want to give application programmers using pack(identity) the ability to prevent users from making obscene usernames.
I have a list of 400 words, but can imagine it growing into the few thousand range.
I want to quickly determine if one of the naughty words is a substring of the user input. So ‘jackisadoofus’ is not OK, because doofus is on my list.
Yes, there are lots of OTHER issues with doing this - I’m aware. My question is just an implementation one.
So, one way would be some sort of prefix tree.
But another would be to simply assert all the bad words as facts, then get all the substrings of length up to the max length of a naughty word, and try them against the list.
It might make sense to also assert prefixes of bad words of length (shortest bad word), then first see if this length in this position is a possible match.
Anyway, how fast is doing lookup on long lists of this sort of thing?
Should I convert all to atoms?