34 ?- tokenize_atom('The big dog.', R).
R = ['The', big, dog, '.'].
Is there a similar function that instead of doing this, will first lower case each token so you end up with a list of non-quoted atoms, at least for the words?
For example:
34 ?- tokenize_atom_hypothetical('The big dog.', R).
R = [the, big, dog, '.'].
Actually, I cannot think of an example where your solution works differently from first tokenizing and then downcasing. Are there unicode characters that would break this if you first downcase them?
I think that shouldn’t be the case. SWI-Prolog’s native Unicode handling is incomplete. It can merely pass on Unicode code points (on Windows limited to 16 bits) and do simple one-character classification and case folding. The SWI-Prolog Unicode library provides more advanced features.