Tokenize_atom equivalent that forces everything to lowercase atoms?

prodog · August 9, 2019, 4:08am

tokenize_atom/2 will convert any tokens in a sentence that start with an upper case letter to a quoted string:

https://www.swi-prolog.org/pldoc/doc_for?object=tokenize_atom/2

34 ?- tokenize_atom('The big dog.', R).
R = ['The', big, dog, '.'].

Is there a similar function that instead of doing this, will first lower case each token so you end up with a list of non-quoted atoms, at least for the words?

For example:

34 ?- tokenize_atom_hypothetical('The big dog.', R).
R = [the, big, dog, '.'].

Boris · August 9, 2019, 6:07am

Even if there isn’t, it seems easy enough to do it:

tokenize_atom_and_downcase(A, Ts) :-
    tokenize_atom(A, Any_case),
    maplist(downcase_atom, Any_case, Ts).

With this:

?- tokenize_atom_and_downcase('The big dog ATE MY CAT!', Ts).
Ts = [the, big, dog, ate, my, cat, !].

CapelliC · August 9, 2019, 6:11am

Of course, if loosing information is not a problem:

?- downcase_atom('The big dog.',L),tokenize_atom(L,T).
L = 'the big dog.',
T = [the, big, dog, '.'].

Boris · August 9, 2019, 6:19am

Actually, I cannot think of an example where your solution works differently from first tokenizing and then downcasing. Are there unicode characters that would break this if you first downcase them?

jan · August 9, 2019, 6:53am

I think that shouldn’t be the case. SWI-Prolog’s native Unicode handling is incomplete. It can merely pass on Unicode code points (on Windows limited to 16 bits) and do simple one-character classification and case folding. The SWI-Prolog Unicode library provides more advanced features.

CapelliC · August 9, 2019, 7:41am

SQL is case insensitive, but actually, quoted data should preserve the casing, lost by my solution. Yours works better, in this respect.

Boris · August 9, 2019, 10:17am

I see (I think). So you mean that there can be additional logic after tokenizing but before downcasing. Fair point.

Topic		Replies	Views
Read_term_from_atom and upper case functor Predicate	8	816	March 2, 2020
Suppress confusion! Two suggestions for the online docs (atom_codes/2, atom_chars/2) Request For Comments discussion	3	703	April 26, 2020
How to use of upcase_atom/2 to get uppercase letters in an output Data Structure how-to	62	3788	January 23, 2022
Btw. I found 8 possibilities to concat atoms in Swi Prolog Nice to know	6	885	March 24, 2022
Alphabetical order between two free variables using term_to_atom/2 Help! how-to	4	491	November 30, 2020

Tokenize_atom equivalent that forces everything to lowercase atoms?

Related topics