Many thanks for those pointers Jan.
I’ve realised a DCG is the most elegant approach after getting to grips with the various ways of iterating in Prolog which I described here Six ways to iterate in Prolog
I’ve also realised A parsing example I took from The Art of Prolog is overcomplicated and could be boiled down to one DCG (something I’ll tackle next).
Here’s the simplified tokeniser I’ve settled on.
:- module(tokeniser, [string_tokens/2]).
%% tokenise(+String, -Tokens) is det
string_tokens(String, Tokens) :-
string_chars(String, Chars),
phrase(tokens(Tokens), Chars).
% Definite Clause Grammar (DCG)
tokens([Token|Tokens]) --> ws, token(Token), ws, !, tokens(Tokens).
tokens([]) --> [].
ws --> [W], { char_type(W, space) }, ws.
ws --> [].
token(P) --> [C], { char_type(C, punct), \+char_type(C, quote), string_chars(P, [C]) }.
token(Q) --> quote(Cs), { string_chars(Q, Cs) }.
token(W) --> word(Cs), { string_chars(W, Cs) }.
quote([Quote|Ls]) --> [Quote], { char_type(Quote, quote) }, quote_rest(Quote, Ls).
quote_rest(Quote, [L|Ls]) --> [L], { L \= Quote }, quote_rest(Quote, Ls).
quote_rest(Quote, [Quote]) --> [Quote].
word([L|Ls]) --> [L], { char_type(L, alnum) }, word_rest(Ls).
word_rest([L|Ls]) --> [L], { char_type(L, alnum) }, word_rest(Ls).
word_rest([]) --> [].