Split_string: the swiss army knife of string goodness. Removing encapsulating quotes

Problem: Given a string that may or may not have “” or ‘’ at either end, in unknown quantities, for example

'const char`
"const char"
"'const char*'"

How does one remove them all? I initially started thinking about lists, how to match the first and last elements etc etc and then…somewhere…split_string crept into the back of my mind.

sanitise_string(In, Out) :-
    split_string(In, "\"'", "\"'", [Out| _]).

Awesome! I have no idea what magic and sorcery is happening behind the scenes and quite frankly, right, I don’t give a damn! (Very bad paraphrase from Gone With The Wind).

This will break if you have any single or double quote inside the string. For example, try:

?- sanitise_string("That's all, folks", Out).

(tumbleweed)…you are right… :expressionless: ah ell, back to the drawing board! :smiley: That’s half the fun of it

Just leave the second argument empty, you will not split at all, only strip from the two ends.

split_string(String, "", "\"'", [Sanitized])

Obviously this doesn’t care if the quotes are balanced… I guess you’d have to use a DCG for such fancy stuff.

1 Like

Clever! I’ll remember that :slight_smile:

In the end, it was a “code smell” that needed attention. I had blindly taken the PHP code as my guide from version 1. The real solution has been to re-do the AST parsing DCG rules so that the presence of a type after a “/” is formally identified and dealt with.

The format is “VARNAME/TYPE” and because my tokeniser doesn’t care about whitespace, you can tokenise that as VARNAME/ TYPE i.e. two tokens. All I had to do was extend the rules and it all works so nicely now, I even was able to delete cruft that was splitting the token around “/” and then looking for types etc.

To quote that old TV advert, “One instinctively knows when something is right”.

1 Like

There’s already an example of using split_string/4 for trimming on the documentation page:

% only remove leading and trailing white space
?- split_string("  SWI-Prolog  ", "", "\s\t\n", L).
L = ["SWI-Prolog"].
2 Likes