term_string('A', "'A'")

Sometime term_string/2 confuses me, and it causes bugs into my codes. One of such confuses, I think, comes from that I believed that the length of “atom name” of the “atom” ‘A’ is 1. But, like an example below, the length of X returned by query term_string('A', X) is 3 ! Maybe ‘A’ itself is not an atom, but merely a prolog term to indicate a unique atom which has name ‘A’ as a term. Of course, practically I am satisfied with nice property that

term_string(X, Y), term_string(Z, Y)   =>   X == Z.

How are you free from possible confusions about term_string('A', X) ?


?- atom_length('A', X).
X = 1.

?- string_length('A', X).
X = 1.

?- term_string('A', X).
X = "'A'".

?- term_string('A', X), string_length(X, L).
X = "'A'",
L = 3.

?- term_string('a', X).
X = "a".

?- term_string(a, X).
X = "a".

?- term_string("abc", X), term_string(Y, X).
X = "\"abc\"",
Y = "abc".

Perhaps you intended to use atom_string/2 and not term_string/2?

?- Atom='ABC', atom_string(Atom, X), atom_length(Atom, AtomLength), string_length(X, StringLength).
Atom = 'ABC',
X = "ABC",
AtomLength = StringLength, StringLength = 3.

In term_string('A', X), X gets a representation of the term 'A'. There’s an atom_length/2 predicate, but it only works with atoms, not with terms in general. The representation must work with all possible terms, so it needs to add quotes.

Note that term_string/2 works in both directions:

?- term_string('ABC', X).
X = "'ABC'".

?- term_string(Y, "'ABC'").
Y = 'ABC'.

Another way of thinking about it: you’re using term_string(Term, String) and expecting that term_length(Term) should be the same as string_length(String). But there is no term_length/1 – what would it mean in general (and not just for atoms or strings)?

3 Likes

Rather, 'A' is an atom, but since term_string/2 quotes if necessary, the string representation has a length of 3. From the docs:

Term is ‘written’ using the option quoted(true) and the result is converted to String.

Since this term has to be quoted it “increases” in length by exactly 2.

One way to think about it is that the second argument (the string) is just text that could be parsed into a Prolog term. It gets however increasingly confusing if you parse what would be a variable in Prolog text:

?- term_string(T, S).
S = "_27118". % conversion still went from term (variable) to string

?- term_string(T, "A").
true. % What happened?

?- term_string(T, "A"), display(T).
_31336 % OK, the text "A" was parsed into a Prolog term, a free variable
true.

?- term_string(T, 'A').
true.

?- term_string(T, 'A'), display(T).
_2788 % atoms are also text...
true.

?- term_string(T, "").
T = end_of_file.

?- term_string(T, '').
T = end_of_file. % okay

So indeed, the right argument is really just text. (But I am now more confused than before)

Annoyingly enough it works with all “atomic” it seems, not just atoms :slight_smile:

?- atom_length("string", N).
N = 6.

?- atom_length(0, N).
N = 1.

?- atom_length(42, N).
N = 2.

?- atom_length([], N).
N = 0.

?- term_string([], S), atom_length(S, N).
S = "[]",
N = 2.

?- atom_length(22r7, N).
N = 4.

So the “empty list” is a non-atom atomic with an atom length of 0 that can be stringified, and the atom length of that string is 2 :smiley: but that 2 is not coming from the quotes.

This is too meta for me.

So what happens here really? Is the empty list interpreted as a code list with 0 length, converted to the empty atom, which then has a length of 0? I guess so.

That seems to be the case, atom_length/2 delegates to the C function PL_get_text which has a special case for the empty list:

int
PL_get_text(DECL_LD term_t l, PL_chars_t *text, int flags)
{ word w = valHandle(l);
  if ( (flags & CVT_ATOM) && isAtom(w) )
  { if ( isNil(w) && (flags&CVT_LIST) )
      goto case_list;
...
  case_list:
    if ( (b = codes_or_chars_to_buffer(l, BUF_STACK, FALSE, &result)) )
    { text->length = entriesBuffer(b, char);
      addBuffer(b, EOS, char);
      text->text.t = baseBuffer(b, char);
      text->encoding = ENC_ISO_LATIN_1;
    }
...
}

Where codes_or_chars_to_buffer just returns an empty buffer when given the empty list as the first argument AFAICT.

1 Like

It is a good suggestion for me. I agree at least as a prolog programmer. It sounds like you say the second argument text is the name of a prolog term in the first argument. This intuitive meaning will decrease related possible bugs in the future.

Well this is the problem with using natural language for describing computer programs :wink: I though more like “the second argument is a string that holds the text that would represent the term in the first argument”. Really not sure about “name of a prolog term”. I know that some long time ago “name” was a thing in Prolog programming, based on the existence of the (cautiously deprecated) name/2.

FYI for those wondering how to track down source code when it passes from Prolog to C.


Normally to see the code behind a documented predicate just go to the SWI-Prolog documentation page and click on image

E.g.

For append/2, image links to append/2 source code

Note: image is at the right of
image


However for atom_length/2 it shows.

image

So the documentation will not help at this point the source code has to be searched.

The source code is on GitHub with two repositories for SWI-Prolog

Typically ones SWI-Prolog version is in sync with the latest SWI-Prolog development release so that repository will be used.

Browse to the GitHub repository for SWI-Prolog development.
(https://github.com/SWI-Prolog/swipl-devel)

In the upper left search box
image

enter the search word: atom_length and press enter.

On the left at the bottom click Advanced search

For Written in this language select: C

Click Search

This shows the C implementation of atom_length/2 is in the src directory and pl-prims.c file.



A more efficient way I have found to search for such is to use Notepad++ to search a directory of local copies of GitHub repositories. A bit more details are in this reply.

For the impatient, you could also use git grep with a path specification for “C files only”:

$ git grep atom_length -- *.c
src/pl-prims.c:PRED_IMPL("atom_length", 2, atom_length, PL_FA_ISO)
src/pl-prims.c:  PRED_DEF("atom_length", 2, atom_length, PL_FA_ISO)

No idea about the subtleties of quoting the path specification on different shells/OSs though :frowning:

Thanks for all comments . I was glad to hear them. Reading them I came to a practical conclusin about use of term_string/2 avoiding unexpeced confusions.

Given a prolog term X, term_string(X, Y) returns some string Y from which X can be restored in a variant term. I recommend here that one should not much pay attention to the exact form of the text Y, which may be a kind of implementation matter. Imortant thing is that using term_string one can save a term as Y in a file. After then
reading the text Y, the X is restored by term_string(X, Y).

Of couse, it is necessary to pay some minimum attention to the exact form Y when Y is sent to other languages e.g. Javasritpt.

?- X = f(A, B, A),
	term_string(X, S), term_string(Y, S), variant(X, Y).
X = f(A, B, A),
S = "f(_9546,_9548,_9546)",
Y = f(_A, _, _A).

If you want to be sure that the term can be restored, you should use some extra options. I think that these suffice:

?- term_string(ハロー+'Foo', X, [ignore_ops(true), quoted(true), quote_non_ascii(true)]).
X = "+('ハロー','Foo')".
1 Like

Thanks for the remark. I seldom saw such options, but I tested with cut and paste. The result is exactly as you said. It is impressive.

?- write_canonical(ハロー+'Foo').
+('ハロー','Foo')
true.

?- X = ハロー+'Foo',
|    Options=[ignore_ops(true), quoted(true), quote_non_ascii(true)],
|    term_string(X, Y,  Options), term_string(Z, Y, Options),
|    X=Z.
X = Z, Z = ハロー+'Foo',
Options = [ignore_ops(true), quoted(true), quote_non_ascii(true)],
Y = "+('ハロー','Foo')".

Maybe the easiest to unterstand term_string/2 with mode (+,-) is
to compare it with write_term/2. It seems it works like write_term/2
with the option list [quoted(true)], which is also what the SWI-Prolog

documentation of term_string/2 basically says:

?- term_string('A', X).
X = "'A'".
?- term_string('$VAR'(0), X).
X = "'$VAR'(0)".

Its the same as, the output wrapped in a string instead sent to the console:

?- write_term('A',[quoted(true)]).
'A'
?- write_term('$VAR'(0),[quoted(true)]).
'$VAR'(0)

Edit 22.06.2022:
Its interesting that SWI-Prolog and most other Prolog systems allow option
merging already through the option list itself:

?- write_term('$VAR'(0),[quoted(true), quoted(false)]).
$VAR(0)

So the bootstrapping could be made more easier than this here:

term_string(Term, String, Options) :-
    (   '$option'(quoted(_), Options)
    ->  Options1 = Options
    ;   '$merge_options'(_{quoted:true}, Options, Options1)
    ),
    format(string(String), '~W', [Term, Options1]).

Can be replaced by:

term_string(Term, String, Options) :-
    format(string(String), '~W', [Term, [quoted(true)|Options]]).

Works like a charm, no need for '$option' and '$merge_options':

?- format(string(String), '~W', ['$VAR'(0), [quoted(true)]]).
String = "'$VAR'(0)".

?- format(string(String), '~W', ['$VAR'(0), [quoted(true), quoted(false)]]).
String = "$VAR(0)".

Be very careful with that. Practically all predicates do allow duplicate options, but there are a lot of ways to process options and some of these pick the first, while others pick the last. I think that eventually all should pick the first. That would be consistent with the Prolog library(options) and using a dict that has no duplicates and thus Opts.put(quoted,false) creates a new option dict where quoted=false. Notably the built-in option processing helper picks the last, as well as most “hand coded” option processing in C(++) that walks over the list and processes the options one by one.

SWI-Prolog seems to pick the last, in case of quoted/1 ?
When you pick the last consistently for all options, you can use the idiom.

write_my_defaults(X, O) :-
    write_term(X, [my_defaults|O]).

Picking the last can be also interpreted as picking all options from
left to right, and one after the other overwriting some options
record structure. This can be also done in Prolog! Like here:

% decode_write_opts(+List, +Triple, -Triple)
decode_write_opts(V, _, _) :- var(V),
   throw(error(instantiation_error,_)).
decode_write_opts([X|L], I, O) :- !,
   decode_write_opt(X, I, H),
   decode_write_opts(L, H, O).
decode_write_opts([], H, H) :- !.
decode_write_opts(L, _, _) :-
   throw(error(type_error(list,L),_)).

% decode_write_opt(+Option, +Triple, -Triple)
decode_write_opt(V, _, _) :- var(V),
   throw(error(instantiation_error,_)).
decode_write_opt(variable_names(N), v(_,F,L), v(N,F,L)) :- !,
   decode_write_variable_names(N).
decode_write_opt(quoted(B), v(N,F,L), v(N,G,L)) :- !,
   decode_write_boolean(B, 1, F, G).
decode_write_opt(ignore_ops(B), v(N,F,L), v(N,G,L)) :- !,
   decode_write_boolean(B, 2, F, G).
decode_write_opt(priority(L), v(N,F,_), v(N,F,L)) :- !,
   sys_check_integer(L).
decode_write_opt(O, _, _) :-
   throw(error(type_error(write_option,O),_)).

Disclaimer: I didn’t check what the ISO core standard would
prescribe or what newer Prolog systems do, like Scryer Prolog.
So the idiom might be not that portable.

Yes, as I said, most C code picks the last, most Prolog code picks the first. The last is nice for adding defaults to options, while the first is nice to make sure some option value is used regardless of the options coming from the environment. Both clearly have use cases … It would definitely be better if all option processing was consistent.

Yeah, it seems ‘$option’ is bootstrapped from memberchk/2. Its a little
bit the same situation like with sub_atom/5 versus last_sub_atom/5.
One would need an API ‘$last_option’ to pick the last. Which would

need to call last_memberchk/2. In case one retrieves options selectively.
The decode_write_opts/3 approach on the other hand does validate
and merge and puts into a special datastucture, where you can then

retrieve the options from. But this is more specific, the routines that
access the special datastructure need to have knowledge of that
datastructure. So its possibly not a solution for the public. Except if

you would use SWI-Prolog dicts for the resulting datastructure of
such a decoding. And then you can make it more open.

I see. In other words, term_string/2 is an inverse function of read (as function from strings to terms).
Thanks.