Xpath and string notation

Looking at a video on Twitter from Markus Triska , i noticed a slight difference in terms of notation as for using string "

:- use_module(library(http/http_open)).
:- use_module(library(sgml)).
:- use_module(library(xpath)).

go :-
http_open("https://news.ycombinator.com", S, []),
load_html(S, DOM, []),
xpath(DOM, //a(@class=storylink,text), E),
writeln(E).

In the version from Markus, developped with Scryer Prolog, it is not written storylink but “storylink” considering it as a string.

As i saw that SWI-Prolog was replacing by " i was wondering if there was also something forecasted in that sense in library xpath.pl xpath/3 (swi-prolog.org) ?

Slightly odd. I was under the impression that Scryer Prolog wanted to stay much closer to the ISO standard, where "storylink" is a list of integers or one-character atoms. This is a rather unpleasant notation. If we had strings from the start I’d probably had mapped XML attribute names to atoms, attribute values to strings and CDATA to strings. That would explain the above. Without packed strings as datatype I think this is way to costly and leads too easily to ambiguities.

A quick search suggests that Scryer Prolog has a packed representation for strings.

Well, good to see they are copying a lot of SWI-Prolog’s functionality!

2 Likes

Thx. Anyway what matters for Prolog as a standard and to ease programming is to have a common approach, reason why i asked about that difference. With "storylink" in SWI Prolog i was getting a false in spite of the reply said in the example and i was looking for some concrete examples around SWI Prolog’s web use. It needed some time on my side to understand what was causing the difference as i am not as skilled as many others here :slight_smile:

The string vs. atom thing is still somewhat unsettled. Systems such as ECLiPSe that had both for a very long time and doesn’t have atom garbage collection could early settle on a consistent choice on whether to use an atom of string for a particular thing. Typically one uses atoms for things that come from a more or less bounded set and act as something “identifier” like and strings for everything else.

SWI-Prolog missed that opportunity as strings were added late and atom garbage collection is quite old and also avoids running out of memory when more and more atoms are (temporarily) seen by the system. Strings were added for two reasons: disambiguation for dynamic data types and reducing the overhead of atom-garbage collection for threaded applications with lots of volatile atoms. An example if the first is to distinguish "true" from true as a JSON value. A much improved atom garbage collector, mostly by Keri Harris, reduced the need for avoiding atom-GC to rather extreme programs.

As a result of all this we are faced with two representations of text that do not unify (in the Prolog sense of unification) without a widely generally agreed guideline on which to use when :frowning: The Prolog community has been faced with a couple of similar changes in representation that are really hard to deal with :frowning:

Probably, for this case, xpath/3 should allow for both.

2 Likes

You are much more skilled than me to deal about that … personnaly the thing that i appreciate with SWIPL is that it is multi OS, robust, stable, and recognized as so, with the highest performance on the market. Moreover i always priviledged efficiency and memory use from a time when memory or data storage had a prohibitive cost … On the other hand internet is the world of blabla strings everywhere stuff and wasted space … Personnaly i think that what matters is the quality of libraries and examples as the better it is documented the less time people lose time searching. To my point of view XPCE and DCG are great features … under documented … the same way as XPATH that made me search outside the SWIPL area for examples about it.

1 Like

Works also in SWI-Prolog. Just try this:

?- set_prolog_flag(double_quotes, atom).
true.

?- http_open("https://news.ycombinator.com", S, []),
    load_html(S, DOM, []),
    xpath(DOM, //a(@class="storylink",text), E),
    writeln(E).
PE Anatomist – Explore data structures in PE files

I didn’t test setting double_quotes flag to string. Might or might not work, dunno.
The double_quotes flag controls not the parsing but the term building during parsing. The double quotes syntax is always the same, but depending on the flag the token is mapped to different Prolog atomics or even Prolog terms such as lists.

When SWI-Prolog did an article " 5.2.3 Why has the representation of double quoted text changed?", the meaning is only that the default value of double_quotes flag was changed from the value codes to the value string. But I really don’t know whether the default value string works in the above example.

You have to test yourself and maybe raise a feature request. But as Jan W. already observed, there might be obstacles such as unification. Its also not clear to me what Scryer Prolog does. What is "storyline" in Scryer Prolog? Does it unify with a cons cell [H|T]? Its difficult to say what the compatiblity issues are.

Edit 05.01.2021:
From Scryer Github:

Strings and partial strings
In Scryer Prolog, the default value of the Prolog flag double_quotes is chars , which is also the recommended setting.

The chars option is also available in SWI-Prolog, required by ISO core standard. But a compact representation might be missing in SWI-Prolog. But this doesn’t make it clear why for example the argument to http_open/2 “https://news.ycombinator.com” works in Scryer.

To really understand whats is going on, install Scryer and see what it is doing. From an ISO core standard perspective, atoms that are just char lists would break the Prolog system. Since then abc would unify with [H|T], something the ISO core standard doesn’t propose.

1 Like

Thx for your replies that are really instructive. I didn’t think that talking about a string would raise so many points … :slight_smile: