Xpath and string notation

Looking at a video on Twitter from Markus Triska , i noticed a slight difference in terms of notation as for using string "

:- use_module(library(http/http_open)).
:- use_module(library(sgml)).
:- use_module(library(xpath)).

go :-
http_open("https://news.ycombinator.com", S, []),
load_html(S, DOM, []),
xpath(DOM, //a(@class=storylink,text), E),
writeln(E).

In the version from Markus, developped with Scryer Prolog, it is not written storylink but “storylink” considering it as a string.

As i saw that SWI-Prolog was replacing by " i was wondering if there was also something forecasted in that sense in library xpath.pl xpath/3 (swi-prolog.org) ?

Slightly odd. I was under the impression that Scryer Prolog wanted to stay much closer to the ISO standard, where "storylink" is a list of integers or one-character atoms. This is a rather unpleasant notation. If we had strings from the start I’d probably had mapped XML attribute names to atoms, attribute values to strings and CDATA to strings. That would explain the above. Without packed strings as datatype I think this is way to costly and leads too easily to ambiguities.

A quick search suggests that Scryer Prolog has a packed representation for strings.

Well, good to see they are copying a lot of SWI-Prolog’s functionality!

2 Likes

Thx. Anyway what matters for Prolog as a standard and to ease programming is to have a common approach, reason why i asked about that difference. With "storylink" in SWI Prolog i was getting a false in spite of the reply said in the example and i was looking for some concrete examples around SWI Prolog’s web use. It needed some time on my side to understand what was causing the difference as i am not as skilled as many others here :slight_smile:

The string vs. atom thing is still somewhat unsettled. Systems such as ECLiPSe that had both for a very long time and doesn’t have atom garbage collection could early settle on a consistent choice on whether to use an atom of string for a particular thing. Typically one uses atoms for things that come from a more or less bounded set and act as something “identifier” like and strings for everything else.

SWI-Prolog missed that opportunity as strings were added late and atom garbage collection is quite old and also avoids running out of memory when more and more atoms are (temporarily) seen by the system. Strings were added for two reasons: disambiguation for dynamic data types and reducing the overhead of atom-garbage collection for threaded applications with lots of volatile atoms. An example if the first is to distinguish "true" from true as a JSON value. A much improved atom garbage collector, mostly by Keri Harris, reduced the need for avoiding atom-GC to rather extreme programs.

As a result of all this we are faced with two representations of text that do not unify (in the Prolog sense of unification) without a widely generally agreed guideline on which to use when :frowning: The Prolog community has been faced with a couple of similar changes in representation that are really hard to deal with :frowning:

Probably, for this case, xpath/3 should allow for both.

3 Likes

You are much more skilled than me to deal about that … personnaly the thing that i appreciate with SWIPL is that it is multi OS, robust, stable, and recognized as so, with the highest performance on the market. Moreover i always priviledged efficiency and memory use from a time when memory or data storage had a prohibitive cost … On the other hand internet is the world of blabla strings everywhere stuff and wasted space … Personnaly i think that what matters is the quality of libraries and examples as the better it is documented the less time people lose time searching. To my point of view XPCE and DCG are great features … under documented … the same way as XPATH that made me search outside the SWIPL area for examples about it.

1 Like

Thx for your replies that are really instructive. I didn’t think that talking about a string would raise so many points … :slight_smile: