Decision diagram for the SWI-Prolog "pseudo-types"

Why pseudo-types? Because they are not really types! But that won’t stop us from drawing a decision tree:

(Large image!)

I’m a bit unsure about the dict with the functor C'dict'. Can use that functor to build a dict using ..=? (I haven tried pulling it out of term using ..= and using it when recomposing actually; I have tried to type it directly, which leads to failure).

Note the interesting feature of the query var(X), which is both a query on the syntax and on the computational state. IMHO it should really be called freshvar(X). The queries are about syntactic structure except for the ones in the number domain, which are about underlying representation, kind of.


EDIT by EricGT

To get a better view of the image, using your Internet browser: open link in new tab

For Chrome it is a context menu item: Right click - Open link in new tab.

3 Likes

Based on the ISO core standard, you could introduce more branching below
the atom type. An atom of length 1 is a char in ISO core standard. What I don’t

know is whether SWI-Prolog views a string of length 1 also sometimes as char.
Like whether you can feed a string of length 1 to char_code/2 first argument.

I guess one motivation, besides readability, was that most Prolog systems have
an atom table. So if you only have an ASCII range of character codes, than

switching between char and code is quick. With Unicode and atom table garbage
collection this assumption has been shaken a little bit.

Edit 26.04.2020:
You could also add code somewhere below integer. But ISO core standard has
two data types of code, one with -1 and one without -1. The -1 is used in the

I/O to indicate EOF. See also:

get_code(-Code)
Code is unified with -1 on end of file
https://www.swi-prolog.org/pldoc/doc_for?object=get_code/1

1 Like

Although incomplete in the implementation, the overall idea is that predicates that require text input accept all text representations and produce the documented type as output.

I’m not seeing this. I think I’m misunderstanding. Why can’t the atom table use Unicode for names? Internally all strings could be pointers into a block of memory for string storage and there’s no reason this couldn’t be Unicode. I use UTF-8 encoding and it works out fine. In a flat system with no modules a functor could be saved in memory like this:

struct wam_functor
{
  wam_val_t *symbol;
  uint8_t arity;
  uint8_t spying;
  wam_val_t *entrypoint;
};

here symbol is a pointer to strings storage, arity is obvious, spying is a flag for the debugger, and entrypoint is either a pointer to bytecode storage or to the clause store.

I am talking about char_code/2 performance.

If your system only supports ASCII than you can reserve
an atom table part for some 127 char code entries. But
the Unicode range is much bigger, it is 1114111.

Now if garbage collection removes atom table entries, then
switching between char and code is not anymore quick.
Because every removed entry has to be recreated.

You can make this simple test. The varying time indicates
garbage collection. Even string_chars/2 instead of atom_chars/2
doesn’t help. What maybe would help, if char_code/2 would

return a string in its first argument and not an atom, hence no
atom table polution. SWI-Prolog writes "A string is a compact
representation that lives on the global (term) stack. "

SWI-Prolog (threaded, 64 bits, version 8.1.29)

?- between(1,31,M), time((between(1,10000,_), test(M), fail; true)), fail; true.
% 30,069,999 inferences, 3.156 CPU in 3.148 seconds (100% CPU, 9527128 Lips)
% 30,069,999 inferences, 3.313 CPU in 3.308 seconds (100% CPU, 9077736 Lips)
% 30,069,999 inferences, 2.922 CPU in 2.931 seconds (100% CPU, 10291337 Lips)
% 30,069,999 inferences, 2.875 CPU in 2.883 seconds (100% CPU, 10459130 Lips)
% 30,069,999 inferences, 3.031 CPU in 3.041 seconds (100% CPU, 9920000 Lips)
% 30,069,999 inferences, 3.719 CPU in 3.713 seconds (100% CPU, 8086050 Lips)
% 30,069,999 inferences, 3.344 CPU in 3.337 seconds (100% CPU, 8992897 Lips)
% 30,069,999 inferences, 3.000 CPU in 3.012 seconds (100% CPU, 10023333 Lips)
% 30,069,999 inferences, 2.984 CPU in 2.973 seconds (100% CPU, 10075811 Lips)
% 30,069,999 inferences, 2.953 CPU in 2.954 seconds (100% CPU, 10182434 Lips)
% 30,069,999 inferences, 3.063 CPU in 3.073 seconds (100% CPU, 9818775 Lips)
% 30,069,999 inferences, 3.328 CPU in 3.330 seconds (100% CPU, 9035117 Lips)
% 30,069,999 inferences, 3.047 CPU in 3.058 seconds (100% CPU, 9869128 Lips)
% 30,069,999 inferences, 2.922 CPU in 2.924 seconds (100% CPU, 10291337 Lips)
% 30,069,999 inferences, 3.406 CPU in 3.405 seconds (100% CPU, 8827890 Lips)
% 30,069,999 inferences, 3.703 CPU in 3.702 seconds (100% CPU, 8120169 Lips)
% 30,069,999 inferences, 3.281 CPU in 3.275 seconds (100% CPU, 9164190 Lips)
% 30,069,999 inferences, 2.969 CPU in 2.999 seconds (99% CPU, 10128842 Lips)
% 30,069,999 inferences, 2.938 CPU in 2.940 seconds (100% CPU, 10236595 Lips)
% 30,069,999 inferences, 2.938 CPU in 2.934 seconds (100% CPU, 10236595 Lips)
% 30,069,999 inferences, 2.953 CPU in 2.955 seconds (100% CPU, 10182434 Lips)
% 30,069,999 inferences, 2.969 CPU in 2.968 seconds (100% CPU, 10128842 Lips)
% 30,069,999 inferences, 3.609 CPU in 3.608 seconds (100% CPU, 8331082 Lips)
% 30,069,999 inferences, 3.672 CPU in 3.671 seconds (100% CPU, 8189276 Lips)
% 30,069,999 inferences, 3.719 CPU in 3.718 seconds (100% CPU, 8086050 Lips)
% 30,069,999 inferences, 3.234 CPU in 3.234 seconds (100% CPU, 9297005 Lips)
% 30,069,999 inferences, 2.969 CPU in 2.968 seconds (100% CPU, 10128842 Lips)
% 30,069,999 inferences, 2.953 CPU in 2.984 seconds (99% CPU, 10182434 Lips)
% 30,069,999 inferences, 3.328 CPU in 3.327 seconds (100% CPU, 9035117 Lips)
% 30,069,999 inferences, 3.313 CPU in 3.312 seconds (100% CPU, 9077736 Lips)
% 30,069,999 inferences, 3.688 CPU in 3.679 seconds (100% CPU, 8154576 Lips)
true.

?- between(1,31,M), time((between(1,10000,_), test2(M), fail; true)), fail; true.% 30,070,000 inferences, 2.984 CPU in 2.982 seconds (100% CPU, 10075812 Lips)
% 30,069,999 inferences, 2.922 CPU in 2.928 seconds (100% CPU, 10291337 Lips)
% 30,069,999 inferences, 3.313 CPU in 3.306 seconds (100% CPU, 9077736 Lips)
% 30,069,999 inferences, 3.281 CPU in 3.282 seconds (100% CPU, 9164190 Lips)
% 30,069,999 inferences, 3.109 CPU in 3.111 seconds (100% CPU, 9670753 Lips)
% 30,069,999 inferences, 2.875 CPU in 2.882 seconds (100% CPU, 10459130 Lips)
% 30,069,999 inferences, 2.547 CPU in 2.549 seconds (100% CPU, 11806625 Lips)
% 30,069,999 inferences, 2.766 CPU in 2.760 seconds (100% CPU, 10872768 Lips)
% 30,069,999 inferences, 3.250 CPU in 3.259 seconds (100% CPU, 9252307 Lips)
% 30,069,999 inferences, 3.281 CPU in 3.267 seconds (100% CPU, 9164190 Lips)
% 30,069,999 inferences, 2.578 CPU in 2.587 seconds (100% CPU, 11663515 Lips)
% 30,069,999 inferences, 3.047 CPU in 3.048 seconds (100% CPU, 9869128 Lips)
% 30,069,999 inferences, 3.297 CPU in 3.290 seconds (100% CPU, 9120758 Lips)
% 30,069,999 inferences, 2.953 CPU in 2.963 seconds (100% CPU, 10182434 Lips)
% 30,069,999 inferences, 3.313 CPU in 3.298 seconds (100% CPU, 9077736 Lips)
% 30,069,999 inferences, 3.281 CPU in 3.288 seconds (100% CPU, 9164190 Lips)
% 30,069,999 inferences, 3.156 CPU in 3.156 seconds (100% CPU, 9527128 Lips)
% 30,069,999 inferences, 2.531 CPU in 2.534 seconds (100% CPU, 11879506 Lips)
% 30,069,999 inferences, 2.531 CPU in 2.531 seconds (100% CPU, 11879506 Lips)
% 30,069,999 inferences, 2.547 CPU in 2.546 seconds (100% CPU, 11806625 Lips)
% 30,069,999 inferences, 2.578 CPU in 2.575 seconds (100% CPU, 11663515 Lips)
% 30,069,999 inferences, 2.609 CPU in 2.618 seconds (100% CPU, 11523832 Lips)
% 30,069,999 inferences, 2.531 CPU in 2.518 seconds (101% CPU, 11879506 Lips)
% 30,069,999 inferences, 2.547 CPU in 2.549 seconds (100% CPU, 11806625 Lips)
% 30,069,999 inferences, 2.672 CPU in 2.672 seconds (100% CPU, 11254269 Lips)
% 30,069,999 inferences, 3.328 CPU in 3.331 seconds (100% CPU, 9035117 Lips)
% 30,069,999 inferences, 3.063 CPU in 3.058 seconds (100% CPU, 9818775 Lips)
% 30,069,999 inferences, 3.375 CPU in 3.386 seconds (100% CPU, 8909629 Lips)
% 30,069,999 inferences, 3.344 CPU in 3.340 seconds (100% CPU, 8992897 Lips)
% 30,069,999 inferences, 3.281 CPU in 3.288 seconds (100% CPU, 9164190 Lips)
% 30,069,999 inferences, 3.328 CPU in 3.315 seconds (100% CPU, 9035117 Lips)
true.

The used code was:

test(M) :- L is M*1000, H is (M+1)*1000, between(L,H,N), 
    char_code(C,N), atom_chars(_,[C,C]), fail; true.

test2(M) :- L is M*1000, H is (M+1)*1000, between(L,H,N), 
    char_code(C,N), string_chars(_,[C,C]), fail; true.

Then it wouldn’t be char_code/2 though wouldn’t it? Surely your problem is not one of pollution (since you support preallocation of the one character ASCII atoms) but one of dimensioning (since you think it too expensive to preallocate the one character Unicode atoms).

1 Like

For a better understanding, my “readability” refers to the
Prolog chars ISO fetish that this here is more readable:

?- atom_chars('hello', X).
X = [h, e, l, l, o].

Than this here. Nothing to do primafacie with Unicode.

?- atom_codes('hello', X).
X = [104, 101, 108, 108, 111].

The predicate atom_chars/2 is one more place where the
atom table is poluted. This ISO fetish is the root cause.
If you wouldn’t have chars, there wouldn’t be this problem.

But one could now introduce string backed chars, and
maybe have the cake and eat it two. Question is whether
strings have the same clause indexing and unification

properties like chars. Because atom table backed chars
still work good for example in DCG.

1 Like

I don’t preallocate ASCII atoms in an atom table. This was only a
suggestion for a certain type of Prolog system. Also I don’t know
what heuristic causes the garbage collection in SWI-Prolog.

In my own system I don’t have an atom table, in the sense that an
atom is an index into an atom table. Atoms can exist like strings
without an atom table entry, making strings superflous.

See also: String predicate in Swi-Prolog
In the example there you see atom_codes/chars
atom table polution through its first argument.
https://stackoverflow.com/a/51911593/502187

1 Like

Well its all game and play, until you try to realize the full set of ISO
core standard predicates. With UTF-8 you get a slow down in many
atom operations like atom_length/2 and sub_atom/5.

What were originally very quick direct operation are now again
operations that get shattered. Just imagine how you would
handle these test cases based on UTF-8?

?- atom_length('Zürich', L).
L = 6
?- sub_atom('Zürich', 2, 2, _, A)
A = ri

Same problem with UTF-16 as it is used in Java Strings. Funny result
in my Prolog system since UTF-16 is used and I allow surrogate input
in quoted atoms. You dont get the first result in SWI-Prolog:

?- X = '\xD83D\\xDE00\'.
X = 😀
?- X = '\x1F600\'.
X = 😀
1 Like

I just tried these out in my system and I get L=6 and A=ri. Did you expect those results or not? You’ve got me worried I’ve implemented Unicode support incorrectly. You’ve got me doing this:

| ?- Emote = '\u1f631'.
Emote = 😱 ? 

I can maybe see there being a problem with having a length in characters as Unicode talks about code points (can a codepoint comprise several characters - graphically I mean?).

The \u syntax is not ISO core standard. Also according to
SWI Prolog documentation \u has only four hexdigits. Unlike
the ISO core standard syntax \x which has an end delimiting
\ the \u has no end delimiting character, therefore its fixed
to four hexdigits.

So the result for SWI-Prolog is not what you showed.
Note that the 1, the fifth hexdigit is ignored, not part of the \u:

SWI-Prolog (threaded, 64 bits, version 8.1.29)

?- Emote = '\u1f631'.
Emote = ὣ1.

To make it work you need to use the capital \U which has
eight hexdigits. You can then make it work. Well not on
windows, only on mac. Since on windows SWI-Prolog
default build Unicode is capped to 16-bit:

?- Emote = '\U0001f631'.
ERROR: Syntax error: Illegal character code

This SWI-Prolog specific \u and \U syntax is documented here:
2.16.1.3 Character Escape Syntax
https://www.swi-prolog.org/pldoc/man?section=charescapes

1 Like

I’m not surprised it isn’t becuase the output I showed wasn’t from SWI Prolog. I wrote

I just tried these out in my system

And by “my system” I mean a Prolog system I built myself. See page 11 of this manual http://barrywatson.se/download/manual.pdf for my implementation of Unicode escape sequences.

Just to be clear. I’m making no claims about SWI Prolog.

This is also a funny test case:

?- atom_concat(X, Y, 'Zürich').
 X = '', Y = 'Zürich' ;
 X = 'Z', Y = ürich ;
 X = 'Zü', Y = rich Etc..

An often neglect atom functionality is searching backwards.
Think of Java which has String.indexOf() and String.lastIndexOf(),
for both search directions:

?- last_atom_concat(X, Y, 'Zürich').
 X = 'Zürich', Y = '' ;
 X = 'Züric', Y = h ;
 X = 'Züri', Y = ch Etc..

Searching backwards through a string can be both applied to UTF-8
and UTF-16, these codes allow backward scanning. A Prolog system
can provide further analogues like for example last_sub_atom/5
as analogue of sub_atom/5.

1 Like

Is Barry’s Prolog open source somewhere?
Quite a fancy Prolog, I read (page 106 manual.p):

?- write_canonical(3.14), nl. 
’$float’(14480694097861998019,2,64)

Does your case_shift/2 predicate work for Unicode?
I guess Unicode needs more than Lower is Upper-"A"+"a".

Maybe putting atom/1 above atomic/1 could be also mis-
leading. Not only strings are atomic also atoms are atomic.

?- atomic('abc').
true.

?- atomic("abc").
true.

Actually I think for Prolog we have this identity: Right?

atomic(X) <=> \+ var(X), \+ compound(X).

So the box “In this branch always atomic(X)” happens
already one branch above, a little earlier. I think also
a blob is never a compound, making the identity also

true for the SWI-Prolog implementation specific blobs.
But interestingly strings are not blobs:

?- blob('abc', X).
X = text.

?- blob("abc", X).
false.
1 Like

The atomic part should look like this

  • atomic
    • number
      • float
      • rational
        • integer
          • code (character)
    • string
    • blob
      • atom
        • char (1-length atom)
      • special constants ([], dict functor)
      • encapsulated foreign resources
        • stream handle
        • thread handle
        • clause handle
        • … (many more)

Blobs global shared (between threads) objects and are subject to (atom) garbage collection. All the others live on the Prolog (global) stack and are thread-local and subject to traditional Prolog GC. As @dtonhofer says, several of these types have multiple implementation that one could consider sub types. These distinctions are fully transparent to the user, but knowing about them can have some value for minimizing resource usage and designing tests for transformations that cross these borders. In particular

  • integers come in three forms
    • Inlined (min/max_tagged_integer) integers use no (additional storage)
    • 64-bit signed integers use 64-bit + two guard words on the global stack
    • GMP integers for anything larger. This is a serialization of the GMP structure on the global stack, again with two guard words.
  • Atoms and strings come in two forms: those with all characters in the range 0…255 are represented as a char* array
  • The others are a wchar_t* array, USC-2 on Windows, UCS-4 on anything else.

char and code do not use any special implementation.

3 Likes

Now I have a feeling that SWI-Prolog
new rational numbers have a predicate
missing. Before SWI-Prolog new rational

numbers I could check for a proper ratio
via this test, assuming we know already
\+ var(X):

?- .., X = _ rdiv _.

Whats the replacement now? From common
lisp there is the distinction

Common Lisp the Language, 2nd Edition
The types integer and ratio are disjoint subtypes of rational.
https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node42.html

So I guess a new predicate ratio/1 could
be useful. So then the check would be:

?- .., ratio(X).

Maybe it could be bootstrapped as denominator(X)=1,
but having a test predicate could be more efficient.

It’s not open source. As you see floats can be given the precision you want. What you see is pi approximated as a 64 bits of precision. Here’s the same for 8 bits and 800 bits. See the documentation for eval/2 to see how it is encoded. All floating point operations are written in Prolog.

| ?- set_prolog_flag(floating_point_precision, 8).
% yes
| ?- write_canonical(3.14), nl.
'$float'(201,2,8)
% yes
| ?- set_prolog_flag(floating_point_precision, 800).
% yes
| ?- write_canonical(3.14), nl.
'$float'(5234391329810685605152683655716187370758635283017987905911062382234769231686841176144324277684170118233199979815845093750022709298483701588085234501079701586184698699918501428393516454440774009412237537666670982555627237209419022113563643740,2,800)

% yes

I don’t implement that predicate.

I’m not entirely against it. I’m also not very convinced we need it. You get this using rational(X), \+ integer(X) or rational(X), denominator(X) =\= 1. I surely agree this is a bit clumsy, but it works and I only do not see many cases where you would like to use this test. For short, I rather wait for somebody with a real problem for which this would significantly simplify the code. Adding more primitives is not free and primitives that are never used thus make the system worse rather than better.

1 Like

Pitty Barry’s Prolog is not open source. Whats the
result of write_canonical(pi)? Do you have arbitrary
precission trigonometric functions? Here is my take,

it will calculate pi on demand for the requested
precision and cache it thread locally. The library
is still experimental:

Jekejeke Prolog 4, Runtime Library 1.4.4

?- use_module(library(decimal/multi)).
% 9 consults and 0 unloads in 197 ms.
Yes

?- X is mp(pi,3).
X = 0d3.14 

?- X is mp(pi,100).
X = 0d3.14159265358979323846264338
327950288419716939937510582097494
459230781640628620899862803482534
2117070

The prefix 0d indicates a Java BigDecimal. And its open source
here. Because of multi-threading I avoid a Prolog flag,
so there is a context operator mp/2 for multi-precission which

also changes the semantic of all evaluable functions from is/2.
Accurancy is not yet optimal, do not yet compute with some
excess precssion, so few last digits might be wrong.

1 Like