Once this predicate is tabled, it loops forever

dtonhofer · August 31, 2020, 6:08am

In SWI-Prolog 8.3.5, a simple predicate to generate a random amount of whitespace relatively quickly:

:- module(heavycarbon_strings_spaces,
   [
      string_of_spaces/2  % string_of_spaces(N,Spaces)
   ]).

% :- table string_of_spaces/2. % this causes an infinite loop to appear

% ==============================================================================
% Generating/Recognizing/Verifying "strings made of spaces"
%
% string_of_spaces(?N,?Spaces)
%
% N      : integer >= 0
% Spaces : a string on output (accepts the same stuff atom_string/2 accepts on input)
% ==============================================================================

string_of_spaces(N,Spaces) :-
   nonvar(Spaces),                     % case: "check length or determine length of 'Spaces'"
   !,
   string_concat("",Spaces,Str),       % make sure it's a string for "==" later; may throw
   string_length(Str,N),               % length is now known
   string_of_spaces(N,StrNew),         % regenerate spacey string for comparison
   Str == StrNew.                      % must be the same (i.e. fail if Spaces is not "spacey")

string_of_spaces(N,Spaces) :-
   var(Spaces),nonvar(N),              % case: "generate a string"
   !,
   gen_string_of_spaces(N,Spaces).

string_of_spaces(N,Spaces) :-
   var(Spaces),var(N),                 % case: "generate pairs"
   !,
   between(0,inf,N),
   (gen_string_of_spaces(N,Spaces) -> true ; throw("Problem in gen_string_of_spaces/2")).

gen_string_of_spaces( 0,"")           :- !.
gen_string_of_spaces( 1," ")          :- !.
gen_string_of_spaces( 2,"  ")         :- !.

gen_string_of_spaces(N,Spaces) :-
   N>2, !,
   divmod(N,2,Times,Remainder),
   gen_string_of_spaces(Times,S1),
   string_concat(S1,S1,S2),
   (Remainder>0
    -> (string_of_spaces(Remainder,SR), string_concat(S2,SR,Spaces))
    ;  Spaces = S2).

Note the commented-out table directive.

The above can generate String/Length pairs, as is the Prolog custom:

?- string_of_spaces(L,S).
L = 0,
S = "" ;
L = 1,
S = " " ;
L = 2,
S = "  " ;
L = 3,
S = "   " ;

However, once one tables string_of_spaces/2, an infinite loop seems to appear:

?- string_of_spaces(L,S).
(noise of CPU fan intensifies)

This is probably a problem.

Boris · August 31, 2020, 6:55am

I can’t be certain about it, but maybe the liberal use of cuts is part of the issue.

In the docs, I read:

Tabling guarantees logically correct results and termination provided the computation only involves terms of bounded size on pure Prolog programs, i.e., Prolog programs without side effects or pruning of choice points (cut, ->/2, etc.).

I hadn’t read the rest of the section carefully enough, but this tells me that with the cuts, neither logically correct results nor termination is “guaranteed”.

jan · August 31, 2020, 7:18am

Cut should not affect termination. They may lead to incomplete results (and using negation to wrong answers). The current implementation does not check whether filling a table is truncated due to a cut.

Non-termination with tabling should only occur if the number of solutions is infinite. Note however that using the current local scheduling technique (which is the most popular), all tables are fully filled before the first answer is produced. Thus, goals with a large number of answers may take long to produce the first answer.

Boris · August 31, 2020, 7:27am

I didn’t read the code of @dtonhofer super carefully, but it seems that the predicate can have infinitely many solutions, in the spirit of

?- length(L, N).

or

?- between(0, inf, N).

Is that what you mean by “number of solutions is infinite”?

jan · August 31, 2020, 7:33am

Those are two examples of goals with an infinite number of solutions repeat/0 also works

Boris · August 31, 2020, 7:38am

In regards to your implementation, did you make sure that this implementation is any faster than more trivial code?

For example, “this is a string of length N, with only spaces in it” could probably be done as:

string_length(Spaces, N),
split_string(Spaces, "", " ", [""])

(I am assuming here that the implementation of split_string/4 is clever enough to first get rid of padding characters in linear time and constant memory.)

I also suspect that this is quite OK for “generate a string of spaces with length N”, if the N is not something ridiculously large:

length(L, N),
maplist(=(" "), L),
atomics_to_string(L, Spaces)

but I didn’t try to profile your code and compare it.

jan · August 31, 2020, 7:43am

As for the implementation, there are simpler ways To generate a string of N
spaces you can use format/3, e.g.,

?- format(string(Spaces), '~t~*|', [2]).

To check a string only holds spaces

?- split_string("   ", "", " ", [""]).
true.
?- split_string("x  ", "", " ", [""]).
false.

The more logical (but slower) way is like this. The reverse (check) is the string_codes/2 followed by the maplist/2.

length(Codes, N),
maplist(=(0'\s), Codes),
string_codes(Codes, String).

dtonhofer · August 31, 2020, 11:37am

That must be it then, because with both variables unbound, a +oo number of solutions is produced (it’s the “generate the whole universe of strings” branch).

dtonhofer · August 31, 2020, 12:51pm

Interesting ways of generating and testing there. Thanks.

I will run a naive performance test a bit later.

dtonhofer · September 1, 2020, 10:49am

A little test for creation of “space strings”:

Two gripes:

There should be a way to access the “marker id” of the test clause present in the head from INSIDE the body.
1. In fact, it should be possible throughout Prolog to access ARG(i) of the head without having to retype it in the body:
  f(h(X),g(Y)) :- foo(ARG_0,ARG_1) instead of
  f(h(X),g(Y)) :- foo(h(X),g(Y)).,
  ARG_0 and ARG_1 being just aliasing variables. Pretty weird this isn’t possible.
There should also be a way to capture the output of time/1 which can’t be done neither directly using variables not indirectly using with_output_to/2… That’s fixable though. I will have to take a look at statistics.pl

Oh yeah, the results!

It’s not as if such a predicate would be called all that often, but there is a difference (factor of 2 for max string size 100, the string size being chosen randomly; if we increase the max string size to 500, the factor increases to 5…7):

Creating random strings of space, then dropping them:

Using the “special predicate”:

% 2,460,130 inferences, 0.602 CPU in 0.604 seconds 
% (100% CPU, 4089084 Lips)
drop them immediately (100000 calls) (100 max size) 
using goal 'goal_special': ""

Using the “length/maplist/string_codes” sequence:

% 11,225,260 inferences, 1.443 CPU in 1.447 seconds 
% (100% CPU, 7778712 Lips)
drop them immediately (100000 calls) (100 max size) 
using goal 'goal_direct': ""

Creating random strings of space, but keeping them in a list till the end:

Using the “special predicate”:

% 2,459,621 inferences, 0.662 CPU in 0.665 seconds 
% (100% CPU, 3714638 Lips)
collect in list (100000 calls) (100 max size) 
using goal 'goal_special': ""

Using the “length/maplist/string_codes” sequence:

% 11,197,933 inferences, 2.085 CPU in 2.093 seconds
% (100% CPU, 5370583 Lips)
collect in list (100000 calls) (100 max size) 
using goal 'goal_direct': ""

Code

:- use_module(library('heavycarbon/strings/string_of_spaces.pl')).

:- begin_tests(stringy_performance).

callcount(100000).
max_string_size(100).

goal_special(SzMax,Spaces) :-
   random_between(0,SzMax,RL),
   string_of_spaces(RL,Spaces). % The special predicate

goal_direct(SzMax,Spaces) :-
   random_between(0,SzMax,RL),
   length(Codes,RL),
   maplist(=(0'\s),Codes),
   string_codes(Codes,Spaces). % The length/maplist/string_codes sequence

test("generate strings of spaces of random length and drop them immediately after creation") :-
   maplist([Goal]>>create_and_drop(Goal),[goal_special,goal_direct]).

create_and_drop(Goal) :-
   callcount(CC),
   max_string_size(SzMax),
   with_output_to(string(StatsTxt),time(forall(between(1,CC,_),call(Goal,SzMax,_Spaces)))),
   % doesn't work: time/1 output is NOT captured in StatsTxt
   format("~s (~d calls) (~d max size) using goal '~s': ~q\n",["drop them immediately",CC,SzMax,Goal,StatsTxt]).

test("generate strings of spaces of random length and store them in a list") :-
   maplist([Goal]>>create_and_collect(Goal),[goal_special,goal_direct]).

create_and_collect(Goal) :-
   callcount(CC),
   max_string_size(SzMax),
   length(CollectionList,CC),
   with_output_to(string(StatTxt),time(maplist({Goal,SzMax}/[Spaces]>>call(Goal,SzMax,Spaces),CollectionList))),
   % doesn't work: time/1 output is NOT captured in StatsTxt
   format("~s (~d calls) (~d max size) using goal '~s': ~q\n",["collect in list",CC,SzMax,Goal,StatTxt]).

:- end_tests(stringy_performance).

jan · September 1, 2020, 1:41pm

The normal Prolog idiom is

f(A1, A2) :-
    A1 = h(X),
    A2 = h(Y),
    foo(A1, A2).

Currently the price is that this stops this clause to be considered for clause indexing. That should hopefully change at some point. Technically it should be not to hard what you propose. I have my doubt it is worth breaking the uniformity for this reason. Note that (also not implemented), the implementation could detect compound arguments being passed to subgoals and simply pass the term rather than building it again.

Ideally one (or both) of these optimizations should be implemented.

Boris · September 1, 2020, 6:59pm

This is a bit off topic, but now that I read your code carefully enough, I noticed that you are doing something that is very much like this other thing, namely, raising to the power. I once shared the code, it is here:

So, if you wanted to not have to code gen_string_of_spaces/2 yourself, but still keep the same idea, it would go something like this:

:- use_module(power).

gen_string_of_spaces(N, Spaces) :-
    power(" ", N, string_concat, "", Spaces).

This is it. We are back to a one-liner, as with using format, and it has roughly the same hackiness index.

I have explained it in the original post, but briefly:

the first argument is the Element, in this case a string with one space;
the second argument is the power, here the length of the final string;
the third argument is the operation, here string_concat/3
the fourth argument is the neutral element, the empty string (if your N is 0)
the last argument is the result

I added the following two goals to your test suite:

goal_power(SzMax, Spaces) :-
    random_between(0, SzMax, RL),
    power(" ", RL, string_concat, "", Spaces).

goal_format(SzMax, Spaces) :-
    random_between(0, SzMax, RL),
    format(string(Spaces), "~t~*|", [RL]).

… and I also added them to the list of tested predicates:

test("generate strings of spaces of random length and drop them immediately after creation") :-
   maplist([Goal]>>create_and_drop(Goal),[goal_special,goal_direct,goal_power,goal_format]).

% SNIP

test("generate strings of spaces of random length and store them in a list") :-
   maplist([Goal]>>create_and_collect(Goal),[goal_special,goal_direct,goal_power,goal_format]).

Now, when I run your tests, I see:

% PL-Unit: stringy_performance 
% 3,712,138 inferences, 0.526 CPU in 0.526 seconds (100% CPU, 7063825 Lips)
drop them immediately (100000 calls) (100 max size) using goal 'goal_special': ""
% 11,194,928 inferences, 0.900 CPU in 0.901 seconds (100% CPU, 12433048 Lips)
drop them immediately (100000 calls) (100 max size) using goal 'goal_direct': ""
% 4,369,662 inferences, 0.536 CPU in 0.537 seconds (100% CPU, 8146089 Lips)
drop them immediately (100000 calls) (100 max size) using goal 'goal_power': ""
% 700,000 inferences, 0.313 CPU in 0.314 seconds (100% CPU, 2237162 Lips)
drop them immediately (100000 calls) (100 max size) using goal 'goal_format': ""
.
% 3,707,469 inferences, 0.649 CPU in 0.650 seconds (100% CPU, 5715648 Lips)
collect in list (100000 calls) (100 max size) using goal 'goal_special': ""
% 11,202,787 inferences, 1.316 CPU in 1.318 seconds (100% CPU, 8510957 Lips)
collect in list (100000 calls) (100 max size) using goal 'goal_direct': ""
% 4,367,630 inferences, 0.573 CPU in 0.574 seconds (100% CPU, 7623688 Lips)
collect in list (100000 calls) (100 max size) using goal 'goal_power': ""
% 700,001 inferences, 0.307 CPU in 0.308 seconds (100% CPU, 2278379 Lips)
collect in list (100000 calls) (100 max size) using goal 'goal_format': ""
. done
% All 2 tests passed
true.

The results are pretty similar between runs.

Please confirm my interpretation: it seems that using format(string(Spaces), "~t~*|", [N]) is still roughly twice as fast as the other approaches, with short strings? I suspect that doing less iteration but generating longer strings might make your initial implementation (and the “power”, which is basically the same) somewhat faster. I tried it with:

callcount(10000).
max_string_size(1000).

Indeed, now I get:

% PL-Unit: stringy_performance 
% 606,014 inferences, 0.095 CPU in 0.095 seconds (100% CPU, 6353769 Lips)
drop them immediately (10000 calls) (1000 max size) using goal 'goal_special': ""
% 10,019,866 inferences, 0.766 CPU in 0.767 seconds (100% CPU, 13079676 Lips)
drop them immediately (10000 calls) (1000 max size) using goal 'goal_direct': ""
% 647,857 inferences, 0.092 CPU in 0.092 seconds (100% CPU, 7050067 Lips)
drop them immediately (10000 calls) (1000 max size) using goal 'goal_power': ""
% 70,000 inferences, 0.153 CPU in 0.154 seconds (100% CPU, 456227 Lips)
drop them immediately (10000 calls) (1000 max size) using goal 'goal_format': ""
.
% 607,793 inferences, 0.122 CPU in 0.122 seconds (100% CPU, 4992397 Lips)
collect in list (10000 calls) (1000 max size) using goal 'goal_special': ""
% 10,156,645 inferences, 1.353 CPU in 1.354 seconds (100% CPU, 7506170 Lips)
collect in list (10000 calls) (1000 max size) using goal 'goal_direct': ""
% 649,262 inferences, 0.104 CPU in 0.105 seconds (100% CPU, 6213561 Lips)
collect in list (10000 calls) (1000 max size) using goal 'goal_power': ""
% 70,001 inferences, 0.155 CPU in 0.156 seconds (100% CPU, 450529 Lips)
collect in list (10000 calls) (1000 max size) using goal 'goal_format': ""
. done
% All 2 tests passed
true.

To be honest, I find this at least a bit surprising. I would have expected that using format is going to be always faster. I suspect that string_concat/3 is doing some trickery; I would have to read the implementation if I really wanted to know

dtonhofer · September 3, 2020, 10:48am

Thanks Boris. That’s very detailed.

Using the format/3 approach I have:

When dropping the generated strings:

% PL-Unit: string_of_spaces_performance 
% 4,033,404 inferences, 1.016 CPU in 1.020 seconds (100% CPU, 3968449 Lips)
drop them immediately (100000 calls) (500 max size) using goal 'goal_special': ""
% 51,102,978 inferences, 6.319 CPU in 6.342 seconds (100% CPU, 8087494 Lips)
drop them immediately (100000 calls) (500 max size) using goal 'goal_direct': ""
% 800,000 inferences, 1.226 CPU in 1.233 seconds (99% CPU, 652618 Lips)
drop them immediately (100000 calls) (500 max size) using goal 'goal_format': ""

When keeping the generated strings (stack size set to 2 GiB):

% 5,136,173 inferences, 1.781 CPU in 1.789 seconds (100% CPU, 2883480 Lips)
collect in list (100000 calls) (500 max size) using goal 'goal_special': ""
% 52,297,635 inferences, 9.843 CPU in 9.898 seconds (99% CPU, 5313253 Lips)
collect in list (100000 calls) (500 max size) using goal 'goal_direct': ""
% 1,900,001 inferences, 1.716 CPU in 1.723 seconds (100% CPU, 1107403 Lips)
collect in list (100000 calls) (500 max size) using goal 'goal_format': ""

So timewise it’s about the same as “task-specific exponentiation” but the number of “inferences” (clause calls?) is markedly lower: a divisor 5 when “dropping strings”, 2.5 when “keeping strings”.

Boris · September 3, 2020, 11:38am

To be honest, I am not sure how to interpret the timings. I would have to profile to understand what is going on. One thing is for sure: calls to format do not really influence the number of inferences in a meaningful way. Try this for yourself:

?- time(format(string(Spaces), "~t~*|", [0])).
% what do you expect to see here?
?- time(format(string(Spaces), "~t~*|", [1000])).
% what do you expect to see now?

Also, when I was running the tests, your exponentiation code had less inferences but took a bit longer than the “power” algorithm. Why? Not sure either. I explained it away with less predicate calls altogether for your original version, but many cuts, in contrast to more predicate calls in the “power” version, but no choice points at all. This was a speculation.

Another wild speculation is that there is something strange going on in terms of memory allocation when there is a large N in this call:

format(string(Spaces), "~t~*|", [N])

Again, since format was never meant to be used like this, nor am I ever going to generate strings of spaces 1000 long, this is a dead-end in a way

Boris · September 3, 2020, 11:59am

One strange thing: in my tests, “dropping” and “keeping” had no effect on the number of inferences for the “format” goal; in your tests, they do have an effect (twice as many for “keeping”) – how do you explain that?

dtonhofer · September 4, 2020, 8:06am

One strange thing: in my tests, “dropping” and “keeping” had no effect on the number of inferences for the “format” goal; in your tests, they do have an effect (twice as many for “keeping”) – how do you explain that?

Firs, corrected numbers, I stupidly left a check against the string length in. This gets rid of 1 LIF per string.

700,000 inferences, 1.208 CPU in 1.213 seconds (100% CPU, 579530 Lips)
drop them immediately (100000 calls) (500 max size) using goal 'goal_format': ""

1,800,001 inferences, 1.758 CPU in 1.767 seconds (99% CPU, 1023608 Lips)
collect in list (100000 calls) (500 max size) using goal 'goal_format': ""

This is better.

Now, I expect the collection to have a constant factor of LIF more work to do, as it has to perform whatever the yall Lambda Expression that does it resolves to:

In case of dropping, the instruction is:

time(forall(between(1,CC,_),call(Goal,SzMax,_Spaces)))

But in case of keeping the instruction is:

time(maplist({Goal,SzMax}/[Spaces]>>call(Goal,SzMax,Spaces),CollectionList))

As for format/2:

?-  time(format(string(Spaces), "~t~*|", [0])).
% 1 inferences, 0.000 CPU in 0.000 seconds (93% CPU, 53405 Lips)
 
?- time(format(string(Spaces), "~t~*|", [1000])).
% 0 inferences, 0.000 CPU in 0.000 seconds (97% CPU, 0 Lips)

Hah.

Well, it’s a “builtin”. So it immediately drops to no-inference generating code. That “1/0 inferences” may well come from time/1 being off-by-one (in which direction I don’t know as there is no specification as to what it should show)

Topic		Replies	Views
Puzzling Performance Issue Help!	8	885	November 28, 2021
Unexpected behavior of term_string Predicate	2	282	December 28, 2022
Tabling space error Help!	4	397	January 7, 2023
Btw. I found 8 possibilities to concat atoms in Swi Prolog Nice to know	6	880	March 24, 2022
Trivial parsing and generating DCG Help!	5	875	August 3, 2019

Once this predicate is tabled, it loops forever

Code

Related topics