Autum Challenge: Short Deadfish Numbers

Mostlikely the tabling solution is in essence Dijkstras algorithm.
Here is a tabling solution, only “i”, “d” and “s”, no “o” yet.
The following fails:

% edge(+Atom, +Integer, -Integer)
edge(s, N, M) :- (N = 0 -> M = 16; H is sqrt(N),
   M is truncate(H), H is float(M)).
edge(i, N, M) :- (N = 0 -> M = 255; M is N-1).
edge(d, N, M) :- M is N+1.

% path(+Integer, -Integer, -List)
:- table(path(_, _, lattice(shortest/3))).
path(N, N, [o]).
path(N, M, [X|L]) :- path(N, H, L), edge(X, H, M).

It gives this answer:

?- path(90, 0, L), write(L), nl.
ERROR: '$tbl_wkl_add_answer'/4: Not enough resources: private_table_space

Now if I bound the length of L:

path(N, N, [o]).
path(N, M, [X|L]) :- path(N, H, L), length(L, K), K =< 21, edge(X, H, M).

It works, and it is quite speedy:

?- time(path(90, 0, L)), write(L), nl.
% 2,979 inferences, 0.000 CPU in 0.001 seconds (0% CPU, Infinite Lips)
[i,i,i,s,s,i,i,i,i,i,i,i,i,i,o]

Edit 07.10.2023
All together, the whole dead fish challenge with SWI-Prolog lattice tabling:

?- time(total(S,M,C)).
% 835,418 inferences, 0.188 CPU in 0.204 seconds (92% CPU, 4455563 Lips)
S = 3215,
M = 22,
C = 255.

Little bit slower than my hand rolled memoization,
since the bound K =< 21 leaves a lot of search space room,
making it slower than iterative deepending with tabling.

0.2 seconds is ca 9-times faster than the 1.8 seconds reported
by @peter.ludemann . But need to check what it does for
“o”. With “o” we can use a smaller bound, maybe its faster.

I added that optimisation to my code and got an 8x speed-up – it now computes all minimum results in 0.25 second.

I don’t believe you! Can you give proof? Because kwon-young
reported 0.619 seconds, how do you want to get 0.25 seconds?
Thats impossible, unlike your machine is 3 times faster.

If you would post your code, we could reproduce your claims.
Mostlikely you will never be faster than kwon-young, because
you use dicts, and kwon-young doesn’t use dicts.

You can only break the speed barrier if you fundamentally change
the algorithm, from brute force to something else.

The factor 8x is much too high, testing shows:

/* dif(P, d) and dif(P, i) commented out */
?- time(total(S,M,C)).
% 50,326,245 inferences, 3.266 CPU in 3.253 seconds (100% CPU, 15410908 Lips)
S = 2036,
M = 14,
C = 255.

/* dif(P, d) and dif(P, i) in place */
?- time(total(S,M,C)).
% 8,179,672 inferences, 0.531 CPU in 0.538 seconds (99% CPU, 15397030 Lips)
S = 2036,
M = 14,
C = 255.

Which gives a factor ca. 6x faster. Maybe you got factor 8x because
you deployed also the other dif/2 that kwon-young use, and the when/2 call?

I don’t use dicts - I use compile-time expanded records (see library(record)).

My code is here (it still needs a bit of clean-up): deadfish code improved · GitHub

How does your record solution compare to kwon-youngs deadfish.pl ?
What are the two timings? Is the record solution faster? You
have set_XXX calls from library(record). They are as slow as dicts:

Because if you set one attribute, you nevertheless copy a whole record.
Thats the same dead end as using dicts, using library(record). For example
this record declaration:

:- record point(x:integer=0, y:integer=0).

generates me this setter:

set_x_of_point(A, point(_, B), point(A, B)) :-
    must_be(integer, A).

So when you first set the x coordinate and then the y coordinate,
it will create two new record, one for nothing, that needs to be
garbage collected, and one which is what you want with the bulk change.

The short dead fish example doesn’t have so many bulk changes,
often only one attribute is changed. What then makes it slow is just
the length of the record Prolog term, and that I guess the Prolog system

needs to make a copy. Which you might also have in a manual solution,
like kwon youngs solution, but to a less extend, the SWI-Prolog ZIP
machine is especially optimized for argument permutations in tail recursion,

they don’t need predicate goal re-creation.

The Mercury !IO notation would have another dynamics.
This is really friendly to Prolog systems such as SWI-Prolog,
which are strong in tail recursion.

If you use Mercury !IO in predicate argument position, no
pair record is introduced, the predicate argument position
pair is spreaded into the predicate head or goal.

But to test that you need a Mercury !IO processor, maybe
the prototype from Jan W. and also write a solution
using !IO notation. Not sure whether Mercury !IO

has a place in Short Dead Fish?

These are expanded at compile time (in expand_record/5). If they’re not expanded, the performance using library(record) is about the same as using dicts.

Still, even expanded, they are slower than kwon-youngs deadfish.pl .
Could you make a measurement and compare to kwon-youngs deadfish.pl
on your machine. What are the two timings? Is the record solution faster?

I guess not. Also you could expand even more, now you only expand:

get_set_record(Name, Rec, V0, NewRec, V) :-
    dcg_record_name(RecName),
    call_univ([RecName, '_', Name], [Rec, V0]),
    call_univ([set_, Name, '_of_', RecName], [V, Rec, NewRec]).

But instead of generating a set_XXX call, you could generate what is behind set_XXX.
Inline set_XXX itself. What will be the difference? Yet another 40% faster?
But still slower than kwon-youngs deadfish.pl ? Yes or no?

So, just with small incremental optimisation with things like removing pure constructs like when/2 or dif/2 and inlining a few predicate, I got down to 0.179 sec without optimisation and 0.145 sec with optimisation.

code optimized for speed
deadfish(Prog, I) :-
   length(Prog, _),
   number_chars(I, Digits),
   phrase(deadfish(Prog, o, 0), Digits).

deadfish([], o, _) -->
   [].
deadfish([o | Prog], _, I) -->
   {
      I >= 0,
      number_chars(I, Digits)
   },
   deadfish(Digits),
   deadfish(Prog, o, I).
deadfish([i | Prog], P, I) -->
   {
      P \== d,
      I1 is I + 1,
      (  I1 == -1
      -> I2 = 0
      ;  I1 == 256
      -> I2 = 0
      ;  I2 = I1
      )
   },
   deadfish(Prog, i, I2).
deadfish([d | Prog], P, I) -->
   {
      P \== i, I \== 0,
      I1 is I - 1,
      (  I1 == -1
      -> I2 = 0
      ;  I1 == 256
      -> I2 = 0
      ;  I2 = I1
      )
   },
   deadfish(Prog, d, I2).
deadfish([s | Prog], _, I) -->
   {
      I \== 0, I \== 1,
      I1 is I * I,
      (  I1 == -1
      -> I2 = 0
      ;  I1 == 256
      -> I2 = 0
      ;  I2 = I1
      )
   },
   deadfish(Prog, s, I2).
deadfish([]) -->
   [].
deadfish([Digit | Digits]) -->
   [Digit],
   deadfish(Digits).

Brute force, in this example, will never beat a classical shortest
path search algorithm. Because brute force is exponential complexity
whereas shortest path search algorithm is quadratic (*). Yes or no?

To not compare apples with oranges, to only compare oranges
with oranges, I tried to bring your “di” and “id” filtering heuristic
also to tabling. In your case it gave factor 6x speed up.

What is the speed-up when tabling is involved? First how
would one code your heurstic in a tabling solution. Here is
a take, change edge and path so that it does the same filter:

% edge(+Atom, +Atom, +Integer, -Integer)
edge(s, _, N, M) :- (N = 0 -> M = 16; H is sqrt(N),
   M is truncate(H), H is float(M)).
edge(i, A, N, M) :- A \== d, (N = 0 -> M = 255; M is N-1).
edge(d, A, N, M) :- A \== i, M is N+1.

% path(+Integer, -Integer, -List)
:- table(path(_, _, lattice(shortest/3))).
path(N, N, [o]).
path(N, M, [X|L]) :- path(N, H, L), length(L, K),
   K =< 21, L = [Y|_], edge(X, Y, H, M).

Now we can compare, unfortunately no “o” operator yet,
but we get already a feeling for the speed-up:

/* without "di" and "id" filtering heuristic */
?- time(total(S,M,C)).
% 835,398 inferences, 0.203 CPU in 0.193 seconds (105% CPU, 4112729 Lips)
S = 3215,
M = 22,
C = 255.

/* with "di" and "id" filtering heuristic */
?- time(total(S,M,C)).
% 545,033 inferences, 0.141 CPU in 0.144 seconds (98% CPU, 3875790 Lips)
S = 3215,
M = 22,
C = 255.

Impact is smaller, but also visible. Didn’t test yet when operator “o” is also
present. Mostlikely we get a faster solution than brute force again.

(*)
Dijkstra Algorithm has O(V^2) complexity since it visits each
vertice only once. On the other hand brute force algorithm visit vertices
many many times, and have much higher complexity usually.

But now there is a pat situation. Have to find a better tabling
solution or directly a graph shortest path algorithm, not using
tabling. The solution of known-young, I cannot beat it anymore.

On the other hand the hand rolled memoization solution
still beats both tabling and the solution of known-young.
But its an iterative deepening based shortest path algorithm,

not lattice based tabling. I also think the hand rolled can be further
improved, but the 0.052 seconds show clearly the potential of
any tabling. Only how does one realize this patential with an out of

the box tabling as found in SWI-Prolog? So as to avoid handrolling
memoization? Could maybe just place a table/1 directive, where I
have the handrolling, but then its not anymore the lattice based algorithm.

/* known-young latest inlining, deepening based, with "di" and "id" heuristic */
?- time(total(S,M,C)).
% 1,598,618 inferences, 0.188 CPU in 0.187 seconds (100% CPU, 8525963 Lips)
S = 2036,
M = 14,
C = 255.

/* my latest tabling, lattice based, with "di" and "id" heuristic */
?- time(total(S,M,C)).
% 851,775 inferences, 0.172 CPU in 0.197 seconds (87% CPU, 4955782 Lips)
S = 2036,
M = 14,
C = 255.

/* my older handrolled, deepening based, without "di" and "id" heuristic */
?- time(total(S,M,C)).
% 249,166 inferences, 0.047 CPU in 0.052 seconds (90% CPU, 5315541 Lips)
S = 2036,
M = 14,
C = 255.

My latest tabliing is here:

fishheur2.p.log (1,3 KB)

My older handrolled is here:

shortout2.p.log (2,9 KB)

For comparison, the code runs in 0.325 / 0.279 sec optimisation / no optimisation on my machine (my guess is that the speed difference is due to smaller cache on my laptop’s CPU)

This is better than my program’s speed: 0.415 / 0.363.
But then I reordered the generation of the opcodes to be the same as @kwon-young 's code and got 0.349 / 0.275 – essentially the same.
The important thing is to try the opcode “o” first (I had it last).

I don’t think so … for my code, the various constraints (disallow [i,d], limit the situations where s is tried, prune the tree when the output doesn’t match the expected output, etc.) make the complexity less than exponential – it might even be quadratic. (Or, perhaps you can say that my code isn’t “brute force” because it does trim some infeasible solutions)

Excuse me jan, i am totally out of the challenge, but i can still look at pictures… and what you added to your post looks a lot like data visualization or cosmetics, if you prefer…matplotlib or what? Cheers

Interesting challenge! I tried it on Scryer Prolog. First I’ll show you this naive implementation:

:- use_module(library(clpz)).
:- use_module(library(dcgs)).
:- use_module(library(lists)).

% Increment the accumulator
ops(i, X0, X) :-
    X #= X0 + 1.

% Square the accumulator
ops(s, X0, X) :-
    X #= X0 * X0.

% Decrement the accumulator
ops(d, X0, X) :-
    X #= X0 - 1.

deadfish(N, Code) :-
    length(Code, _),
    phrase(deadfish_(0, N), Code).

deadfish_(X, X) --> [o]. % Output always at the end
deadfish_(X0, X) -->
    { ops(Op, X0, X1) },
    [Op],
    deadfish_(X1, X).
    

But then I saw that you can output stuff in the middle. I rearranged stuff, so I could try tabling. I introduced some predicates to be able to work bidirectionally (so, also be able to execute Deadfish code!)

:- use_module(library(clpz)).
:- use_module(library(lists)).
:- use_module(library(dif)).
:- use_module(library(freeze)).

% Increment the accumulator
ops(i, X0, X, S, S) :-
    X0 #< 255,
    X #= X0 + 1.

% Square the accumulator
ops(s, X0, X, S, S) :-
    X0 #< 16,
    X #= X0 * X0.

% Decrement the accumulator
ops(d, X0, X, S, S) :-
    X0 #> 0,
    X #= X0 - 1.

% Output the accumulator
ops(o, X, X, Screen0, Screen) :-
    dif(Str, ""),
    append(Screen0, Str, Screen),
    freeze(X, number_chars(X, Str)).

deadfish(Str, Code) :-
    length(Code, _),
    deadfish(_, Str, Code).

deadfish(0, "", []).
deadfish(N, Str, Code) :-
    ops(Op, N0, N, Str0, Str),
    append(Code0, [Op], Code),
    restrict_code(Code0, Op),
    deadfish(N0, Str0, Code0).

restrict_code([], _).
restrict_code(Code, o) :- append(_, [_], Code).
restrict_code(Code, s) :- append(_, [_], Code).
restrict_code(Code, d) :- member(Op, [s, o, d]), append(_, [Op], Code).
restrict_code(Code, i) :- member(Op, [s, o, i]), append(_, [Op], Code).

However, it’s very slow (in fact just using freeze/2 adds a big delay). I tried tabling but it’s slower.

I’ve looked at your code for minimize_backward/3 (short2.p.log), and I don’t understand how to apply it to the “deadfish” problem. Do you have a solution to the “deadfish” problem that uses either your dynamic programming or Dijkstra’s algorithm? (I’ve also tried to figure out how to use Dijkstra’s algorithm but couldn’t figure out a suitable “distance” for labeling the arcs – Dijkstra’s algorithm requires that the total distances are monotonically non-decreasing; but for deadfish, for example, 7 is longer than 8 (iisiiio, iiisdo).

(I tried adding a table/1 directive to my code, but it made no significant difference)

Just use your keyboard and then type the below. What do you get?

?- time(total(S,M,C)).

But you get better results with this one, chronologically younger
in this thread and thus the more recent solution of mine:

I modified the short2.p.log code to use tabling instead of memoization (with assertz/1). Memoization was almost 20x faster (the results below are with the -O flag):

% 18,011 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 8197400 Lips)
S = 3215, M = 22, C = 255.

vs tabling:

?- time(total(S,M,C)).
% 306,625 inferences, 0.050 CPU in 0.050 seconds (100% CPU, 6086382 Lips)
S = 3215, M = 22, C = 255.

Compared with my code, short.p.log doesn’t give minimal solutions because it only uses the “o” opcode at the end; presumably the code can be modified to handle this situation:
90: iiisodddddddddo (mine)
90: iiissiiiiiiiiio (short)
222: iiooo (mine)
222: 'iissdsdddddo` (short)

Unfortunately, I don’t get notified of edits to older posts, which is why I didn’t see this. (Is there a way to subscribe to edited posts, also deleted posts?)
My apologies for the noise (also my apologies for mis-reading the earlier “short2.p.log” and not understanding how to run it – I can only plead to the usual human shortcomings of looking but somehow not seeing, much like the car driver who almost ran me over yesterday and claimed he didn’t see me).

[I’m still trying to understand how short2.pl.log works – I need to rewrite it to make the single letter variable names easier for my dull brain to understand. Incidentally,making the “Goal” in the findall/3 into a separate predicate gave me an almost 2x speedup

Code changed by making the goal in findall/3 into separate predicate, for performance
minimize_backward(0, _, 0, []) :- !.
minimize_backward(N, D, J, [C|L]) :-
    D > 0,
    E is D-1,
    findall(K-C-L, step(N,E,K,C,L), S), % was: (member(C,[s,i,d]), ...)
    keysort(S, [K-C-L|_]),
    J is K+1.

step(N,E,K,C,L) :-
    member(C, [s, i, d]),
    step_backward(C, N, M),
    minimize_backward(M, E, K, L).

]

Thats also obsolete, where do you get that from?
Look at shortout2.p, no more findall/3 anymore.
The findall/3 and keysort/2 combo has been replaced

by a cut (!). So during iterative depending, we anyway get
the shortest solution as the first solution, so we can use
a cut (!), if we are ony interested in one shortest solution.

Could be that this is a secret souce to make it a little faster.
Don’t remember exactly how much speed can be gained
with this. But in essence its still a form of dynamic programming

in my option, only it makes use of some property of weigth=1
and of iterative deepening. Thats somehow how I see it.
Without cut (!) it could be also the case that tabling computes

all solutions, and this makes it slow. But the design is currently
rather that it is also ingrained into the memoization realization to
only produce one solution. Producing only one solution does

also use less memory.

With the shortout2.p program, which tries a lot more possibilities, memoization (with assertz/1) was only about 10% faster than tabling, although tabling registered about 2x inferences:
memoization: 220,246 inferences, 0.066 CPU in 0.066 seconds (100% CPU, 3320079 Lips)
tabling: 456,194 inferences, 0.072 CPU in 0.072 seconds (100% CPU, 6308292 Lips)

@jan – I assume that tabling also records failures? I couldn’t find this documented (I also looked in the XSB documentation).