Partition number in ZDD reply

You can do it in one single predicate:

:- table p/2.
p(0, 1) :- !.
p(N, X) :-
    aggregate_all(sum(Z), (between(1,inf,K), M is K*(3*K-1)//2,
           (M>N, !, fail; L is N-M, p(L,Y), Z is (-1)^K*Y)), A),
    aggregate_all(sum(Z), (between(1,inf,K), M is K*(3*K+1)//2,
           (M>N, !, fail; L is N-M, p(L,Y), Z is (-1)^K*Y)), B),
    X is -A-B.

Here is a results:

/* SWI-Prolog 8.3.21 */
?- time(p(6666,X)).
% 13,962,294 inferences, 2.610 CPU in 2.743 seconds (95% CPU, 5350059 Lips)
X = 1936553061617076610800050733944860919984809503384
05932486880600467114423441282418165863.

Thank you for elegant more prolog-like codes. Fortunately my codes returns the same partition number for 6666 with your codes:

% ?- time(zdd((pn(6666, P)))).
%@ % 10,873,956 inferences, 1.154 CPU in 1.179 seconds (98% CPU, 9422855 Lips)
%@ P = 193655306161707661080005073394486091998480950338405932486880600467114423441282418165863.

Kuniaki Mukai

Its a Prolog translation of the Maple solution here:

time(p(6666));
# 0.796
time(combinat[numbpart](6666));
# 0.406

https://rosettacode.org/wiki/Partition_function_P#Maple

Edit 10.04.2021:
Crazy side note, you can compute p(n) also with interval arithmetic:

Arb computes the 111391-digit number p(10^10) in 0.3 seconds, whereas Mathematica 9.0 takes one minute. Arb has been used to compute the record value p(10^20) = 1838176508 . . . 6788091448, an integer with more than 11 billion digits. This took 110 hours (205 hours split across two cores) with 130 GB peak memory usage.

https://arxiv.org/abs/1611.02831

It really souds like an algorithm of demon. Thanks for the side note.

K. Mukai

Now I am waiting for a Picat solution. I want to see how well its tabling fares. But
I didn’t yet get a response on stack overflow. There is possibly a problem with
languages that invent some new ad-hoc language like Picat, Logica, etc… For
example Picat could have used foreach/2 instead of a complicated syntax

with syntactic sugar like the “end” keyword. Thats possibly a dead end and
will only attract a small base of enthusiastic user. Every new language
that cannot be parsed by Prolog but leans toward Prolog, closes the
door towards existing Prolog systems and its

power of meta programming etc…

@j4n_bur53

Sorry, I missed your StackOverflow question. It’s answered now: loops - Partition function P in Picat - Stack Overflow .

Here’s my Picat solution, inspired by the Maple approach, using a foreach while loop (and table):

table
partition1(0) = 1.
partition1(N) = P =>
  S = 0,
  K = 1,
  M = (K*(3*K-1)) // 2,  
  while (M <= N)
     M := (K*(3*K-1)) // 2,  
     S := S - ((-1)**K)*partition1(N-M),
     K := K + 1
  end,
  K := 1,
  M := (K*(3*K+1)) // 2,
  while (M <= N)
     M := (K*(3*K+1)) // 2,  
     S := S - ((-1)**K)*partition1(N-M),
     K := K + 1
  end,
  P = S.

Your (neat) SWI-Prolog version takes about 1,9s for p(6666) on my machines

?- time(p(6666,X)), write(X), nl.
% 13,959,720 inferences, 1.916 CPU in 1.916 seconds (100% CPU, 7285567 Lips)
193655306161707661080005073394486091998480950338405932486880600467114423441282418165863
X = 193655306161707661080005073394486091998480950338405932486880600467114423441282418165863.

My (not as neat) Picat version takes about 0.2s

p(6666) = 193655306161707661080005073394486091998480950338405932486880600467114423441282418165863

CPU time 0.214 seconds.

Thats a nice result:

I tried poor mans tabling as well:

:- thread_local p_cache/2.

p_manual(N, X) :- p_cache(N, X), !.
p_manual(0, 1) :- !, assertz(p_cache(0, 1)).
p_manual(N, X) :-
    aggregate_all(sum(Z), (between(1,inf,K), M is K*(3*K-1)//2,
           (M>N, !, fail; L is N-M, p_manual(L,Y), Z is (-1)^K*Y)), A),
    aggregate_all(sum(Z), (between(1,inf,K), M is K*(3*K+1)//2,
           (M>N, !, fail; L is N-M, p_manual(L,Y), Z is (-1)^K*Y)), B),
    X is -A-B,
    assertz(p_cache(N, X)).

Results on my machine, its a slower machine than your machine:

/* SWI-Prolog 8.3.21 */
?- time(p(6666,X)).
% 13,930,884 inferences, 2.672 CPU in 2.812 seconds (95% CPU, 5214609 Lips)
X = 193655306161707661080005073394486091998480950338405932486880600467114423441282418165863.

?- time(p_manual(6666,X)).
% 5,953,138 inferences, 1.037 CPU in 1.152 seconds (90% CPU, 5742763 Lips)
X = 193655306161707661080005073394486091998480950338405932486880600467114423441282418165863.

I wonder what the higher inference count means in the tabling solution p/1?
Has this with SLG resolution to do?

Edit 14.04.2021:
My tabling is not that clever, I cannot yet deal with all cases that SLG resolution can deal with. As a result my tabling is a little faster, since it is less complex. But strangely the poor mans solution isn’t:

/* Jekejeke Prolog 1.5.0 */
?- time(p(6666,X)).
% Up 2,072 ms, GC 53 ms, Threads 2,000 ms (Current 04/14/21 00:19:00)
X = 193655306161707661080005073394486091998480950338405932486880600467114423441282418165863

?- time(p_manual(6666,X)).
% Up 1,369 ms, GC 24 ms, Threads 1,329 ms (Current 04/14/21 00:20:32)
X = 193655306161707661080005073394486091998480950338405932486880600467114423441282418165863

Open source:

Partition Number, SWI-Prolog solution
https://gist.github.com/jburse/8a24fe5668960c8889770f40c65cdf35#file-partswi-pl

Partition Number, Jekejeke Prolog solution
https://gist.github.com/jburse/8a24fe5668960c8889770f40c65cdf35#file-part-pl

@hakank findall/3 doesn’t mimick aggregate_all/3, since SWI-Prolog aggregate_all/3 doesn’t use findall/3 for sum/1 aggregate. Please try newest release of SWI-Prolog.

This was the case maybe in the old times, but now SWI-Prolog has a fast path for sum/1 that uses a failure driven loop and a value holder:

  187 aggregate_all(sum(X), Goal, Sum) :-
  188    !,
  189    State = state(0),
  190    (  call(Goal),
  191           arg(1, State, S0),
  192           S is S0 + X,
  193           nb_setarg(1, State, S),
  194           fail
  195    ;  arg(1, State, Sum)
  196    ).

https://www.swi-prolog.org/pldoc/doc_for?object=aggregate_all/3

Edit 14.04.2021:
What I also don’t understand is why Picat solution says:

  M := (K*(3*K+1)) // 2,
  while (M <= N)
     M := (K*(3*K+1)) // 2,  
     S := S - ((-1)**K)*partition1(N-M),
     K := K + 1
  end,

Shouldn’t this be?

  M := (K*(3*K+1)) // 2,
  while (M <= N)
     S := S - ((-1)**K)*partition1(N-M),
     K := K + 1
     M := (K*(3*K+1)) // 2,  
  end,

Do, i see this correctly, your java based solution is almost as fast as swi prolog’s c-based solution, when the code is written well optimized.

@j4n_bur53 When I wrote “mimicked” the SWI-Prolog approach in Picat, I just meant a solution that didn’t use the explicit while loop. The tests I did was using the latest SWI-version (I always have the latest SWI-Prolog version. :-)). Sorry about the confusion.

Regarding you comment about the positions of the M := line (first or last in the while loops), I tested both versions and they don’t seems to differ at all, neither in result or in speed. However, I checked the number of steps in partition1(10) for the while loops of each version: my version (the M line first) takes 50 steps while your version (M line last) takes 29 steps, so I change to your version.

It’s interesting that there are no changes in times between these versions for partition1(6666) so I tested with p(66666). But there’s no discernible difference in times, both versions ranges between 16.9s and 17.6s.

But I keep your version. Thanks!

Its faster, 736 ms versus 896 seconds, on MacBook Air 2019. Will check on Windows machine as well. I dont know why, usuall SWI-Prolog is faster than my Prolog systems. But the loop unraveling results in some nasty calls like:

    p21([M, N, Z, K], [_, N, L, _]),

In my Prolog system this doesn’t cost anything, since I have a non-stack parameter passing architecture. Once the variables of a clause are allocated, everything is free. In SWI-Prolog, with a stack parameter passing like architecture, somethings needs to be pushed on the stack.

Also would like to compare with Picat more closely. Hakan reports 0.2 seconds for Picat. What is the B-Prolog code that Picat generates? According to the PADL17 paper it should be something similar. But B-Prolog can execute it even faster? On Windows machine will also measure Picat.

I diverted from the PADL17 paper in as far as I pass around the variables V1,…,Vn of a loop in a list [V1,…,Vn] instead as arguments. The PADL17 paper shows for foreach a more flattened translation which leads to a higher number of arguments. Currious what Picat does.

Interesting.

Is this feature a non-stack-passing technique tightly integrated into your other prolog implementation features (or even with the java vm) – or could such an optimization be introduced in swi-prolog as well.

I imagine that since parameter passing is so ubiquitous – essentially happens all the time, that this would make a significant difference.

Dan

@j4n_bur53 In Picat there’s an option to convert Picat code to B-Prolog (compile_bp/1). I can do this later today, but now I have to go to the dentist…

That’s too bad – archane isn’t an engineering tradeoff, If it works well – is there an engineering tradeoff one makes to gain the speed.

Dan

Thanks.

In this text you can see the engineering tradeoff: speed of parameter passing (significantly increased) vs. speed of data access (“somwhat slower”) so this might make a lot of sense to do.

Regarding the CPU architecture – since, say, SWI Prolog, runs as a VM, an engineering decision that aims to align with the CPU design for function calling, doesn’t make sense – since this is all happening in the VM and hence a “virtual” CPU …

What you refer to reminds me of structure sharing – is this also what is going on?

Is SWI Prolog structure sharing as well?

Dan

It should say archaic architecture. Sorry. Prolog system designers either prefer a register file or a stack for parameter passing. Because this is closer to what CPUs offer as parameter passing or the way the VM is fundamentally designed.

A goal is passed by a so called molecule in my Prolog system. This is 1970’s very early Prolog technique, going back to Marseille, what about Edinburgh? (LoL), and therefore archaic, and possibly also arcane for the other techniques:


https://www.researchgate.net/profile/Luis-Pereira-25/publication/239578881

Thank you for the reference.

It is the first time I started reading about a Prolog implementation that i can actually readily understand …

Interestingly, the approach they took – to focus on creating (CPU machine) instructions for unification in the head – would not readily work for implementing SSU – where you want to defer unification to the body – to ensure steadfastness.

… if i gathered this correctly …

Dan

Yes. – will edit

I think what is remarkable here as well is that your Prolog is implemented in Java – not a language known for high (or system level) performance. Yet, it performs very well.

Does it mean that quite a bit more performance can be obtained in swi-prolog if it implements the 'archaic" approach to handing argument passing.

Or, does it mean that Java along with an efficient-oriented way to program it, can get performer close to c.

Where did you get that from? Good ole Hotspot Java usually the same speed as C. Here is also a Java version of Partition Function, which is like 10x times faster than my Prolog solution:

P(6666) = 19365530616170766108000507339448609199848095033840593248688
0600467114423441282418165863
elapsed time: 59 milliseconds

https://rosettacode.org/wiki/Partition_function_P#Java

But I didn’t try yet on my Windows 10 machine. Whats faster than Java is GMP. GMP is faster than Java BigInteger, because GMP has better algorithms and less overhead. You find this also on rosetta:

19365530616170766108000507339448609199848095033840593248688
0600467114423441282418165863
elapsed time: 8.99497 milliseconds

https://rosettacode.org/wiki/Partition_function_P#C.2B.2B

But since SWI-Prolog uses GMP, it should be faster? This was also a question when I looked at Picat solution, but Hakan responded Picat should have no problems with bignum.

Maybe in SWI-Prolog there is some copying involved in tabling of bignum?