SWI Prolog and knowledge graphs

friguzzi · July 23, 2024, 9:19am

Hi all,
we have started working on applying SWI Prolog and Liftcover for completion in knowledge graphs, you can find a summary of our efforts at

In summary, we were not able yet to be better than AnyBURL in terms of quality of the learned solutions.
But what surprised me was that SWI Prolog was way slower than PyClause in computing the quality metrics, which require finding all solutions to queries from simple rule bases. PyClause is a Pyhton interface to AnyBURL and AMIE and uses a C++ library for metrics computation.

Fabrizio

jan · July 23, 2024, 10:45am

Can you give some details? How do you represent the KG in Prolog and compute the metrics? Also, what does PyClause use? Note that if you use HDTs you can produce many statistics quite fast.

friguzzi · July 23, 2024, 1:20pm

I created a minimal working example for one task: computing the confidence of a set of rules.
You can find the files at GitHub - friguzzi/lp4kg: Logic Programming for Knowledge Graphs
In folder wn18rr you can find the Wordnet dataset in Prolog and PyClause format.
You can associate a confidence to each rule with

% swipl confidence.pl

and

% python confidence.py

This is what I get on my machine:

% time python confidence.py 
python confidence.py  23,65s user 0,37s system 99% cpu 24,238 total
% time swipl confidence.pl
swipl confidence.pl  116,61s user 1,52s system 99% cpu 1:58,86 total
``

jan · July 23, 2024, 2:06pm

Luckily we can beat Python by changing

 maplist(confidence, R, RC, S)

into

 concurrent_maplist(confidence, R, RC, S)

Your machine seems considerably faster than mine. The original takes 3m32. After the above change and loading train.pl as .qlf file I get 20 seconds. A little more than 10 fold speedup (AMD3950X: 16 cores, 32 threads). That is a little disappointing. I would have expected a speedup closer to 20.

Needs more analysis to see how fair this comparison is and whether it can be improved significantly.

friguzzi · July 23, 2024, 4:10pm

After using concurrent_maplist and once it gets down to 11 s, not bad, thanks

friguzzi · July 23, 2024, 4:47pm

Correction: I used one once too many. With just one once in the second query it takes 33.79 s, while without once it takes 31.15 s

friguzzi · July 23, 2024, 5:09pm

Yes, I used your approach.
I redid the experiments and now with just concurrent_maplist I get 34,84 s while with your approach 32,29 s, thanks

jan · July 24, 2024, 8:38am

I also tried replacing aggregate_all(count, Witness, Goal, Count) by the equivalent

aggregate_all(count, distinct(Witness, Goal), Count)

The funny thing is that this is slower, but it gives a much better speedup under concurrent execution, so in total it is about the same on my hardware. Both could be improved. distinct/2 using a low-level implementation for add_nb_set/3 and aggregate_all/4 by figuring out what causes the poor speedup.

jan · July 24, 2024, 9:27am

Yes. Using your distinct2 with aggregate_all/3 is about the same as using aggregate_all/4. As I also found, this is faster, but scales less good when executed concurrently. The slowdown is partly because of the efficiency of the sort/2, where distinct/2 uses a pure Prolog implemented non-backtrackable hash table instead. Actually, I’m surprised it is as fast as it is. Using distinct/2 gives near perfect speedup under concurrency, while the aggregate_all/4 version does not (getting 10 times from 16 cores/32 threads). That means there is some interaction between the threads. I wonder where that comes from. findall/3 has been decoupled quite extensively as far as I know.

friguzzi · July 24, 2024, 1:55pm

I updated confidence.pl to also print the counts as comments, please pull.
It seems they coincide with those of PyClause…

friguzzi · July 24, 2024, 3:02pm

Let’s consider a more challenging task: that of computation of the metrics.
This is what I get (all the code is in the repo)

Computation of Hits@K and MRR

With PyClause:

computation of the scores of the possible answers

% /usr/bin/time python ranking.py 
...
Ranking file written to:  ranking-file.txt
      1,24 real         1,00 user         0,03 sys

computation of the metrics

% /usr/bin/time python metrics.py 
>>> loading triple set from path test.txt ...
>>> ... read and indexed 3134 triples.
*** EVALUATION RESULTS ****
Num triples: 3134
hits@1  0.344288
hits@3  0.452138
hits@5  0.494257
hits@10 0.547543
MRR     0.412144

        0,25 real         0,21 user         0,02 sys

With Prolog

computation of all the possible answers

% cd rank
% /usr/bin/time swipl computeinsts.pl > inst.pl
        5,79 real         5,59 user         0,16 sys

computation of the scores for the answers

% /usr/bin/time swipl inst2ranks.pl > rank.pl
      283,08 real      2778,12 user        32,34 sys

computation of the metrics

/usr/bin/time swipl hitsranks.pl
0.19495851946394385
0.40523292916400766
0.48851308232291
0.5449904275686024
0.32174862969445756
        2,02 real         1,94 user         0,03 sys

jan · July 24, 2024, 3:54pm

Note that instead of using :-include(train)., by better use

:- ensure_loaded(train).

and prior run

swipl qlf compile train.pl

That saves nearly a second. Do the same trick for the other larger include files.

In general I’m not really surprised. PyClause is fast, but it is a dedicated machinery for a single task in C++ as opposed to a generic machine. As we have seen, you can also speedup a couple of times by writing the Prolog code optimally and some more exploiting concurrency.

friguzzi · July 24, 2024, 4:45pm

I have added the file convert.pl that I used from the conversion from train.txt to train.pl

I think the problem is in PyClause. As you can see I get different results also for the metrics. I think it uses some unsound approximations.

j4n_bur53 · July 24, 2024, 4:56pm

PyClause describes their algorithm fo “standard confidence” as follows:

But this is just a conditional probability, frequentist expressed somehow:

    # { Hσ e KG | B1σ,..,Bnσ e KG  }       # KG n { Hσ | B1σ,..,Bnσ e KG  }
   ----------------------------------  =  ----------------------------------
       # { Hσ | B1σ,..,Bnσ e KG }            # { Hσ | B1σ,..,Bnσ e KG  }

But this vague specification leaves a lot of room for distinct/2 & friends.

What does “approximation” mean for PyClause? I don’t know about this,
and how this could explain a discrepancy. In the worst case PyClause
has really a bug, maybe a C++ glitch in multi-threading?

Prolog is a little safer for multi-threading, you don’t have buffer area
communication, which is sometimes a bad idea, when data doesn’t get
fetched immediately, and is overwritten by new data skrewing everything.

friguzzi · July 25, 2024, 5:58am

Thanks @jan .
Sorry, I forgot to use concurrent_maplist in finding instantiations.
I also precompiled to qlf the input files.
Now I have

% /usr/bin/time swipl computeinsts.pl > inst.pl
        2,95 real         9,92 user         0,56 sys

which is much better but still far from 1,2 s for all the process.
What do you think it is the culprit? Do you think it is unification? In this case we just have constants, so maybe the full unification algorithm is an overkill?

friguzzi · July 25, 2024, 6:02am

Looking at a recent paper from the PyClause group

I see this formula

so I think it is your Bayesian Brain I.
I don’t know what PyClause is doing.

j4n_bur53 · July 25, 2024, 9:59am

When you @friguzzi did talk about approximation in PyClause.
Did you mean that PyClause uses some sampling technique.
This could also explain why its much faster, not only why the

results are different. Would it be possible to implement such
sampling also in Prolog? Didn’t PRISM by Sato, don’t know
which version, already offer some sampling in the past?

friguzzi · July 25, 2024, 10:09am

I was referring to metrics computation, not confidence computation.

This is what I know

from https://ceur-ws.org/Vol-3733/paper4.pdf

friguzzi · July 26, 2024, 1:32pm

I understand that PyClause finds only the first 100 possible answers for a KGC query.
But sometimes I have found that PyClause misses some answers.
For example, on the WN18RR dataset, for triple 10551265 _synset_domain_topic_of 15259284 it finds 16 answers while according to my calculations there are 18. The missing ones are 06371413 and 09587565.
You can find the data and code used in my experiments at lp4kg/wn18rr at main · friguzzi/lp4kg · GitHub
In particular, the rule set is lp4kg/wn18rr/theory.txt at 73b81274e5b26864c5b2cb08ba1869718acaa9c6 · friguzzi/lp4kg · GitHub and the ranking file is lp4kg/wn18rr/ranking-file.txt at 73b81274e5b26864c5b2cb08ba1869718acaa9c6 · friguzzi/lp4kg · GitHub
The code used to produce it is at lp4kg/wn18rr/ranking.py at 73b81274e5b26864c5b2cb08ba1869718acaa9c6 · friguzzi/lp4kg · GitHub

brebs · July 26, 2024, 5:27pm

Have you filed a bug report with PyClause?

Topic		Replies	Views
ChatGPT SWI-Prolog assistent Nice to know chatgpt , llm	28	478	July 20, 2025
Does prolog fit my use case? Help!	21	872	September 17, 2019
Benchmarking [CLP(B) libraries and other stuff] General	20	631	November 24, 2023
Any programatic way to obtain rules used to prove a goal? Help!	18	1152	February 18, 2020
I've put an introductory Prolog tut for SQL programmers on Swish Nice to know	47	2480	March 4, 2021

SWI Prolog and knowledge graphs

Computation of Hits@K and MRR

With PyClause:

With Prolog

Related topics