SWI Prolog and knowledge graphs

Hi all,
we have started working on applying SWI Prolog and Liftcover for completion in knowledge graphs, you can find a summary of our efforts at

In summary, we were not able yet to be better than AnyBURL in terms of quality of the learned solutions.
But what surprised me was that SWI Prolog was way slower than PyClause in computing the quality metrics, which require finding all solutions to queries from simple rule bases. PyClause is a Pyhton interface to AnyBURL and AMIE and uses a C++ library for metrics computation.

Fabrizio

Can you give some details? How do you represent the KG in Prolog and compute the metrics? Also, what does PyClause use? Note that if you use HDTs you can produce many statistics quite fast.

I created a minimal working example for one task: computing the confidence of a set of rules.
You can find the files at GitHub - friguzzi/lp4kg: Logic Programming for Knowledge Graphs
In folder wn18rr you can find the Wordnet dataset in Prolog and PyClause format.
You can associate a confidence to each rule with

% swipl confidence.pl

and

% python confidence.py

This is what I get on my machine:

% time python confidence.py 
python confidence.py  23,65s user 0,37s system 99% cpu 24,238 total
% time swipl confidence.pl
swipl confidence.pl  116,61s user 1,52s system 99% cpu 1:58,86 total
``
1 Like

Luckily we can beat Python by changing

 maplist(confidence, R, RC, S)

into

 concurrent_maplist(confidence, R, RC, S)

Your machine seems considerably faster than mine. The original takes 3m32. After the above change and loading train.pl as .qlf file I get 20 seconds. A little more than 10 fold speedup (AMD3950X: 16 cores, 32 threads). That is a little disappointing. I would have expected a speedup closer to 20.

Needs more analysis to see how fair this comparison is and whether it can be improved significantly.

1 Like

After using concurrent_maplist and once it gets down to 11 s, not bad, thanks

Correction: I used one once too many. With just one once in the second query it takes 33.79 s, while without once it takes 31.15 s

Yes, I used your approach.
I redid the experiments and now with just concurrent_maplist I get 34,84 s while with your approach 32,29 s, thanks

I also tried replacing aggregate_all(count, Witness, Goal, Count) by the equivalent

aggregate_all(count, distinct(Witness, Goal), Count)

The funny thing is that this is slower, but it gives a much better speedup under concurrent execution, so in total it is about the same on my hardware. Both could be improved. distinct/2 using a low-level implementation for add_nb_set/3 and aggregate_all/4 by figuring out what causes the poor speedup.

Yes. Using your distinct2 with aggregate_all/3 is about the same as using aggregate_all/4. As I also found, this is faster, but scales less good when executed concurrently. The slowdown is partly because of the efficiency of the sort/2, where distinct/2 uses a pure Prolog implemented non-backtrackable hash table instead. Actually, I’m surprised it is as fast as it is. Using distinct/2 gives near perfect speedup under concurrency, while the aggregate_all/4 version does not (getting 10 times from 16 cores/32 threads). That means there is some interaction between the threads. I wonder where that comes from. findall/3 has been decoupled quite extensively as far as I know.

I updated confidence.pl to also print the counts as comments, please pull.
It seems they coincide with those of PyClause…

Let’s consider a more challenging task: that of computation of the metrics.
This is what I get (all the code is in the repo)

Computation of Hits@K and MRR

With PyClause:

  1. computation of the scores of the possible answers
% /usr/bin/time python ranking.py 
...
Ranking file written to:  ranking-file.txt
      1,24 real         1,00 user         0,03 sys
  1. computation of the metrics
% /usr/bin/time python metrics.py 
>>> loading triple set from path test.txt ...
>>> ... read and indexed 3134 triples.
*** EVALUATION RESULTS ****
Num triples: 3134
hits@1  0.344288
hits@3  0.452138
hits@5  0.494257
hits@10 0.547543
MRR     0.412144

        0,25 real         0,21 user         0,02 sys

With Prolog

  1. computation of all the possible answers
% cd rank
% /usr/bin/time swipl computeinsts.pl > inst.pl
        5,79 real         5,59 user         0,16 sys
  1. computation of the scores for the answers
% /usr/bin/time swipl inst2ranks.pl > rank.pl
      283,08 real      2778,12 user        32,34 sys
  1. computation of the metrics
/usr/bin/time swipl hitsranks.pl
0.19495851946394385
0.40523292916400766
0.48851308232291
0.5449904275686024
0.32174862969445756
        2,02 real         1,94 user         0,03 sys

Note that instead of using :-include(train)., by better use

:- ensure_loaded(train).

and prior run

swipl qlf compile train.pl

That saves nearly a second. Do the same trick for the other larger include files.

In general I’m not really surprised. PyClause is fast, but it is a dedicated machinery for a single task in C++ as opposed to a generic machine. As we have seen, you can also speedup a couple of times by writing the Prolog code optimally and some more exploiting concurrency.

1 Like

I have added the file convert.pl that I used from the conversion from train.txt to train.pl

I think the problem is in PyClause. As you can see I get different results also for the metrics. I think it uses some unsound approximations.

PyClause describes their algorithm fo “standard confidence” as follows:

image

But this is just a conditional probability, frequentist expressed somehow:

    # { Hσ e KG | B1σ,..,Bnσ e KG  }       # KG n { Hσ | B1σ,..,Bnσ e KG  }
   ----------------------------------  =  ----------------------------------
       # { Hσ | B1σ,..,Bnσ e KG }            # { Hσ | B1σ,..,Bnσ e KG  }

But this vague specification leaves a lot of room for distinct/2 & friends.

What does “approximation” mean for PyClause? I don’t know about this,
and how this could explain a discrepancy. In the worst case PyClause
has really a bug, maybe a C++ glitch in multi-threading?

Prolog is a little safer for multi-threading, you don’t have buffer area
communication, which is sometimes a bad idea, when data doesn’t get
fetched immediately, and is overwritten by new data skrewing everything.

Thanks @jan .
Sorry, I forgot to use concurrent_maplist in finding instantiations.
I also precompiled to qlf the input files.
Now I have

% /usr/bin/time swipl computeinsts.pl > inst.pl
        2,95 real         9,92 user         0,56 sys

which is much better but still far from 1,2 s for all the process.
What do you think it is the culprit? Do you think it is unification? In this case we just have constants, so maybe the full unification algorithm is an overkill?

Looking at a recent paper from the PyClause group

I see this formula
image
so I think it is your Bayesian Brain I.
I don’t know what PyClause is doing.

When you @friguzzi did talk about approximation in PyClause.
Did you mean that PyClause uses some sampling technique.
This could also explain why its much faster, not only why the

results are different. Would it be possible to implement such
sampling also in Prolog? Didn’t PRISM by Sato, don’t know
which version, already offer some sampling in the past?

I was referring to metrics computation, not confidence computation.

This is what I know


from https://ceur-ws.org/Vol-3733/paper4.pdf

I understand that PyClause finds only the first 100 possible answers for a KGC query.
But sometimes I have found that PyClause misses some answers.
For example, on the WN18RR dataset, for triple 10551265 _synset_domain_topic_of 15259284 it finds 16 answers while according to my calculations there are 18. The missing ones are 06371413 and 09587565.
You can find the data and code used in my experiments at lp4kg/wn18rr at main ¡ friguzzi/lp4kg ¡ GitHub
In particular, the rule set is lp4kg/wn18rr/theory.txt at 73b81274e5b26864c5b2cb08ba1869718acaa9c6 ¡ friguzzi/lp4kg ¡ GitHub and the ranking file is lp4kg/wn18rr/ranking-file.txt at 73b81274e5b26864c5b2cb08ba1869718acaa9c6 ¡ friguzzi/lp4kg ¡ GitHub
The code used to produce it is at lp4kg/wn18rr/ranking.py at 73b81274e5b26864c5b2cb08ba1869718acaa9c6 ¡ friguzzi/lp4kg ¡ GitHub

Have you filed a bug report with PyClause?