How to detect duplicate facts in the knowledge base?

Hi,
Just by error, I found that you can assert the exact same fact (same functor and same argument) as many times as you want.

So, my question is How can I detect and remove the duplicate clauses? If the clauses are identical, does not look like retract can help.

% your code here
insect(fly).
insect(bee).
insect(ant).
insect(fly).
insect(fly).
insect(fly).

This is a partial solution to your post, which, I doubt,
what you intended.

% ?- find_multiples(S, S0, Dups).
%@ S = [fly, bee, ant, fly, fly, fly],
%@ S0 = [ant, bee, fly],
%@ Dups = [fly, fly, fly] 

find_multiples(S, S0, Dups):- findall(X, insect(X), S),
	sort(S, S0),
	foldl(select, S0, S, Dups).

insect(fly).
insect(bee).
insect(ant).
insect(fly).
insect(fly).
insect(fly).

IMO, a better approach is to avoid duplicate facts instead of detecting and removing them afterward.

The following are some possible options:

  1. When asserting a new fact, check and throw it away if it’s a duplicate.
  2. Maintain a unique key for each fact.

That would make your life easier.

1 Like

If you want to detect and remove duplicates:

?- forall((clause(insect(A), Body, Ref), clause(insect(A), Body, Ref2), Ref @< Ref2), format('~q  ~w ~w~n', [insect(A), Ref, Ref2])).
insect(fly)  <clause>(0x566ed049e8a0) <clause>(0x566ed05cede0)
insect(fly)  <clause>(0x566ed049e8a0) <clause>(0x566ed05cecc0)
insect(fly)  <clause>(0x566ed049e8a0) <clause>(0x566ed05ceba0)
insect(fly)  <clause>(0x566ed05cecc0) <clause>(0x566ed05cede0)
insect(fly)  <clause>(0x566ed05ceba0) <clause>(0x566ed05cede0)
insect(fly)  <clause>(0x566ed05ceba0) <clause>(0x566ed05cecc0)

You can use erase/1 to remove the duplicates.

1 Like

Quick answer:

You want to check for the fact or assert the fact. The or as used in the sentence means logical or which translates to Prolog as the ; operator.

Here is an example

add_imports(Module,Import) :-
    (
        imports(Module,Import), !
    ;
        assertz(imports(Module,Import))
    ).

This is the first part of the statement that checks if the fact already exist.

imports(Module,Import)

This is the second part of the statement that asserts the fact if the first part of the or fails.

assertz(imports(Module,Import))

Note the use of logical or (;) and the cut (!). The cut is used to avoid leaving behind unnecessary choice points.


The example used is actually from real working code from the post

Basic persistency

which in the example is

add_imports(Module,Import) :-
    (
        imports(Module,Import), !
    ;
        assert_imports(Module,Import)
    ).

but you have to understand that when using library(persistency) that it will create the predicate assert_imports/2, thus why the example code for library(persistency) is different.

But has a larger scope than needed. What is wrong with this?

add_imports(Module,Import) :-
    (   imports(Module,Import)
    ->  true
    ;   assert_imports(Module,Import)
    ).
1 Like

Nothing.


In my few years of using it the way I did (; instead of conditional -> with ;) with library(persistency) it works as needed.


On a side note, it might be useful to add another generated predicate for library(persistency) that asserts only unique facts, then about a third to half of the lines of code I write for large modules that use library(persistency) can be removed.

I know, pull requests are desired. :slightly_smiling_face:

The following is the approach that implement @peter.ludemann 's idea.

:- dynamic insect/1.

insect(fly).
insect(bee).
insect(ant).
insect(fly).
insect(fly).
insect(fly).

?- setof(Ref2, Ref^A^Body^(clause(insect(A), Body, Ref), clause(insect(A), Body, Ref2), Ref @< Ref2), RList),
   maplist([R] >> erase(R), RList), listing(insect(_)).
:- dynamic insect/1.

insect(fly).
insect(bee).
insect(ant).

RList = [<clause>(00000000026AD930),<clause>(00000000026AE350),<clause>(00000000026AE3B0)].

Here the Ref plays the role of the unique key for each fact.

Ohh…
I see.
Aparently a carefull combination of clause/3 with ducplicate checking is all that is needed…

Thank you very much all for your insights…

or this?

asser_insect( Insect_atom ):-  insect( Insect_atom ),!. %  already asserted
asser_insect( Insect_atom ):- assert( insect( Insect_atom ) ).

That last one from dspro looks incredibly simple… but it seems to work fine.

Worth studying it…

I’m not a fan of cuts, so I’d write it:

assert_insect(Insect) :-
    assertion(ground(Insect)),
    (  insect(Insect)
    -> true
    ;  assertz(insect(Insect))
    ).

The assertion/1 is because this code could have unexpected results if Insect is a term that contains a variable.

Another way of doing this is to add the table/1 directive:

?- dynamic insect/1.
true.

?- table insect/1.
true.

?- assertz(insect(fly)).
true.

?- assertz(insect(flea)).
true.

?- assertz(insect(fly)).
true.

?- bagof(Insect, insect(Insect), Insects).
Insects = [flea, fly].

If the table/1 directive is removed, the bagof/3 query gets Insects = [fly, flea, fly].

PS: assert/1 is deprecated and assertz/1 is recommended instead.

1 Like

Ok,
I guess the cut ruins the non-determinism of the predicate.
But yes, peter.ludemann’s version with if-then-else looks more elegant…

However the behavior of assertion() will require some more experiments to clearly understand it…
The tabling solution is very clear, although not so sure if the memory requirement makes it worth…

Suppose that instead of insect/1, you had insect(CommonName, ScientificName), and there’s no assertion((ground(CommonName), ground(ScientificName))):

assert_insect(CommonName, ScientificName) :-
    (   insect(CommonName, ScientificName)
    ->  true
    ;   assertz(insect(CommonName, ScientificName))
    ).

?- assert_insect(cockroach, blattodea).
?- assert_insect(ant, _).
?- assert_insect(ant, formicidae).