Update arg in the dynamic fact base

Hello,

Since today i am full of great new ideas, here is another one i have been contemplating :slight_smile:

Suppose we have a dynamic fact such as so, and we want to update the list to include another element, e:

:- dynamic adjacent/2.

adjacent(a, [b, c, d]).

Today one would do something like this:

update_adjacent(X, Added) :-
   adjacent(X, Current_List),
   append(Current_list, Added, New_List),
   retract(adjacent(X, Current_List),
   assert(adjacent(X, New_List).

Could one envision an dynamic_set_arg/3 operator, that takes as arguments a position and a new argument and updates the internal representation with the new argument as so:

update_adjacent(X, Added) :-
   adjacent(X, Current_List),
   append(Current_list, Added, New_List),
   dynamic_set_arg(2, adjacent(X, Current_List), New_List).

Would it be worthwhile – i.e. save time both in terms by needing only one operation and not two (retract/assert) and by making indexing simpler for update?

Edit:

It would be in line with, say, SQL Update:

https://www.w3schools.com/sql/sql_update.asp

Dan

The more natural way to model “adjacent” in Prolog would be:

:- dynamic adjacent/2.

adjacent(a, b).
adjacent(a, c).
adjacent(a, d).

Do you see how things got easier already, both for maintaining the adjacency matrix and for querying?

If you need to efficiently manipulate a list, look at the code that Jan kindly shared in this post.

Hi Boris,

Thank you.

Yes, i am aware of this. My aim is performance in looking up adjacent nodes – and one best way to achieve performance is to use a representation that avoids the creation of choice points, when looking up the adjacent nodes of a node.

The “normalized” representation you suggest – requires choices – whereas the “list” based representation does not.

Naturally, I will have to deal with the less convenient representation – for example, adding a node will require a retract and assert rather than just an assert. And other kinds of lookup might suggest having a normalized representation as well (perhaps via a rule) …

But, then again, i am still learning what might work best and the idea to use an un-normalized representation isn’t a good one.

Dan

To collect all nodes adjacent to a, you would write:

findall(X, adjacent(a, X), Xs)

Right, but with my representation I could write:

adjacent(X, Arg_List).

Which, i hope, would be a faster access.

its the tradeoff between a more convenient representation – and better performing during assert – v.s faster lookup. And, in my application lookup happens much more often than assert, hence, I am considering that.

I guess, its the eternal question related to normalization …

from what I read, RDF has this problem – its aims to such a fine grained normalization for the purpose of link to anything – that query performance takes a significant hit.

Edit:

As i mentioned in another post – perhaps with tabling one can eat the cake and have it too. One can go with the adjacent/2 predicate as you have shown, while adding a predicate, like i suggested, for lookup that is tabled.

Unfortunately that is not going to work for many different reasons. To name a few

  • retract/1 can backtrack.
  • We need to respect the logical update semantics
  • Other threads may be executing the clause that is being updated
  • The compiled code length will change, so you anyway need to add a new clause and remove the old.

@boris’s representation is clearly simpler. Indeed, getting a list of neighbors is less efficient. Possibly you can avoid materializing a list? I’ve seen enough people using findall/3 to assemble a list and subsequently traverse the list using member/2 rather than just querying the relation directly … Another option could be to table the adjacent “Node → neigbors” relation? Could use too much memory.

During graph traversal the use of those lists of neighbors is short lived, so collecting them for subsequent use would require going beyond the specific query, i.e. some sort of memoing or tabling.

I guess, there is no way around it – either its normalized and inefficient – and a non-normalized tabled relation duplicates (or even triples: given that its a bi-directional graph ) memory requirements over time.

Or, instead, one uses the adjacency list in the predciate – which, however would duplicate storage as well, since i would need adjacencies for both directions – which is given for free in @Boris representation.

But, i guess, if instead of a list i would use a functor, then having two adacency (array based) lists of identifiers, per node – one outgoing and one incoming would not be much different than when one programs this in C – which would require keeping track in some kind of list the outgonig and incoming links – leading to some “duplication”.

There is a third approach suggested by Markus, which is to store the graph as a Term on the heap and, somehow, keep it around – this would result in a true pointer based solution - i guess.

But, i can’t get my head around how such a solution work without “locking” the whole graph for each api call – since the whole graph has to be passed and relationally updated, and threaded through each new api call.

Dan

You can use global variables using b_setval/2. A stack based solution limits the access to a single thread though. It also tends to make garbage collection slower. From what I know about you scenario, this won’t be a step forward. I would consider a plain graph representation as @boris suggests. You can also cache the list adjacent facts, either as a table or using dynamic facts. And yes, some queries will be faster using the adjacent list but I’m sure quite a few are considerably faster.

Hi Jan,

Thank you.

Markus, generally speaking, seems to passionately advocate putting the graph on the heap, and also make use of attributed variables to store things more locally for further access per node and link – I guess, instead of my current use of an assoc based mapping.

But, I agree, I will keep things for this round of development in the fact base – and will likely update the assoc mapping with your suggestion, to use an arg array (“heat map”)

@Boris, solution, is the one I currently have, as part of the general graph representation of links.

To test assumptions: I will try to cache the adjacent mapping, first with tabling, then perhaps with memoing – it happens that this lookup happens a lot in a time critical path - so, i think, any cycle i can save would quickly add up.

If tabling really helps, then i will need to think about memory utilization also – but, perhaps not in this round either.

Dan

Every one of us comes from a particular background and experience and has certain angle on things that is sometimes not easy to just “give forward” to another person.

This is why I firmly believe that the only way to progress is to acquire first-hand experience by trying out things, using a systematic approach and collecting and analyzing results. But again, this firm believe of mine is a result of my own specific experience with experimental research.

But please don’t get me started on “publishing” results or (the horror!) trying to use other people’s results :wink: To quote the classics: “You’re entering a world of pain!”

PS: my own hands are dirty too, I contributed to the problem and not to the solution.

1 Like

If you use tables only for caching you can delete them periodically. See abolish_all_tables/0 and some more restrictive alternatives. At some point I might maintain some access statistics and allow the user to flag a table for automatic deletion to reclaim space. I.e., the system can automatically decide to destroy big tables that are rarely used.

Perhaps, there could be similar to staleness detection tactics in caches. I used to read up on them a while ago.

Also, pin-pointed updates, e.g. when outside an adjacent node (and its link) is added – if its not already implemented. I have less use for deletion, but, i guess, this could happen as well – in the general case.

How does tabling play with transactions, and in particular, roleback.

Dan

I agree – however, sometimes the cost of trying out is too high and you try to get first cut assessments from other people experiences – which, then need to be consumed with a grain of salt.

For example, I am wondering which option would perform better:

  • Prolog implementation via fact base
  • Prolog implementation on the heap
  • Various optimization techniques for above in Prolog, including potential addition to the VM
  • C/C++ implementation of a (directed) graph with pointers (and as a foreign predicate)
  • C/C++ implementation via some array based adjacency structure + graceful approach to increase memory needs as the graph grows (also as foreign predicates)
  • Rust based implementation of the above – while Rust doesn’t work well with graphs and pointers – requiring unsafe code

Why Rust – because its an up and coming language with built-in memory safety.

I can’t try all these out – so i need to evaluate in principle and then chose one or, perhaps, two for deeper analysis.

I fully agree. SWI-Prolog, like most high level declarative systems, is a highly dynamic environment and it is typically hard to predict the impact of the various optimizations that may or may not be used (depending on heuristics). Note that some of these heuristics may suit your workload poorly and some may even be plain wrong (well, not optimal). Sometimes the profiler gives hints on things that may go wrong, such as a large percentage of the time spent in one of the garbage collectors or an unexpected amount of time spent on some simple predicate indicating indexing issues.

At this moment not (except for a few very specific scenarios). That is work in progress. I’m not sure where that is on the current priority list. For one project I hacked a little on top of transaction_updates/1 to automatically invalidate (incremental) tables. That is fairly trivial,

I guess, i might then take out transactions – for this round, as i develop a PoC – or try memoing first.

Invalidating tables is not that hard … And, no matter how you change the data on which tables are based you need to deal with the consistency. This can be by destroying dependent tables or using incremental or monotonic tabling to automate re-evaluation when needed.