I am running profile from time to time – and it leads me to two things – one is the property graph traversal, including property lookup, which is currently an assoc, is high up there in usage, and the second is a very long tail.
And, with the more recent introduction of transaction/1, also this overhead started to show up.
I hope that with a better understanding of choice points are and how to control them via code / argument structuring, indexing and cuts, the more I will be able to improve things.
Here are strategies i am thinking about:
One move of a cut from client code, to the called predicate, which happened to be heavily used, had a dramatic effect the other month, hopefully, other such opportunities are lurking.
re: property graph representation
I need to scrutinize the graph representation – currently, its a straight forward one node/1, link/3 (each link having an identifier as well) – this is possibly not the most performant one, since it causes choice points for accessed outgoing (and incoming) link, when traversing the graph.
An adjacency list based graph representation, might be a better choice, since it would access all adjacencies at once, binding them into a variable, without need for choice points – and reduced recurring access of the fact base – its one thing i am considering. Something like: node/1, node_adjacency/2. E.g. (which also indicates an identifier for the link)
node_adjacent(a, items(l1-a, l2-b, l3-c)).
(note: if its not a list, but a functor with arguments, as above, – i would need some predicates to access the arguments like a list – the assumption here is that fact access to an array would be faster than to a cons list).
A key drawback is that i would need to maintain two such adjacencies, one for outgoing and one for incoming links – since its a directional graph.
Perhaps, tabling could offer a happy medium – by using the normalized representation (node/1, link/3) and a tabled rule that turns it into the node_adjacency/2 representation – it would still require more memory, and a bit more compute, but at least, only those graph areas that are traversed will take up more of the memory.
It would be important to update tabled adacencies in case new ones are asserted – best, if this can be done in a pin-pointed way, rather than invalidating the whole table.
Alternatively, i could roll my own memoing for this purpose, and ensure that the memoing of adjacent’s is updated in case nodes and links are added.
re: graph properties
Perhaps assoc to store properties over nodes and links isn’t the best choice, and there is a better way to associate nodes and links with properties. Jan, suggested an implementation based on arg (array based, essentially, with node and link id’s as indexes – so, instead of an assoc hash its an indexed lookup).
Some properties on the graph could eventually be turned into bitmaps (essentially hard coding them), and tests of bitmaps in some code areas, perhaps there is some benefit there as well – this could be one (research) outcome of my work, to identify such properties and elevate them into the “language” as it were.
Also, there is some potential for parallelism that i noticed and haven’t yet fully exploited. I’d also like to try the spawn library for this.
re: first argument indexing
And, i will scrutinize maplist, which is used quite a bit to filter results of queries, first for potentially missed first argument indexing and potential for parallelism.
re: term expansion to remove call indirection and “simulate” constants
Also, i have been experimenting a bit with term expansions, to remove call indirection mainly due to the modular architecture (g1 :- module1:g2 and g2:- module2:g3).
Also to simulate constants, instead of retrieving constants by predicate (e.g. constant_1(A) translated into A=1 in the code).
The current library and call structure, caused some guards to be executed more than once – i will need to clean this up – essentially, having guarded calls – intended for the end-user, and unguarded calls, intended for internal (re)use.
However, i also want to have strict rules for calls, and failure should most often lead to an exception – however, i will likely relegate this to debugging time via conditional compile - to ensure performance.
Finally, I need to investigate tabling – and reuse of computed results – when recompute is fresh and when it might become stale (disregarding at this stage memory footprint).
Would be glad to hear about additional optimization tactics …