Advice on implementing Dijkstra's algorithm

Thanks for this work. Interesting use case. Can you also share the other implementations and test data? Dijkstra’s algorithm probably fits Prolog poorly. I had similar experiences with computing the least edit distance between two strings. Anything that requires an updating array is hard. The fastest ways I can think of in not-so-pure Prolog are:

  • Create a map (predicate) from the the node identities to small (nearly) consecutive integers and next use a compound as an array. nb_setarg/3 is pretty fast, although we should reconsider some of the implementation of foreign predicates, possibly adding a shortcut for foreign predicates that never need GC to kick in.
  • If the node IDs are small integers or atoms, create a dict holding all nodes. Now you can use nb_set_dict/3. This is O(log(N)), but with pretty small constants.

Using destructive updates is IMO unavoidable for such algorithms. Without you get O(log(N)) algorithms with significant constants as well as the need for garbage collection.

I’m naive but doing mutable Prolog feels horrible to me. I immediately get the “well I might as well do it in X instead then” reaction. In use cases – perhaps such as this – where Prolog doesn’t fit, @jan is there some guidance somewhere regarding how one could pursue a C based hybrid solution? E.g. would it be possible to write this algo in C and then just dynamically link it to a Prolog predicate for convenience (without having to re-compile SWI!) in order to get the benefits of both? At least then if it doesn’t belong in Prolog we can put a box around it and get on with life :slight_smile:

Prolog – to me as a relative newcomer – comes across as a great glue language. If I could pair it with some other more performance oriented language that can export a library and a C style signatures (e.g. C, C++, Go, Haskell?), it feels like it would answer more questions. I think Janus – as an example – is an important contribution in that direction, effectively enabling a fairly seamless pairing of Prolog and Python.

I think there are two options. Either find an alternative algorithm to the problem that does fit Prolog or, indeed, do parts “outside Prolog”. Using destructive updates on Prolog data structures could be considered an in-between solution.

And yes, SWI-Prolog has (IMHO) excellent bindings to C, C++ (thanks to @peter.ludemann) and Rust (thanks to @matthijs) for performance oriented escapes. In addition it has good interfaces to notably Python and Java, but these are more targeting the rich eco system of these languages.

You can do two things. One is to define a new data structure in one of these performance oriented language bindings that solves the limitation you experience in Prolog. The other is to do the whole thing in the other language. In this case I’d be tempted to the first, so we can keep accessing the data from Prolog directly. But then, the two solutions I gave might get close to what you can achieve if the “visited” data structure is the bottleneck. Using C/C++/Rust always comes with some portability issues getting the code compiled into a shared object and loaded into Prolog.

I try to tell that story for decades :slight_smile: Not many people buy it though …

2 Likes

When I have the time to spare – or if others do – what’d be great to see is a write-up of something like this Dijkstra problem as a hybrid solution in the vain of : here it is in Prolog, not fast enough, I do some bits in C, I compile and dynamically link, and now look its really fast in Prolog.

Dijkstra algorithm is an extremely advanced topic. It’s interesting that there are people willing to tackle it. I would just be happy to learn how that family name is pronounced…

PS: sorry for the digression (you may as well ignore it)

For sure, writing Prolog code has a vibe and there is a tradition of pure vs impure.
However, when you consider the common citation that Algorithm = Logic + Control, I somewhat realized recently that SLD or SLG resolution are just premade control tools provided by Prolog.
There is nothing inherently bad providing alternative control strategy as long as you shield users from side effects and impurities.
Moreover, Prolog is originally marketed as a general purpose language and specifically not a DSL.

So I would like to argue that implementing the Dijksta algorithm is akin to implement an alternative control tool for a specific type of problem and using impure predicates is justified in this case.
Just look at how many impure predicates are used to implement widely used libraries like library(clpfd) or library(aggregate).

Implementing high performance algorithm should be one of the target of every general purpose programming languages and I believe we should improve Prolog so that we can write such algorithms.
I dream of the day when we could use array based programming (like numpy) in Prolog :slight_smile:

1 Like

I’d really like to test this approach, I’ll report once I’ve tried implementing it.

Do you think this could be upstreamed somewhere ? I suppose there is already an addon package that implements Dijkstra somewhere ?

Unfortunately, I did not keep the different iterations of my work.
The only test data I used is the random graph I generate in the tests in the linked gists.
I know this is a very poor testing framework and I should find a better test dataset…

1 Like

Curious. Please also test the dict version. It might do quit well. The dict access functions may be O(log(N)), but due to the small constants involved they compete with compound terms as arrays up to fairly high numbers of keys.

Not that I am aware of. Quite likely someone implemented it for his/her needs, but whether it was shared? Any textbook that discusses shortest path finding in Prolog?

There’s a chapter devoted to search algorithms in O’Keefe (Craft of Prolog), but I was led astray in other directions lately. Don’t know if he tackles such topics as well.
I guess that search algorithms are a rather standard topic for a computer science student. But of course implementing them in Prolog is something maybe a little bit more “special”

At the moment, no. The main attack vector is probably crafted names (atoms). term_hash/2 produces a stable hash across processed based on this seed though. As this feature is rarely used we could consider a flag to enable it.

clpBNR has Travelling Salesman code.

So, I have managed to use compound terms to store the state of the dijkstra algorithm, as well as using compound terms to store the graph and another for its weights.
I found that it will significantly reduce the runtime of the algorithm and be somewhat close (albeit slower) to the runtime of networkx !
Now, the runtime of the algorithm is dominated by the min heap operations.

I have made complete test script with both the red black tree and compound term approach as well as using networkx with the awesome janus interface:

test script
:- use_module(library(rbtrees)).
:- use_module(library(heaps)).

add(A, B, C) :- C is A+B.

update_distances1(Start, End, NewDistance, TreeIn-HeapIn, TreeOut-HeapOut) :-
   (  rb_lookup(End, Val, TreeIn)
   -> (  Val = unvisited(CurrentDistance, _),
         NewDistance < CurrentDistance
      -> rb_update(TreeIn, End, unvisited(NewDistance, Start), TreeOut),
         add_to_heap(HeapIn, NewDistance, End, HeapOut)
      ;  TreeIn = TreeOut,
         HeapIn = HeapOut
      )
   ;  rb_insert(TreeIn, End, unvisited(NewDistance, Start), TreeOut),
      add_to_heap(HeapIn, NewDistance, End, HeapOut)
   ).

get_unvisited_min_node1(Tree, HeapIn, P, K, HeapOut) :-
   get_from_heap(HeapIn, PTmp, KTmp, HeapTmp),
   (  rb_lookup(KTmp, Val, Tree)
   -> (  Val = unvisited(_, _)
      -> HeapTmp = HeapOut,
         P = PTmp, K = KTmp
      ;  get_unvisited_min_node1(Tree, HeapTmp, P, K, HeapOut)
      )
   ;  HeapTmp = HeapOut,
      P = PTmp, K = KTmp
   ).

build_path1(Tree, From, To, From) :-
   rb_lookup(To, visited(From), Tree).

:- meta_predicate dijkstra1(+, 3, +, +, -).

dijkstra1(Graph, Edge, Start, End, Path) :-
   ord_list_to_rbtree(Graph, GraphTree),
   empty_heap(Heap),
   list_to_rbtree([Start-unvisited(0, no)], TreeIn),
   dijkstra_1(GraphTree, Edge, Start, End, 0, TreeIn, TreeOut, Heap),
   once(foldl(build_path1(TreeOut), ReversePath, End, Start)),
   reverse([End | ReversePath], Path).

dijkstra_1(Graph, Edge, Start, End, CurrentDistance, TreeIn, TreeOut, Heap) :-
   rb_lookup(Start, Neighbours, Graph),
   maplist(call(Edge, Start), Neighbours, NeighboursWeights),
   maplist(add(CurrentDistance), NeighboursWeights, NeighboursDistances),
   foldl(update_distances1(Start),
         Neighbours, NeighboursDistances,
         TreeIn-Heap, Tree2-Heap2),
   rb_update(Tree2, Start, unvisited(_, StartPrev), visited(StartPrev), Tree3),
   get_unvisited_min_node1(Tree3, Heap2, NextDistance, NextNode, Heap3),
   (  NextNode == End
   -> rb_update(Tree3, End, unvisited(_, EndPrev), visited(EndPrev), TreeOut)
   ;  dijkstra_1(Graph, Edge, NextNode, End, NextDistance, Tree3, TreeOut, Heap3)
   ).

update_distances2(State, Start, End, NewDistance, HeapIn, HeapOut) :-
   arg(End, State, Val),
   (  nonvar(Val)
   -> (  Val = unvisited(CurrentDistance, _),
         NewDistance < CurrentDistance
      -> nb_setarg(End, State, unvisited(NewDistance, Start)),
         add_to_heap(HeapIn, NewDistance, End, HeapOut)
      ;  HeapIn = HeapOut
      )
   ;  arg(End, State, unvisited(NewDistance, Start)),
      add_to_heap(HeapIn, NewDistance, End, HeapOut)
   ).

get_unvisited_min_node2(State, HeapIn, P, K, HeapOut) :-
   get_from_heap(HeapIn, PTmp, KTmp, HeapTmp),
   (  arg(KTmp, State, Val)
   -> (  Val = unvisited(_, _)
      -> HeapTmp = HeapOut,
         P = PTmp, K = KTmp
      ;  get_unvisited_min_node2(State, HeapTmp, P, K, HeapOut)
      )
   ;  HeapTmp = HeapOut,
      P = PTmp, K = KTmp
   ).

build_path2(State, From, To, From) :-
   arg(To, State, visited(From)).

add_to_heap_(Default, Key, HeapIn, HeapOut) :-
   add_to_heap(HeapIn, Default, Key, HeapOut).

:- meta_predicate dijkstra2(+, 3, +, +, -).

dijkstra2(Graph, Edge, Start, End, Path) :-
   pairs_values(Graph, Neighbours),
   compound_name_arguments(GraphCompound, graph, Neighbours),
   empty_heap(Heap),
   length(Graph, NumNodes),
   compound_name_arity(State, state, NumNodes),
   arg(Start, State, unvisited(0, no)),
   dijkstra_2(GraphCompound, Edge, Start, End, 0, State, Heap),
   once(foldl(build_path2(State), ReversePath, End, Start)),
   reverse([End | ReversePath], Path).

dijkstra_2(Graph, Edge, Start, End, CurrentDistance, State, Heap) :-
   arg(Start, Graph, Neighbours),
   maplist(call(Edge, Start), Neighbours, NeighboursWeights),
   maplist(add(CurrentDistance), NeighboursWeights, NeighboursDistances),
   foldl(update_distances2(State, Start),
         Neighbours, NeighboursDistances,
         Heap, Heap2),
   arg(Start, State, unvisited(_, StartPrev)),
   nb_setarg(Start, State, visited(StartPrev)),
   get_unvisited_min_node2(State, Heap2, NextDistance, NextNode, Heap3),
   (  NextNode == End
   -> arg(End, State, unvisited(_, EndPrev)),
      nb_setarg(End, State, visited(EndPrev))
   ;  dijkstra_2(Graph, Edge, NextNode, End, NextDistance, State, Heap3)
   ).

random_directed_edge(Low, High, Weights, Start-End, -(Start, End, Weight), (Start-End)-Weight) :-
   random_between(Low, High, Start),
   random_between(Low, High, End),
   random_between(Low, High, Weight),
   arg(Start, Weights, Ends),
   memberchk(End-Weight, Ends).

weight1(Weights, From, To, Distance) :-
   rb_lookup(From-To, Distance, Weights).
weight2(Weights, From, To, Distance) :-
   arg(From, Weights, Ends),
   memberchk(To-Distance, Ends).

random_graph(NumNodes, NumEdges, UGraph, weight1(RbWeights), weight2(Weights), NxGraph) :-
   numlist(1, NumNodes, Nodes),
   length(Edges, NumEdges),
   compound_name_arity(Weights, weights, NumNodes),
   maplist(random_directed_edge(1, NumNodes, Weights), Edges, Tuples, Pairs),
   list_to_rbtree(Pairs, RbWeights),
   vertices_edges_to_ugraph(Nodes, Edges, UGraph),
   py_call(networkx:'DiGraph'(), NxGraph, [py_object(true)]),
   py_call(NxGraph:add_weighted_edges_from(Tuples)).

main(Seed, X) :-
   N is 10**X,
   M is N*2,
   set_random(seed(Seed)),
   random_graph(N, M, UGraph, Edge1, Edge2, NxGraph),
   random_between(1, N, Start),
   random_between(1, N, End),
   format("Prolog dijkstra 1~n"),
   time(dijkstra1(UGraph, Edge1, Start, End, Path)),
   format("Prolog dijkstra 2~n"),
   time(dijkstra2(UGraph, Edge2, Start, End, Path)),
   format("Networkx dijkstra~n"),
   time(py_call(networkx:dijkstra_path(NxGraph, Start, End))).

Here are some notable results:

?- set_prolog_flag(stack_limit, 2_147_483_648).
true.
?- main(1, 6). % run dijkstra on random graph with 1M nodes and 2M edges
Prolog dijkstra 1 % red black tree approach
% 455,909,369 inferences, 44.604 CPU in 44.738 seconds (100% CPU, 10221305 Lips)
Prolog dijkstra 2 % compound term approach
% 88,561,958 inferences, 9.340 CPU in 9.367 seconds (100% CPU, 9481945 Lips)
Networkx dijkstra
% -1 inferences, 6.474 CPU in 6.495 seconds (100% CPU, 0 Lips)
true.
3 Likes

NetworkX uses the heap implementation from the python standard library ^^

I had the same reflection when I found out that the heap was the main bottleneck of the code !
I’ll see if I can write an implementation of a heap based on compounds from the wikipedia page on it.

1 Like

I’ve just spend the last two days implementing a non backtracking array based d-ary min heap… But I can’t make it go faster than the original pairing heap :frowning:
So I’m starting to loose motivation on this subject ^^

If anyone shows interest, I’ll share the code.

By the way, I tried to make a dict based implementation of the dijkstra alg but I couldn’t make it run in a reasonable time either…

Overall, the best implementation I found was to use pairing min heaps with an array based state. However, I prefer the red black tree version for it’s cleanliness and its ability to manipulate arbitrary nodes data.

1 Like

I’m not interested. I don’t have much time right now and i’m doing small things in other “fields”. But as a general suggestion I would say: leave it aside for the moment. Do something else. If it is something that really interests you, you’ll find the way to tackle the issue with a fresh and renewed perspective. Otherwise you’ll forget it and do other things. That would be ok as well

PS: it was meant to be an answer to you @kwon-young but i didn’t do it right

I think we probably need to settle for “good enough”, and what you’ve done seems very good. If a Prolog user needs a predicate for shortest paths which works, then there it is. Meanwhile, if they need a very fast predicate, a C/C++ implementation accessed via FFI may be a better option.

If it helps, I have the skeleton for a “blob” that wraps a C++ “map” (updateable red-black tree). It shouldn’t be difficult to modify this to have a backtrackable update, if that’s useful.

(I’ve run into a subtle bug with the blob API, which I’m in the process of writing up; but I don’t think that this would affect anyone using blobs to wrap C or C++ functionality)

BTW, Richard O’Keefe implemented Fast Fourier Transforms using ordinary Prolog data structures, and with reasonable efficiency: Learning to implement data structures & algorithms in Prolog - #4 by peter.ludemann

Wasn’t expecting to make it a Christmas miracle.
The year has still eleven (11) other months to go.

Yes!

Of course, would love to see, please share :grinning:

Since some people have shown interests, here is the code for non-backtracking d-ary heap based on prolog compound terms, together with some tests:

d-ary heaps
:- module(dary_heap, [
   create_dary_heap/3,
   nb_add_to_heap/3,
   nb_get_from_heap/3,
   nb_decrease_key/3]).

create_dary_heap(D, Size, heap(D, 1, Array, Map)) :-
   compound_name_arity(Array, array, Size),
   compound_name_arity(Map, map, Size).

set(I, Array, P-K, Map) :-
   nb_setarg(I, Array, P-K),
   nb_setarg(K, Map, I).

swap(Array, I1, E1, I2, E2, Map) :-
   set(I2, Array, E1, Map),
   set(I1, Array, E2, Map).

nb_add_to_heap(Priority, Key, Heap) :-
   Heap = heap(D, Last, Array, Map),
   compound_name_arity(Array, _, Size),
   Size >= Last,
   set(Last, Array, Priority-Key, Map),
   compare(R, Last, 1),
   sift_up(R, D, Last, Array, Priority-Key, Map),
   Last1 is Last + 1,
   nb_setarg(2, Heap, Last1).

sift_up(=, _, _, _, _, _).
sift_up(>, D, Index, Array, Priority-Key, Map) :-
   ParentIndex is (Index+2) // D,
   arg(ParentIndex, Array, ParentPriority-ParentKey),
   compare(R, ParentPriority, Priority),
   sift_up(R, D, Index, Array, Priority-Key, ParentIndex, ParentPriority-ParentKey, Map).

sift_up(< , _ , _     , _     , _     , _           , _           , _).
sift_up(= , _ , _     , _     , _     , _           , _           , _).
sift_up(> , D , Index , Array , Entry , ParentIndex , ParentEntry , Map) :-
   swap(Array, Index, Entry, ParentIndex, ParentEntry, Map),
   compare(R, ParentIndex, 1),
   sift_up(R, D, ParentIndex, Array, Entry, Map).

nb_get_from_heap(Priority, Key, Heap) :-
   Heap = heap(D, Last, Array, Map),
   Last > 1,
   arg(1, Array, Priority-Key),
   Last1 is Last - 1,
   arg(Last1, Array, Entry),
   set(1, Array, Entry, Map),
   (  Last1 > 1
   -> sift_down(D, Last1, 1, Array, Entry, Map)
   ;  true
   ),
   nb_setarg(2, Heap, Last1).

% :- table child_indices/2.

% child_indices(Index, [C1, C2, C3, C4]) :-
%    C1 is (Index-1) * 4 + 2,
%    C2 is C1 + 1,
%    C3 is C2 + 1,
%    C4 is C3 + 1.
%
% arg_(C, I, P-(I-K)) :- arg(I, C, P-K).

% sift_down(D, Last, Index, Array, Priority-Key, Map) :-
%    child_indices(Index, ChildIndices),
%    include('>'(Last), ChildIndices, ValidChildIndices),
%    (  ValidChildIndices \== []
%    -> maplist(arg_(Array), ValidChildIndices, Childs),
%       keysort(Childs, [ChildPriority-(Child-ChildKey) | _]),
%       (  Priority > ChildPriority
%       -> swap(Array, Index, Priority-Key, Child, ChildPriority-ChildKey, Map),
%          sift_down(D, Last, Child, Array, Priority-Key, Map)
%       ;  true
%       )
%    ;  true
%    ).

sift_down(D, Last, Index, Array, Priority-Key, Map) :-
   C1 is (Index-1) * D + 2,
   (  C1 >= Last
   -> true
   ;  arg(C1, Array, P1-K1),
      C2 is C1 + 1,
      (  C2 >= Last
      -> (  Priority > P1
         -> swap(Array, Index, Priority-Key, C1, P1-K1, Map),
            sift_down(D, Last, C1, Array, Priority-Key, Map)
         ;  true
         )
      ;  arg(C2, Array, P2-K2),
         C3 is C2 + 1,
         (  C3 >= Last
         -> (  P1 < P2
            -> (  Priority > P1
               -> swap(Array, Index, Priority-Key, C1, P1-K1, Map),
                  sift_down(D, Last, C1, Array, Priority-Key, Map)
               ;  true
               )
            ;  Priority > P2
            -> swap(Array, Index, Priority-Key, C2, P2-K2, Map),
               sift_down(D, Last, C2, Array, Priority-Key, Map)
            ;  true
            )
         ;  arg(C3, Array, P3-K3),
            C4 is C3 + 1,
            (  C4 >= Last
            -> (  P1 < P2
               -> (  P1 < P3
                  -> (  Priority > P1
                     -> swap(Array, Index, Priority-Key, C1, P1-K1, Map),
                        sift_down(D, Last, C1, Array, Priority-Key, Map)
                     ;  true
                     )
                  ;  Priority > P3
                  -> swap(Array, Index, Priority-Key, C3, P3-K3, Map),
                     sift_down(D, Last, C3, Array, Priority-Key, Map)
                  ;  true
                  )
               ;  (  P2 < P3
                  -> (  Priority > P2
                     -> swap(Array, Index, Priority-Key, C2, P2-K2, Map),
                        sift_down(D, Last, C2, Array, Priority-Key, Map)
                     ;  true
                     )
                  ;  Priority > P3
                  -> swap(Array, Index, Priority-Key, C3, P3-K3, Map),
                     sift_down(D, Last, C3, Array, Priority-Key, Map)
                  ;  true
                  )
               )
            ;  arg(C4, Array, P4-K4),
               (  P1 < P2
               -> (  P1 < P3
                  -> (  P1 < P4
                     -> (  Priority > P1
                        -> swap(Array, Index, Priority-Key, C1, P1-K1, Map),
                           sift_down(D, Last, C1, Array, Priority-Key, Map)
                        ;  true
                        )
                     ;  Priority > P4
                     -> swap(Array, Index, Priority-Key, C4, P4-K4, Map),
                        sift_down(D, Last, C4, Array, Priority-Key, Map)
                     ;  true
                     )
                  ;  (  P3 < P4
                     -> (  Priority > P3
                        -> swap(Array, Index, Priority-Key, C3, P3-K3, Map),
                           sift_down(D, Last, C3, Array, Priority-Key, Map)
                        ;  true
                        )
                     ;  Priority > P4
                     -> swap(Array, Index, Priority-Key, C4, P4-K4, Map),
                        sift_down(D, Last, C4, Array, Priority-Key, Map)
                     ;  true
                     )
                  )
               ;  (  P2 < P3
                  -> (  P2 < P4
                     -> (  Priority > P2
                        -> swap(Array, Index, Priority-Key, C2, P2-K2, Map),
                           sift_down(D, Last, C2, Array, Priority-Key, Map)
                        ;  true
                        )
                     ;  Priority > P4
                     -> swap(Array, Index, Priority-Key, C4, P4-K4, Map),
                        sift_down(D, Last, C4, Array, Priority-Key, Map)
                     ;  true
                     )
                  ;  (  P3 < P4
                     -> (  Priority > P3
                        -> swap(Array, Index, Priority-Key, C3, P3-K3, Map),
                           sift_down(D, Last, C3, Array, Priority-Key, Map)
                        ;  true
                        )
                     ;  Priority > P4
                     -> swap(Array, Index, Priority-Key, C4, P4-K4, Map),
                        sift_down(D, Last, C4, Array, Priority-Key, Map)
                     ;  true
                     )
                  )
               )
            )
         )
      )
   ).

nb_decrease_key(NewPriority, Key, Heap) :-
   Heap = heap(D, _, Array, Map),
   arg(Key, Map, Index),
   set(Index, Array, NewPriority-Key, Map),
   compare(R, Index, 1),
   sift_up(R, D, Index, Array, NewPriority-Key, Map).

:- begin_tests(heaps).

test(add_to_heap, []) :-
   create_dary_heap(4, 5, Heap),
   nb_add_to_heap(2, 1, Heap),
   nb_add_to_heap(3, 2, Heap),
   nb_add_to_heap(1, 3, Heap).

test(get_from_heap, []) :-
   create_dary_heap(4, 5, Heap),
   nb_add_to_heap(1, 1, Heap),
   nb_get_from_heap(P, K, Heap),
   P == 1, K == 1.

test(multiple_get_from_heap, []) :-
   create_dary_heap(4, 5, Heap),
   nb_add_to_heap(3, 1, Heap),
   nb_add_to_heap(2, 2, Heap),
   nb_add_to_heap(1, 3, Heap),
   nb_get_from_heap(P1, K1, Heap),
   P1 == 1, K1 == 3,
   nb_get_from_heap(P2, K2, Heap),
   P2 == 2, K2 == 2,
   nb_get_from_heap(P3, K3, Heap),
   P3 == 3, K3 == 1.

test(decrease_, []) :-
   create_dary_heap(4, 5, Heap),
   nb_add_to_heap(5, 1, Heap),
   nb_decrease_key(1, 1, Heap),
   nb_get_from_heap(P, K, Heap),
   P == 1, K == 1.

test(multiple_get_from_heap, []) :-
   create_dary_heap(4, 5, Heap),
   nb_add_to_heap(4, 1, Heap),
   nb_add_to_heap(3, 2, Heap),
   nb_add_to_heap(2, 3, Heap),
   nb_decrease_key(1, 1, Heap),
   nb_get_from_heap(P, K, Heap),
   P == 1, K == 1.

:- end_tests(heaps).

From my dijkstra tests, I’ve found that the bottleneck of the code is from nb_get_from_heap, especially the sift_down method, where you need to find the minimum priority of children of your current nodes.
In order to get as efficiently as possible, I have manually unrolled the use of meta predicates like include/3, maplist/3 and keysort/2 in sift_down.

However, the results are still disapointing:

?- main(2, 5).
Prolog dijkstra 1 % red-black tree state + pairing heaps
% 16,066,388 inferences, 1.442 CPU in 1.446 seconds (100% CPU, 11141349 Lips)
Prolog dijkstra 2 % non-bactracking compound state + pairing heap
% 3,178,067 inferences, 0.287 CPU in 0.288 seconds (100% CPU, 11071715 Lips)
Prolog dijkstra 2_2 % non-backtracking compound state + nb 4-ary heap
% 4,316,582 inferences, 0.581 CPU in 0.583 seconds (100% CPU, 7431574 Lips)
Networkx dijkstra
% -1 inferences, 0.141 CPU in 0.141 seconds (100% CPU, -7 Lips)

When profiling, the runtime of the 4-ary heap is dominated by arg/3 and nb_setarg/3.
I wonder if making a special purpose nb_swap predicate in C in order to swap two arguments of a compound term would improve the runtime significantly ?