Question about dictionaries and their nuances

I’m using: SWI-Prolog version 8.1.9.

I’ve been reading about Prolog dictionaries for a while but so far have avoided them because I don’t understand the nuances of them:

https://www.swi-prolog.org/pldoc/man?section=bidicts

My concern is that some subtle nuance in the behavior of dicts will bite me somewhere down the line when I use them in a complex search or scan. Is there anything you can’t do with a dictionary field that you can do with plain Prolog term?

For example, if I wanted to pull out a single field in a dict item and use it’s value for a choice point in a Prolog goal, will I discover that there are differences in how the code would behave compared to a plain Prolog term?

The problem is my current mental images have to do with either JSON objects or SQL records whereby to do anything with a field in an object or a record (respectively), you first have to pull up the object/record from a collection, and then do your work on the retrieved object/record. I worry that Prolog dicts will have the same or similar behavior.

One gotcha I’ve found is performance (which is mentioned in the documentation) – dicts aren’t a replacement for library(assoc) or library(rbtrees). I have some code that does a big update on dicts (I’ll eventually get rid of this dict and replace it by rbtree):

%! update_new_dict(+KVs:list(pair), -Dict0:dict, +Dict:dict) is det.                                                    %% Add all key-value pairs in KVs into the dict, skipping any that
%% are already in the dict.
%% The keys are processed in order, so if a key is added, any subsequent
%% values of that key are ignored.
%% Note that dict_pairs(Dict, Tag, KVs) doesn't allow duplicate keys.

%% This is the straightforward implementation, but it creates
%% a lot of garbage and is slow.

%% update_new_dict([], Dict, Dict).
%% update_new_dict([K-V|KVs], Dict0, Dict) :-
%%     (  get_dict(K, Dict0, _)
%%     -> Dict1 = Dict0
%%     ;  put_dict(K, Dict0, V, Dict1)
%%     ),
%%     update_new_dict(KVs, Dict1, Dict).

%% The following is several times faster than the above.  (For 11,000
%% items, make_rb is about 20x faster than the equivalent loop for a
%% dict, but there's still significant cost converting to/from rbtree).
%% It would be better to just leave everything as a RB-tree.

update_new_dict(KVs, Dict0, Dict) :-
    dict_pairs(Dict0, Tag, KVs0),
    ord_list_to_rbtree(KVs0, Rb0),
    make_rb(KVs, Rb0, Rb1),
    rb_visit(Rb1, KVs1),
    dict_pairs(Dict, Tag, KVs1).

make_rb([], Rb, Rb).
make_rb([K-V|KVs], Rb0, Rb) :-
    rb_insert(Rb0, K, V, Rb1),
    make_rb(KVs, Rb1, Rb).

Prolog dicts are just compound terms:

5 ?- _{a:1, b:2} =.. L.
L = [C'dict', _6186, 1, a, 2, b].

The value-key pairs are adjacent and sorted on their internal representation. As a compound is deep down a C array we can do a quite efficient binary search for a key and we can quickly merge two dicts. The C'dict' is an internal constant to identify the dict functor.

2 Likes

Since I mainly use dicts provided by json_read_dict/2, I tend to think of them as synonymous with JSON. The ‘dot’ syntax used to get values from a Prolog dict is nearly identical to the syntax used by Javascript et al to get values from a JSON object, so it’s possibly more familiar to people coming from a traditional C-family background than to Prolog purists.

I find dicts handy if the data is heavily nested, which JSON supplied by various web applications tend to be (I’ve provided an example of getting weather data at https://github.com/roblaing/swipl-webapp-howto/tree/master/unit5).

But if you aren’t dealing with JSON, and the data isn’t represented by heavily nested lists, I find sticking to traditional Prolog keeps the code neater and easier to understand later.

2 Likes

The problem is that I have a term with about 14 “fields” now (arguments?) and when I create predicates that need to operate on a particular field or fields within that term or a list of those terms, I find myself continually tweaking those predicates to account for new fields I add later on. I’m hoping that I can use a dict instead to make things cleaner.

1 Like

To quote myself from something I wrote a few weeks back and then forgot until I reread it again just now:

I find dictionaries a great addition to Prolog in that they circumvent the problem in traditional Prolog of a proliferation of context arguments to pass on to predicates since the original designers were opposed to lexical or any other kind of scoping.

I put that in the linking to a database part of my web tutorial at https://github.com/roblaing/swipl-webapp-howto/tree/master/unit3.

If the 14 fields and growing are context arguments, I’d say initially creating something like

  Dict = _{key1:value1, key2:value2,...}

which can then subsequently be passed as one argument with each individual element referenced as, say Dict.key1 makes life a lot easier.

There are probably dozens of other ways to do this, including bunging everything in the clause store, but I like keeping an environment dictionary to pass beteen predicates since I’m no Prolog purist and hop between lots of different programing languages.

1 Like

Same here. :slight_smile:

1 Like

Using a Logtalk parametric object, you can simply pass a RB-tree as a parameter and then access it from any of the object predicates. Access to the parameters is O(1) and thus the complexity would be the complexity of the RB-tree operation itself. But given that parameters are logical variables, you cannot lookup key values but not add new key-value pairs or update existing keys. You could use an “input” RB-tree parameter and an “output” RB-tree parameter (as in RB-tree predicates that make updates) but I suspect that would not cover your usage scenario…? If updating pairs is a central requirement, dicts may provide the most practical choice.

Logtalk parameter variables lexical scope are the contents of the parametric object (or category and thus visible to all encapsulated directives, predicates, and grammar rules). They use a “syntax” that facilitates visually distinguishing them from other variables (they must start and end with an underscore). They are a practical choice for a small number of context arguments. But for a large number of “fields”, combining them in e.g a dict or a RB-tree is more practical and also makes it simpler to add and remove “fields”.

@pmoura - I’m a bit confused by your response … my original code used dicts for convenience and readability (I didn’t use updateable dicts – instead I used input and output parameters, hidden away using EDCG preprocessing) but profiling showed me that updating the dict was slow, so I wrote this code to speed up one bulk update. Eventually I’ll switch everything that’s performance-critical from dict to rbtree (in my case, rbtree turns out to be faster than assoc; YMMV).

[As for readability: I could have written a portray/1 rule for pretty-printing an rbtree; but dict already has a readable format, with the advantage that it also handles input – there’s no input equivalent to portray/1.]

Dicts by default follow the normal Prolog model that only allows instantiation and data is otherwise read-only. Like compounds, there are predicates to destructively update values of existing keys. A normal relational update is O(n) as opposed to O(log(n)) for assoc/rbtree. The constant factor is seriously lower though. If I recall correctly, an insert uses binary search to find the insertion location and then uses memcpy() to copy the unmodified parts.

At some point I did some comparison and from my memory, the assoc starts to win above approximately1,000 keys. YMMV though as dicts also use less memory, but an update creates a complete copy while assoc/rbtrees copy only log(n) nodes.

In my experience dicts are ok for most purposes except for incrementally building them from a (large) data set. In that case you use something else for the incremental build and translate the result in a dict. Note that merging is very fast, so if you can do the incremental build by splitting the data in two and merge the resulting dicts (recursively) this can be clean and fast (and run concurrently).

3 Likes