Syntax/semantics for partial dicts

So, I’ve been thinking about dicts a bunch, and I’ve got an idea that addresses some of the headaches I (and, I gather, others) have been having with them, especially around unification and clause heads and whatnot. What if we extend the dict syntax to allow for partial/incomplete dicts, like we can for lists with [1,2|T]? Not getting into implementation yet, just syntax and semantics.

As far as syntax goes, I was thinking of two options:

  • The more Prolog-esque tag{k1: V1, k2: V2|Rest} (where Rest could of course be _ ) - which could cause some backwards incompatibility, since that will currently be parsed as tag{k1: V1, k2: (V2|Rest)} despite the operator precedence mismatch between ,/2 and |/2.
  • The other option would be a more ECMAScript-flavored tag{k1: V1, k2: V2, ...} for the anonymous-unification version, or tag{k1: V1, k2: V2, ...Rest} for the tail-unification mode. That also has a bit of weird operator-precedence syntactics, but it uses a syntax that currently isn’t implemented by SWI-Prolog, so it would have no backwards-compatibility implications.

Personally, I’m inclined to lean towards the first syntax, both because it feels better for Prolog and because if that syntax is used, then dicts could (conceivably) be implemented as a library, the way library(yall) is. I’ll be using that syntax for the rest of this discussion.

As for semantics, it’s basically the same as for lists. A “rest” term in a well-formed dict representation will unify to a (possibly empty) dict, none of whose keys are in the primary dict. So, we could declare the following:

?- FullDict = full{a:1, b:2, c:3, d:4}), PartialDict = _{c:3, b:2|_}.
FullDict = full{a:1, b:2, c:3, d:4},
PartialDict = _5296{b:2, c:3|_5292}.

The unification rules for full dicts with no “rest” component would be the same as they always have been; they will unify with another dict with the same tag, the same keys, and the same values for each key. This ensures backwards compatibility.

For partial dicts, we need to look at the “conceptual implementation” a little bit. In the same way that a list is defined as either the empty list [] or a term [H|T] where T is a list, a well-formed dict is defined as either:

  • The empty dict, represented {}, whose tag is the empty dict; or:
  • A term with three components:
    1. A tag, which must be an atom;
    2. A set of key-value pairs, where every key is an atom or smallint; and
    3. A “rest” parameter, which should be a dict.

Assuming the “rest” parameter is a dict, it is included in the lookup chain for any keys in this dict. If its tag unifies with the primary dict’s tag, then it is also considered logically part of the primary dict, and the two sets of keyvalue pairs may end up being transparently merged by Prolog. To maintain the division between primary dict and rest dict such that they’re both recoverable from Prolog, they must have distinct, instantiated tags.

To access this “underlying” representation, I’ll use a new system predicate called '$dict_composition'(Dict, Tag, Pairs, Rest), where Pairs unifies to a proper list of Key:Value pairs of that dict. That allows me to define the public-facing dict_rest/3 as follows:

% split Dict into Proper, a closed dict with all the existing keys, and Rest
dict_rest(Dict, Proper, Rest), Dict == {} =>
    Proper = {}, Rest = {}.
dict_rest(Dict, Proper, Rest), nonvar(Dict) =>
    must_be(Dict, dict),
    '$dict_composition'(Dict, Tag, Pairs, RTerm),
    (   is_dict(RTerm, Tag) % can we merge RTerm?
    ->  dict_rest(RTerm, RProper, Rest),
        '$dict_composition'(RProper, _, RPairs, {}),
        '$merge_dict_pairs'(Pairs, RPairs, MPairs),
        '$dict_composition'(Proper, Tag, MPairs, {})
    ;   Rest = RTerm,
        '$dict_composition'(Proper, Tag, Pairs, {})
    ).
dict_rest(Dict, Proper, Rest), nonvar(Proper) =>
    must_be(Proper, dict),
    dict_rest(Proper, PProper, {}),
    '$dict_composition'(PProper, Tag, PPairs, {}),
    (   is_dict(Rest, Tag)
    ->  dict_rest(Rest, RProper, RRest),
        '$merge_dict_pairs'(PPairs, RPairs, MPairs),
        '$dict_composition'(Dict, Tag, MPairs, RRest)
    ;   '$dict_composition'(Dict, Tag, PPairs, Rest)
    ).
dict_rest(Dict, Proper, Rest) =>
    instantiation_error((Dict, Proper)). % must provide either Dict or Proper

% '$merge_dict_pairs'(+ProperPairs, +RestPairs, -MergedPairs)
'$merge_dict_pairs'([], [], []) :- !.
'$merge_dict_pairs'(P, [], P) :- !.
'$merge_dict_pairs'([], R, R) :- !.
'$merge_dict_pairs'([K:V|PT], [K:_|RT], [K:V|MT]) :-
    !,
    '$merge_dict_pairs'(PT, RT, MT).
'$merge_dict_pairs'([K:V|PT], R, [K:V|MT]) :-
    R = [RK:_|_],
    K @< RK,
    !,
    '$merge_dict_pairs'(PT, R, MT).
'$merge_dict_pairs'(P, [K:V|RT], [K:V|MT]) :-
    P = [PK:_|_],
    K @< PK,
    !,
    '$merge_dict_pairs'(P, RT, MT).

dict_rest/3 can operate in +,-,- mode, to decompose a dict, or -,+,+ mode, to compose a dict (like, to add a Rest element to a currently-closed dict, or to append one dict to another). Note that, by the definition of '$merge_dict_pairs'/3, any keys in an instantiated Rest dict (whose tag unifies with Proper’s) that match a key in the Proper dict are silently discarded. This means that a dict with an unbound Rest component cannot have any of its existing keyvalues modified by instantiation of Rest.

This gives us the concept of the chain view of a dict, which is either a single dict with an unbound tag and a rest component which is either {} or unbound, or a chain of one or more dicts linked by their rest components, each with an instantiated tag that differs from its parent, the last of which has a rest component of {} or unbound. This is as opposed to the flat view of a dict, which has the tag of the top-level dict, all visible keyvalues, and a {}/unbound rest component. dict_rest/3 operates on the chain view of a dict.

In addition to dict_rest/3, we can operate on the chain view using a couple new syntaxes, Tag{Head|Rest} or Tag{Head} (identical to Tag{Head|{}} in the same way that [H] is identical to [H|[]]), which operate on unification with a dict. This syntax can also be used to create a dict out of a keyvalue pair term, using the ensure_dict/2 predicate:

% Tag{Head|Rest} parses to the term '{|}'(Tag, Head, Rest)
% Tag{Head} parses to the term '{|}'(Tag, Head, {})
ExistingDict = Tag{Head|Rest} :-
    is_dict(ExistingDict, Tag),
    ensure_dict(Head, HDict),
    dict_rest(ExistingDict, HDict, Rest).

ensure_dict(Data, Dict), Data = _,_ =>
    comma_list(Data, DList),
    ensure_dict(DList, Dict).
ensure_dict(Data, Dict), Data = [_|_] =>
    dict_create(Dict, _, Data).
ensure_dict([], Dict) =>
    Dict = _{}.
ensure_dict(Data, Dict) =>
    must_be(Data, dict),
    Dict = Data.

Existing dict predicates work on the flat view of a dict in a parallel manner to list predicates, and things like dict_pairs/3 will map partial dicts to partial lists, and vice versa:

dict_pairs(Dict, Tag, Pairs) :-
    nonvar(Dict), !,
    is_dict(Dict, Tag),
    dict_pairs_([], Dict, Pairs).

dict_pairs(Dict, Tag, Pairs) :-
    nonvar(Pairs), !,
    dict_pairs_rest(KVs, KVRest, Pairs, _),
    (   var(KVRest)
    ->  KVRest = [], % close the list
        '$dict_composition'(Dict, Tag, KVs, _)
    ;   '$dict_composition'(Dict, Tag, KVs, {})
    ).

dict_pairs_(KVs, Term, Pairs), var(Term) =>
    dict_pairs_rest(KVs, _, Pairs, _).
dict_pairs_(KVs, {}, Pairs) =>
    dict_pairs_rest(KVs, [], Pairs, []).
dict_pairs_(KVs0, Term, Pairs) =>
    must_be(Term, dict),
    '$dict_composition'(Term, _, TKVs, TR),
    '$merge_dict_pairs'(KVs0, TKVs, KVs1),
    dict_pairs_(KVs1, TR, Pairs).

% dict_pairs_rest/4: convert between dict Key:Value notation and pair Key-Value
% notation, preserving unbound variables at the end of both
dict_pairs_rest(DictKVs, DRest, Pairs, PRest) :-
    var(DictKVs), var(Pairs),
    !,
    DRest = DictKVs,
    PRest = Pairs.
dict_pairs_rest([K:V|DT], DR, [K-V|PT], PR) :-
    dict_pairs_rest(DT, DR, PT, PR).
dict_pairs_rest([], [], [], []).

Worth noting, put_dict/3 can at this point be implemented quite simply as:

put_dict(New, DictIn, DictOut) :-
    DictOut = _{New|DictIn}.

And finally, unification. Unification operates on the chain view of each dict, meaning that consecutive unifiable tags are merged, logically if not physically, and it works on the same principle, in reverse; since you can compose or extend a dict by adding another dict with the same tag to the head of the chain, it follows that you can decompose any dict into a two-dict chain that share the same tag. So, the logic for unifying two dicts is as follows:

  1. Ensure each dict is in proper chain view
  2. Unify the tags of the two dicts
  3. For each dict:
    • If there are remaining keys, use them to form a new dict with the same tag and unify it with its counterpart’s “rest” component (which will fail unless the rest component is unbound)
    • Otherwise, unify this dict’s “rest” with its counterpart’s “rest”

Represented in Prolog form:

% dict_composition/4: a friendlier version of '$dict_composition', which
% respects tag-merging rules
dict_composition(Dict, Tag, KVPairs, Rest) :-
    dict_rest(Dict, Proper, Rest),
    '$dict_composition'(Proper, Tag, KVPairs, _).

% '$unify_dicts', called when unifying two instantiated dict terms
'$unify_dicts'(Dict1, Dict2) :-
    dict_composition(Dict1, Tag, KV1, Rest1),
    dict_composition(Dict2, Tag, KV2, Rest2),
    '$unify_dict_pairs'(KV1, R1, KV2, R2),
    (   R1 == [], R2 == []
    ->  Rest1 = Rest2
    ;   R1 == []
    ->  Rest1 = Tag{R2|Rest2}
    ;   R2 == []
    ->  Tag{R1|Rest1} = Rest2
    ).

'$unify_dict_pairs'([K:V|KV1], R1, [K:V|KV2], R2) :-
    '$unify_dict_pairs'(KV1, R1, KV2, R2).
'$unify_dict_pairs'([K1:V1|KV1], R1, KV2, [K1:V1|R2]) :-
    KV2 = [K2:_|_],
    K1 @< K2,
    '$unify_dict_pairs'(KV1, R1, KV2, R2).
'$unify_dict_pairs'(KV1, [K2:V2|R2], [K2:V2|KV2], R2) :-
    KV1 = [K1:_|_],
    K2 @< K1,
    '$unify_dict_pairs'(KV1, R1, KV2, R2).
'$unify_dict_pairs'([], KV2, KV2, []).
'$unify_dict_pairs'(KV1, [], [], KV1).

And @jan, before you start panicking about internals, I do of course have ideas about internal representation and implementation, but I figured I’d start a discussion about syntax and semantics before getting into that, not to mention this post is long enough already :grinning:

I think this is very good, following the list syntax.

I constantly run into this problem, this would be a great addition to for flexible matching of dicts in the head.

I think Jan had this kind of thing in mind already here:

Note In the current implementation, two dicts unify only if they have the same set of keys and the tags and values associated with the keys unify. In future versions, the notion of unification between dicts could be modified such that two dicts unify if their tags and the values associated with common keys unify, turning both dicts into a new dict that has the union of the keys of the two original dicts.

Yeah, that was the comment that led me to think that this is maybe something that could be revised! One thing I kind of love about this idea is, it opens the door to a proper Prolog inheritance model:

new_baseclass(Foo, baseclass{foo: Foo, bar: standard_bar}).

new_subclass(Foo, OverrideBar, Baz, subclass{bar: OverrideBar, baz: Baz|Base}) :-
    new_baseclass(Foo, Base).

get_foobar(Dict, Foobar) :-
    atom_concat(
        Dict.foo, % direct lookup for a baseclass{}, chain lookup for subclass{}
        Dict.bar, % gets standard_bar for a baseclass{}, OverrideBar for subclass{}
        Foobar).

I just glanced through the proposal and one item came to my mind …

Unlike lists, the key-value pairs in a dict do not have an order defined – i.e. two dicts that include the same key-value pairs are, by definition, equivalent no matter the order … i think this affects the unification semantics:

[First | Rest] in lists is “positional” – whereas in a dict D = struct:{First | Rest} – might mean something like this: member(First, D), subtract(First, D, Rest).

Dan

The D = struct{First|Rest} syntax doesn’t get individual members, it pulls the entire top dictionary off the lookup chain, leaving you with a closed, simple dictionary First and a possible dictionary chain (or possibly just the empty dict {}) Rest. So, in a way, it is positional.

If it helps, think about lookup chains as a list of dictionaries, like [specific{a:1,b:2}, general{a:5,c:7}, base{x:2}]. The dot operator will “look through” the dictionaries in that order, so D.a will resolve to 1, D.c resolves to 7, and D.x resolves to 2. If you write_canonical(D) on that, you’ll get something like (with spaces and annotations for clarity):

 specific{a:1, b:2 | general{a:5, c:7 | base{x:2 | {} } } }
%specific{         |                                      }
%                    general{         |                 }
%                                       base{    |    }
% empty dict symbol:                               {}

Running the above query would leave you with (omitting the ending {} sentinels like list portrayal does):

?- D = specific{First|Rest}.
D = specific{a:1, b:2 | general{a:5, c:7 | base{x:2} } },
First = specific{a:1, b:2},
Rest = general{a:5, c:7 | base{x:2} }.

On the other hand, if you use the {|} syntax to pull out a single member of the top dict and that doesn’t leave it empty, you’ll get:

?- D = specific{a: A|Rest}.
D = specific{a:1, b:2 | general{a:5, c:7 | base{x:2} } },
A = 1,
Rest = specific{b:2 | general{a:5, c:7 | base{x:2} } }.

And finally, if you use that syntax to add a value to the top dict, you get:

?- NewD = Tag{c:17|D}.
D = specific{a:1, b:2 | general{a:5, c:7 | base{x:2} } },
Tag = specific,
NewD = specific{a:1, b:2, c:17 | general{a:5, c:7 | base{x:2} } }.

(all this, of course, assuming that somehow D magically populates itself in the query >.>)

Thanks.

I guess, its then a new kind of thing – its like analogous of a list of lists – or perhaps (more) deeply nested structure.

Dan

I am greeting the initiative, I work (also) with lingustic applications, and I have already badly needed the (partial) unification feature…
I don’t really mind the formalism, I can use parentheses in case of “|” - but I am not against the other “…” either…
Yes I am using dict-in-dict at the moment, I hope, it wouldnt change… I am not against small syntactical changes… my applications are Prolog-like, dicts are usually used on the tip of the iceberg…

yours: Imre

1 Like

Interestingly, that’s almost an exact parallel structure to the internal dict representation I’ve been thinking of! It makes a lot of sense to have the encapsulation chain inverted though, when you’re trying to implement a polymorphic type system.

Yeah, that was my thought as well. To be honest, I’d probably use parentheses there anyway, just because the precedence mismatch would feel weird.

I certainly wouldn’t expect anything to change about the existing semantics! That was definitely one of my primary goals when I was thinking about this. Also, I’m not sure if this is relevant to you, but in cases where you have a lot of similar dicts in a list or something, this method could actually have a much more compact internal representation than the current implementation.

After a couple of years of experience with dicts it is surely worthwhile to evaluate the result :slight_smile: Instead of jumping to conclusions I propose to get an idea of which problems people expected dicts to solve where dicts fails to satisfy the expectations. Partial dicts seem to be one of them. I have the impression various people want this to mean different things.

If I understand the proposal correctly @dmchurch wants to realize a chain of dicts where the actual value associated to a key is the value found in the first dict that contains the key. At first sight, I do not really like the _{k:v, … | Rest} for this. This looks like a sensible syntax for extracting some keys from a dict and produce another dict with the remaining keys. There are surely use-cases for that. Unlike processing lists this way though, for dicts this leads to N^2 algorithms and shouldn’t be invited by syntactic support IMO. Note that dict_pairs/3 is the dict alternative for =../2 and we also do not have a(A,B|Rest).

There seem to be multiple ways to use chains of dicts … I think this needs some time to reflect and establish good practice before looking into low-level support. Note that a chain of dicts can also be represented using some suitable operator and then e.g. D1::D2::D3, …

1 Like

My thought was to move more of the processing to compile-time, actually. At the moment, a dict is a functor with a common reserved-symbol name of arity 2K+1 (tag plus key-value pairs); I’d like to change that to a functor of arity K+1 (K values, plus chain term) whose name is, effectively, a compound term of arity K+1 (tag plus K keys). In much the same way as atoms themselves are globally allocated, any literal tagged dict seen anywhere in source can cause a registration of that dict’s tag/keys (in addition, of course, to registering the atoms involved themselves). At that point a partial head unification of a known dict can be transformed into a static functor of known arity at parse time. I’d originally worried about the compatibility concerns of internally using a compound term instead of an atom as a functor name, but then I learned that the existing dict functor’s name is a non-atom special symbol, so I don’t feel as bad :grinning_face_with_smiling_eyes:

(oh, and one other thing - I consider this model a starting point for discussion and revision, not a finished design set in stone. I take “request for comments” quite literally!)

That does lead me to the question as to what is the “dot on the horizon” you have in mind?

Hmm, good question. Thinking about it, I’ve got three main goals I’d like to see from dicts:

1) Dicts can be used as clause heads without explicitly naming all the keys.

Even just a change of syntax/implementation that amounted to dicts being represented as

'<dict>'(Tag, keys(K1,K2,K3...), values(V1,V2,V3...))
% separating keys() and values() allows the definitely-grounded keys() to be shared

would be welcome, since it would allow unification of dicts on tag alone, while still preserving the access-by-argument implementation of the current system.

2) Accessing a value in a well-known dict by key is an O(1) operation.

This speaks to the “dicts as functors with named arguments” idea. I’d like these two clauses to perform with the same efficiency:

area(circle(_X, _Y, R), A) :- A is pi * R * R.
area(circle{Circle},    A) :- A is pi * Circle.r * Circle.r.

Ideally, the second would compile into effectively-identical code to the first, given a priori (or at least, discoverable) knowledge of the structure of the circle dict.

3) Dict storage overhead approaches 0 as more dicts with the same structure are used

Where by “overhead” I’m referring to storage space beyond an equivalent compound-term implementation. To be honest, for simple tasks I’m likely just to use compound terms anyway, when the number of values is small, obvious, and sensibly ordered - like, there are really only two potential orderings for a circle(_, _, _) term, depending on whether you put radius first or last :joy:

The times I’d really want to lean on dicts are when I have (a) a large-ish number of keys, like say “my whole application state”, and I really don’t want to be typing underscores out to the edge of the screen when unifying (and where I might want that O(1) field access time, as above) and I don’t want to double the time it takes to memory copies when updating state, or when (b) I’ve got a huge dataset of items, all alike (perhaps loaded from a CSV or from JSON) and I don’t want to be using a ton of extra space to store keys with each and every dict.

2 Likes

Fair enough. Note that there are a lot of dynamic (unique) dicts around. Preserving key ordering might be another thing to consider. It is complicated stuff though. I think we do want to preserve the current unification semantics.

Definitely! Preserving the unification semantics for existing dicts is an overriding priority, I’d say, which is why anything that alters the semantics would need to be coupled with some amount of new syntax, something that would make it opt-in on the part of the developer.

Also, the need for dynamic/unique/extensible dicts is part of what was spurring my chain resolution idea. If, say, you wanted to take a dict with a well-known (and thus, storage/access-optimized) structure and add an arbitrarily-named field to it, the engine wouldn’t need to create a whole new functor to represent it, it could just add the new field as a key in a chained dict.

Incidentally, that’s why the stipulation that adjacent dicts in a chain with unifiable tags are logically “part of” the same dict structure - it’s partly to enable the unification semantics, yes, but it’s also to allow the system to transparently split a large dict between “hot” and “cold” sections, to avoid having to search as many keys on hot accesses, and to avoid having to update (and copy) a huge structure when adding (or removing, for that matter, you just need a reserved “whiteout” symbol like Linux uses for overlayfs) keys to an existing dict.

For that matter, you could even have a cutover point where adding dict fields starts out by appending them in transparent, single-field dicts to be searched sequentially if the bsearch fails, only doing the full key-merge operation after a certain threshold is reached. The amortized time savings on workloads where keys are frequently added to dicts would be huge, I’d imagine. It’s the same reason other language implementations might use a list/array implementation for small dictionaries and switch to a hash table for large ones. (See, for example: HybridDictionary Class (System.Collections.Specialized) | Microsoft Docs)

1 Like

Interestingly, i am using something like this by way of assoc – where global relational states are tagged away through keys.

This however has the drawback that not only do i have to pass along the assoc before and after state to every goal – its also often a two step access – to get to application state – first the global assoc and then a second assoc that relationally stores state.

Is this the use case you have in mind as well?

Dan

I was not immediately familiar with the term “whiteout” as used so looked it up and found this blog.

How containers work: overlayfs

This is also a case where I find using Google image search (search) works better than Google text search (search)


Of interest is the MIT open courseware on Advanced Data Structures by Erik Demaine

Related keywords are
Persistent data structure
Retroactive data structure

1 Like

Hi,

Thank you.

I need a retractable structure that supports backtracking.

I guess this begs the question what approach to a backtrackable “state” data structure is better / faster – use of a relational data structures passed along as bound variables, such as assoc or a structure that gets asserted and retracted during backtracking.

I think DCG grammar with their implicit threading of states are one example of a syntax that manage state across predicates without requiring to explicitly mention them in the code.

I guess, in so far, keep backtrackable (relational) state is something that happens frequently, perhaps a generalization of the DCG could be envisioned which also optimizes fast access and update.

I was at a time think to use global variables for backtrackable states – but, global variables are currently thread specific, so can’t be shared across threads – which is limiting.

Dan