So, I’ve been thinking about dicts a bunch, and I’ve got an idea that addresses some of the headaches I (and, I gather, others) have been having with them, especially around unification and clause heads and whatnot. What if we extend the dict syntax to allow for partial/incomplete dicts, like we can for lists with [1,2|T]
? Not getting into implementation yet, just syntax and semantics.
As far as syntax goes, I was thinking of two options:
- The more Prolog-esque
tag{k1: V1, k2: V2|Rest}
(whereRest
could of course be_
) - which could cause some backwards incompatibility, since that will currently be parsed astag{k1: V1, k2: (V2|Rest)}
despite the operator precedence mismatch between,/2
and|/2
. - The other option would be a more ECMAScript-flavored
tag{k1: V1, k2: V2, ...}
for the anonymous-unification version, ortag{k1: V1, k2: V2, ...Rest}
for the tail-unification mode. That also has a bit of weird operator-precedence syntactics, but it uses a syntax that currently isn’t implemented by SWI-Prolog, so it would have no backwards-compatibility implications.
Personally, I’m inclined to lean towards the first syntax, both because it feels better for Prolog and because if that syntax is used, then dicts could (conceivably) be implemented as a library, the way library(yall)
is. I’ll be using that syntax for the rest of this discussion.
As for semantics, it’s basically the same as for lists. A “rest” term in a well-formed dict representation will unify to a (possibly empty) dict, none of whose keys are in the primary dict. So, we could declare the following:
?- FullDict = full{a:1, b:2, c:3, d:4}), PartialDict = _{c:3, b:2|_}.
FullDict = full{a:1, b:2, c:3, d:4},
PartialDict = _5296{b:2, c:3|_5292}.
The unification rules for full dicts with no “rest” component would be the same as they always have been; they will unify with another dict with the same tag, the same keys, and the same values for each key. This ensures backwards compatibility.
For partial dicts, we need to look at the “conceptual implementation” a little bit. In the same way that a list is defined as either the empty list []
or a term [H|T]
where T is a list, a well-formed dict is defined as either:
- The empty dict, represented
{}
, whose tag is the empty dict; or: - A term with three components:
- A tag, which must be an atom;
- A set of key-value pairs, where every key is an atom or smallint; and
- A “rest” parameter, which should be a dict.
Assuming the “rest” parameter is a dict, it is included in the lookup chain for any keys in this dict. If its tag unifies with the primary dict’s tag, then it is also considered logically part of the primary dict, and the two sets of keyvalue pairs may end up being transparently merged by Prolog. To maintain the division between primary dict and rest dict such that they’re both recoverable from Prolog, they must have distinct, instantiated tags.
To access this “underlying” representation, I’ll use a new system predicate called '$dict_composition'(Dict, Tag, Pairs, Rest)
, where Pairs
unifies to a proper list of Key:Value
pairs of that dict. That allows me to define the public-facing dict_rest/3
as follows:
% split Dict into Proper, a closed dict with all the existing keys, and Rest
dict_rest(Dict, Proper, Rest), Dict == {} =>
Proper = {}, Rest = {}.
dict_rest(Dict, Proper, Rest), nonvar(Dict) =>
must_be(Dict, dict),
'$dict_composition'(Dict, Tag, Pairs, RTerm),
( is_dict(RTerm, Tag) % can we merge RTerm?
-> dict_rest(RTerm, RProper, Rest),
'$dict_composition'(RProper, _, RPairs, {}),
'$merge_dict_pairs'(Pairs, RPairs, MPairs),
'$dict_composition'(Proper, Tag, MPairs, {})
; Rest = RTerm,
'$dict_composition'(Proper, Tag, Pairs, {})
).
dict_rest(Dict, Proper, Rest), nonvar(Proper) =>
must_be(Proper, dict),
dict_rest(Proper, PProper, {}),
'$dict_composition'(PProper, Tag, PPairs, {}),
( is_dict(Rest, Tag)
-> dict_rest(Rest, RProper, RRest),
'$merge_dict_pairs'(PPairs, RPairs, MPairs),
'$dict_composition'(Dict, Tag, MPairs, RRest)
; '$dict_composition'(Dict, Tag, PPairs, Rest)
).
dict_rest(Dict, Proper, Rest) =>
instantiation_error((Dict, Proper)). % must provide either Dict or Proper
% '$merge_dict_pairs'(+ProperPairs, +RestPairs, -MergedPairs)
'$merge_dict_pairs'([], [], []) :- !.
'$merge_dict_pairs'(P, [], P) :- !.
'$merge_dict_pairs'([], R, R) :- !.
'$merge_dict_pairs'([K:V|PT], [K:_|RT], [K:V|MT]) :-
!,
'$merge_dict_pairs'(PT, RT, MT).
'$merge_dict_pairs'([K:V|PT], R, [K:V|MT]) :-
R = [RK:_|_],
K @< RK,
!,
'$merge_dict_pairs'(PT, R, MT).
'$merge_dict_pairs'(P, [K:V|RT], [K:V|MT]) :-
P = [PK:_|_],
K @< PK,
!,
'$merge_dict_pairs'(P, RT, MT).
dict_rest/3
can operate in +,-,-
mode, to decompose a dict, or -,+,+
mode, to compose a dict (like, to add a Rest element to a currently-closed dict, or to append one dict to another). Note that, by the definition of '$merge_dict_pairs'/3
, any keys in an instantiated Rest dict (whose tag unifies with Proper’s) that match a key in the Proper dict are silently discarded. This means that a dict with an unbound Rest component cannot have any of its existing keyvalues modified by instantiation of Rest.
This gives us the concept of the chain view of a dict, which is either a single dict with an unbound tag and a rest component which is either {}
or unbound, or a chain of one or more dicts linked by their rest components, each with an instantiated tag that differs from its parent, the last of which has a rest component of {}
or unbound. This is as opposed to the flat view of a dict, which has the tag of the top-level dict, all visible keyvalues, and a {}
/unbound rest component. dict_rest/3
operates on the chain view of a dict.
In addition to dict_rest/3
, we can operate on the chain view using a couple new syntaxes, Tag{Head|Rest}
or Tag{Head}
(identical to Tag{Head|{}}
in the same way that [H]
is identical to [H|[]]
), which operate on unification with a dict. This syntax can also be used to create a dict out of a keyvalue pair term, using the ensure_dict/2
predicate:
% Tag{Head|Rest} parses to the term '{|}'(Tag, Head, Rest)
% Tag{Head} parses to the term '{|}'(Tag, Head, {})
ExistingDict = Tag{Head|Rest} :-
is_dict(ExistingDict, Tag),
ensure_dict(Head, HDict),
dict_rest(ExistingDict, HDict, Rest).
ensure_dict(Data, Dict), Data = _,_ =>
comma_list(Data, DList),
ensure_dict(DList, Dict).
ensure_dict(Data, Dict), Data = [_|_] =>
dict_create(Dict, _, Data).
ensure_dict([], Dict) =>
Dict = _{}.
ensure_dict(Data, Dict) =>
must_be(Data, dict),
Dict = Data.
Existing dict predicates work on the flat view of a dict in a parallel manner to list predicates, and things like dict_pairs/3
will map partial dicts to partial lists, and vice versa:
dict_pairs(Dict, Tag, Pairs) :-
nonvar(Dict), !,
is_dict(Dict, Tag),
dict_pairs_([], Dict, Pairs).
dict_pairs(Dict, Tag, Pairs) :-
nonvar(Pairs), !,
dict_pairs_rest(KVs, KVRest, Pairs, _),
( var(KVRest)
-> KVRest = [], % close the list
'$dict_composition'(Dict, Tag, KVs, _)
; '$dict_composition'(Dict, Tag, KVs, {})
).
dict_pairs_(KVs, Term, Pairs), var(Term) =>
dict_pairs_rest(KVs, _, Pairs, _).
dict_pairs_(KVs, {}, Pairs) =>
dict_pairs_rest(KVs, [], Pairs, []).
dict_pairs_(KVs0, Term, Pairs) =>
must_be(Term, dict),
'$dict_composition'(Term, _, TKVs, TR),
'$merge_dict_pairs'(KVs0, TKVs, KVs1),
dict_pairs_(KVs1, TR, Pairs).
% dict_pairs_rest/4: convert between dict Key:Value notation and pair Key-Value
% notation, preserving unbound variables at the end of both
dict_pairs_rest(DictKVs, DRest, Pairs, PRest) :-
var(DictKVs), var(Pairs),
!,
DRest = DictKVs,
PRest = Pairs.
dict_pairs_rest([K:V|DT], DR, [K-V|PT], PR) :-
dict_pairs_rest(DT, DR, PT, PR).
dict_pairs_rest([], [], [], []).
Worth noting, put_dict/3 can at this point be implemented quite simply as:
put_dict(New, DictIn, DictOut) :-
DictOut = _{New|DictIn}.
And finally, unification. Unification operates on the chain view of each dict, meaning that consecutive unifiable tags are merged, logically if not physically, and it works on the same principle, in reverse; since you can compose or extend a dict by adding another dict with the same tag to the head of the chain, it follows that you can decompose any dict into a two-dict chain that share the same tag. So, the logic for unifying two dicts is as follows:
- Ensure each dict is in proper chain view
- Unify the tags of the two dicts
- For each dict:
- If there are remaining keys, use them to form a new dict with the same tag and unify it with its counterpart’s “rest” component (which will fail unless the rest component is unbound)
- Otherwise, unify this dict’s “rest” with its counterpart’s “rest”
Represented in Prolog form:
% dict_composition/4: a friendlier version of '$dict_composition', which
% respects tag-merging rules
dict_composition(Dict, Tag, KVPairs, Rest) :-
dict_rest(Dict, Proper, Rest),
'$dict_composition'(Proper, Tag, KVPairs, _).
% '$unify_dicts', called when unifying two instantiated dict terms
'$unify_dicts'(Dict1, Dict2) :-
dict_composition(Dict1, Tag, KV1, Rest1),
dict_composition(Dict2, Tag, KV2, Rest2),
'$unify_dict_pairs'(KV1, R1, KV2, R2),
( R1 == [], R2 == []
-> Rest1 = Rest2
; R1 == []
-> Rest1 = Tag{R2|Rest2}
; R2 == []
-> Tag{R1|Rest1} = Rest2
).
'$unify_dict_pairs'([K:V|KV1], R1, [K:V|KV2], R2) :-
'$unify_dict_pairs'(KV1, R1, KV2, R2).
'$unify_dict_pairs'([K1:V1|KV1], R1, KV2, [K1:V1|R2]) :-
KV2 = [K2:_|_],
K1 @< K2,
'$unify_dict_pairs'(KV1, R1, KV2, R2).
'$unify_dict_pairs'(KV1, [K2:V2|R2], [K2:V2|KV2], R2) :-
KV1 = [K1:_|_],
K2 @< K1,
'$unify_dict_pairs'(KV1, R1, KV2, R2).
'$unify_dict_pairs'([], KV2, KV2, []).
'$unify_dict_pairs'(KV1, [], [], KV1).
And @jan, before you start panicking about internals, I do of course have ideas about internal representation and implementation, but I figured I’d start a discussion about syntax and semantics before getting into that, not to mention this post is long enough already