Dict tags, large JSON files & memory consumption

alanbur · May 14, 2025, 8:43am

I’m using dicts to read large external JSON datasets (~100Mb), which may have up to 50k top level JSON Objects. They read fine using json_read_dict() but it appears that all the tags are unbound, which I assume means there’s a var for each tag? As I don’t have any use for the tags, would it save memory if I used default_tag('') when reading the JSON in? Thanks!

jan · May 14, 2025, 9:15am

No. The memory usage of a variable, atom and small number are all the same. A dict with N key-values takes 2+N*2 “words” (8 bytes), regardless of the tag.

What can make a difference is the value_string_as option. When using the default (string), each JSON value that is a string is allocated on the stack as a Prolog string. This takes 16+(len+9)//8 bytes for text that fits ISO-Latin-1 (all code points <256). Alternatively, use atom. Atoms fit directly in the dict, so they do not make the dict bigger. Duplicate strings use the same atom. But, there is always a but , atoms themselves are bigger than strings and they are shared resources, which makes them more expensive to create and reclaim. So, if there are few duplicate string values, the memory usage goes up while the stack usage still goes down. If the stack is not big enough though, you can enlarge it to any size your computer can deal with.

alanbur · May 14, 2025, 9:38am

Thanks for the quick answer I’m already using value_string_as(atom) because I know there are lots of duplicated URI-like strings in the data and I’d guessed that atoms were probably shared. The JSON is deeply nested but apart from one initial transform on the top level of the dict “tree”, it’s going to be immutable, as I’ll be matching predicates against it to look for configuration problems in the system it represents.

I’m still not clear what the use case for tags is, the docs make an oblique mention of modules but other than that I haven’t seen anything that couldn’t be done without them.

Boris · May 14, 2025, 9:44am

If you use the dict as a (temporary, backtrackable) key-value store, you don’t need the tag.

If you want to define functions on a dict, you need the tag. See “User defined functions on dicts”.

One reason to leave the dict tag a variable might be to prevent you from accidentally defining a function on that tag elsewhere in your code.

alanbur · May 14, 2025, 9:46am

Ahah, thanks That might indeed be useful, so good to know.

jan · May 14, 2025, 9:57am

I tend to use # for “anonymous” dicts. The disadvantage of a variable is that the dict is not a ground term if all values are ground.

If you want to do a lot of reasoning over this dict data I’d consider to first create a set of (dynamic) predicates from it that reflect the parts you are interested in. Most likely that makes the subsequent reasoning a lot easier to read as well as a lot faster. As JSON data tends to be very verbose, it may also reduce the size of the data considerably. If you represent everything as predicates it is likely you’ll need more space though.

alanbur · May 14, 2025, 10:31am

OK, # sounds like what I want, thanks.

The data represents a network of interconnected resources that may have configuration issues. Each resource has a unique key and a set of attributes. Relationships between resources are expressed using the unique keys. I’ll need to express two sorts of correctness checks:

Checks within a resource, e.g. does it have one of a set of valid values in an attribute.
Checks between resources, e.g. if resource A has an attribute B with value C, all the X resources with a reference to A should also have the same value C in their Y attribute.

I have though about extracting just the “interesting” subsets of resource’s attributes from the JSON dicts and keeping a reference back to the full resource record for displaying results etc, so it sounds like I’m not too far off the mark, thanks!

I’ve also thought about providing a DSL for expressing the various rules, but first I want to make sure that it works with “hand crafted” rules.

The last Prolog I wrote was in the 1980s, so I’m having to reboot a lot of very dormant neurons!

Topic		Replies	Views
This looks like an error in dict handling: Help!	2	290	July 15, 2020
Yaml_read atom vs string Help!	4	312	September 19, 2022
Reply_json_dict: avoid losing distinction between atoms and strings Help!	1	293	February 18, 2019
Json_dict/2 - Helpful for learning how to use JSON with SWI-Prolog dict Useful Code	0	574	September 15, 2021
Binding to the tag of a (variable sized) dict General	0	225	February 22, 2023

Dict tags, large JSON files & memory consumption

Related topics