Locality of Reference linked lists v.s dynamic (resizeable) vectors


Thinking about the advantages of linked lists (e.g. cons / prolog lists) vs. array representations (e.g. args), and related copying overhead of arrays when they need to be resized, I noticed the following article that offers some benchmarks.

It seems that locality of reference – i.e. the optimal use of memory cache vs. pointer chasing in linked-lists which often misses caches is a key reason of reduced performance – also, per a referenced lecture by Bjarne Stroustrup [1])

I also noticed early papers on Hashed Arrays [2] and unrolled lists [3] – i find the Hashed Array in particular interesting, but unrolled lists also seem to have much potential.

I wonder how such data structures considerations compare to how Prolog stores and accesses list and graph structures stored as dynamic facts or as terms during predicate processing.


[1] Day 1 Keynote - Bjarne Stroustrup: C++11 Style | GoingNative 2012 | Channel 9

[2] https://www.drdobbs.com/database/algorithm-alley/184409965?pgno=5
[3] Unrolled linked list - Wikipedia

CDR-coding is a 3rd possibility.

I find it interesting that the article starts with a warning sign (as in, a picture), followed by a lengthy explanation why the Number crunching part of the title is important. The short 5-paragraph exposition ends with (and here I have kept the author’s text formatting):

I hope with this Mea Culpa you can see beyond any initial confusion which is all my fault and see to the core issue: the cons of linked-list for a set of number crunching problems .

I think its indeed crucial to keep the access patterns in mind – which i guess is the meaning of “number crunching” … in addition he considers the expansion pattern of memory usage and size of each (passive) data structure and the size of memory cache …


Not sure I understand what you mean here. Again, from the Intro of the article that you linked:

Number crunching is the term used here for describing the crunching, sorting, inserting, removing and simply juggling of raw numbers or POD data.

One subtle difference between “raw numbers” and plain old data structures, on the one hand, and almost anything else, on the other, is that all numbers are of the same size (in bits). You know in advance how much space you need to put a bunch of them next to each other.

An example of a programming environment that does mainly “number crunching” is R. It comes with its own programming language (also known as R) but here is a snippet I found on their FAQ page:

Note that you need a FORTRAN compiler or perhaps f2c in addition to a C compiler to build R.

And, at least on my computer, LAPACK is a hard dependency.

I think the question here is how memory space allocated for the array is dynamically extended, when it fills up – in the Tau prolog case its JavaScript handing this internally – perhaps through copying the existing array into a larger array memory space.

Hash Array is a solution that aims to reduce this cost significantly …