Locality of Reference linked lists v.s dynamic (resizeable) vectors

grossdan · May 24, 2021, 3:28am

Hello,

Thinking about the advantages of linked lists (e.g. cons / prolog lists) vs. array representations (e.g. args), and related copying overhead of arrays when they need to be resized, I noticed the following article that offers some benchmarks.

It seems that locality of reference – i.e. the optimal use of memory cache vs. pointer chasing in linked-lists which often misses caches is a key reason of reduced performance – also, per a referenced lecture by Bjarne Stroustrup [1])

I also noticed early papers on Hashed Arrays [2] and unrolled lists [3] – i find the Hashed Array in particular interesting, but unrolled lists also seem to have much potential.

I wonder how such data structures considerations compare to how Prolog stores and accesses list and graph structures stored as dynamic facts or as terms during predicate processing.

Dan

[1] Day 1 Keynote - Bjarne Stroustrup: C++11 Style | GoingNative 2012 | Channel 9

[2] https://www.drdobbs.com/database/algorithm-alley/184409965?pgno=5
[3] Unrolled linked list - Wikipedia

peter.ludemann · May 24, 2021, 5:45am

CDR-coding is a 3rd possibility.

Boris · May 24, 2021, 6:05am

I find it interesting that the article starts with a warning sign (as in, a picture), followed by a lengthy explanation why the Number crunching part of the title is important. The short 5-paragraph exposition ends with (and here I have kept the author’s text formatting):

I hope with this Mea Culpa you can see beyond any initial confusion which is all my fault and see to the core issue: the cons of linked-list for a set of number crunching problems .

grossdan · May 24, 2021, 7:04am

I think its indeed crucial to keep the access patterns in mind – which i guess is the meaning of “number crunching” … in addition he considers the expansion pattern of memory usage and size of each (passive) data structure and the size of memory cache …

Dan

Boris · May 24, 2021, 7:56am

Not sure I understand what you mean here. Again, from the Intro of the article that you linked:

Number crunching is the term used here for describing the crunching, sorting, inserting, removing and simply juggling of raw numbers or POD data.

One subtle difference between “raw numbers” and plain old data structures, on the one hand, and almost anything else, on the other, is that all numbers are of the same size (in bits). You know in advance how much space you need to put a bunch of them next to each other.

An example of a programming environment that does mainly “number crunching” is R. It comes with its own programming language (also known as R) but here is a snippet I found on their FAQ page:

Note that you need a FORTRAN compiler or perhaps f2c in addition to a C compiler to build R.

And, at least on my computer, LAPACK is a hard dependency.

grossdan · May 24, 2021, 10:42am

I think the question here is how memory space allocated for the array is dynamically extended, when it fills up – in the Tau prolog case its JavaScript handing this internally – perhaps through copying the existing array into a larger array memory space.

Hash Array is a solution that aims to reduce this cost significantly …

Dan

Topic		Replies	Views
Linking from clause store to terms on the stack Discussion	48	1533	April 29, 2021
"copy constructors" -- when are terms copied and when linked Help!	15	1064	January 19, 2020
Update arg in the dynamic fact base Request For Comments	15	2563	April 6, 2021
Performance cost of dictionaries? General	14	297	September 18, 2024
Clause lookup with Reference vs term lookup w/ first argument Help!	5	347	November 25, 2019

Locality of Reference linked lists v.s dynamic (resizeable) vectors

Related topics