Preallocating storage for assoc

Hello,

I’d like to try to use an assoc instead of the fact store, to store a graph structure. I am wondering, is it possible to preallocate more memory space for an assoc – so that memory allocation isn’t performed piece by piece during the construction of a larger graph.

thank you,

Dan

It’s not clear that you’ll gain anything. Here’s a fairly succinct description of how most memory allocators work:

A typical allocator implementation will first call the operating system to get huge block of memory, and then to satisfy your request it will give you a piece of that memory, this is known as suballocation . If it runs out of memory, it will get more from the operating system.

The allocator must keep track of both all the big blocks it got from the operating system and also all the small blocks it handed out to its clients. It also must accept blocks back from clients.

A typical suballocation algorithm keeps a list of returned blocks of each size called a freelist and always tries to satisfy a request from the freelist, only going to the main block if the freelist is empty. This particular implementation technique is extremely fast and quite efficient for average programs, though it has woeful fragmentation properties if request sizes are all over the place (which is not usual for most programs).

Modern allocators like GNU’s malloc implementation are complex, but have been built with many decades of experience and should be considered so good that it is very rare to need to write your own specialised suballocator.

Hi Peter,

Thank you for your comments - this is indeed very interesting;

Dan

The assoc data lives on the Prolog (global) stack. That is managed using malloc/realloc as a big chunk, which typically means it is a memory mapped region. The biggest price you pay is Prolog garbage collection.

Malloc() doesn’t always do a good job. Notable SWISH seems to suffer from poor reuse of memory. See https://swish.swi-prolog.org/example/stats.swinb, last chart. I’m still looking for tools that can tell me what happens …

1 Like

Hi Peter,

I am reading again your response.

In some industries preallocation is apparently not only about efficiency but also about safety, and perhaps security – i.e. preallocated memory is a require style.

Can one cause preallocation in swi prolog to use it in such an environment.

Dan

Just listened to optimization techniques for high frequency trading (youtube talk [1]).

Apparently, new and delete of memory can have costs such as system calls and even paging.

Better to preallocate memory and keep it in a pool of sort.

In stead of assoc lists, my zdd library uses an extendible vector for buckets using the swi builtin functor and term_hash. Recently I added as an experimental feature for reclaiming the used hash table by calling garbage_collect. The effect was surprisingly more than I expected, though I can not explain well why so good. For a certain graph path counting problem, macOS activity monitor showed almost 200GB memory, but the garbage collection version does less that 20GB ! As a prologer I like using assoc lists, but often has been disappointed in its slowness for big size problem seen in zdd programming.

re: NVM, and mini PCI

I got myself an FPGA card with a miniPCI – and got curious if something like this could somehow be connected to Prolog.

I guess the way to go – and it seems pretty complex – is via foreign predicates written in c/c++ that are written against the mini PCI interface …

Ideally, i want to have some logic implemented in the FPGA hardware which is then presented as a callable (deterministic?) predicate to Prolog.

Its all new to me, so i am just grasping in the dark …

Dan

I think mini PCI is memory mapped …

There is some “controller magic” happening – to make different PCI memory area types appear to the CPU as all the same type – but, access is, i think, via the data bus.

A page load is possibly done by DMA via P-Box and M-Box of the chip set, without the involvement of the CPU. The bottle neck of such an approach could be serving multiple cores. What if all of them have a page fault at once, and need a page load. Not sure whether the P-Boxes and M-Boxes can work in true pallel. So maybe the future isn’t that bright as a I thought.

But who knows what future chip set and memory designs will deliver. The cores are already in competition to normal memory access, and can only sustain performance through the faster memory caches. So a normal memory access is already a kind of page fault. So maybe nevertheless the trend is towards no difference between on board memory and peripherial memory.

See also, nice picture on page 9 (1.1 Introduction):

Xeon Processor Series Uncore Programming Guide
Intel Xeon Processor 7500 Series Block Diagram
https://www.manualslib.com/products/Intel-Xeon-7500-Series-10152130.html