Disabling tcmalloc: warning for data science and big data apps

Interesting. I’ve added a Prolog flag malloc that is set to tcmalloc or ptmalloc if either is detected (and left unset if we do not know). Also added trim_heap/0 (named after trim_stacks/0) to release memory to the OS. trim_heap/0 is, like trim_stacks/0, called by the interactive toplevel. I’m not sure about calling it automatically at e.g., atom or clause garbage collection. What is the impact on performance for large heaps in multi-threaded applications? Is it even desirable to give back memory to the OS in all cases?

The flag and additional predicates are now based on finding the functions at runtime as well as checking the amount of allocated memory and, if this is really low, assume this is not really the active allocator.

I hope this settles the issue.

If you change the PPA version of swipl to use tcmalloc, wouldn’t that require adding libtcmalloc-minimal4 as a pre-req package (on my system, it shows as libtcmalloc-minimal4:amd64), and would this potentially break unrelated things?

Yes, it sets up one more dependency. Well, there are already so many dependencies that one more or less shouldn’t bother us. The Linux package dependency handling is pretty good :slight_smile:

It will only break things that do not work together with tcmalloc. If I recall correctly, RocksDB is one of these. It uses jemalloc and the two don’t seem to like each other :slight_smile: Eventually we should have the universal allocator that is better than any other one in every aspect … I fear we won’t see that shortly (if ever). For 99% of the applications this is all pretty irrelevant, but for some it can have a big impact on memory usage as well as performance. Allocation speed is not too much of a problem as Prolog manages its most volatile data on the stacks. False sharing may slow down multi-threaded applications considerably.

Great, thanks for adding this.

There are too many variables to consider, for some types of scenarios it is better to release automatically, whereas in other cases it is better to do it manually.

Solution; I think the best solution is to provide a boolean prolog flag: trim_memory_on_gc and the user can set it according to their needs, this will solve the problem.

EDIT: Just to give some context, in the embedded world it is extremely important to release memory back to the OS. This would apply to using raspberry PIs and other SBCs with one or several swi-prolog server-like apps running on them.

Possibly. A sensible alternative could be to make gc_loop/0 from boot/gc.pl hookable. That would allow alternative libraries to implement more advanced strategies. I did a similar trick for the HTTP library to decide on extending the number of HTTP worker threads. Such an approach would get even better if we wakeup the gc thread on more memory relevant events. Think of threads that ask for a lot of stack space, high demands on table space, etc.

There are lot of opportunities for dynamic smart planning of resource usage. Most of the required information is already available from Prolog and most of the possible actions can already be done. Maybe allow hooking process/1 in gc.pl is for now good enough?

I think this is a great idea :grinning:, much better than a prolog flag. We can simply call trim_heap/0 within the hook for simple cases, and it opens the opportunity for doing much more advanced strategies.

The latest source changes appear to have broken something:

$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 8.3.26)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

ERROR: Unknown procedure: '$toplevel':trim_heap/0
ERROR: In:
ERROR:    [5] trim_heap
   Exception: (5) trim_heap ? 

I’ve rolled back to e9a3fac3da8d74e3624a1401002ea073d71a4b36.

I think your installation is somehow mixed up and old and new code is combined somewhere. First of all because it claims to be version 8.3.26, while it should print 8.3.26-40-g5772c8039 for the latest version. It should also have this pattern except for the exact releases. Second trim_heap/0 is defined in the core, so there cannot be a version issue. Check that your git tree is fully up to date. If you have no local modifications do

git fetch
git reset --hard origin/master
git status

and make sure all submodules except the one you are working on is up-to-date. Disregarding local modifications

git submodule update --init
git submodule foreach git reset --hard

And possibly get rid of old stuff. Be careful, this destroys any file you may have around in the source tree.

git clean -xfd
git submodule foreach git clean -xfd

Finally, messing with PATH, LD_LIBRARY_PATH (or its Mac alternative), LD_PRELOAD and some more environment variables may cause this type of issues. There is a check in the cmake config run that checks for some of these pitfalls, but that is most likely incomplete.

1 Like