I’m working on a dashboard to provide as much as possible insight about the running Prolog program. This started as a toolkit to get insight in time and memory usage for a tabled program developed at CMU that I ported to SWI-prolog (branch
swi). The work in progress is in webstat.
The current challenge is the execution profiler. The low-level stuff is older then SWI-Prolog’s recorded history, i.e. before 1992. The details have changed a couple of times, but currently reflects the
gprof way of collecting data with some extensions. The current graphics view is also based on
The problem is that this way of collecting the execution context deals very poorly with mutual recursion and meta-calling predicates. The profiler has a hack that removes (some) meta-calling predicates from the graph, adding their cost to the predicate that calls them (see noprofile/1). Analyzing the state I discovered that the way mutual recursion is handled is broken. Since we have delimited continuations things got a lot worse. For short, it is time to look for something new.
Starting at Google Scolar I quickly discovered there is a whole lot of relevant literature. So far I selected
- “Comparative performance analysis of calling context profilingdata
structures” Prita Yadav and Paramvir Singh. DOI: 10.1007/s13198-016-0447-x
- “Exploiting hardware performance counters with flow and context
sensitive profiling” Glenn Ammons, Thomas Ball, James R. Larus.
- “Incremental call-path profiling” by Andrew R. Bernat and Barton P. Miller
For Prolog we have an additional challenge to deal with non-determinism.
Currently I hook into the VM, which tells me
- When a predicate is called. That is reliable (and the only simple and reliable hook). It creates a node in the dynamic call graph and stores a reference to this node in the environment frame it creates for the call.
- When we get an exit. We do know whether this is deterministic or not, but I currently do not use that. We know which call graph node exited (from the environment). Their may be nodes missing due to last-call optimization. Currently I walk up the dynamic call graph until I find the target node and exit all the intermediate nodes. I fear this is dubious under certain abstractions of the call graph.
- On a failure we can find the frame (and thus call graph node) where we resume. I tried to translated that into redo/fail counts, but this seems harder than I expected.
Note that we want something at the VM level that doesn’t interfere with normal optimized execution. If we do interfere with that counting becomes much simpler but large programs can no longer be handled and timing will be very poor. We also want to be able to enable/disable recording on the fly, such that we can record a running program for a while.
Timing is currently based on sampling. We can also consider using performance timers at the various ports. I don’t know how portable that is and how much impact it has. Would be great to have.
Does anyone has good leads or ideas how to handle this? Ideally I’d like to end up with something similar to kcachegrind with additional relevant annotations for Prolog (determinism and tabling come to mind).
If anyone is looking for a project to contribute, please step in. There is a lot to do to find and specify the required accounting data and algorithms. Implementing in
src/pl-prof.c is pretty much isolated from the rest of the system. Next there is the challenge to write (Prolog) routines to nicely present the results in the web interface.