Having used the two modes more on my own examples, I think this is a mistake. If there are two modes, they should be readily distinguishable from the presentation. I’ll leave it to someone else to figure out how and when to indicate questionable data; I really have run out of ideas.
And don’t forget the API (library(prolog_profile)), where "warnings: and “indications” might be problematical. I also think warnings, indications and words like “as long is there is no indirect recursion” aren’t very user friendly, particularly for new users, but maybe they aren’t the main focus of a profiling tool.
Having said that, I’ll add a third possibility to the now renamed ports option - classic, which will be the default and will result in numbers equivalent to the current presentation. (But I take no responsibility for explaining them.) For the record I am unlikely to ever use classic mode; for the same amount of memory ports(false) provides equivalent useful information (IMO) with slightly lower overhead. And you did say in a previous post:
Moving on:
I should have said sampling, not collecting, as in recording the ticks into the call tree. BTW I think “distortion” is also not a very user friendly term. It suggests, at least to me, that a higher number means the timing and port count data is less accurate. In fact, the only thing affecting accuracy is the number of “useful” samples; a better statistic to display might be that number, e.g., Total samples = sT, Useful samples = sU in t sec;... .
Casual readers may want to skip the rest of this.
Lets go back to basics. We have a Prolog engine capable of 10 million calls (inferences) per second, give or take. We’re trying to construct of model of what that engine is doing using a nominal sample interval of 5 ms. (minimum) so within a nominal interval about 50,000 calls are made on average. That’s roughly equivalent of trying to reconstruct a multi-mHz. signal with a sampling rate of 200 Hz. Clearly a few samples isn’t going to cut it; depending on code complexity, it’s going to take hundreds if not thousands of samples, i.e. typically on the order of 5 secs. worth, minimum.
Now the current handler is also trying to compensate for “jitter” in the sampling period. But what does that actually buy you when sampling a Mhz signal with a 200 Hz. sampling rate. I would claim nothing - sampling at a slightly different spot in time might cause a different node’s tick count to be incremented but it’s all going to average out given that a sufficient number of samples are collected.
Making it even more questionable is that the jitter is calculated in milliseconds, so about 5000 calls are made (on average) within the rounding error of the calculation.
What really counts is how many samples are collected assuming you have some assurance that the samples are relatively evenly distributed over the total profiling time. The latter seems to guaranteed(?) by the OS which says timer events will not occur before the set time period is reached, so there’s no real possibility of the samples being concentrated in a small region of the total time.
In terms of analysis the units of a tick are irrelevant as long as all ticks are equal. The other input requirement is the elapsed time of the profiling run, which can be based on either the thread clock or the wall clock. Time spent in each node is then calculated by multiplying the total profiling time by the ratio of the nodes tick count to the total tick count.
So if the tick units can be anything, why not make the tick equal to a sample? And the answer, after actually trying it, seems to be no reason at all. AFAICT short circuiting the jitter calculation and treating each sample as a single tick had no significant affect on the profiler data. And things just a whole lot simpler for the signal handler. In fact the handler doesn’t have to be aware of time at all. Now I don’t think I’ve made any bad assumptions in coming to this conclusion, but maybe I’ve missed something.
But I was surprised to see that this had no noticeable affect on the “lost sample” problem I was observing. However, changing the time option continued to have an affect on the number of samples collected per second. I conclude that the difference is not due to “lost” samples, but due to the rate at which the OS triggers the signal, i.e., out of our control. So my hypothesis is that, on MacOS at least, the thread timer interval does not include kernel time (including the overhead incurred by the timer itself in the kernel), but the thread clock does. This seems to validated by experimenting with sampling interval, e.g., from 1 ms. to 10 ms. In any case I don’t think it matters for the big picture since the SIGPROF timer is now only used to drive the accumulation of sample data.
With minimal timer overhead in the app itself, it’s interesting to consider the possibility of supporting an option for the timer interval (currently fixed at 5 ms.). Relatively modern hardware and OS’s can support considerably smaller (and obviously, larger) intervals, but I’m not sure when this flexibility would ever be useful.