Announcing plstat - statistics using prolog

Hi,
I’ve gathered some of the predicates related to statistics (mean, variance, correlation, skew etc) plus some utility I usually use in a package called plstat, maybe it can be useful to someone. They are all written in prolog. You can have a look at this link: GitHub - damianoazzolini/plstat: Statistics using prolog

If you have suggestions about anything (fixing code style, more predicates to add, ecc) or spot some bugs, please let me kow!

3 Likes

Looks very good, thanks for sharing it. In particular it is very useful that you provide examples in the documentation for each of the predicates.

I would suggest you use %! somepred(A,B) ... in the documentation instead of /* ... */, so that the documentation is nicely produced on a webpage by pldoc (just like the docs for swi-prolog).

1 Like

I’m no statistician but it strikes me as it might be helpful to have a functional form of these predicates for use in arithmetic expressions. They’re certainly in the right format, i.e., return value is the last argument.

For example:

?- S is sum([1,24,2,3,-1]).
S = 29.

?- std_dev([1,2,4,6,7,8,9]) > 2.
true.

?- N=10, prod(seq(1,N,1))=:=factorial(N).
N = 10.

I’ve been looking at this issue for arithmetic operations on arrays, e.g., for solving the matrix form of a linear system of equations. But maybe it’s not so applicable to the statistics domain.

If you think it’s worth exploring, I’ve published a pack at https://github.com/ridgeworks/arithmetic_types which may help.

I am not sure that this does what you think it does. My understanding is that with the current implementation, this will also materialize the list (using findall?). The source is here:

Well you are now not showing the same thing that you showed above. This will not materialize a list:

aggregate_all(sum(X), ..., Sum),
aggregate_all(count, ..., N),
Mean is Sum / N

but it will obviously iterate twice.

I was commenting on this formulation, again:

I thought that this will go here, and there I do see a findall.

You wrote this after I wrote that :wink: either way, yes, I agree, it would be very nice to have something that:

  • goes over the solutions only once, and
  • has an obvious interface for plugging in your own aggregator.

It will look exactly like a fold, but for solutions instead of a list.

1 Like

Yes, I have tried out such things myself, and was also happy to see that it is indeed faster. I actually started reading about how to really calculate the mean, and went a bit too deep. I think I gave up somewhere around this point: Kahan summation algorithm - Wikipedia (EDIT no, it was something else that calculated the running mean as a side-effect of trying to find the standard deviation. I cannot find it any more, I decided to stop wasting time on it back then…) (One of the very few things they managed to teach me at university is a healthy fear of floating point math)

Long story short, if I actually needed to do statistics I would probably fall back to R. This is however completely orthogonal to the question of an aggregate_all that can be used as a foldl.