Hi,
I’ve gathered some of the predicates related to statistics (mean, variance, correlation, skew etc) plus some utility I usually use in a package called plstat, maybe it can be useful to someone. They are all written in prolog. You can have a look at this link: GitHub - damianoazzolini/plstat: Statistics using prolog
If you have suggestions about anything (fixing code style, more predicates to add, ecc) or spot some bugs, please let me kow!
Looks very good, thanks for sharing it. In particular it is very useful that you provide examples in the documentation for each of the predicates.
I would suggest you use %! somepred(A,B) ... in the documentation instead of /* ... */, so that the documentation is nicely produced on a webpage by pldoc (just like the docs for swi-prolog).
I’m no statistician but it strikes me as it might be helpful to have a functional form of these predicates for use in arithmetic expressions. They’re certainly in the right format, i.e., return value is the last argument.
For example:
?- S is sum([1,24,2,3,-1]).
S = 29.
?- std_dev([1,2,4,6,7,8,9]) > 2.
true.
?- N=10, prod(seq(1,N,1))=:=factorial(N).
N = 10.
I’ve been looking at this issue for arithmetic operations on arrays, e.g., for solving the matrix form of a linear system of equations. But maybe it’s not so applicable to the statistics domain.
I am not sure that this does what you think it does. My understanding is that with the current implementation, this will also materialize the list (using findall?). The source is here:
Yes, I have tried out such things myself, and was also happy to see that it is indeed faster. I actually started reading about how to really calculate the mean, and went a bit too deep. I think I gave up somewhere around this point: Kahan summation algorithm - Wikipedia (EDIT no, it was something else that calculated the running mean as a side-effect of trying to find the standard deviation. I cannot find it any more, I decided to stop wasting time on it back then…) (One of the very few things they managed to teach me at university is a healthy fear of floating point math)
Long story short, if I actually needed to do statistics I would probably fall back to R. This is however completely orthogonal to the question of an aggregate_all that can be used as a foldl.