Announcing plstat - statistics using prolog

damiazz94 · May 15, 2021, 2:47pm

Hi,
I’ve gathered some of the predicates related to statistics (mean, variance, correlation, skew etc) plus some utility I usually use in a package called plstat, maybe it can be useful to someone. They are all written in prolog. You can have a look at this link: GitHub - damianoazzolini/plstat: Statistics using prolog

If you have suggestions about anything (fixing code style, more predicates to add, ecc) or spot some bugs, please let me kow!

swi · May 15, 2021, 6:01pm

Looks very good, thanks for sharing it. In particular it is very useful that you provide examples in the documentation for each of the predicates.

I would suggest you use %! somepred(A,B) ... in the documentation instead of /* ... */, so that the documentation is nicely produced on a webpage by pldoc (just like the docs for swi-prolog).

ridgeworks · May 17, 2021, 9:52pm

I’m no statistician but it strikes me as it might be helpful to have a functional form of these predicates for use in arithmetic expressions. They’re certainly in the right format, i.e., return value is the last argument.

For example:

?- S is sum([1,24,2,3,-1]).
S = 29.

?- std_dev([1,2,4,6,7,8,9]) > 2.
true.

?- N=10, prod(seq(1,N,1))=:=factorial(N).
N = 10.

I’ve been looking at this issue for arithmetic operations on arrays, e.g., for solving the matrix form of a linear system of equations. But maybe it’s not so applicable to the statistics domain.

If you think it’s worth exploring, I’ve published a pack at https://github.com/ridgeworks/arithmetic_types which may help.

Boris · May 24, 2021, 1:00pm

I am not sure that this does what you think it does. My understanding is that with the current implementation, this will also materialize the list (using findall?). The source is here:

github.com

SWI-Prolog/swipl-devel/blob/4e808fb378bc64b36d63dbc8ac87688f401219b3/library/aggregate.pl

/*  Part of SWI-Prolog

    Author:        Jan Wielemaker
    E-mail:        J.Wielemaker@vu.nl
    WWW:           http://www.swi-prolog.org
    Copyright (c)  2008-2020, University of Amsterdam
                              VU University Amsterdam
                              CWI Amsterdam
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions
    are met:

    1. Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in
       the documentation and/or other materials provided with the

This file has been truncated. show original

Boris · May 24, 2021, 1:22pm

Well you are now not showing the same thing that you showed above. This will not materialize a list:

aggregate_all(sum(X), ..., Sum),
aggregate_all(count, ..., N),
Mean is Sum / N

but it will obviously iterate twice.

I was commenting on this formulation, again:

I thought that this will go here, and there I do see a findall.

Boris · May 24, 2021, 1:33pm

You wrote this after I wrote that either way, yes, I agree, it would be very nice to have something that:

goes over the solutions only once, and
has an obvious interface for plugging in your own aggregator.

It will look exactly like a fold, but for solutions instead of a list.

Boris · May 24, 2021, 1:46pm

Yes, I have tried out such things myself, and was also happy to see that it is indeed faster. I actually started reading about how to really calculate the mean, and went a bit too deep. I think I gave up somewhere around this point: ~~Kahan summation algorithm - Wikipedia~~ (EDIT no, it was something else that calculated the running mean as a side-effect of trying to find the standard deviation. I cannot find it any more, I decided to stop wasting time on it back then…) (One of the very few things they managed to teach me at university is a healthy fear of floating point math)

Long story short, if I actually needed to do statistics I would probably fall back to R. This is however completely orthogonal to the question of an aggregate_all that can be used as a foldl.

Topic		Replies	Views
Detailed tabling statistics Help!	5	501	May 29, 2019
Library(aggregate): Weird stuff! Help! how-to	3	2755	April 26, 2020
Tools for reduced semantics and structural inference? Help!	0	319	April 5, 2020
No avg(X) for aggregate_all? Help!	6	887	February 26, 2021
Calculate sum of marks Algorithm how-to	8	2788	January 16, 2022

Announcing plstat - statistics using prolog

Related topics