Is there a linter for prolog programs?

It would be useful to have a tool that does static checks in the prolog source filee like “ruff check” works for python.

For instance “type checking” using the info from comments like

%!      length(+List:list, -Length:int) is det.
1 Like

Logtalk has a Prolog linter

1 Like

See "mavis" pack for SWI-Prolog

In addition, there is library(check), providing check/0 and a couple of predicates to check specific issues. E.g, list_undefined/0, which is used by make/0.

There is the compiler that spots some issues such as refining singleton detection to branches (;/2) and spots various issues concerning meaningless calls to built-ins.
For example using ==/2 where one side is known to be a fresh variable (always fails).

Finally, there is the built-in editor PceEmacs, the sweep module for GNU Emacs, SWI-Tinker (the WASM Prolog shell) and SWISH that all expose the syntax highlighting provided by library(prolog_colour). It is not a linter, but it highlights a lot of possible issues with your code.

But no, there is no comprehensive “linter”. Would be great to have!

3 Likes

The direction that for example GCC has taken is to gradually add warnings to the compiler for things that started as linter warnings. But the compiler only looks at the code of course.

My gut feeling is that annotation in “non-code” is a work-around that shouldn’t be necessary in a language like Prolog. But it seems popular.

Eventually, the mode declaration should probably be replaced by formal directives. We have had several discussions on standardizing the mode declarations in the PIP working group. This turns out far harder than expected. To give an idea, classically the mode - means “must be uninstantiated at call and will be instantiated on success”. But, this is practically meaningless. The stream argument of open/3 is one of the few cases where this cleanly applies. Most predicates unify output variables with some value they derives from the system state and/or input variables. You’d like to capture that notion in your documentation. None of the standard modes typically capture this though. Think about read/1. Surely it is valid to call with an instantiated argument. In addition, it may not bind the variable if the term read is a variable. So, the (classical) mode must be read(?), leaving the reader of your documentation completely in the dark on what read/1 does with its argument.

SICStus and SWI-Prolog redefine - to mean “output argument”. That does lead to meaningful documentation. For validators, being static analysis or runtime verification, this is pretty much meaningless.

Joachim Schimpf proposed to use read(=Term) to capture the notion of an “output” argument. This annotation merely claims that the semantics of read(Term) is the same as read(NewVar), Term = NewVar. In SWI-Prolog’s documentation, read/1 should also have been declared det, meaning it has exactly one solution. This clearly is not true though as read(a) can fail. It is true after rewriting to read(NewVar), a = NewVar though.

The = mode fixes confusion about -, but does not resolve the issue that documentation and analysis/verification seem to require different (mode) annotations. If that is indeed the case, having both a formal declaration that is useful for verification and a documentation mode in the (structured) comment might not be too bad …

When we started this discussion I also thought it was simple(r) :frowning:

1 Like

In statically typed programming languages, we often have a kind of struct or class, along with associated functions that take objects of that type as parameters or return them. Based on the type information, we can then infer that the return value of ‘obj f()’ can be passed to ‘g(obj)’. Of course, this is not always straightforward, especially with simple data types, as there may be additional rules that need to be satisfied for such an operation to be allowed. But when we have a class, we also define member functions and a constructor, and it’s quite clear that the ‘this’ pointer or ‘self’ is actually the first parameter of the method and expects this specific object. And the same logic generally applies to other classes, which are passed around elsewhere (not necessarily as parameter 1).

Some Prolog typing packages attempt to adopt this concept in a similar way, but in my opinion, this fails because a variable can have many states, somewhere between ‘var’ and ‘ground’. A type check based on the variable might therefore become too complicated and is unfortunately performed at runtime. It’s also worth questioning to what extent a type-checking package restricts the programmer. The ‘type_check’ package, which implements the Hindley-Milner type system (also present in languages like OCaml or Haskell, and similarly in Coq and Lean), is a very clean implementation of type checking as such, but it could restrict the programmer too much, as certain parameter constructions would no longer be permitted.

An interesting aspect of the ‘type_check’ package is that it performs static type checking, which does not burden the program with runtime type checks.

In my opinion, what could also work is implementing the programmer’s intent regarding typing.

The example above: if ‘obj f()’ and ‘g(obj)’, then ‘g(f())’ is allowed. In Prolog, this might look like: ‘f(X), g(X)’. We might not even need to look closely at the types. We only need to ensure that when ‘f(X)’ is called, then ‘g(X)’ is also permitted. Things get complicated with goals—how do I know if an expression is allowed as a goal? Here, I can help myself by building a constructor predicate that has a named type (kind of like a class name) and then creating predicates that use this name as a parameter. A type definition could look like this: ‘schema(g(obj)). schema(f(obj)).’ That’s how I currently do it in my experiments. I define:

schema(current_input(inputstream)).
schema(read_string(inputstream, delimiter_chars, padchars, used_delimiter_code, read_string)).

schema(forall(goal_cond, goal_action)).
schema(findall(template_fa, goal_fa, bag_fa)).

schema(call(goal)).
schema(call(partial_goal, partial_goal_parameter_1)).

And then a meta-interpreter checks the code.

‘Goal’ might still be too general; a type hierarchy could be built, and arity could also be incorporated, but that’s the basic idea. I am working on something similar compared to what I am proposing here—I try to use knowledge about types to enable shorter expressions for programming. Put simply, if I know that ‘f’ and ‘g’ expect ‘X’ of type ‘obj’, then I only need to write ‘f/1, g/1’. But that’s just a side note.

Perhaps this impulse could steer Prolog typing in the right direction.

yes… this is still not clear to me. This supposedly can never be what you meant to type:

1 is X rem 2

but it is used on purpose and it “works” often enough, even though it feels like a compiler warning or error? Especially if there is a floating point number on the left-hand side?

But how is this different from

process_wait(PID, exit(0))

Is it a good idea to “check exit status” like that? Are those two examples the same or different?

PS: The two examples are different, for a reason I didn’t notice first: process_wait(PID, exit(0)), just like read(a), both have a side-effect that cannot be undone.

I rather find the proliferation of symbols a red herring. This works fine: https://github.com/jp-diegidio/Nan.System.Sources-Prolog/blob/396e54dbb911f2f7a4e8a1324f84c6c05db67bd8/Code/sources_docs.txt Indeed, I’ve never seen the need for ?? or = or other contraptions.

In particular, we don’t change the determinism mode because the predicate is steadfast on an output argument: e.g. we declare read(-Term) is det if it is not steadfast, and read(?Term) is det if it is: with the semantics in the latter case that det becomes (i.e. behaves as) semidet, or multi becomes nondet, if the caller passes an instantiated or partially instantiated Term. Namely, we do not need extra symbols, just clean and clear semantics: the usual semantics…

[MOD: The “system” should stop rewriting my links…]

I think this goal is perfectly fine, though good style would ask for using 1 =:= X rem 2. Using is/2 and if the left side is a float, I think the ideal compiler should say “goal always fails”.

It is (IMO) not different. If your goal is to test whether some output argument unifies to some value you can put that value at the place of the argument … unless the predicate is not stead-fast. Public predicates should IMO always be steadfast. As a result, that also holds for built-ins and library predicates. I’m ok with local “helper” predicates not being steadfast.

Fine. I see from your link

| *Instantiation* | *Direction* | *Argument requirements* |
| =|+|= | Input argument  | Must satisfy _type_ at call time.  May be instantiated further. |
| =|-|= | Output argument | Must be unbound at call time.  Will satisfy _type_ on exit. |
| =|?|= | Output argument | As =|-|= but "steadfast".  (With =nonvar= argument, =det= becomes =semidet=, etc.) |
| =|:|= | Input argument  | As =|+|= but treated as a meta-argument. |
| =|@|= | Input argument  | As =|+|= but not instantiated further. |
| =|!|= | Input argument  | As =|+|= but may be modified destructively. |

First of all, ? is typically not considered an “output” argument. Consider append(?,?,?). Typically ? means “input or output” or “pure” (as in purely logical, i.e., valid in any mode). In other words, it means nothing :slight_smile: That is why = was proposed.

Otherwise I mostly agree. As said though, that makes - pretty much useless as non-steadfast predicates should not exist except for helpers that you normally do not document. Only for predicates such as open/3 it makes sense. As you can not create anything (except for a variable) that will unify with the stream argument, having this argument instantiated will always call open/3 to fail.

+ as input argument is fine, but the relation between instantiation and typing is complicated. We must at least accept a variable as valid input for term or any. If we do so though, it means we do not say anything observable about the argument. Note that the classical meaning is nonvar. The classical meaning does not combine with types though.

: can, unfortunately also be an output argument. Consider clause/2, current_predicate/1, etc.

Not even I am totally satisfied with the phrasing there, although it is an output argument, or the semantics of the predicate is simply not well-defined!

Indeed, think that append(?,?,?) is rather short-hand for append(+,+,?) and all the other combinations: and then again, the phrasing in the docs may be better, you still do not need any additional symbols/modes…

P.S. And short-hands are often just mis-specifications: if you write append(?,?,?), that does/should mean a predicate whose fundamental mode is append(-,-,-), i.e. that enumerates its domain, just steadfast on all arguments. – An exhaustive enumeration that of course append cannot do (is it also going to enumerate all possible Prolog terms for each element?), so that really is the wrong specification. Anyway, that’s just an example.

I disagree:

(ins)?- append( X, Y, Z).
X = [],
Y = Z ;
X = [_A],
Z = [_A|Y] ;
X = [_A, _B],
Z = [_A, _B|Y] ;
X = [_A, _B, _C],
Z = [_A, _B, _C|Y] ;
X = [_A, _B, _C, _D],
Z = [_A, _B, _C, _D|Y] .
...

The plus/3 predicate fails to enumerate all combinations. Maybe we need a better mode declaration. A mode declaration could be by itself an enumeration. Consider:

:- mode( +, +, -); mode( -, +, +); mode( +, -, +).
plus( A; B, C) :- ...

And I think the mode declaration can also be derived for a term like this:

TERM = ( plus( M, N, O), plus( N, O, P))

Then I would know how to fill the variables before calling the term.

The only problem with that is the number of mode combinations which can be too much after a while.

But we can look for shorter definitions in the clpb package there is the card operator which seems to be able to shorten this statement.

So when I say for instance that the number of + in plus parameters has to be 2 and the number of - parameters has to be 1 that could be an approach for a shorter mode definition.

Whether you agree or not, it is a fact that it won’t enumerate all possible lists: those are open lists of distinct variables, one for each possible list length, nothing less nothing more: and of course append does not only handle that, nor a plain append(?,?,?) captures any of it…

That’s not called an “enumeration”, anyway we do already have that at least in code docs: multiple modes for predicates are not an exception, in fact in most cases are the correct way to go, very few predicates are actually “relational in all arguments”…

I know some people on Stack Overflow who would disagree. Also in real life we see quite often “pure” predicates. The simplest being a set of facts. I think mode declarations fit these very poorly and possibly we should have a dedicated annotation that says that the predicate is “pure” (or “relational in all arguments”).

Also on non-pure predicates the number of modes can be pretty high. Think sub_atom/5. Actually, this predicate is “pure” except for the first argument that must be ground. There are no issues with partially instantiated arguments and we (thus) have 16 modes …

How many predicates representing set of facts do you have in the SWI Prolog manual?

I think mode declarations fit these very poorly

Actually, those are the easiest: the simple enumerations.

I think many people are just very confused… Anyway, my 2c.

Can someone please explain what a “steadfast” predicate is? Thanks.

In fact, the term more applies to arguments. Below, the predicate p/1 is steadfast for its argument if the two clauses below always produce the same answers (i.e., regardless of the instantiation of X).

t(X) :- p(X).
t(X) :- p(Y), X = Y.

A typical example of non-steadfast code is below. I.e, calling max(3,5,3) succeeds, while calling max(3,5.X), X = 3 fails.

max(X,Y,X) :- X >= Y, !.
max(_,Y,Y).
1 Like

Thanks, but I’m very confused now. The non-steadfast max/3 doesn’t work like you say and I’m not sure what’s the right instantiation to show what you meant to show:

?- [user].
|: max(X,Y,X) :- X >= Y, !.
|: max(_,Y,Y).
|: 
% user://1 compiled 0.00 sec, 2 clauses
true.

?- max(3,5,3).
false.

?- max(3,5,X), X = 3.
false.

?- listing(max/3).
max(A, B, A) :-
    A>=B,
    !.
max(_, A, A).

true.

@jan ‘s example was not quite right. A correct example is the first definition of integer_integer_maximum at https://stackoverflow.com/questions/39191184/steadfastness-definition-and-its-relation-to-logical-purity-and-termination

The bug with integer_integer_maximum is that it prematurely binds the 3rd argument:

integer_integer_maximum(X, Y, Y) :-
    Y >= X,
    !.
integer_integer_maximum(X, _, X).

… so this is wrong:

?- M = 0, integer_integer_maximum(0, 1, M).
M = 0. % wrong!

… because integer_integer_maximum(0, 1, 0) cannot unify with integer_integer_maximum(X, Y, Y), so its cut does not get applied.

To correct the code, the 3rd argument should be another variable which only gets bound after the cut, i.e.:

integer_integer_maximum_correct(X, Y, M) :-
    Y >= X,
    !,
    M = Y.
integer_integer_maximum_correct(X, _, X).

However - for such code, the alternative style used in the built-in max/3 is simpler and easier:

?- listing(max/3).
max(X, Y, Max) :-
    (   X@<Y
    ->  Max=Y
    ;   Max=X
    ).
1 Like

Thanks @brebs . The code is still wrong, but the given query didn’t show it. This does

?- max(5,3,3).
true.

In general, you often loose steadfastness when unifying “output” arguments before a cut. That is one of the reasons to have => rules. This would be the literal translation for my original program.

max(X,Y,X), X >= Y => true.
max(_,Y,Y) => true.

Now, ?- max(5,3,3). is still true, so it seems we didn’t buy much. But,

?- max(5,3,X).
ERROR: No rule matches max(5,3,_11658)
ERROR: In:
ERROR:   [14] max(5,3,_11706)
ERROR:   [13] toplevel_call(user:user: ...) at /home/jan/src/swipl-devel/build.pgo/home/boot/toplevel.pl:1520
   Exception: (14) max(5, 3, _2062) ?

… the normal use case is now an error because => rules disallow the head unification to instantiate variables. So, it is obvious the predicate is broken. To fix it we need a fresh variable for the output argument, ending up with

max(X,Y,M), X >= Y => M = X.
max(_,Y,M) => M = Y.
1 Like