It would be useful to have a tool that does static checks in the prolog source filee like “ruff check” works for python.
For instance “type checking” using the info from comments like
%! length(+List:list, -Length:int) is det.
It would be useful to have a tool that does static checks in the prolog source filee like “ruff check” works for python.
For instance “type checking” using the info from comments like
%! length(+List:list, -Length:int) is det.
Logtalk has a Prolog linter
See "mavis" pack for SWI-Prolog
In addition, there is library(check), providing check/0 and a couple of predicates to check specific issues. E.g, list_undefined/0, which is used by make/0.
There is the compiler that spots some issues such as refining singleton detection to branches (;/2) and spots various issues concerning meaningless calls to built-ins.
For example using ==/2 where one side is known to be a fresh variable (always fails).
Finally, there is the built-in editor PceEmacs, the sweep module for GNU Emacs, SWI-Tinker (the WASM Prolog shell) and SWISH that all expose the syntax highlighting provided by library(prolog_colour). It is not a linter, but it highlights a lot of possible issues with your code.
But no, there is no comprehensive “linter”. Would be great to have!
The direction that for example GCC has taken is to gradually add warnings to the compiler for things that started as linter warnings. But the compiler only looks at the code of course.
My gut feeling is that annotation in “non-code” is a work-around that shouldn’t be necessary in a language like Prolog. But it seems popular.
Eventually, the mode declaration should probably be replaced by formal directives. We have had several discussions on standardizing the mode declarations in the PIP working group. This turns out far harder than expected. To give an idea, classically the mode - means “must be uninstantiated at call and will be instantiated on success”. But, this is practically meaningless. The stream argument of open/3 is one of the few cases where this cleanly applies. Most predicates unify output variables with some value they derives from the system state and/or input variables. You’d like to capture that notion in your documentation. None of the standard modes typically capture this though. Think about read/1. Surely it is valid to call with an instantiated argument. In addition, it may not bind the variable if the term read is a variable. So, the (classical) mode must be read(?), leaving the reader of your documentation completely in the dark on what read/1 does with its argument.
SICStus and SWI-Prolog redefine - to mean “output argument”. That does lead to meaningful documentation. For validators, being static analysis or runtime verification, this is pretty much meaningless.
Joachim Schimpf proposed to use read(=Term) to capture the notion of an “output” argument. This annotation merely claims that the semantics of read(Term) is the same as read(NewVar), Term = NewVar. In SWI-Prolog’s documentation, read/1 should also have been declared det, meaning it has exactly one solution. This clearly is not true though as read(a) can fail. It is true after rewriting to read(NewVar), a = NewVar though.
The = mode fixes confusion about -, but does not resolve the issue that documentation and analysis/verification seem to require different (mode) annotations. If that is indeed the case, having both a formal declaration that is useful for verification and a documentation mode in the (structured) comment might not be too bad …
When we started this discussion I also thought it was simple(r) ![]()
In statically typed programming languages, we often have a kind of struct or class, along with associated functions that take objects of that type as parameters or return them. Based on the type information, we can then infer that the return value of ‘obj f()’ can be passed to ‘g(obj)’. Of course, this is not always straightforward, especially with simple data types, as there may be additional rules that need to be satisfied for such an operation to be allowed. But when we have a class, we also define member functions and a constructor, and it’s quite clear that the ‘this’ pointer or ‘self’ is actually the first parameter of the method and expects this specific object. And the same logic generally applies to other classes, which are passed around elsewhere (not necessarily as parameter 1).
Some Prolog typing packages attempt to adopt this concept in a similar way, but in my opinion, this fails because a variable can have many states, somewhere between ‘var’ and ‘ground’. A type check based on the variable might therefore become too complicated and is unfortunately performed at runtime. It’s also worth questioning to what extent a type-checking package restricts the programmer. The ‘type_check’ package, which implements the Hindley-Milner type system (also present in languages like OCaml or Haskell, and similarly in Coq and Lean), is a very clean implementation of type checking as such, but it could restrict the programmer too much, as certain parameter constructions would no longer be permitted.
An interesting aspect of the ‘type_check’ package is that it performs static type checking, which does not burden the program with runtime type checks.
In my opinion, what could also work is implementing the programmer’s intent regarding typing.
The example above: if ‘obj f()’ and ‘g(obj)’, then ‘g(f())’ is allowed. In Prolog, this might look like: ‘f(X), g(X)’. We might not even need to look closely at the types. We only need to ensure that when ‘f(X)’ is called, then ‘g(X)’ is also permitted. Things get complicated with goals—how do I know if an expression is allowed as a goal? Here, I can help myself by building a constructor predicate that has a named type (kind of like a class name) and then creating predicates that use this name as a parameter. A type definition could look like this: ‘schema(g(obj)). schema(f(obj)).’ That’s how I currently do it in my experiments. I define:
schema(current_input(inputstream)).
schema(read_string(inputstream, delimiter_chars, padchars, used_delimiter_code, read_string)).
schema(forall(goal_cond, goal_action)).
schema(findall(template_fa, goal_fa, bag_fa)).
schema(call(goal)).
schema(call(partial_goal, partial_goal_parameter_1)).
And then a meta-interpreter checks the code.
‘Goal’ might still be too general; a type hierarchy could be built, and arity could also be incorporated, but that’s the basic idea. I am working on something similar compared to what I am proposing here—I try to use knowledge about types to enable shorter expressions for programming. Put simply, if I know that ‘f’ and ‘g’ expect ‘X’ of type ‘obj’, then I only need to write ‘f/1, g/1’. But that’s just a side note.
Perhaps this impulse could steer Prolog typing in the right direction.
yes… this is still not clear to me. This supposedly can never be what you meant to type:
1 is X rem 2
but it is used on purpose and it “works” often enough, even though it feels like a compiler warning or error? Especially if there is a floating point number on the left-hand side?
But how is this different from
process_wait(PID, exit(0))
Is it a good idea to “check exit status” like that? Are those two examples the same or different?
PS: The two examples are different, for a reason I didn’t notice first: process_wait(PID, exit(0)), just like read(a), both have a side-effect that cannot be undone.
I think this goal is perfectly fine, though good style would ask for using 1 =:= X rem 2. Using is/2 and if the left side is a float, I think the ideal compiler should say “goal always fails”.
It is (IMO) not different. If your goal is to test whether some output argument unifies to some value you can put that value at the place of the argument … unless the predicate is not stead-fast. Public predicates should IMO always be steadfast. As a result, that also holds for built-ins and library predicates. I’m ok with local “helper” predicates not being steadfast.
Fine. I see from your link
| *Instantiation* | *Direction* | *Argument requirements* |
| =|+|= | Input argument | Must satisfy _type_ at call time. May be instantiated further. |
| =|-|= | Output argument | Must be unbound at call time. Will satisfy _type_ on exit. |
| =|?|= | Output argument | As =|-|= but "steadfast". (With =nonvar= argument, =det= becomes =semidet=, etc.) |
| =|:|= | Input argument | As =|+|= but treated as a meta-argument. |
| =|@|= | Input argument | As =|+|= but not instantiated further. |
| =|!|= | Input argument | As =|+|= but may be modified destructively. |
First of all, ? is typically not considered an “output” argument. Consider append(?,?,?). Typically ? means “input or output” or “pure” (as in purely logical, i.e., valid in any mode). In other words, it means nothing
That is why = was proposed.
Otherwise I mostly agree. As said though, that makes - pretty much useless as non-steadfast predicates should not exist except for helpers that you normally do not document. Only for predicates such as open/3 it makes sense. As you can not create anything (except for a variable) that will unify with the stream argument, having this argument instantiated will always call open/3 to fail.
+ as input argument is fine, but the relation between instantiation and typing is complicated. We must at least accept a variable as valid input for term or any. If we do so though, it means we do not say anything observable about the argument. Note that the classical meaning is nonvar. The classical meaning does not combine with types though.
: can, unfortunately also be an output argument. Consider clause/2, current_predicate/1, etc.
I disagree:
(ins)?- append( X, Y, Z).
X = [],
Y = Z ;
X = [_A],
Z = [_A|Y] ;
X = [_A, _B],
Z = [_A, _B|Y] ;
X = [_A, _B, _C],
Z = [_A, _B, _C|Y] ;
X = [_A, _B, _C, _D],
Z = [_A, _B, _C, _D|Y] .
...
The plus/3 predicate fails to enumerate all combinations. Maybe we need a better mode declaration. A mode declaration could be by itself an enumeration. Consider:
:- mode( +, +, -); mode( -, +, +); mode( +, -, +).
plus( A; B, C) :- ...
And I think the mode declaration can also be derived for a term like this:
TERM = ( plus( M, N, O), plus( N, O, P))
Then I would know how to fill the variables before calling the term.
The only problem with that is the number of mode combinations which can be too much after a while.
But we can look for shorter definitions in the clpb package there is the card operator which seems to be able to shorten this statement.
So when I say for instance that the number of + in plus parameters has to be 2 and the number of - parameters has to be 1 that could be an approach for a shorter mode definition.
I know some people on Stack Overflow who would disagree. Also in real life we see quite often “pure” predicates. The simplest being a set of facts. I think mode declarations fit these very poorly and possibly we should have a dedicated annotation that says that the predicate is “pure” (or “relational in all arguments”).
Also on non-pure predicates the number of modes can be pretty high. Think sub_atom/5. Actually, this predicate is “pure” except for the first argument that must be ground. There are no issues with partially instantiated arguments and we (thus) have 16 modes …
Can someone please explain what a “steadfast” predicate is? Thanks.
In fact, the term more applies to arguments. Below, the predicate p/1 is steadfast for its argument if the two clauses below always produce the same answers (i.e., regardless of the instantiation of X).
t(X) :- p(X).
t(X) :- p(Y), X = Y.
A typical example of non-steadfast code is below. I.e, calling max(3,5,3) succeeds, while calling max(3,5.X), X = 3 fails.
max(X,Y,X) :- X >= Y, !.
max(_,Y,Y).
Thanks, but I’m very confused now. The non-steadfast max/3 doesn’t work like you say and I’m not sure what’s the right instantiation to show what you meant to show:
?- [user].
|: max(X,Y,X) :- X >= Y, !.
|: max(_,Y,Y).
|:
% user://1 compiled 0.00 sec, 2 clauses
true.
?- max(3,5,3).
false.
?- max(3,5,X), X = 3.
false.
?- listing(max/3).
max(A, B, A) :-
A>=B,
!.
max(_, A, A).
true.
@jan ‘s example was not quite right. A correct example is the first definition of integer_integer_maximum at https://stackoverflow.com/questions/39191184/steadfastness-definition-and-its-relation-to-logical-purity-and-termination
The bug with integer_integer_maximum is that it prematurely binds the 3rd argument:
integer_integer_maximum(X, Y, Y) :-
Y >= X,
!.
integer_integer_maximum(X, _, X).
… so this is wrong:
?- M = 0, integer_integer_maximum(0, 1, M).
M = 0. % wrong!
… because integer_integer_maximum(0, 1, 0) cannot unify with integer_integer_maximum(X, Y, Y), so its cut does not get applied.
To correct the code, the 3rd argument should be another variable which only gets bound after the cut, i.e.:
integer_integer_maximum_correct(X, Y, M) :-
Y >= X,
!,
M = Y.
integer_integer_maximum_correct(X, _, X).
However - for such code, the alternative style used in the built-in max/3 is simpler and easier:
?- listing(max/3).
max(X, Y, Max) :-
( X@<Y
-> Max=Y
; Max=X
).
Thanks @brebs . The code is still wrong, but the given query didn’t show it. This does
?- max(5,3,3).
true.
In general, you often loose steadfastness when unifying “output” arguments before a cut. That is one of the reasons to have => rules. This would be the literal translation for my original program.
max(X,Y,X), X >= Y => true.
max(_,Y,Y) => true.
Now, ?- max(5,3,3). is still true, so it seems we didn’t buy much. But,
?- max(5,3,X).
ERROR: No rule matches max(5,3,_11658)
ERROR: In:
ERROR: [14] max(5,3,_11706)
ERROR: [13] toplevel_call(user:user: ...) at /home/jan/src/swipl-devel/build.pgo/home/boot/toplevel.pl:1520
Exception: (14) max(5, 3, _2062) ?
… the normal use case is now an error because => rules disallow the head unification to instantiate variables. So, it is obvious the predicate is broken. To fix it we need a fresh variable for the output argument, ending up with
max(X,Y,M), X >= Y => M = X.
max(_,Y,M) => M = Y.
But how is steadfastness in nondet predicates? Is the handling of attributed variables for instance like clpfd it does also included?
Or predicates which can provide indefinite many solutions like length?
length( L, 3) vs. length( L, LEN), LEN=3
At a first glance it provides at least the same solution.
length/2 is intentionally designed to be helpful, when the length is unspecified. Prolog uses backtracking, so a Length variable which is only instantiated after length/2 has been called, is not in scope (i.e. is unavailable) to length/2. A more obvious example is:
?- format('A equals ~w~n', [A]), A = 5.
A equals _8072
A = 5.
The order of the constraints is critically important, to be able to satisfy the intentions of the program.
The usual intentions of a program are to work “correctly” (i.e. as the programmer intended), and be reasonably performant.
If different behaviour is desired, then we can create our own predicates (e.g. list_length at https://ridgeworks.github.io/clpBNR/CLP_BNR_Guide/CLP_BNR_Guide.html ), or e.g. at least prevent going off into infinity:
?- between(2, 3, Len), length(List, Len).
Len = 2,
List = [_, _] ;
Len = 3,
List = [_, _, _].
So, “at first glance” is a desirable aim for program understandability, but sometimes unachievable because there is always the implicit hard requirement that the person already appreciates the foibles of the language - e.g. compare/3 when given vars:
?- compare(C, A, B).
C = (<).
Thanks. This is an interesting case. And no, according to the definition length/2 is not steadfast. Yes, it produces the correct solution, but (1) it does so much slower than when called as length(L, 3) and (2) it runs out of stack on backtracking rather than being deterministic.
In my current view on modes, I think it should be descriped as
I.e., if the first argument is a list, it is steadfast in the 2nd and visa versa. Otherwise it enumerates pairs of lists and integers.
length/2 is a pretty complicated predicate. Just try implementing it correctly in Prolog, providing all solutions when needed and determinism when possible ![]()
But we can make it steadfast :
?- when( (nonvar(L); ground(LEN) ), length( L, LEN)), member( LEN, [2,3]).
L = [_, _],
LEN = 2 ;
L = [_, _, _],
LEN = 3.
Does that fit the definition?
This goes off into infinity, though:
?- when( (nonvar(L); ground(LEN) ), length( L, LEN)), L = [_|_], member( LEN, [2,3]).
… because nonvar(L) is triggered by the L = [_|_].
This works better:
?- when( ground(LEN), length( L, LEN)), L = [_|_], member( LEN, [2,3]).
LEN = 2,
L = [_, _] ;
LEN = 3,
L = [_, _, _].