Improving unit tests

marcoxa · January 22, 2023, 11:59am

Hi

apologies for the GDrive permission snag.

Let me just add the code right here (reindent as needed).

/* -*- Mode: Prolog -*- */

/* plunit_reporting.pl --
 * Extra test running and reporting predicates to ease automating
 * multiple tests of similar code.
 *
 * See the file COPYING for coyright and licensing information.
 */

:- module(plunit2,
	  [ run_tests_and_report/1,
	    run_tests_and_report/2
	  ]).
:- use_module(library(plunit)).
:- add_import_module(plunit2, plunit, start).


%!  run_tests_and_report(-Report) is semidet.
%!  run_tests_and_report(+TestSet, -Report) is semidet.
%
%   Run tests and report about the results.  The predicate
%   run_tests_and_report/1 runs all known tests that are not
%   blocked. The predicate run_tests_and_report/1 takes a
%   specification of tests to run.  This is either a single
%   specification or a list of specifications. Each single
%   specification is either the name of a test-unit or a term
%   <test-unit>:<test>, denoting a single test within a unit.

/* The predicates report_and_cleanup2/2 and report2/5 are named so to
 * avoid, for the time being, issues with 'discontinuous' settings.
 */

run_tests_and_report(report(Passed,
			    Failed,
			    FailedAssertions,
			    Blocked,
			    STO)) :-
    cleanup,
    setup_call_cleanup(
        setup_trap_assertions(Ref),
        (   run_current_units2,
            %% writeln(">>> Current units run"),
            true
        ),
        report_and_cleanup2(Ref,
			    report(Passed,
				   Failed,
				   FailedAssertions,
				   Blocked,
				   STO))).


run_current_units2 :-
    forall(current_test_set(Set),
           run_unit(Set)),
    check_for_test_errors2.



report_and_cleanup2(Ref,
		    report(Passed,
			   Failed,
			   FailedAssertions,
			   Blocked,
			   STO)) :-
    cleanup_trap_assertions(Ref),
    %% format(">>> After trap cleanup ~w~n", [Ref]),
    report2(Passed,
	    Failed,
	    FailedAssertions,
	    Blocked,
	    STO),
    /*
    format(">>> ~d ~d ~d ~d ~d~n",
           [Passed,
            Failed,
            FailedAssertions,
            Blocked,
            STO]),
    */
    cleanup_after_test2.


run_tests_and_report(Set,
		     report(Passed,
			    Failed,
			    FailedAssertions,
			    Blocked,
			    STO)) :-
    cleanup,
    setup_call_cleanup(
        setup_trap_assertions(Ref),
        (
            %% format(">>> Current unit running ~w~n", [Set]),
            run_unit_and_check_errors2(Set),
            %% format(">>> Current unit run and errors checked ~w~n", [Set]),
            true
        ),
        report_and_cleanup2(Ref,
			    report(Passed,
				   Failed,
				   FailedAssertions,
				   Blocked,
				   STO))).


run_unit_and_check_errors2(Set) :-
    run_unit(Set),
    %% format(">>> RUCE ~w~n", [Set]),
    check_for_test_errors2,
    %% format(">>> RUCE done~n").
    true.


check_for_test_errors2 :- true.


report2(Passed, Failed, FailedAssertion, Blocked, STO) :-
    number_of_clauses(passed/5, Passed),
    number_of_clauses(failed/4, Failed),
    number_of_clauses(failed_assertion/7, FailedAssertion),
    number_of_clauses(blocked/4, Blocked),
    number_of_clauses(sto/4, STO),
    (   Passed+Failed+FailedAssertion+Blocked+STO =:= 0
    ->  info(plunit(no_tests))
    ;   Failed+FailedAssertion+Blocked+STO =:= 0
    ->  report_fixme,
        info(plunit(all_passed(Passed)))
    ;   report_blocked,
        report_fixme,
        report_failed_assertions,
        report_failed,
        report_sto,
        info(plunit(passed(Passed)))
    ).



cleanup_after_test2 :-
    current_test_flag(test_options, Options),
    %% format(">>> Options ~w~n", [Options]),
    option(cleanup(Cleanup), Options, false),
    %% format(">>> Cleanup ~w~n", [Cleanup]),
    (   Cleanup == true
    ->  cleanup
    ;  (
           %%  writeln(">>> Not cleaning up"),
            true
       )
    ).


/* end of file -- plunit_reporting.pl */

EricGT · January 22, 2023, 12:01pm

I do not use sto(true) and do not change occurs_check flag.

The like for the reply with the code posting is not because I like the reply but to increase likelihood of the reply being included with the Discourse Activity Summary.

jan · January 22, 2023, 4:32pm

Do I understand correctly that you only want access to the totals? It does strengthen my idea to add

 run_tests(+Set, +Options)

Where Set should probably be allowed to be a variable or something like all to run all tests. Options can do the same as set_test_options/1 and add additional options, like returning the overall statistics. But also things such as

?- run_tests(_, [jobs(4)]).
?- run_tests(_, [format(tap)]).

etc.

marcoxa · January 22, 2023, 4:53pm

Yes. I want access to the totals. But I do not want to have to parse text of any kind to access them.

My solution works (I am using it; in conjunction with a mostly working home brewed timeout facility). If you have a different, better one, I will have something off my plate.

But the “no parsing” is a must.

What about something like.

?- run_tests(_, [..., report(Passed, Failed, FailedAssertions, Blocked, STO), ...]).

All the best

Marco

jan · January 24, 2023, 8:51am

I will add some option to return a report on the executed tests. Using messages however does not require parsing. message_hook/3 has this signature:

message_hook(+Term, +Kind, +Lines)

Here, Term conveys the meaning of the messages as a Prolog term. Kind is the warning, error, etc. and Lines is the (somewhat encoded) text that is generated from Term. The hook may inspect Term to get access to the data associated with the message and handle it any way it wishes. If it merely wants to print the message to a non-standard location it can use print_message_lines/3 using the Lines argument.

We should thank Quintus for this quite neat design for dealing with messages!

swi · January 27, 2023, 10:21pm

If a test hangs, and one presses ^C, then the abort(y/n) prompt is not displayed; perhaps the output should be re-enabled temporarily when ^C is pressed.

peter.ludemann · January 28, 2023, 1:51am

The new output is nice, but because they overlay each other, only the last tests’ output shows and there’s no total time. Of course, I can do time(run_tests), but the old plunit used to give a total, I think - and also when the tests were automatically run by make. (I might like an option for the outputs to not overlay, so that I get time for each test separately.

?- run_tests.
[88/88] nerdle:p61 .......................................... passed (0.004 sec)
% All 88 tests passed
true.

?- time(run_tests).
[88/88] nerdle:p61 .......................................... passed (0.004 sec)
% All 88 tests passed
% 421,243,035 inferences, 40.918 CPU in 40.930 seconds (100% CPU, 10294774 Lips)
true.

jan · January 30, 2023, 9:40am

Thanks for noting. Pushed a fix that restores output on ^C. The only downside is that if you use c (continue), output remains directly to the terminal for the interrupted test. That is hard to avoid.

That is achieved using

?- set_test_options([format(log)]).

Or

?- run_tests(all, [format(log)]).

The log format is default if the output is not to a terminal, so we get a full log (hence the name) when running in normal scripting environments. That is one reason for the current changes: build+test logs where a test crashed or got struck used to be very hard to debug. On the other hand, running make/0 with a lot of tests I typically merely want to see they all passed rather than pages of output.

It now also prints a total time. Currently wall time as CPU time is a little harder when using concurrency.

swi · January 30, 2023, 5:26pm

Thanks, it seems to me that it is okay for output to remain directed to the terminal, since we pressed ^C.

`get_test_options/1`

Should not we have a get_test_options/1/current_test_optioons/1? This could be useful to programmatically do some different things if some test option is enabled.

`run_tests/0`

There seems to be a bug, if you have two or more begin_tests/1; run_tests. only runs the first test unit, you have to use run_tests(all) to run them all. Should not run_tests be the same as run_tests(all)?

jan · January 30, 2023, 7:17pm

This is a bit of a mess. Part of that is because library(plunit) was portable. I’m afraid that is gone anyway (although the test file syntax is still the same, which is
most likely the most important to users). I consider to replace the test options with straight prolog flags called plunit_*, i.e.

 :- set_prolog_flag(plunit_format, log).

That gets what you want and allows people to set defaults in their init file without loading plunit. Would that make sense?

Works fine for me and in plunit we see

run_tests :-
    run_tests(all).

Some local redefinition?

swi · January 30, 2023, 7:45pm

hmmm…I restarted swi-prolog and the problem went away, it is working now. No idea what caused it, probably some old code still in memory.

I think this works, but I wouldn’t eliminate run_tests(+TestSet,+Options) (with options as the last arg).

jan · January 30, 2023, 8:06pm

I surely keep that. I’ll also keep set_test_options/1 for compatibility. I just would like to have something cheap to put in your init file or project load file that deals with things such as test output, concurrency, etc.

peter.ludemann · February 1, 2023, 6:56pm

A small trick for seeing all the tests’ outputs, rather than have them overwrite each other is to pipe the output to cat. For example – this is what ctest runs for one set of tests I’m working on, with “|cat” added to tell plunit that it’s outputting to a “dumb” terminal:

/home/peter/src/swipl-devel/build/src/swipl "-p" "foreign=" "-f" "none" "--no-packs" "--on-error=status" "-s" "/home/peter/src/swipl-devel/packages/cpp/test_ffi.pl" "-g" "test_ffi" "-t" "halt" | cat

peter.ludemann · February 1, 2023, 11:51pm

Apologies for the bogus bug report; here’s a real one (using the command env -i PATH=/usr/bin LANG=en_US.UTF-8 /usr/bin/ctest -j8 -V -R cpp:ffi). It only occurs sometimes.

42: Test command: /home/peter/src/swipl-devel/build/src/swipl "-p" "foreign=" "-f" "none" "--no-packs" "--on-error=status" "-s" "/home/peter/src/swipl-devel/packages/cpp/test_ffi.pl" "-g" "test_ffi" "-t" "halt"
42: Test timeout computed to be: 10000000
42: % Start unit: ffi
42: % [1/38] ffi:range1 ..
42:     [[ EXCEPTION while printing message '~`.t ~w (~3f sec)~*|'
42:        with arguments [passed,0.021554946899414062,-8]:
42:        raised: format('no or negative integer for `*\' argument')
42:     ]]
42: 
42: % [2/38] ffi:range2 ..
42:     [[ EXCEPTION while printing message '~`.t ~w (~3f sec)~*|'
42:        with arguments [passed,2.5987625122070312e-5,-8]:
42:        raised: format('no or negative integer for `*\' argument')
42:     ]]

Boris · February 2, 2023, 5:31am

There are some changes in this latest commit to PlUnit: Include test unit in generating the summary. · SWI-Prolog/packages-plunit@09c90d4 · GitHub

Those might be causing this regression.

peter.ludemann · February 2, 2023, 7:18am

The problem seems to be in plunit:tty_width/2, because under some circumstances, tty_size/2 can return Cols=0 (seems weird, but I’m running under Emacs and it has its own idea about how a shell window should behave; sometimes it returns the correct number and sometimes it returns 0). There are a few obvious work-arounds, but I don’t know which @jan would prefer because tty_columns/2 is used in a few places. Here’s one fix (adding Cols >= 8 to tty_width/1):

tty_width(W) :-
    current_predicate(tty_size/2),
    catch(tty_size(_Rows, Cols), error(_,_), fail),
    Cols >= 8,   % <<< add this line: "8" is used in tty_columns/2
    !,
    W = Cols.
tty_width(80).

jan · February 2, 2023, 9:03am

Piping indeed stops overwriting. Normally you’ll run this test as

src/swipl ../packages/cpp/test_ffi.pl
?- test_ffi.

If you want log-style output, run

?- run_tests(all, [format(log)]).

Or add to your init file (requires current GIT version)

:- set_prolog_flag(plunit_format, log).

Think I found the wrong format from the message generation. Pushed a fix. Probably the checking for format arguments should be extended to cope with the message generating DCG conventions. Bit harder to do unambiguously though.

Ok, This is probably the real cause. The other is a real patch too, but for a different scenario. Added (demanding 25 columns at the minimum as it makes
little sense with less).

Topic		Replies	Views
PLUnit - summary report General	4	826	March 14, 2019
Ann: SWI-Prolog 9.1.4 Releases	0	368	January 27, 2023
How to add trace output to unit tests? Help! how-to	4	420	January 17, 2023
PlUnit extension Request For Comments plunit	7	636	February 28, 2021
Difficult to reproduce problems while running tests General	22	921	August 11, 2021

Improving unit tests

get_test_options/1

run_tests/0

Related topics

`get_test_options/1`

`run_tests/0`