Improving unit tests

Hi

apologies for the GDrive permission snag.

Let me just add the code right here (reindent as needed).

/* -*- Mode: Prolog -*- */

/* plunit_reporting.pl --
 * Extra test running and reporting predicates to ease automating
 * multiple tests of similar code.
 *
 * See the file COPYING for coyright and licensing information.
 */

:- module(plunit2,
	  [ run_tests_and_report/1,
	    run_tests_and_report/2
	  ]).
:- use_module(library(plunit)).
:- add_import_module(plunit2, plunit, start).


%!  run_tests_and_report(-Report) is semidet.
%!  run_tests_and_report(+TestSet, -Report) is semidet.
%
%   Run tests and report about the results.  The predicate
%   run_tests_and_report/1 runs all known tests that are not
%   blocked. The predicate run_tests_and_report/1 takes a
%   specification of tests to run.  This is either a single
%   specification or a list of specifications. Each single
%   specification is either the name of a test-unit or a term
%   <test-unit>:<test>, denoting a single test within a unit.

/* The predicates report_and_cleanup2/2 and report2/5 are named so to
 * avoid, for the time being, issues with 'discontinuous' settings.
 */

run_tests_and_report(report(Passed,
			    Failed,
			    FailedAssertions,
			    Blocked,
			    STO)) :-
    cleanup,
    setup_call_cleanup(
        setup_trap_assertions(Ref),
        (   run_current_units2,
            %% writeln(">>> Current units run"),
            true
        ),
        report_and_cleanup2(Ref,
			    report(Passed,
				   Failed,
				   FailedAssertions,
				   Blocked,
				   STO))).


run_current_units2 :-
    forall(current_test_set(Set),
           run_unit(Set)),
    check_for_test_errors2.



report_and_cleanup2(Ref,
		    report(Passed,
			   Failed,
			   FailedAssertions,
			   Blocked,
			   STO)) :-
    cleanup_trap_assertions(Ref),
    %% format(">>> After trap cleanup ~w~n", [Ref]),
    report2(Passed,
	    Failed,
	    FailedAssertions,
	    Blocked,
	    STO),
    /*
    format(">>> ~d ~d ~d ~d ~d~n",
           [Passed,
            Failed,
            FailedAssertions,
            Blocked,
            STO]),
    */
    cleanup_after_test2.


run_tests_and_report(Set,
		     report(Passed,
			    Failed,
			    FailedAssertions,
			    Blocked,
			    STO)) :-
    cleanup,
    setup_call_cleanup(
        setup_trap_assertions(Ref),
        (
            %% format(">>> Current unit running ~w~n", [Set]),
            run_unit_and_check_errors2(Set),
            %% format(">>> Current unit run and errors checked ~w~n", [Set]),
            true
        ),
        report_and_cleanup2(Ref,
			    report(Passed,
				   Failed,
				   FailedAssertions,
				   Blocked,
				   STO))).


run_unit_and_check_errors2(Set) :-
    run_unit(Set),
    %% format(">>> RUCE ~w~n", [Set]),
    check_for_test_errors2,
    %% format(">>> RUCE done~n").
    true.


check_for_test_errors2 :- true.


report2(Passed, Failed, FailedAssertion, Blocked, STO) :-
    number_of_clauses(passed/5, Passed),
    number_of_clauses(failed/4, Failed),
    number_of_clauses(failed_assertion/7, FailedAssertion),
    number_of_clauses(blocked/4, Blocked),
    number_of_clauses(sto/4, STO),
    (   Passed+Failed+FailedAssertion+Blocked+STO =:= 0
    ->  info(plunit(no_tests))
    ;   Failed+FailedAssertion+Blocked+STO =:= 0
    ->  report_fixme,
        info(plunit(all_passed(Passed)))
    ;   report_blocked,
        report_fixme,
        report_failed_assertions,
        report_failed,
        report_sto,
        info(plunit(passed(Passed)))
    ).



cleanup_after_test2 :-
    current_test_flag(test_options, Options),
    %% format(">>> Options ~w~n", [Options]),
    option(cleanup(Cleanup), Options, false),
    %% format(">>> Cleanup ~w~n", [Cleanup]),
    (   Cleanup == true
    ->  cleanup
    ;  (
           %%  writeln(">>> Not cleaning up"),
            true
       )
    ).


/* end of file -- plunit_reporting.pl */
1 Like

I do not use sto(true) and do not change occurs_check flag.


The like for the reply with the code posting is not because I like the reply but to increase likelihood of the reply being included with the Discourse Activity Summary.

Do I understand correctly that you only want access to the totals? It does strengthen my idea to add

 run_tests(+Set, +Options)

Where Set should probably be allowed to be a variable or something like all to run all tests. Options can do the same as set_test_options/1 and add additional options, like returning the overall statistics. But also things such as

?- run_tests(_, [jobs(4)]).
?- run_tests(_, [format(tap)]).

etc.

Yes. I want access to the totals. But I do not want to have to parse text of any kind to access them.

My solution works (I am using it; in conjunction with a mostly working home brewed timeout facility). If you have a different, better one, I will have something off my plate.

But the “no parsing” is a must.

What about something like.

?- run_tests(_, [..., report(Passed, Failed, FailedAssertions, Blocked, STO), ...]).

All the best

Marco

I will add some option to return a report on the executed tests. Using messages however does not require parsing. message_hook/3 has this signature:

message_hook(+Term, +Kind, +Lines)

Here, Term conveys the meaning of the messages as a Prolog term. Kind is the warning, error, etc. and Lines is the (somewhat encoded) text that is generated from Term. The hook may inspect Term to get access to the data associated with the message and handle it any way it wishes. If it merely wants to print the message to a non-standard location it can use print_message_lines/3 using the Lines argument.

We should thank Quintus for this quite neat design for dealing with messages!

2 Likes

If a test hangs, and one presses ^C, then the abort(y/n) prompt is not displayed; perhaps the output should be re-enabled temporarily when ^C is pressed.

The new output is nice, but because they overlay each other, only the last tests’ output shows and there’s no total time. Of course, I can do time(run_tests), but the old plunit used to give a total, I think - and also when the tests were automatically run by make. (I might like an option for the outputs to not overlay, so that I get time for each test separately.

?- run_tests.
[88/88] nerdle:p61 .......................................... passed (0.004 sec)
% All 88 tests passed
true.

?- time(run_tests).
[88/88] nerdle:p61 .......................................... passed (0.004 sec)
% All 88 tests passed
% 421,243,035 inferences, 40.918 CPU in 40.930 seconds (100% CPU, 10294774 Lips)
true.

Thanks for noting. Pushed a fix that restores output on ^C. The only downside is that if you use c (continue), output remains directly to the terminal for the interrupted test. That is hard to avoid.

That is achieved using

?- set_test_options([format(log)]).

Or

?- run_tests(all, [format(log)]).

The log format is default if the output is not to a terminal, so we get a full log (hence the name) when running in normal scripting environments. That is one reason for the current changes: build+test logs where a test crashed or got struck used to be very hard to debug. On the other hand, running make/0 with a lot of tests I typically merely want to see they all passed rather than pages of output.

It now also prints a total time. Currently wall time as CPU time is a little harder when using concurrency.

Thanks, it seems to me that it is okay for output to remain directed to the terminal, since we pressed ^C.

get_test_options/1

Should not we have a get_test_options/1/current_test_optioons/1? This could be useful to programmatically do some different things if some test option is enabled.

run_tests/0

There seems to be a bug, if you have two or more begin_tests/1; run_tests. only runs the first test unit, you have to use run_tests(all) to run them all. Should not run_tests be the same as run_tests(all)?

This is a bit of a mess. Part of that is because library(plunit) was portable. I’m afraid that is gone anyway (although the test file syntax is still the same, which is
most likely the most important to users). I consider to replace the test options with straight prolog flags called plunit_*, i.e.

 :- set_prolog_flag(plunit_format, log).

That gets what you want and allows people to set defaults in their init file without loading plunit. Would that make sense?

Works fine for me and in plunit we see

run_tests :-
    run_tests(all).

Some local redefinition?

hmmm…I restarted swi-prolog and the problem went away, it is working now. No idea what caused it, probably some old code still in memory.

I think this works, but I wouldn’t eliminate run_tests(+TestSet,+Options) (with options as the last arg).

I surely keep that. I’ll also keep set_test_options/1 for compatibility. I just would like to have something cheap to put in your init file or project load file that deals with things such as test output, concurrency, etc.

2 Likes

A small trick for seeing all the tests’ outputs, rather than have them overwrite each other is to pipe the output to cat. For example – this is what ctest runs for one set of tests I’m working on, with “|cat” added to tell plunit that it’s outputting to a “dumb” terminal:

/home/peter/src/swipl-devel/build/src/swipl "-p" "foreign=" "-f" "none" "--no-packs" "--on-error=status" "-s" "/home/peter/src/swipl-devel/packages/cpp/test_ffi.pl" "-g" "test_ffi" "-t" "halt" | cat

Apologies for the bogus bug report; here’s a real one (using the command env -i PATH=/usr/bin LANG=en_US.UTF-8 /usr/bin/ctest -j8 -V -R cpp:ffi). It only occurs sometimes.

42: Test command: /home/peter/src/swipl-devel/build/src/swipl "-p" "foreign=" "-f" "none" "--no-packs" "--on-error=status" "-s" "/home/peter/src/swipl-devel/packages/cpp/test_ffi.pl" "-g" "test_ffi" "-t" "halt"
42: Test timeout computed to be: 10000000
42: % Start unit: ffi
42: % [1/38] ffi:range1 ..
42:     [[ EXCEPTION while printing message '~`.t ~w (~3f sec)~*|'
42:        with arguments [passed,0.021554946899414062,-8]:
42:        raised: format('no or negative integer for `*\' argument')
42:     ]]
42: 
42: % [2/38] ffi:range2 ..
42:     [[ EXCEPTION while printing message '~`.t ~w (~3f sec)~*|'
42:        with arguments [passed,2.5987625122070312e-5,-8]:
42:        raised: format('no or negative integer for `*\' argument')
42:     ]]

There are some changes in this latest commit to PlUnit: Include test unit in generating the summary. · SWI-Prolog/packages-plunit@09c90d4 · GitHub

Those might be causing this regression.

The problem seems to be in plunit:tty_width/2, because under some circumstances, tty_size/2 can return Cols=0 (seems weird, but I’m running under Emacs and it has its own idea about how a shell window should behave; sometimes it returns the correct number and sometimes it returns 0). There are a few obvious work-arounds, but I don’t know which @jan would prefer because tty_columns/2 is used in a few places. Here’s one fix (adding Cols >= 8 to tty_width/1):

tty_width(W) :-
    current_predicate(tty_size/2),
    catch(tty_size(_Rows, Cols), error(_,_), fail),
    Cols >= 8,   % <<< add this line: "8" is used in tty_columns/2
    !,
    W = Cols.
tty_width(80).

Piping indeed stops overwriting. Normally you’ll run this test as

src/swipl ../packages/cpp/test_ffi.pl
?- test_ffi.

If you want log-style output, run

?- run_tests(all, [format(log)]).

Or add to your init file (requires current GIT version)

:- set_prolog_flag(plunit_format, log).

Think I found the wrong format from the message generation. Pushed a fix. Probably the checking for format arguments should be extended to cope with the message generating DCG conventions. Bit harder to do unambiguously though.

Ok, This is probably the real cause. The other is a real patch too, but for a different scenario. Added (demanding 25 columns at the minimum as it makes
little sense with less).