Understanding the output of the profiler

Hi all,

I’m using: SWI-Prolog version 9.2.9

I’m troubleshooting a bottleneck in my application that uses swipl, so I ran the profiler as shown below.

profile_h(Goal) :-
    strip_module(Goal, Module, Pred),
    get_term_arity(Pred, F, A),
    (current_prolog_flag(optimise, false) ->
        writeln(user_error, 'WARNING: Profiling non-optimized code can be misleading')
    ;
        true
    ),
    format(user_error, 'INFO: Profiling ~p~n', [Module:F/A]),
    profile(Goal, [cumulative(false), top(100)]).

and this is the output I get (trimmed for clarity)

INFO: Profiling eclair:unit_work_orig/3
=====================================================================
Total time: 827.587 seconds
=====================================================================
Predicate                       Box Entries =    Calls+Redos     Time
=====================================================================
retract/1                         1,811,405 =1,469,505+341,900  30.5%
eclair:$metrics__value/6          1,308,047 =1,087,301+220,746  27.6%
eclair:parser_type_fqname/2       1,980,085 =1,980,085+0         5.7%
eclair:parser_get_type/2          4,247,369 =4,247,369+0         1.3%
nb_getval/2                     101,436,385 =101,436,385+0         1.1%
$memberchk/3                      9,942,183 =9,942,183+0         0.8%
eclair:parser_decl_fqname/2         272,729 =  272,729+0         0.8%
eclair:fQualType_getDesugaredType/32,819,578 =2,819,578+0         0.7%
retractall/1                        113,964 =  113,964+0         0.7%
term_hash/2                      29,634,269 =29,634,269+0         0.7%
garbage_collect_clauses/0                 1 =        1+0         0.6%
$garbage_collect/1                        7 =        7+0         0.6%
assertz/1                         3,967,256 =3,967,256+0         0.6%
eclair:ast_visitor_has_mode/1    56,679,914 =56,679,914+0         0.5%
eclair:parser_get_redecls/2         991,618 =  991,618+0         0.5%
utilities:matches/3               9,126,278 =8,649,698+476,580   0.5%
eclair:parsed_visitor_enter/2     1,616,883 =  920,024+696,859   0.3%
eclair:matches_approx/5           1,374,712 =1,373,625+1,087     0.3%
eclair:$mcrargpn__arg_expand_data/92,951,751 =1,532,630+1,419,121 0.3%
eclair:parser_expr_is_evaluatable/3 126,309 =  126,309+0         0.3%
eclair:parsed_visitor_exit/2      1,908,236 =  920,024+988,212   0.3%
eclair:regexec/2                  2,421,380 =2,421,380+0         0.3%
eclair:$unuslocl__decl/3            101,300 =   91,450+9,850     0.3%
nb_linkval/2                     21,574,876 =21,574,876+0         0.3%
eclair:g_type_selector__matches_aux/218,446,744,073,713,445,529 =8,250,221+18,446,744,073,705,195,308 0.3%
eclair:parser_loc_get_nspelling_info/43,229,208 =3,229,208+0         0.2%
eclair:parser_token_get_text/2    2,408,468 =2,408,468+0         0.2%
$bags:cleanup_bag/2               3,641,638 =3,626,803+14,835    0.2%
utilities:list_exists/3           1,484,346 =1,484,346+0         0.2%
eclair:has_function_props/10     15,178,388 =15,178,388+0         0.2%
parser:is_node/2                 12,904,999 =12,904,999+0         0.2%
eclair:is_inside_signature_for_any_function_decl/31,685,544 =1,685,544+0         0.2%
eclair:parser_fid_has_info/4      4,729,506 =4,729,506+0         0.2%
eclair:parser_type_add_qualifiers/3 697,502 =  697,502+0         0.2%
eclair:fStmt_getSourceRange/2     1,143,695 =1,143,695+0         0.2%
eclair:fTypeLoc_getSourceRange/2    311,247 =  311,247+0         0.2%
eclair:frame_matches/1           19,187,280 =19,187,280+0         0.2%
eclair:parser_file_get_info/3     2,063,583 =2,063,583+0         0.2%
eclair:g_decl_selector__matches_aux/218,446,744,073,713,234,421 =4,516,608+18,446,744,073,708,717,813 0.2%
eclair:node_has_handle/2          5,960,516 =5,960,516+0         0.2%
eclair:ast_subtree__is_selected/1 5,204,455 =5,204,455+0         0.2%
utilities:list_exists/2           3,625,976 =3,619,550+6,426     0.2%
eclair:matches_alt/5 18,446,744,073,714,797,408 =5,482,643+18,446,744,073,709,314,765 0.2%

Aside from the top offenders, which are not what interests me in this context, I’m trying to understand the reason behind the huge number of retries in e.g. matches_alt.

This is what I gathered so far from interpreting that output, but I may be totally mistaken:

  • It’s not possible that the predicate was redone that many times, especially given the very low cpu time percentage, and also because it’s a multi-clause predicate with at most tens of clauses, so clearly it’s not executed that many times;
  • Perhaps the number is negative, which then prompts the question why the number of retries may go into the negatives, unless somehow it’s due to overflow

I did test the last hypotesis (in a different run, but showing roughly the same behavior) and this is the result:

eclair:matches_alt/518,446,744,073,727,905,772 =20,682,929+18,446,744,073,707,222,843 1.1%
(gdb) print/x 18446744073707222843LL
$4 = 0xffffffffffdc773b
(gdb) print/x ~18446744073707222843LL
$5 = 0x2388c4
(gdb) print/d ~18446744073707222843LL
$6 = 2328772

which seems to confirm the hypotesis that the huge number is just a negative quantity.
Could it be that the number of redos is unsigned and therefore if they are subtracted (too much) it behaves like this? From my tests, the application works correctly, so perhaps it’s just manifesting in the profiler.

Thanks

Add ports(true) to the options. That gives correct port counts. Without, the call ports are fine, but the others vary. The downside is that on some cases of mutual recursion the recorded call tree can become huge. If you are interested primarily in time with some indication of activity, the default is fine. If you want to understand your program, ports(true) is the way to go.

Note that there is also coverage/2 to annotate your program behaviour by line. This relies on source code mapping though and does not work well on programs that use a lot of term_expansion/2 and/or goal_expansion/2.

1 Like

Hi Jan,

I did some experiments with ports(true), but as you indicated it has an excessive impact for most of the usecases. coverage/2 might be interesting to use to find unused code, though there are more than a few uses of goal_expansion/2 and term_expansion/2, so I’m not sure it would be effective. Thanks for the help.

Can you elaborate? Can you provide some information on memory usage? (See SWI-Prolog -- library(statistics): Get information about resource usage ). Having such data for different profiler options should determine whether the profiler is having a significant impact in your case; there may even be lingering bug fixes or optimizations that need attention. (Generate an issue if you think so.)

I also note that garbage collection (including clauses) is run very infrequently suggesting that the automatic mechanisms don’t seem to be reacting how you would expect if memory resources were being depleted.

Further the data in the original post indicates that 30% of the time is spent in retract/1. Assuming the counts for the call port are correct, that’s approximately 30% of 827 sec. for 1.5 million calls, or about 165 microseconds per call. That seems high, or at least worthy of further investigation.

And, as Jan points out, using the ports(true) will consume more memory when the program is mutually recursive (but not immediately recursive). Does your program fit this description? (Called/calling details from the call graph can be used to determine this.) If so, running such a program for 827 seconds (~15 minutes) is probably going to run out of memory at some point - using ports(true) may only be exacerbating an issue that already exists. If so, there may well be a use case that runs out of memory even when the profiler isn’t being used at all.

It’s also not clear (to me) whether you’re interested in general timing information to point to general areas in a large program with unknown performance characteristics that need attention. If so, the default where only the call port count is accurate makes sense and has fairly modest memory requirements.

If you’re more interested in the details at the individual predicate level, then having accurate port counts is useful, e.g., for non-deterministic predicates where clause ordering may be an issue. But in this case, you generally want to work with smaller programs or sub-programs (i.e., not programs that need to run for 15 minutes) where the memory overhead isn’t as big an issue. (This latter case dominates for my particular programs of interest.)