Database micro-benchmark

A quick micro-benchmark for the different database mechanisms (treated as a key store). These database mechanisms have different advantages and disadvantages (backtracking, erasing methods, logical update view, etc) but I just wanted to get a rough idea.
I don’t like micro-benchmarks too much, but it is useful for a relative comparison:

With SWI-Prolog, linux, threaded, 64 bits, version 8.1.13-27-gd480a2001-:

14 ?- go.
rec_gen
% 2,000,001 inferences, 0.377 CPU in 0.377 seconds (100% CPU, 5305095 Lips)
rec_lookup
% 2,000,001 inferences, 0.355 CPU in 0.355 seconds (100% CPU, 5633298 Lips)
rec_lookup (2nd time)
% 2,000,001 inferences, 0.356 CPU in 0.356 seconds (100% CPU, 5620478 Lips)
nb_gen
% 4,000,001 inferences, 0.255 CPU in 0.255 seconds (100% CPU, 15704181 Lips)
nb_lookup
% 2,000,001 inferences, 0.131 CPU in 0.132 seconds (99% CPU, 15222120 Lips)
nb_lookup (2nd time)
% 2,000,001 inferences, 0.131 CPU in 0.131 seconds (100% CPU, 15274105 Lips)
asrt_gen
% 2,000,001 inferences, 0.604 CPU in 0.607 seconds (100% CPU, 3313618 Lips)
asrt_lookup
% 2,000,001 inferences, 1.516 CPU in 1.523 seconds (100% CPU, 1318850 Lips)
asrt_lookup (2nd time)
% 2,000,001 inferences, 0.381 CPU in 0.382 seconds (100% CPU, 5254804 Lips)
true.

So nb_getval and nb_setval are the fastest as would be expected (with the limitation of only one value per key). The record database is fast on the first run of lookups and maintains the same speed on further lookups. The assert database is slow on the first lookup, but reaches record db speeds afterwards.

I wonder if the assert lookup speeds up considerably on the second run because of JIT indexing?

With the stable version (linux, threaded, 64 bits, version 8.0.3) things are faster in general:

2 ?- go.
rec_gen
% 2,000,002 inferences, 0.313 CPU in 0.313 seconds (100% CPU, 6394146 Lips)
rec_lookup
% 2,000,002 inferences, 0.262 CPU in 0.263 seconds (100% CPU, 7621965 Lips)
rec_lookup (2nd time)
% 2,000,002 inferences, 0.259 CPU in 0.260 seconds (100% CPU, 7711132 Lips)
nb_gen
% 4,000,002 inferences, 0.168 CPU in 0.168 seconds (100% CPU, 23870140 Lips)
nb_lookup
% 2,000,002 inferences, 0.101 CPU in 0.101 seconds (100% CPU, 19743315 Lips)
nb_lookup (2nd time)
% 2,000,002 inferences, 0.101 CPU in 0.101 seconds (100% CPU, 19876580 Lips)
asrt_gen
% 2,000,002 inferences, 0.626 CPU in 0.628 seconds (100% CPU, 3194608 Lips)
asrt_lookup
% 2,000,002 inferences, 0.904 CPU in 0.906 seconds (100% CPU, 2212180 Lips)
asrt_lookup (2nd time)
% 2,000,002 inferences, 0.289 CPU in 0.289 seconds (100% CPU, 6931781 Lips)
true.

I guess the (dynamic) tabling code is making thinks a little bit slower in the dev version.

Here is the micro-benchmark code (not perfect, but enough for a rough idea) would like to see what you get on your machine:

go :-
   test(rec),
   test(nb),
   test(asrt).

test(Name) :-
   N = 1 000 000,
   atomic_list_concat([Name,'_',gen],Gen),
   atomic_list_concat([Name,'_',lookup],Lookup),
   atomic_list_concat([Name,'_',clear],Clear),
   ignore(call(Clear,N)),
   writeln(Gen),
   time( call(Gen,N) ),
   writeln(Lookup),
   time( call(Lookup,N) ),
   write(Lookup), writeln(' (2nd time)'),
   time( call(Lookup,N) ).

rec_clear(N) :-
   forall(  between(1,N,I),
            (  recorded(I, I, Ref), erase(Ref)
	    )
   ).


rec_gen(N) :-
   forall(  between(1,N,I),
            (  recordz(I, I)
	    )
   ).


rec_lookup(N) :-
   forall(  between(1,N,I),
            (  recorded(I, I)
	    )
   ).

nb_clear(_).

nb_gen(N) :-
   forall(  between(1,N,I),
            %(  atom_number(A,I), nb_setval(A, I)
            (  nb_setval('500', I)
	    )
   ).

nb_lookup(N) :-
   forall(  between(1,N,_),
            (  nb_getval('500', _)
	    )
   ).

asrt_clear(_) :-
   retractall(keydb(_,_)).

asrt_gen(N) :-
   %retractall(keydb(_,_)),
   forall(  between(1,N,I),
            (  assertz(keydb(I,I))
	    )
   ).

asrt_lookup(N) :-
   forall(  between(1,N,I),
            (  keydb(I,I)
	    )
   ).

EDIT: the difference between stable and dev is not much, so it could be within the statistical error.

4 Likes

Well done. Would love to see it in SWISH, with graphical reporting.

1 Like

The links for those of who like to click.
NB Updated version of code appears in later post.

SWI-Prolog Database

The recorded database

Global variables

Managing (dynamic) predicates

Consult

Trie

Any chance of seeing the following added?

Hard coded facts read in via consult/1
Tries
Association lists
ODBC Interface

EDIT

With dynamic predicates/facts and later loading via consult/1, instead consider the use of library(persistency). I don’t expect that to be added, but am noting it for my use or others who want to explore more possibilities along the line of thought you are exploring.

While not directly related to accessing data in a Prolog database or data structure, another concept of note is quick load files. (Timing examples)

Yes

No. Anyway, that would only affect the assert/retract case. The hooks for dealing with incremental udates of tables are realised using different preample (supervisor) code for incremental dynamic predicates. There is a very quick check for a flag in assert/retract to call update hooks. These really matter for dynamic incremental predicates, but should not be measurable otherwise.

There is a general extra price for meta-calling due to the wrap_predicate/4 primitive. The effect seems pretty small though.

I cannot reproduce your figures. Over the whole run I get 3.78 (8.13) vs. 3.67 (8.03). This figure varies a bit between runs and boils down to 3%. Note that re-compiling the system with trivial changes often leads to a few percent difference, I guess due to different cache behavior.

Note that loading library(apply_macros) makes more difference as this inlines forall/2 rather than meta-calling it. Also using scripts/pgo-compile.sh to build the system is likely to make more difference.

Did you really compile on the same machine with the same compiler and same configuration?

1 Like

Didn’t know this, thanks!

You’re right, I used the pre-built distro binary for stable; when I compile the stable myself I don’t see a significant difference between stable and dev.

I added tries and consult, this is what I get on
8.1.13-27-gd480a2001:

9 ?- go.
Database raw and crude micro benchmark
   -> Testing 1,000,000 entries
rec_gen
% 0.468 CPU in 0.469 seconds (100% CPU)
rec_lookup
% 0.426 CPU in 0.427 seconds (100% CPU)
rec_lookup (2nd time)
% 0.422 CPU in 0.423 seconds (100% CPU)
nb_gen
% 0.257 CPU in 0.257 seconds (100% CPU)
nb_lookup
% 0.133 CPU in 0.133 seconds (100% CPU)
nb_lookup (2nd time)
% 0.131 CPU in 0.131 seconds (100% CPU)
asrt_gen
% 0.487 CPU in 0.487 seconds (100% CPU)
asrt_lookup
% 1.366 CPU in 1.369 seconds (100% CPU)
asrt_lookup (2nd time)
% 0.395 CPU in 0.396 seconds (100% CPU)
consult_gen
% 19.954 CPU in 19.979 seconds (100% CPU)
consult_lookup
% 0.300 CPU in 0.301 seconds (100% CPU)
consult_lookup (2nd time)
% 0.300 CPU in 0.301 seconds (100% CPU)
trie_gen
% 0.493 CPU in 0.494 seconds (100% CPU)
trie_lookup
% 0.273 CPU in 0.273 seconds (100% CPU)
trie_lookup (2nd time)
% 0.274 CPU in 0.274 seconds (100% CPU)

          Database raw and crude micro benchmark           
                     1,000,000 entries                     
Database  Operation                              Wall time
--------  ---------                              ---------
nb        1st lookup  .......................... 0.133 secs.
trie      1st lookup  .......................... 0.273 secs.
consult   1st lookup  .......................... 0.301 secs.
rec       1st lookup  .......................... 0.427 secs.
asrt      1st lookup  .......................... 1.369 secs.

nb        2nd lookup  .......................... 0.131 secs.
trie      2nd lookup  .......................... 0.274 secs.
consult   2nd lookup  .......................... 0.301 secs.
asrt      2nd lookup  .......................... 0.396 secs.
rec       2nd lookup  .......................... 0.423 secs.

nb        insert      .......................... 0.257 secs.
rec       insert      .......................... 0.469 secs.
asrt      insert      .......................... 0.487 secs.
trie      insert      .......................... 0.494 secs.
consult   insert      .......................... 19.979 secs.
true.

The new code which adds consult and tries, prints the nice table, and stores the results as db_bench facts (for further processing):

:- dynamic keydb/2,
           trie/1,
	   consult_db/2,
	   consult_file/1,
	   db_bench/2.

:- use_module(library(assoc)).
:- use_module(library(apply_macros)).

go :-
   writeln('Database raw and crude micro benchmark'),
   N = 1 000 000,
   format('   -> Testing ~:d entries~n',[N]),
   retractall(db_bench(_,_)),
   test_db(rec,N),
   test_db(nb,N),
   test_db(asrt,N),
   test_db(consult,N),
   test_db(trie,N),
   print_header(N),
   print_results.


test_db(Name,N) :-
   atomic_list_concat([Name,'_',gen],Gen),
   atomic_list_concat([Name,'_',lookup],Lookup),
   atomic_list_concat([Name,'_',cleanup],Cleanup),

   setup_call_cleanup(
      (  writeln(Gen),
         benchmark_goal( [Name,insert], call(Gen,N))
      ),
      (  writeln(Lookup),
         benchmark_goal( [Name,'1st lookup'], call(Lookup,N)),
         write(Lookup), writeln(' (2nd time)'),
         benchmark_goal( [Name,'2nd lookup'], call(Lookup,N))
      ),
      (  catch( call(Cleanup,N),
                error(existence_error(procedure,_),_),
	        true)
      )
   ) .




                         /**********************
                          *    Db operations   *
                          **********************/
rec_gen(N) :-
   forall(  between(1,N,I),
            (  recordz(I, I)
	    )
   ).

rec_lookup(N) :-
   forall(  between(1,N,I),
            (  recorded(I, I)
	    )
   ).

rec_cleanup(N) :-
   forall(  between(1,N,I),
            (  recorded(I, I, Ref), erase(Ref)
	    )
   ).



nb_gen(N) :-
   forall(  between(1,N,I),
            %(  atom_number(A,I), nb_setval(A, I)
            (  nb_setval('500', I)
	    )
   ).

nb_lookup(N) :-
   forall(  between(1,N,_),
            (  nb_getval('500', _)
	    )
   ).

nb_cleanup(_) :-
   nb_delete('500').



asrt_gen(N) :-
   %retractall(keydb(_,_)),
   forall(  between(1,N,I),
            (  assertz(keydb(I,I))
	    )
   ).

asrt_lookup(N) :-
   forall(  between(1,N,I),
            (  keydb(I,I)
	    )
   ).

asrt_cleanup(_) :-
   retractall(keydb(_,_)).



trie_gen(N) :-
   trie_new(Trie),
   assertz(trie(Trie)),
   forall(  between(1,N,I),
            (  trie_insert(Trie,I,I)
	    )
   ).

trie_lookup(N) :-
   trie(Trie),
   forall(  between(1,N,I),
            (  trie_lookup(Trie,I,I)
	    )
   ).

trie_cleanup(_) :-
   trie(Trie),
   trie_destroy(Trie),
   retractall(trie(_)).



consult_gen(N) :-
   File = '/tmp/consult_db.db_bench',
   assertz(consult_file(File)),
   setup_call_cleanup(
      open(File,write,Stream),
      consult_write_terms(Stream,N),
      close(Stream)
   ),
   load_files([File]).

consult_write_terms(Stream,N) :-
   forall(  between(1,N,I),
            (  format(Stream,'~w.~n',[consult_db(I,I)])
	    )
   ).

consult_lookup(N) :-
   forall(  between(1,N,I),
            (  consult_db(I,I)
	    )
   ).

consult_cleanup(_) :-
   consult_file(F),
   retractall(consult_file(_)),
   delete_file(F).



                         /**********************
                          *    Nice printout   *
                          **********************/
print_header(N) :-
   nl,
   format('~t~w~t~59|~n',['Database raw and crude micro benchmark']),
   format('~t~:d entries~t~59|~n',[N]),
   format('~w ~t~10|~w ~t~12+~t~48| ~w~n',['Database','Operation','Wall time']),
   format('~w ~t~10|~w ~t~12+~t~48| ~w',  ['--------','---------','---------']).

print_results :-
   bagof( b(Db,W),
          S^(db_bench([Db,Type|_], S), get_dict(wall_time,S,W)),
	  Res),
   sort(2,'@<',Res,Sorted),
   nl,
   maplist( {Type}/[b(Db,Wall)]>>
            format('~w ~t~10|~w ~t~12+~`.t~48| ~3f secs.~n',[Db,Type,Wall]),
            Sorted),

   fail.

print_results.



                            /****************
                             *    Helpers   *
                             ****************/

% Measure, print and store wall and cpu time
% used by Goal.
benchmark_goal(BenchMark,Goal) :-
   get_time(OldWall),
   statistics(cputime, OldCpu),
   call(Goal),
   get_time(NewWall),
   statistics(cputime, NewCpu),
   UsedWall is NewWall - OldWall,
   UsedCpu  is NewCpu  - OldCpu,
   assertz( db_bench(BenchMark, stat{  cpu_time: UsedCpu,
                                       wall_time: UsedWall })),
   print_message(information, bench(UsedCpu, UsedWall)).

prolog:message(bench(Cpu, Wall)) -->
   { Perc is round(Cpu*100/Wall) },
   [ '~3f CPU in ~3f seconds (~w% CPU)' - [Cpu, Wall, Perc] ].

1 Like

Mistery solved :slight_smile: I should figure out what flags are used for the PPA build as these seem to produce a significantly faster version. They probably use -O3. That used to miscompile the system, but I think these issues have been fixed long ago. With older GCC versions the difference between -O2 and -O3 was barely measurable, so I never switched to -O3. Possibly that has changed.

2 Likes

Actually I used the archlinux build, and they are quite good at optmizing gcc flags. I tried looking at current_prolog_flag(c_cflags,C) but it is not capturing all the flags.

It picks the CMake variable C_CFLAGS. That should do the trick AFAIK.

the CMAKE_C_CFLAGS variable is not reliable because it is attached to the target and can change afterwards, people try other methods, see https://stackoverflow.com/questions/46303348/print-complete-cflags-cxxflags-of-cmake-project

I get a good speed up changing the compilation flags:

With this (which is what Archlinux uses):

cmake ../swipl-$pkgver \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DCMAKE_C_FLAGS="$CFLAGS -fPIC -ffile-prefix-map=$PWD= -w" \
    -DLIBEDIT_LIBRARIES=/usr/lib/libedit.so.0 \
    -DLIBEDIT_INCLUDE_DIR=/usr/include \
    -G Ninja
  ../swipl-$pkgver/scripts/pgo-compile.sh
  ninja

and this (I added the -march=native):

CFLAGS="-march=native -O3 -pipe -fstack-protector-strong -fno-plt"

I get:

   Database raw and crude micro benchmark           
                     1,000,000 entries                     
Database  Operation                              Wall time
--------  ---------                              ---------
nb        1st lookup  .......................... 0.100 secs.
trie      1st lookup  .......................... 0.163 secs.
rec       1st lookup  .......................... 0.237 secs.
consult   1st lookup  .......................... 0.802 secs.
asrt      1st lookup  .......................... 0.846 secs.

nb        2nd lookup  .......................... 0.099 secs.
trie      2nd lookup  .......................... 0.162 secs.
rec       2nd lookup  .......................... 0.239 secs.
consult   2nd lookup  .......................... 0.269 secs.
asrt      2nd lookup  .......................... 0.293 secs.

nb        insert      .......................... 0.169 secs.
trie      insert      .......................... 0.257 secs.
rec       insert      .......................... 0.411 secs.
asrt      insert      .......................... 0.482 secs.
consult   insert      .......................... 15.458 secs.
true.

which is much better compared to the plain cmake -G ninja:

          Database raw and crude micro benchmark           
                     1,000,000 entries                     
Database  Operation                              Wall time
--------  ---------                              ---------
nb        1st lookup  .......................... 0.134 secs.
trie      1st lookup  .......................... 0.210 secs.
rec       1st lookup  .......................... 0.275 secs.
consult   1st lookup  .......................... 0.834 secs.
asrt      1st lookup  .......................... 0.868 secs.

nb        2nd lookup  .......................... 0.137 secs.
trie      2nd lookup  .......................... 0.214 secs.
rec       2nd lookup  .......................... 0.277 secs.
consult   2nd lookup  .......................... 0.301 secs.
asrt      2nd lookup  .......................... 0.330 secs.

nb        insert      .......................... 0.252 secs.
trie      insert      .......................... 0.302 secs.
rec       insert      .......................... 0.425 secs.
asrt      insert      .......................... 0.495 secs.
consult   insert      .......................... 19.608 secs.
true.
3 Likes

Thanks! Let us have a little look.

  • DCMAKE_BUILD_TYPE=Release makes of course sense. The default is release with debug info, which should be as fast but provides better stack traces in case of a crash (and uses more disk space).
  • fPIC is double: that is used unless the user explicitly asks not to put the core in a shared object.
  • ffile-prefix-map is just for debugging purposes (getting the file names correct)
  • Bit weird that LIBEDIT needs to be specified as this is just a default location. Anyway, irrelevant for performance.
  • Ninja makes the build faster :slight_smile:

So far, this all seems archlinux policies that have no effect.

  • pgo-compile.sh does have a large effect. The compilation of the core takes twice as long and every tiny change requires redoing the whole thing (twice) rather than only the changed file. More below.
  • -march=native and -O3 may well help. Can you test them separately?
  • -pipe has effect on the compilation speed, not the result.
  • -fstack-protector-strong makes things more secure and slower.
  • -fno-plt seems to do something with the calling conventions in position independent code. I didn’t find a description I could fully understand. May be important.

So, for the default build we could consider -march=native, -O3 and -fno-plt. We could also consider PGO optimization, but that would require proper integration in the CMake build. I tried and failed, ending up with this ugly script. If some CMake expert wants to give it a try, please contribute.

My suspicion is that most of the speedup comes from PGO (Profile Guided Optimization). This is also because there is a rough constant 30ms time difference between most tests. Only the consult test also has a significant speedup. Now, PGO depends on the coverage in the benchmark suite and this is really poor, involving no database benchmarks AFAIK. So, overall Prolog gets a bit better, reducing the lop overhead and consult which does quite a bit in Prolog.

If people want to contribute to performance, please send contributions for the benchmark suite (bench subdir of the sources). There are several issues:

  • Quite a few of the programs have no license, which causes Debian and some more to remove them from the sources and thus make PGO compiling impossible. You can help by replacing or chasing the authors to get permission to put some license on it (any OS license will do as running them does not affect the SWI-Prolog license conditions)
  • Improve the coverage. Constraints, database operations, tabling, I/O, etc. Note that the benchmarks must be single threaded.
  • Fix the above mentioned CMake integration.
1 Like

-march=native will not work for you, but it is useful for individual setups, because it may emit instructions particular to the host CPU.

So here is a test that may work for a general release:
With:

CFLAGS="-march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt"

I get:

          Database raw and crude micro benchmark           
                     1,000,000 entries                     
Database  Operation                              Wall time
--------  ---------                              ---------
nb        1st lookup  .......................... 0.098 secs.
trie      1st lookup  .......................... 0.161 secs.
rec       1st lookup  .......................... 0.246 secs.
consult   1st lookup  .......................... 0.801 secs.
asrt      1st lookup  .......................... 0.847 secs.

nb        2nd lookup  .......................... 0.098 secs.
trie      2nd lookup  .......................... 0.170 secs.
rec       2nd lookup  .......................... 0.244 secs.
consult   2nd lookup  .......................... 0.281 secs.
asrt      2nd lookup  .......................... 0.297 secs.

nb        insert      .......................... 0.167 secs.
trie      insert      .......................... 0.263 secs.
rec       insert      .......................... 0.427 secs.
asrt      insert      .......................... 0.508 secs.
consult   insert      .......................... 15.273 secs.

So -march=native doesn’t make much of a difference for this microbenchmark, and the above CFLAGS allow you to release for any x64 CPU.

I will try to repackage the microbenchmark and submit a patch for the bench directory.

Here is a patch that adds the DB micro benchmarks to the ./bench directory:

diff --git a/bench/db_bench.pl b/bench/db_bench.pl
new file mode 100644
index 0000000..6f7007b
--- /dev/null
+++ b/bench/db_bench.pl
@@ -0,0 +1,225 @@
+/*
+     Permission is hereby granted, free  of charge, to any person obtaining
+     a  copy  of this  software  and  associated documentation  files  (the
+     "Software"), to  deal in  the Software without  restriction, including
+     without limitation  the rights to  use, copy, modify,  merge, publish,
+     distribute, sublicense,  and/or sell  copies of  the Software,  and to
+     permit persons to whom the Software  is furnished to do so, subject to
+     the following conditions:
+
+     The  above  copyright  notice  and this  permission  notice  shall  be
+     included in all copies or substantial portions of the Software.
+
+     THE  SOFTWARE IS  PROVIDED  "AS  IS", WITHOUT  WARRANTY  OF ANY  KIND,
+     EXPRESS OR  IMPLIED, INCLUDING  BUT NOT LIMITED  TO THE  WARRANTIES OF
+     MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+     IN NO EVENT  SHALL THE AUTHORS OR COPYRIGHT HOLDERS  BE LIABLE FOR ANY
+     CLAIM, DAMAGES OR  OTHER LIABILITY, WHETHER IN AN  ACTION OF CONTRACT,
+     TORT OR  OTHERWISE, ARISING  FROM, OUT  OF OR  IN CONNECTION  WITH THE
+     SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+*/
+
+:- dynamic keydb/2,
+           trie/1,
+          consult_db/2,
+          consult_file/1,
+          db_bench/2.
+
+:- use_module(library(assoc)).
+:- use_module(library(apply_macros)).
+
+go :-
+   writeln('Database raw and crude micro benchmark'),
+   %N = 1 000 000,
+   N = 300 000,
+   format('   -> Testing ~:d entries~n',[N]),
+   retractall(db_bench(_,_)),
+   test_db(rec,N),
+   test_db(nb,N),
+   test_db(asrt,N),
+   test_db(consult,N),
+   test_db(trie,N),
+   print_header(N),
+   print_results.
+
+
+test_db(Name,N) :-
+   atomic_list_concat([Name,'_',gen],Gen),
+   atomic_list_concat([Name,'_',lookup],Lookup),
+   atomic_list_concat([Name,'_',cleanup],Cleanup),
+
+   setup_call_cleanup(
+      (  writeln(Gen),
+         benchmark_goal( [Name,insert], call(Gen,N))
+      ),
+      (  writeln(Lookup),
+         benchmark_goal( [Name,'1st lookup'], call(Lookup,N)),
+         write(Lookup), writeln(' (2nd time)'),
+         benchmark_goal( [Name,'2nd lookup'], call(Lookup,N))
+      ),
+      (  catch( call(Cleanup,N),
+                error(existence_error(procedure,_),_),
+               true)
+      )
+   ) .
+
+
+
+
+                         /**********************
+                          *    Db operations   *
+                          **********************/
+rec_gen(N) :-
+   forall(  between(1,N,I),
+            (  recordz(I, I)
+           )
+   ).
+
+rec_lookup(N) :-
+   forall(  between(1,N,I),
+            (  recorded(I, I)
+           )
+   ).
+
+rec_cleanup(N) :-
+   forall(  between(1,N,I),
+            (  recorded(I, I, Ref), erase(Ref)
+           )
+   ).
+
+
+
+nb_gen(N) :-
+   forall(  between(1,N,I),
+            %(  atom_number(A,I), nb_setval(A, I)
+            (  nb_setval('500', I)
+           )
+   ).
+
+nb_lookup(N) :-
+   forall(  between(1,N,_),
+            (  nb_getval('500', _)
+           )
+   ).
+
+nb_cleanup(_) :-
+   nb_delete('500').
+
+
+
+asrt_gen(N) :-
+   %retractall(keydb(_,_)),
+   forall(  between(1,N,I),
+            (  assertz(keydb(I,I))
+           )
+   ).
+
+asrt_lookup(N) :-
+   forall(  between(1,N,I),
+            (  keydb(I,I)
+           )
+   ).
+
+asrt_cleanup(_) :-
+   retractall(keydb(_,_)).
+
+
+
+trie_gen(N) :-
+   trie_new(Trie),
+   assertz(trie(Trie)),
+   forall(  between(1,N,I),
+            (  trie_insert(Trie,I,I)
+           )
+   ).
+
+trie_lookup(N) :-
+   trie(Trie),
+   forall(  between(1,N,I),
+            (  trie_lookup(Trie,I,I)
+           )
+   ).
+
+trie_cleanup(_) :-
+   trie(Trie),
+   trie_destroy(Trie),
+   retractall(trie(_)).
+
+
+
+consult_gen(N) :-
+   File = '/tmp/consult_db.db_bench',
+   assertz(consult_file(File)),
+   setup_call_cleanup(
+      open(File,write,Stream),
+      consult_write_terms(Stream,N),
+      close(Stream)
+   ),
+   load_files([File]).
+
+consult_write_terms(Stream,N) :-
+   forall(  between(1,N,I),
+            (  format(Stream,'~w.~n',[consult_db(I,I)])
+           )
+   ).
+
+consult_lookup(N) :-
+   forall(  between(1,N,I),
+            (  consult_db(I,I)
+           )
+   ).
+
+consult_cleanup(_) :-
+   consult_file(F),
+   retractall(consult_file(_)),
+   delete_file(F).
+
+
+
+                         /**********************
+                          *    Nice printout   *
+                          **********************/
+print_header(N) :-
+   nl,
+   format('~t~w~t~59|~n',['Database raw and crude micro benchmark']),
+   format('~t~:d entries~t~59|~n',[N]),
+   format('~w ~t~10|~w ~t~12+~t~48| ~w~n',['Database','Operation','Wall time']),
+   format('~w ~t~10|~w ~t~12+~t~48| ~w',  ['--------','---------','---------']).
+
+print_results :-
+   bagof( b(Db,W),
+          S^(db_bench([Db,Type|_], S), get_dict(wall_time,S,W)),
+         Res),
+   sort(2,'@<',Res,Sorted),
+   nl,
+   maplist( {Type}/[b(Db,Wall)]>>
+            format('~w ~t~10|~w ~t~12+~`.t~48| ~3f secs.~n',[Db,Type,Wall]),
+            Sorted),
+
+   fail.
+
+print_results.
+
+
+
+                            /****************
+                             *    Helpers   *
+                             ****************/
+
+% Measure, print and store wall and cpu time
+% used by Goal.
+benchmark_goal(BenchMark,Goal) :-
+   get_time(OldWall),
+   statistics(cputime, OldCpu),
+   call(Goal),
+   get_time(NewWall),
+   statistics(cputime, NewCpu),
+   UsedWall is NewWall - OldWall,
+   UsedCpu  is NewCpu  - OldCpu,
+   assertz( db_bench(BenchMark, stat{  cpu_time: UsedCpu,
+                                       wall_time: UsedWall })),
+   print_message(information, bench(UsedCpu, UsedWall)).
+
+prolog:message(bench(Cpu, Wall)) -->
+   { Perc is round(Cpu*100/Wall) },
+   [ '~3f CPU in ~3f seconds (~w% CPU)' - [Cpu, Wall, Perc] ].
diff --git a/bench/run.pl b/bench/run.pl
index 05cb99b..5b79892 100644
--- a/bench/run.pl
+++ b/bench/run.pl
@@ -47,7 +47,8 @@
 :- initialization(run(1), main).
 
 run(F) :-
-       run(current_output, F).
+       run(current_output, F),
+        run_db_bench.
 
 run(S, F):-
        compile_programs,
@@ -67,6 +68,10 @@ run(S, F):-
        assert(user:file_search_path(bench, Dir))
    ).
 
+run_db_bench :-
+       load_files([bench(db_bench)],[module(db_bench)]),
+       db_bench:go.
+
 compile_programs :-
        style_check(-singleton),
        forall(program(P, _),

With the patch applied, PGO does even better for database operations:

          Database raw and crude micro benchmark           
                     1,000,000 entries                     
Database  Operation                              Wall time
--------  ---------                              ---------
nb        1st lookup  .......................... 0.087 secs.
trie      1st lookup  .......................... 0.157 secs.
rec       1st lookup  .......................... 0.214 secs.
consult   1st lookup  .......................... 0.789 secs.
asrt      1st lookup  .......................... 0.811 secs.

nb        2nd lookup  .......................... 0.087 secs.
trie      2nd lookup  .......................... 0.166 secs.
rec       2nd lookup  .......................... 0.210 secs.
consult   2nd lookup  .......................... 0.263 secs.
asrt      2nd lookup  .......................... 0.281 secs.

nb        insert      .......................... 0.146 secs.
trie      insert      .......................... 0.275 secs.
rec       insert      .......................... 0.375 secs.
asrt      insert      .......................... 0.402 secs.
consult   insert      .......................... 14.841 secs.

EDIT: Since PGO gives a good improvement, I think it is worth it to add other benchmarks like clp(fd), tabling, perhaps using some of @hakank programs!

3 Likes

Thanks. Could you attach this or make a pull request on github or send to bugs@swi-prolog.org? It is rather nasty to get a long fragment from a post :frowning:

I think discourse only allows image attachments, is there a way I can attach a .gz, zip or text file?

It seems not and I think we should keep that. On a free plan we don’t have unlimited storage and I’m sure people will start attaching big stuff. There are plenty of places to exchange code. Mail certainly suffices here, as does Github.

I don’t like github too much, here is the patch in a gitlab snippet.
EDIT: Use any license you want, I just put the MIT one there because it’s the one that came up first. I don’t mind.