Effective implementation of a set

Currently I’m aware of 2 set implementations in SWI:

  1. library(ordset), which is quite inefficient for my current use case (adding and removing items in a tight loop with a maximum size of a million terms)
  2. library(nb_sets), which seems not to have an option to remove an element from the set.

So are there any implementations of sets available that I have not yet found that are fully functional (in the sense that I can do all common set operations on it, adding, removing elements as well as creating unions, intersections and differences) while still beeing effective?

I do not care if they are “mutating” a value nb_sets does or if they do it similar to ordsets with a relationship.

library(lists): List Manipulation

has

list_to_set/2
is_set/1
intersection/3
union/3
subset/2
subtract/3

You have to decide if it meets your criteria.

As those listbased sets do not even need to be ordered, its probably even less efficient than library(ordset). Most of the set manipulation predicates even link to ordset equivalents.

There are many alternatives. What is best depends mostly on what you need to do (frequently) with the sets and what is in the sets. Two options you misses are library(assoc) (or library(rbtrees)), both providing balanced tree based sets. The advantage is that these perform O(log(N)) insert/delete both in terms of time and space. They do not provide the usual intersection, union, etc. operations though, but they can be mapped efficiently to/from ordsets and are thus interesting if you need to add/delete lots of elements.

If your sets are (small) integers you can use integers as bitmaps and express the set operations as simple bit operations. Although bit operations, adding an element is still O(N) as it creates a copy of the bitmap.

Some of the issues are limitations of (pure) Prolog which states that the only thing you can change to a term is binding variables. Using the non-backtrackable primitives you can create data structures with destructive assignment, but this can hardly be called Prolog programming anymore.

Could you a bit clearer on what you are after?

My “use case” currently is just solving Advent of Code 2015 Day 6 (part 1).

It provides a list of instructions to switch on, off or toggle lights on a 1000 by 1000 grid.

The most efficient solutions I’ve seen so far across languages use some implementation of a set to keep track of a lights state and use the sets size at the end as the solution to the problem.

Part 2 is a little bit different and will require some kind of key-value store, as the instructions then will be re-interpreted to increase/decrease a lights brightness.

Currently I use a set of coords (X-Y), though I could “hash” them into X * 1000 + Y. I don’t think though that it will help me to solve the O(n) problem of ordsets.

I’ll take a look though at assoc and rbtree this evening.

Thank you!

Using an integer bitmap and the X1000Y encoding might b e worth trying :slight_smile:

Yeah, it might be worth a try.

I tried this out, here is a short write-up.

TL;DR: When very large integers are involved, set_prolog_flag(optimise, true) does not seem to help (on my machine). Also, choosing the right algorithm does make a difference.


The original problem statement can be found here:

https://adventofcode.com/2015/day/6

The essence of the problem is to apply a couple of hundred (300 exactly) instructions (set/reset/flip) to a 1_000_000 element bitset. Every instruction is defined as a rectangle in a 1000x1000 bitmap. The question at the end is “how many bits in the bitmap are on”.

At the bottom of this post you can find the full code listings and the running times I measured. The interesting bits:

  • Yes, it is perfectly fine to make an integer with 1_000_000 bits and set/reset/flip those.
  • In my original solution, most of the time (95% of about a second) went into generating the “rectangular bitmasks”. The time is spend inside is/2!
  • Using set_prolog_flag(optimise, true) increases the runtime slightly.

The original code for preparing the rectangular bitmasks looked like this:

bitmask(X0, Y0, X1, Y1, Mask) :-
    Row is (1 << (X1 - X0 + 1)) - 1,
    N_cols0 is Y1 - Y0,
    rectangle(N_cols0, Row, 1000, Mask0),
    Mask is Mask0 << (X0 + 1000 * Y0).

rectangle(N_cols, Row, Shift, Mask) :-
    (   succ(N_cols0, N_cols)
    ->  rectangle(N_cols0, Row, Shift, Mask0),
        Mask is (Mask0 << Shift) + Row
    ;   Mask = Row
    ).

This is where the program was spending 95% of its time. The only idea I could come up to improve this was to instead use the “power of associative operation” algorithm for creating the bitmask; the idea is outlined on pages 31-40 of this PDF; obviously, this idea is ancient (like, ancient Egypt) but I think that this is where I originally stole the code from.

Long story short, as expected, this decreases the running time by a factor of 10 (from 11 seconds down to 0.8 seconds for 10 runs), and is/2 no longer dominates when I profile. The power we are raising to is anything between 1 and 1000 (for my input, mean is 266 and median is 207).

If there is interest, I can wrap up inside a module the “raising to power with associative operation” code I have and post it here.


Code listings

Original solution

:- module(d06, [run/2]).

run(Filename, N) :-
    phrase_from_file(light_instr(0, Lights), Filename),
    N is popcount(Lights).

light_instr(L0, L) -->
    instr(A, X0, Y0, X1, Y1), !,
    { do_instr(A, X0, Y0, X1, Y1, L0, L1) },
    light_instr(L1, L).
light_instr(L, L) --> [].

do_instr(A, X0, Y0, X1, Y1, L0, L1) :-
    bitmask(X0, Y0, X1, Y1, Mask),
    apply_mask(A, Mask, L0, L1).

bitmask(X0, Y0, X1, Y1, Mask) :-
    Row is (1 << (X1 - X0 + 1)) - 1,
    N_cols0 is Y1 - Y0,
    rectangle(N_cols0, Row, 1000, Mask0),
    Mask is Mask0 << (X0 + 1000 * Y0).

rectangle(N_cols, Row, Shift, Mask) :-
    (   succ(N_cols0, N_cols)
    ->  rectangle(N_cols0, Row, Shift, Mask0),
        Mask is (Mask0 << Shift) + Row
    ;   Mask = Row
    ).

apply_mask(set, Mask, L0, L1) :-
    L1 is L0 \/ Mask.
apply_mask(reset, Mask, L0, L1) :-
    L1 is L0 /\ ((1 << 1_000_000) - Mask - 1).
apply_mask(flip, Mask, L0, L1) :-
    L1 is xor(L0, Mask).

:- use_module(library(dcg/basics)).

instr(A, X0, Y0, X1, Y1) -->
    action(A), " ",
    integer(X0), ",", integer(Y0),
    " through ",
    integer(X1), ",", integer(Y1),
    "\n".

action(set) --> "turn on".
action(reset) --> "turn off".
action(flip) --> "toggle".

Running this on my computer, with my input (you can download it from the “Advent of Code” website I linked above):

?- time( forall(between(1, 10, _), run('d06.input', N)) ).
% 2,605,420 inferences, 11.797 CPU in 11.879 seconds (99% CPU, 220851 Lips)
true.

“Power-semigroup” solution

:- module(d06, [run/2]).

:- use_module(library(dcg/basics), [integer//1]).

run(Filename, N) :-
    phrase_from_file(light_instr(0, Lights), Filename),
    N is popcount(Lights).

light_instr(L0, L) -->
    instr(A, X0, Y0, X1, Y1), !,
    { format(user_error, "~d,", [Y1 - Y0 + 1]) },
    { do_instr(A, X0, Y0, X1, Y1, L0, L1) },
    light_instr(L1, L).
light_instr(L, L) --> [].

do_instr(A, X0, Y0, X1, Y1, L0, L1) :-
    Row_mask is ((1 << (X1 - X0 + 1)) - 1) << X0,
    N is Y1 - Y0 + 1,
    power_semigroup(m(Row_mask, 1000), N, append_rows, m(Mask0, _)),
    Mask is Mask0 << (1000 * Y0),
    apply_mask(A, Mask, L0, L1).

append_rows(m(M_a, N_a), m(M_b, N_b), m(M, N)) :-
    M is (M_a << N_b) + M_b,
    N is N_a + N_b.

apply_mask(set, Mask, L0, L1) :-
    L1 is L0 \/ Mask.
apply_mask(reset, Mask, L0, L1) :-
    L1 is L0 /\ ((1 << 1_000_000) - Mask - 1).
apply_mask(flip, Mask, L0, L1) :-
    L1 is xor(L0, Mask).

instr(A, X0, Y0, X1, Y1) -->
    action(A), " ",
    integer(X0), ",", integer(Y0),
    " through ",
    integer(X1), ",", integer(Y1),
    "\n".

action(set) --> "turn on".
action(reset) --> "turn off".
action(flip) --> "toggle".

/* raise to power */
goal_expansion(odd(N),        N /\ 0x1 =:= 1).
goal_expansion(half(N, Half), Half is N >> 1).

power_accumulate_semigroup(R, A, N, Op, P) :-
    (   N == 0
    ->  P = R
    ;   power_accumulate_semigroup_1(R, A, N, Op, P)
    ).

power_accumulate_semigroup_1(R, A, N, Op, P) :-
    (   odd(N)
    ->  power_accumulate_semigroup_oddn(R, A, N, Op, P)
    ;   power_accumulate_semigroup_anyn(R, A, N, Op, P)
    ).

power_accumulate_semigroup_oddn(R, A, N, Op, P) :-
    call(Op, R, A, R1),
    (   N == 1
    ->  P = R1
    ;   power_accumulate_semigroup_anyn(R1, A, N, Op, P)
    ).

power_accumulate_semigroup_anyn(R, A, N, Op, P) :-
    half(N, N1),
    call(Op, A, A, A1),
    power_accumulate_semigroup_1(R, A1, N1, Op, P).

power_semigroup(A, N, Op, P) :-
    must_be(positive_integer, N),
    power_semigroup_(A, N, Op, P).

power_semigroup_(A, N, Op, P) :-
    power_semigroup_evenn(A, N, Op, A0, N0),
    (   N0 == 1
    ->  P = A0
    ;   call(Op, A0, A0, A1),
        half(N0 - 1, N1),
        power_accumulate_semigroup(A0, A1, N1, Op, P)
    ).

power_semigroup_evenn(A, N, Op, A1, N1) :-
    (   odd(N)
    ->  A = A1,
        N = N1
    ;   call(Op, A, A, A0),
        half(N, N0),
        power_semigroup_evenn(A0, N0, Op, A1, N1)
    ).

Time:

?- time( forall(between(1, 10, _), run('d06.input', N)) ).
% 414,149 inferences, 0.740 CPU in 0.756 seconds (98% CPU, 559307 Lips)
true.
2 Likes

Thanks for sharing.

The low Lips number shows that the math is pretty expensive. I’m a bit surprised optimized arithmetic doesn’t help here. Might experiment a bit to find out why. In some other project we cheated by using the foreign language interface to deal with a bitmap as a string and use destructive assignment on it. Ugly, but effective.

Since I do not understand in enough detail what optimise does, I tried out many things. One funny observation is that in the original “rectangle” code:

rectangle(N_cols, Row, Shift, Mask) :-
    (   succ(N_cols0, N_cols)
    ->  rectangle(N_cols0, Row, Shift, Mask0),
        Mask is (Mask0 << Shift) + Row
    ;   Mask = Row
    ).

… changing the is to a = (so, kinda “inlining” the arithmetic) decreases run time to a bit more than half. I still got slightly longer run times with optimise set to true.

You mean that instead of executing the arithmetic step by step you assemble a huge expression and evaluate it at the end? Yes, that helps because an important part of the costs is in moving the large numbers from GMP to the Prolog stacks (back is quite cheap). By assembling a huge expression you do this only once.

When I was doing the timings, preparing the expression decreased the total running time only for this very tight loop. I tried progressively replacing is/2 with =/2, until I was left with a single evaluation of is/2, the one in the run/2 predicate in my code:

N is popcount(Lights)

All versions of the code had higher total running time.

Again, using “power” is the only change that had a significant positive effect on the efficiency.