How to "One-Hot" cleanly

Greetings.

This may seem trivial, but I’m learning…

I faced a classic one-hot situation and have not resolved yet. Can you help me?

I have lists of two types:

   Elements = [a,b,c,d,e,f,g].
   Marks    = [1,0,0,1,1,0,0].

and I want to relate them in such a way that the Marks list specifies a filter over Elements elements. In other words, I need to “one hot” with them so I get a third list

   Selection = [a,d,e].

At first I figure some kind of maplist/3,4, particularly I studied convlist/3 but it seems convlist has strictly one list to apply the criteria to.

I seem to need some combination of maplist and convlist

Any Ideas? What would be the most clean or elegant way for doing this?

Trying the SWI-Prolog assistent:

The code is

one_hot_filter(Elements, Marks, Selection) :-
    pairs_keys_values(Pairs, Marks, Elements),
    convlist(marked_element, Pairs, Selection).

marked_element(1-Element, Element).

With a nice explanation. Personally, I’d go for a non-library solution, e.g.,

one_hot_filter([], [], []).
one_hot_filter([H|T0], [1|TM], [H|T]) :-
    one_hot_filter(T0, TM, T).
one_hot_filter([H|T0], [0|TM], T) :-
    one_hot_filter(T0, TM, T).

Fot one thing, it also works to compute the marks :slight_smile:

3 Likes

Another method, for variety (not that it’s any particular improvement):

match1(0, _, T, T).
match1(1, E, [E|T], T).

match1s([], [], []).
match1s([E|Es], [M|Ms], M1s) :-
    match1(M, E, M1s, M1sT),
    match1s(Es, Ms, M1sT).

Result:

?- match1s([a,b,c,d,e,f,g], [1,0,0,1,1,0,0], M1s).
M1s = [a, d, e].

Can be used as a generator (as can one_hot_filter) - evidence:

?- length(Es, 2), match1s(Es, Ms, M1s).
Es = [_, _],
Ms = [0, 0],
M1s = [] ;
Es = [_, _A],
Ms = [0, 1],
M1s = [_A] ;
Es = [_A, _],
Ms = [1, 0],
M1s = [_A] ;
Es = M1s, M1s = [_, _],
Ms = [1, 1].
1 Like

I would take this ‘old-school’ approach

one_hot(Elements,Marks,Selection) :-
  findall(Element, (
    nth1(I,Marks,1),
    nth1(I,Elements,Element)
  ), Selection).

that allows for

?- one_hot([a,b,c,d,e,f,g],[1,0,0,1,1,0,0],Selection).
Selection = [a, d, e].

Doesn’t work in the general case though, so I would deduct a point :grinning_face:

?- length(Es, 2), one_hot(Es, Ms, M1s).
<hangs>

Not only that. It also has complexity O(N^2) and copies the selected elements of the list, which may break variable sharing, constraints as well as wasting space unless the selected elements are atoms or small integers.

findall/3 is rarely a good way to filter. If you insist using libraries, foldl/5 provides a solution. To me, the mental load finding this is higher than solving this without any libraries. I guess you can get used to it though :slight_smile:

:- use_module(library(apply)).

onehot(Elements, Marks, Selection) :-
    foldl(marked, Elements, Marks, Selection, []).

marked(Elem, 1, [Elem|Tail], Tail).
marked(_,    0, Elems, Elems).
3 Likes

Performance comparison, in order of duration:

?- time(forall(between(1, 10_000_000, _), match1s([a,b,c,d,e,f,g], [1,0,0,1,1,0,0], M1s))).
% 159,999,999 inferences, 6.817 CPU in 6.817 seconds (100% CPU, 23472389 Lips)
true.

% The non-library version, i.e. without convlist
?- time(forall(between(1, 10_000_000, _), one_hot_filter([a,b,c,d,e,f,g], [1,0,0,1,1,0,0], M1s))).
% 89,999,999 inferences, 7.262 CPU in 7.262 seconds (100% CPU, 12393638 Lips)
true.

?- time(forall(between(1, 10_000_000, _), onehot([a,b,c,d,e,f,g], [1,0,0,1,1,0,0], M1s))).
% 179,999,999 inferences, 12.963 CPU in 12.963 seconds (100% CPU, 13885611 Lips)
true.

?- time(forall(between(1, 10_000_000, _), one_hot([a,b,c,d,e,f,g], [1,0,0,1,1,0,0], M1s))).
% 300,000,139 inferences, 24.563 CPU in 24.563 seconds (100% CPU, 12213613 Lips)
true.

match1s seems fastest by a hair, as an example of how the inference count may not correlate to performance.

I thank you all for you ideas…

I am a bit better Prolog programmer now :grin: