Matching an identical subset of two lists

So, I have two lists which may have overlapping sublists. In that case, I want to delete one sublist.

The actual data comes from a binary tree, so if tree format is better, we can pursue that angle instead. But basically I want to do this:

List1: [1, 1, 2, 3, 5, 8, 7, 7, 4]
List2: [3, 5, 5, 8, 12, 15]

match: [5, 8]
(specifically in that order, as a set, not matching each number every time it shows up)

newList1: [1, 1, 2, 3, 7, 7, 4]
List 2: unchanged

I can’t use set operations because I need them in identical order and allowing duplicates. Tree form is hard to think through for me, which is why I am approaching this as a list problem, but as a tree problem, it would be that:

                                    A
                                  /   \
                                 B     E
                                / \     \
                               C   D     F
                              / \
                             H   I
                            /
                           J

Branch J matches Branch F perfectly, thus delete Branch F and keep the rest of the tree unchanged.

In your example, not only is [5,8] a common sublist, but so is [3,5]. And one could also argue that [3], [5], etc. are also common sublists.

So, the specification needs to be made more precise.

Here’s a simple (and inefficient) predicate that looks for sublists (and can be easily modified to remove the common sublist from L1 and L2):

select_sublist(L1, L2, Sublist) :-
    append(_, S1, L1),
    append(_, S2, L2),
    Sublist = [_|_],
    append(Sublist, _, S1),
    append(Sublist, _, S2).

I agree that my list lacks a good way to tell branches apart. I was struggling with that as well, and how to recognize (or specify) the input in such a way that it removes the correct sections. It probably is better to approach this as a tree after all, if for no other reason than the syntax

InputTree = t(L, Root, R)

ensures that we can cut individual layers at their dividing point. So for instance, in the above example, if J matches F, we could return just the root of F as a leaf, and J could continue its branch.

With this format, the specificity issue is solved, but the tree can be very large, so I would need an efficient way to check for equality. On a reddit post I made (regarding a slightly different issue), brebs sent me to this link: performance - Efficient solution for the same-fringe problem for binary trees - Stack Overflow

I didn’t realize it’s a notable problem in programming, but that comforts me some in why it is hard for me to reason through. I’m walking through the answers right now to try and understand it, and then I’ll look into modifying my situation so that this takes the right inputs and gives the right outputs. Still, there’s a lot about this that I don’t fully understand.

I refined the code - it’s now 59 times the speed :grinning:

2 Likes

Does the same fringe problem concern itself with the order of the leaves (1-2-3 vs 1-3-2) or is it just their values?

The order and the values. For the purposes of that stackoverflow question, anyway - perhaps there are variants :grinning:

I ask because, I’m curious what the use case for such a stringent condition is. I could see an order-independent one being useful as a comparison between functions or something along those lines (do two things map to the exact same results for a variety of input consitions), but having order-dependent comparison of leaves seems too restricting to be useful particularly often.

Also, just in terms of this question, I found out that my particular use-case was missing a rule to check the condition of the output. The branch-comparison happens as they are built, but because I was missing one condition, I was getting duplicates.

Lesson learned - while overconstrained systems return no results, underconstrained systems provide bad / excess results. So check that everything is defined the way it should be first.

Here’s the code if anyone wants to see. The absurd amount of parentheses might be offputting, but it helps me think through scoping, and since I started with procedural programming, it serves like the curly braces in an if(X) { do stuff } statement:

/* Increment the input upwards to the limit, which also checks for repeat values and stops that branch if it finds them */
increment(Input, Limit, InList, Tree) :-
	List2 = [Input | InList],
	(Input #> Limit -> Tree = t(0, Input, 0);  
	(
		Tree = t(L, Input, R),
		
		pathUpA(Input, R1), 
		pathUpB(Input, L1),

		(member(R1, List2) -> 
		(
			R #= R1,
			increment(L1, Limit, List2, L)
		);
		(member(L1, List2) -> 
		(
			L #= L1,
			increment(R1, Limit, List2, R)
		);
		(R1 #= 0 ->
		(
			R #= 0,
			increment(L1, Limit, List2, L)
		);
		(
			increment(R1, Limit, List2, R),
			increment(L1, Limit, List2, L)
		))))
	)).