Matching an identical subset of two lists

konokaiden · September 10, 2024, 4:58pm

So, I have two lists which may have overlapping sublists. In that case, I want to delete one sublist.

The actual data comes from a binary tree, so if tree format is better, we can pursue that angle instead. But basically I want to do this:

List1: [1, 1, 2, 3, 5, 8, 7, 7, 4]
List2: [3, 5, 5, 8, 12, 15]

match: [5, 8]
(specifically in that order, as a set, not matching each number every time it shows up)

newList1: [1, 1, 2, 3, 7, 7, 4]
List 2: unchanged

I can’t use set operations because I need them in identical order and allowing duplicates. Tree form is hard to think through for me, which is why I am approaching this as a list problem, but as a tree problem, it would be that:

                                    A
                                  /   \
                                 B     E
                                / \     \
                               C   D     F
                              / \
                             H   I
                            /
                           J

Branch J matches Branch F perfectly, thus delete Branch F and keep the rest of the tree unchanged.

peter.ludemann · September 10, 2024, 6:06pm

In your example, not only is [5,8] a common sublist, but so is [3,5]. And one could also argue that [3], [5], etc. are also common sublists.

So, the specification needs to be made more precise.

Here’s a simple (and inefficient) predicate that looks for sublists (and can be easily modified to remove the common sublist from L1 and L2):

select_sublist(L1, L2, Sublist) :-
    append(_, S1, L1),
    append(_, S2, L2),
    Sublist = [_|_],
    append(Sublist, _, S1),
    append(Sublist, _, S2).

konokaiden · September 10, 2024, 6:20pm

I agree that my list lacks a good way to tell branches apart. I was struggling with that as well, and how to recognize (or specify) the input in such a way that it removes the correct sections. It probably is better to approach this as a tree after all, if for no other reason than the syntax

InputTree = t(L, Root, R)

ensures that we can cut individual layers at their dividing point. So for instance, in the above example, if J matches F, we could return just the root of F as a leaf, and J could continue its branch.

With this format, the specificity issue is solved, but the tree can be very large, so I would need an efficient way to check for equality. On a reddit post I made (regarding a slightly different issue), brebs sent me to this link: performance - Efficient solution for the same-fringe problem for binary trees - Stack Overflow

I didn’t realize it’s a notable problem in programming, but that comforts me some in why it is hard for me to reason through. I’m walking through the answers right now to try and understand it, and then I’ll look into modifying my situation so that this takes the right inputs and gives the right outputs. Still, there’s a lot about this that I don’t fully understand.

brebs · September 15, 2024, 8:00am

I refined the code - it’s now 59 times the speed

konokaiden · September 17, 2024, 1:19pm

Does the same fringe problem concern itself with the order of the leaves (1-2-3 vs 1-3-2) or is it just their values?

brebs · September 17, 2024, 1:43pm

The order and the values. For the purposes of that stackoverflow question, anyway - perhaps there are variants

konokaiden · September 17, 2024, 1:57pm

I ask because, I’m curious what the use case for such a stringent condition is. I could see an order-independent one being useful as a comparison between functions or something along those lines (do two things map to the exact same results for a variety of input consitions), but having order-dependent comparison of leaves seems too restricting to be useful particularly often.

konokaiden · September 17, 2024, 2:00pm

Also, just in terms of this question, I found out that my particular use-case was missing a rule to check the condition of the output. The branch-comparison happens as they are built, but because I was missing one condition, I was getting duplicates.

Lesson learned - while overconstrained systems return no results, underconstrained systems provide bad / excess results. So check that everything is defined the way it should be first.

Here’s the code if anyone wants to see. The absurd amount of parentheses might be offputting, but it helps me think through scoping, and since I started with procedural programming, it serves like the curly braces in an if(X) { do stuff } statement:

/* Increment the input upwards to the limit, which also checks for repeat values and stops that branch if it finds them */
increment(Input, Limit, InList, Tree) :-
	List2 = [Input | InList],
	(Input #> Limit -> Tree = t(0, Input, 0);  
	(
		Tree = t(L, Input, R),
		
		pathUpA(Input, R1), 
		pathUpB(Input, L1),

		(member(R1, List2) -> 
		(
			R #= R1,
			increment(L1, Limit, List2, L)
		);
		(member(L1, List2) -> 
		(
			L #= L1,
			increment(R1, Limit, List2, R)
		);
		(R1 #= 0 ->
		(
			R #= 0,
			increment(L1, Limit, List2, L)
		);
		(
			increment(R1, Limit, List2, R),
			increment(L1, Limit, List2, L)
		))))
	)).

Topic		Replies	Views
Set Difference of Two Compatible Lists Algorithm	4	1502	May 4, 2021
Compare two lists = if sublist of a list = true Help!	10	8590	March 8, 2019
Program to check if a sublist appears in a list in the same order Help!	5	410	March 9, 2021
Help with creating two lists solved Help! list	4	730	March 25, 2021
Need help with a homework Help!	14	557	November 19, 2022

Matching an identical subset of two lists

Related topics