Remove all substrings in a string

Hello.

?- teto("abab", "cc","ababvvvababeeeeababtttttabab", R).

I get the output: But only thanks to write(Q).

[c,c,v,v,v,a,b,a,b,e,e,e,e,a,b,a,b,t,t,t,t,t,a,b,a,b]
[c,c,v,v,v,c,c,e,e,e,e,a,b,a,b,t,t,t,t,t,a,b,a,b]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,a,b,a,b]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,c,c]
[c,c,v,v,v,c,c,e,e,e,e,a,b,a,b,t,t,t,t,t,c,c]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,c,c]
[c,c,v,v,v,a,b,a,b,e,e,e,e,c,c,t,t,t,t,t,a,b,a,b]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,a,b,a,b]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,c,c]
[c,c,v,v,v,a,b,a,b,e,e,e,e,c,c,t,t,t,t,t,c,c]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,c,c]
[c,c,v,v,v,a,b,a,b,e,e,e,e,a,b,a,b,t,t,t,t,t,c,c]
[c,c,v,v,v,c,c,e,e,e,e,a,b,a,b,t,t,t,t,t,c,c]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,c,c]
[c,c,v,v,v,a,b,a,b,e,e,e,e,c,c,t,t,t,t,t,c,c]
[c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,c,c]

But I only want the following output: R = [c,c,v,v,v,c,c,e,e,e,e,c,c,t,t,t,t,t,c,c]
I tried it without write. I get only false. And I tried append and other tricks. But it returns always false.

% here I start the program.

With count_substring I know how many times the substring is in the string. So I know how often the loop has to run to replace all substring entries.

teto(Ag, B,Z, R) :-
   count_substring(Z,Ag,X),
   stringToList(Ag,U),
   stringToList(B,E),
   stringToList(Z,J),
   loop(U,E,J, R, X ).

% just base cases
loop(Ag, B,Z, Xs, 0) :- not(replacement(Ag, B, Z, Xs)), !.
loop(Ag, B,Z, Xs, 0) :- replacement(Ag, B, Z, Xs), !.

% loops till N is 0, replaces every substring "abab" with  "cc" in the string: 
"ababvvvababeeeeababtttttabab"

loop(Ag, B,Z, W, N) :- 
   N>0,
   S is N-1,
   loop(Ag, B,  Z,Q, S),
   write(Q), nl,
   replacement(Ag, B, Q, W).

% count the number of substring in the string
count_substring(String, Sub, Total) :-
    count_substring(String, Sub, 0, Total).
 
count_substring(String, Sub, Count, Total) :-
    ( substring_rest(String, Sub, Rest)
    ->
        succ(Count, NextCount),
        count_substring(Rest, Sub, NextCount, Total)
    ;
        Total = Count
    ).
 
substring_rest(String, Sub, Rest) :-
    sub_string(String, Before, Length, Remain, Sub),
    DropN is Before + Length,
    sub_string(String, DropN, Remain, 0, Rest).

% convert strint to list and flatten it ,, because replacement accepts only lists
stringToList(L,Y) :- 
   atom_codes(A, L),
   atom_chars(A, K),
  flatten(K, Y).

% replace in a list [a,b,a,b,v,v,v,a,b,a,b] a list entry [a,b,a,b] with another list [c,c]
replacement(A, B,  Ag, Bg) :-
   phrase((seq(S1),seq(A),seq(S2)), Ag),
   phrase((seq(S1),seq(B),seq(S2)), Bg).

seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).

Thank you.

Seems DCGs would be good here, how about this?

list([])     --> [].
list([L|Ls]) --> [L], list(Ls).

substitution(This,That,MyStr,Result) :-
   phrase(substitution(This,That,MyStr),Result,[]).

substitution([], _, MyStr) --> list(MyStr).
substitution(This, _, MyStr) --> { \+ contains(This, MyStr) }, list(MyStr).
substitution(This, That, MyStr) -->
   { concatenation([Before, This, After], MyStr),
     \+ contains(This, Before)
   },
   list(Before),
   list(That),
   substitution(This, That, After).

concatenation(Ls,R) :-
   phrase(concatenation(Ls),R).
concatenation([]) --> [].
concatenation([List|Lists]) -->
        list(List),
        concatenation(Lists).

contains(This, MyStr) :-
   concatenation([_,This,_], MyStr).
?- substitution(`this`,`that`,`athishellothis`,R).
R = `athathellothat` ;
false.

Note: Some of the predicates above from here.

EDIT: Of course you could use re_replace/4, but I figured you wanted prolog

1 Like

Are you sure?

?- substitution(`this`,`that`,`athishellothis`,R).
R = [97, 116, 104, 105, 115, 104, 101, 108, 108|...] ;
R = [97, 116, 104, 105, 115, 104, 101, 108, 108|...] ;

I looked up the ascii code:
athishe…
It did not chance anything.

And I tried.

?- re_replace("[ab]+", "ccc", "bbaabdddaaeeeavvv", R).
R = "cccdddaaeeeavvv".

It can only replace characters at the end or at the beginning…

If you use library(pcre), you can provide options to the patterns like this:

?- re_replace("abab"/g, "cc", "ababvvvababeeeeababtttttabab", Result).
Result = "ccvvvcceeeecctttttcc".

?- re_replace("ABAB"/g, "cc", "ababvvvababeeeeababtttttabab", Result).
Result = "ababvvvababeeeeababtttttabab".

?- re_replace("ABAB"/gi, "cc", "ababvvvababeeeeababtttttabab", Result).
Result = "ccvvvcceeeecctttttcc".

Note that “[ab]+” matches any sequence of “a” or “b”, for example “aaa” or “ba” or anything really. If you need to match a sequence of one or more “ab”, you should use groups:

re_replace("(ab)+"/g, "cc", "ababvvvababeeeeabababababtttttabab", Result).
Result = "ccvvvcceeeecctttttcc".

But are you actually using strings (then library(pcre) should be the better choice) or are you working with lists?

1 Like

Thank you. The code works very well.

Strange. Everything works fine here (notice the 97 in the 4th position):

?- substitution(`this`,`that`,`athishellothis`,R).
R = [97, 116, 104, 97, 116, 104, 101, 108, 108|...] ;
false.

It is suspicious that backtracking is giving you another solution. My guess is there is some setting that is affecting things, something in the initialization file, or some error when copying the code.
However, re_replace/4 is better for you since you were not looking for prolog.

1 Like

Never underestimate append/3:

subst(This, That, MyStr, Result) :-
    append(This, After, Rest),
    append(Before, Rest, MyStr),
    !,
    subst(This, That, After, AfterResult),
    append([Before,That,AfterResult], Result).
subst(_, _, S, S).

To test:

103 ?- portray_text(true).
true.

104 ?- subst(`this`,`that`,`athishellothis`,R).
R = `athathellothat`.

But yes, especially if you have atoms or strings to begin with and portability is not a goal, re_replace/4 is probably a simpler and more efficient choice.

3 Likes

I always forget about this :slight_smile:

1 Like