Hybrid Regular Expressions to Automata

I have a cgi page on the title as I posted earlier. Hybrid Regex to Automaton. Most of below are a copy of part of the page. As regex has been already built-in SWI-Prolog, I have no plan to develop the old page further, but I found the hybrid regex is enough handy to parse and extract information from text for my daily use of DCG-based coding.

  • Input: A regular expression possibly with standard boolean operaters, character classes and strings for the regular expression in unix-like syntax.
  • Output: An automaton.
  • Input-output relation: The output is a minimum automata for the input regular expression among the bisimulation equivalences.
  • Features: Boolean operations on the intervals of charaters are suppored
    to compute a set of mutual disjoint partition for the character classes
    in the input regular expression.
  • To run the CGI: Put a regular expression into the input box,
    and push the button,In stead of writing,
    you may select a regular expression from the prepared sample expressions.

I used these hybrid regexes for testing the transformation to automata.

Hybrid Regexes
 ""
 []
 "a"
 \ "a"
 \ a
 \ \ a
 "ab"
 "."
 ".*"
 ".+*"
 (.)^3
 "a.*b"
 a^3 | a^5
 rev("abcdefg")
 "(abc)*" & rev("(abc)*")
 "a*b*c*" & rev("a*b*c*")
 "[a-zA-Z]+$"
 *(a^3 | a^5)
 *(a^3) | *(a^5)
 a^4 | a^6
 *(a^4) | *(a^6)
 *(a^4 | a^6)
 *(a^4 | a^9 | a^5)
 *(a^4) | *(a^9) | *(a^5)
 *char(alnum) +  (+char(digit))
 *(aaa) |  *(aa)
 *(aaaa) |  *(aa)
 *(aaa) |  *(aaaaa)
 "a|b"
 "a|a|a|a"
 "[^a]"
 "[^ab]"
 "a*|b*"
 *(a) + b
 *(a)+ *(b)
 *(a) + *(a) + *(a) +  *(a) + *(a) + *(a) + *(a) + *(a) + *(a)
 *(a|b) + *(a|b) + *(a|c) + *(a|c) + *(a|d) + *(a|d) + *(a|e) + *(a|e) + *(a|e)
 *(*(a))
 *char(alnum) +  (+char(digit))
 *(char(alnum) \ char(digit))
 (s|t)+ char(digit)^5 + char(lower)^2
 *(a) \ *(a)
 *(a) \ []
 \("ab")
 \("a*b*c*" & rev("a*b*c*"))
 (a+ *(b)) + (*(a)+b)
 *(a) + *(*(a) + b + *(a) + b + *(a)) + *(a)
 *(a) + *(*(a) + b + *(a) + b  + *(a))
 *(*(a) + b +  *(a) +  b  +  *(a)) +  *(a)
(.)^3
(.)< 5
  *(*(a) + b +  *(a) +  b  +  *(a)) &  *(*(b) + a +  *(b) +  a  +  *(b))
 *(0+ *(1 +  *(0) + 1))  &  *(1+ *(0 +  *(1) + 0))
 *(a) + b
 *(a)+ *(b)
 *(a) + *(a)
 *(*(a))
 "a|b"
 "a|a|a|a"
 "[^a]"
 "[^ab]"
 "a*|b*"
 "[abcdefg]++++******++++++"
 "[3a4b5c98d7exfg]++++******++++++"
 *(a) \ *(a)
 *(a) \ []
 (a+ *(b)) + (*(a)+b)
 *(a) + *(*(a) + b + *(a) + b + *(a)) + *(a)
 *(a) + *(*(a) + b + *(a) + b  + *(a))
 *(*(a) + b +  *(a) +  b  +  *(a)) +  *(a)
 *(*(a) + b +  *(a) +  b  +  *(a)) +  *(b)
 *(*(a) + b +  *(a) +  b  +  *(a))
 *(a+ *(b +  *(a) + b))
 ("/\\*" + *(.) + "\\*/") | (("//" | "%") +  ".*" + "\n" )
  (("/\\*" +  *(\ ("\\*/")) + "\\*/") | (("//" | "%") +  "[^\n]*" + "\n"))

The cgi demonstrates a feature of comparing two regular expressions to
test incomparable, subset, or equality as for extension of acceptable language.

Compare Regexes
 ""  :    []
 *(*(*(a))) : *(a)
 a : b
 *(a) : b
 *(a) : a
 (*(a+b)\(*b)) : *(a)
 \(\(*(a))) : *(a)
 (("/\\*" + *(.) + "\\*/") | (("//" | "%") +  ".*" + "\n" )) :   (("/\\*" +  *(\ ("\\*/")) + "\\*/") | (("//" | "%") +  "[^\n]*" + "\n"))

I am afraid of possible bugs. It would be a pleasure of mine to hear from someone
who enjoys it.

1 Like