I have a cgi page on the title as I posted earlier. Hybrid Regex to Automaton. Most of below are a copy of part of the page. As regex has been already built-in SWI-Prolog, I have no plan to develop the old page further, but I found the hybrid regex is enough handy to parse and extract information from text for my daily use of DCG-based coding.
- Input: A regular expression possibly with standard boolean operaters, character classes and strings for the regular expression in unix-like syntax.
- Output: An automaton.
- Input-output relation: The output is a minimum automata for the input regular expression among the bisimulation equivalences.
- Features: Boolean operations on the intervals of charaters are suppored
to compute a set of mutual disjoint partition for the character classes
in the input regular expression. - To run the CGI: Put a regular expression into the input box,
and push the button,In stead of writing,
you may select a regular expression from the prepared sample expressions.
I used these hybrid regexes for testing the transformation to automata.
Hybrid Regexes
""
[]
"a"
\ "a"
\ a
\ \ a
"ab"
"."
".*"
".+*"
(.)^3
"a.*b"
a^3 | a^5
rev("abcdefg")
"(abc)*" & rev("(abc)*")
"a*b*c*" & rev("a*b*c*")
"[a-zA-Z]+$"
*(a^3 | a^5)
*(a^3) | *(a^5)
a^4 | a^6
*(a^4) | *(a^6)
*(a^4 | a^6)
*(a^4 | a^9 | a^5)
*(a^4) | *(a^9) | *(a^5)
*char(alnum) + (+char(digit))
*(aaa) | *(aa)
*(aaaa) | *(aa)
*(aaa) | *(aaaaa)
"a|b"
"a|a|a|a"
"[^a]"
"[^ab]"
"a*|b*"
*(a) + b
*(a)+ *(b)
*(a) + *(a) + *(a) + *(a) + *(a) + *(a) + *(a) + *(a) + *(a)
*(a|b) + *(a|b) + *(a|c) + *(a|c) + *(a|d) + *(a|d) + *(a|e) + *(a|e) + *(a|e)
*(*(a))
*char(alnum) + (+char(digit))
*(char(alnum) \ char(digit))
(s|t)+ char(digit)^5 + char(lower)^2
*(a) \ *(a)
*(a) \ []
\("ab")
\("a*b*c*" & rev("a*b*c*"))
(a+ *(b)) + (*(a)+b)
*(a) + *(*(a) + b + *(a) + b + *(a)) + *(a)
*(a) + *(*(a) + b + *(a) + b + *(a))
*(*(a) + b + *(a) + b + *(a)) + *(a)
(.)^3
(.)< 5
*(*(a) + b + *(a) + b + *(a)) & *(*(b) + a + *(b) + a + *(b))
*(0+ *(1 + *(0) + 1)) & *(1+ *(0 + *(1) + 0))
*(a) + b
*(a)+ *(b)
*(a) + *(a)
*(*(a))
"a|b"
"a|a|a|a"
"[^a]"
"[^ab]"
"a*|b*"
"[abcdefg]++++******++++++"
"[3a4b5c98d7exfg]++++******++++++"
*(a) \ *(a)
*(a) \ []
(a+ *(b)) + (*(a)+b)
*(a) + *(*(a) + b + *(a) + b + *(a)) + *(a)
*(a) + *(*(a) + b + *(a) + b + *(a))
*(*(a) + b + *(a) + b + *(a)) + *(a)
*(*(a) + b + *(a) + b + *(a)) + *(b)
*(*(a) + b + *(a) + b + *(a))
*(a+ *(b + *(a) + b))
("/\\*" + *(.) + "\\*/") | (("//" | "%") + ".*" + "\n" )
(("/\\*" + *(\ ("\\*/")) + "\\*/") | (("//" | "%") + "[^\n]*" + "\n"))
The cgi demonstrates a feature of comparing two regular expressions to
test incomparable, subset, or equality as for extension of acceptable language.
Compare Regexes
"" : []
*(*(*(a))) : *(a)
a : b
*(a) : b
*(a) : a
(*(a+b)\(*b)) : *(a)
\(\(*(a))) : *(a)
(("/\\*" + *(.) + "\\*/") | (("//" | "%") + ".*" + "\n" )) : (("/\\*" + *(\ ("\\*/")) + "\\*/") | (("//" | "%") + "[^\n]*" + "\n"))
I am afraid of possible bugs. It would be a pleasure of mine to hear from someone
who enjoys it.