summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authoraarne <aarne@cs.chalmers.se>2006-01-07 15:42:13 +0000
committeraarne <aarne@cs.chalmers.se>2006-01-07 15:42:13 +0000
commit00ea4e3dcd0dda6c8353e4134d8ddf106e1d18e7 (patch)
tree8684af90411395398d30b2d6eed0322c8e8c4734
parentd133e0353ca614b36357dadb782aea43de895e09 (diff)
documenting regular patterns
-rw-r--r--doc/gf-history.html58
-rw-r--r--src/GF/Grammar/PatternMatch.hs2
2 files changed, 44 insertions, 16 deletions
diff --git a/doc/gf-history.html b/doc/gf-history.html
index 7b4e1091c..c58d39fa5 100644
--- a/doc/gf-history.html
+++ b/doc/gf-history.html
@@ -14,26 +14,54 @@ Changes in functionality since May 17, 2005, release of GF Version 2.2
<p>
-5/1 (BB) New grammar printers <tt>slf_sub</tt> and <tt>slf_sub_graphviz</tt>
-for creating SLF networks with sub-automata.
-
-<hr>
-
-6/1/2006 (AR) Concatenative string patterns to help morphology definitions.
-The pattern <tt>Predef.CC p1 p2</tt> matches a string literal <tt>s</tt>
-with the first (i.e. shortest-prefix) division <tt>s1 + s2 = s</tt> such that
-<tt>p1</tt> matches <tt>s1</tt> and <tt>p2</tt> matches <tt>s2</tt>. For example,
-the following expression produces the English plural forms
-<i>boy-boys, play-plays, fly-flies, dog-dogs</i>:
+7/1 (AR) Full set of regular expression patterns, with
+as-patterns to enable variable bindings to matched expressions:
+<ul>
+ <li> <i>p</i> <tt>+</tt> <i>q</i> : token consisting of <i>p</i> followed by <i>q</i>
+ <li> <i>p</i> <tt>*</tt> : token <i>p</i> repeated 0 or more times
+ (max the length of the strin to be matched)
+ <li> <tt>-</tt> <i>p</i> : matches anything that <i>p</i> does not match
+ <li> <i>x</i> <tt>@</tt> <i>p</i> : bind to <i>x</i> what <i>p</i> matches
+ <li> <i>p</i> <tt>|</tt> <i>q</i> : matches what either <i>p</i> or <i>q</i> matches
+</ul>
+The last three apply to all types of patterns, the first two only to token strings.
+Example: plural formation in Swedish 2nd declension
+(<i>pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar</i>):
<pre>
- plur : Str -> Str = \s -> case s of {
- CC x (CC ("a" | "o") "y") => s + "s" ;
- CC x "y" => x + "ies" ;
- _ => s + "s"
+ plural2 : Str -> Str = \w -> case w of {
+ pojk + "e" => pojk + "ar" ;
+ nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
+ bil => bil + "ar"
} ;
</pre>
+Semantics: variables are always bound to the <b>first match</b>, in the sequence defined
+as the list <tt>Match p v</tt> as follows:
+<pre>
+ Match (p1|p2) v = Match p1 v ++ Match p2 v
+ Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
+ Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
+ Match c v = [[]] if c == v -- for constant patterns c
+ Match x v = [[(x,v)]] -- for variable patterns x
+ Match x@p v = [[(x,v)]] + M if M = Match p v /= []
+ Match p v = [] otherwise -- failure
+</pre>
+Examples:
+<ul>
+<li> <tt>x + "e" + y</tt> matches <tt>"peter"</tt> with <tt>x = "p", y = "ter"</tt>
+<li> <tt>x@("foo"*)</tt> matches any token with <tt>x = ""</tt>
+<li> <tt>x + y@("er"*)</tt> matches <tt>"burgerer"</tt> with <tt>x = "burg", y = "erer"</tt>
+</ul>
+<p>
+
+6/1 (AR) Concatenative string patterns to help morphology definitions...
This can be seen as a step towards regular expression string patterns.
The natural notation <tt>p1 + p2</tt> will be considered later.
+<b>Note</b>. This was done on 7/1.
+
+<p>
+
+5/1/2006 (BB) New grammar printers <tt>slf_sub</tt> and <tt>slf_sub_graphviz</tt>
+for creating SLF networks with sub-automata.
<hr>
diff --git a/src/GF/Grammar/PatternMatch.hs b/src/GF/Grammar/PatternMatch.hs
index f850981f0..2724bd263 100644
--- a/src/GF/Grammar/PatternMatch.hs
+++ b/src/GF/Grammar/PatternMatch.hs
@@ -105,7 +105,7 @@ tryMatch (p,t) = do
return (concat matches)
(PRep p1, ([],K s, [])) -> checks [
- trym (foldr (const (PSeq p1)) (PString "") [0..n]) t' | n <- [1 .. length s]
+ trym (foldr (const (PSeq p1)) (PString "") [1..n]) t' | n <- [0 .. length s]
]
_ -> prtBad "no match in case expr for" t