diff options
| author | kr.angelov <kr.angelov@gmail.com> | 2012-01-10 19:36:28 +0000 |
|---|---|---|
| committer | kr.angelov <kr.angelov@gmail.com> | 2012-01-10 19:36:28 +0000 |
| commit | 796dd530eedee8a7ccd605aa956a087c77719ab6 (patch) | |
| tree | ecdb1b4fe984887b18fa5c9ffc2411b2a794fed4 /examples/PennTreebank/PennFormat.hs | |
| parent | 1732254a1b4f9a229e638cd4142604acc9003b70 (diff) | |
the translation script from the Penn format to GF RGL is now in examples/PennTreebank
Diffstat (limited to 'examples/PennTreebank/PennFormat.hs')
| -rw-r--r-- | examples/PennTreebank/PennFormat.hs | 38 |
1 files changed, 38 insertions, 0 deletions
diff --git a/examples/PennTreebank/PennFormat.hs b/examples/PennTreebank/PennFormat.hs new file mode 100644 index 000000000..2aaf0a6b6 --- /dev/null +++ b/examples/PennTreebank/PennFormat.hs @@ -0,0 +1,38 @@ +module PennFormat(parseTreebank, showTree) where + +import Text.PrettyPrint +import Data.Tree +import Data.Char + +parseTreebank :: String -> [Tree String] +parseTreebank [] = [] +parseTreebank (c:cs) + | isSpace c = parseTreebank cs + | c == '(' = let (ts,cs1) = parseTrees cs + in ts ++ parseTreebank cs1 + +parseTrees [] = ([],[]) +parseTrees (c:cs) + | isSpace c = parseTrees cs + | c == ')' = ([],cs) + | c == '(' = let (w, cs1) = parseWord cs + (children,cs2) = parseTrees cs1 + (rest, cs3) = parseTrees cs2 + in (Node (normalize w) children : rest,cs3) + | otherwise = let (w, cs1) = parseWord (c:cs) + (rest, cs2) = parseTrees cs1 + in (Node w [] : rest,cs2) + +normalize tag = + let (tag0,mod) = break (=='-') tag + in if null tag0 + then tag + else tag0 + +parseWord = break (\c -> isSpace c || c == '(' || c == ')') + +printTree (Node w []) = text w +printTree (Node l children) = parens (text l <+> hsep (map printTree children)) + +showTree :: Tree String -> String +showTree = render . printTree |
