diff options
| author | aarne <aarne@cs.chalmers.se> | 2006-10-30 16:43:13 +0000 |
|---|---|---|
| committer | aarne <aarne@cs.chalmers.se> | 2006-10-30 16:43:13 +0000 |
| commit | 8b753833b5858977138c9a622d187ddeec85135c (patch) | |
| tree | b1b8d3a93472b3c57157bcc5addf40811c7fe7f6 | |
| parent | f240612ef4d33640eae8a04b8fe3e369465799c7 (diff) | |
compilation doc close to final for presentation
| -rw-r--r-- | doc/compiling-gf.txt | 91 |
1 files changed, 62 insertions, 29 deletions
diff --git a/doc/compiling-gf.txt b/doc/compiling-gf.txt index 997b6647a..6f7b60eb8 100644 --- a/doc/compiling-gf.txt +++ b/doc/compiling-gf.txt @@ -28,6 +28,10 @@ The grammar gives a declarative description of these functionalities, on a high abstraction level that improves grammar writing productivity. +For efficiency, the grammar is compiled to lower-level formats. + +Type checking is another essential compilation phase. + #NEW @@ -53,11 +57,41 @@ Pattern matching. Higher-order functions. Dependent types. +``` + flip : (a, b, c : Type) -> (a -> b -> c) -> b -> a -> c = + \_,_,_,f,y,x -> f x y ; +``` + + +#NEW + +==The module system of GF== -Module system reminiscent of ML (signatures, structures, functors). +Main division: abstract syntax and concrete syntax ``` + abstract Greeting = { + cat Greet ; + fun Hello : Greet ; + } + + concrete GreetingEng of Greeting = { + lincat Greet = {s : Str} ; + lin Hello = {s = "hello"} ; + } + concrete GreetingIta of Greeting = { + param Politeness = Familiar | Polite ; + lincat Greet = {s : Politeness => Str} ; + lin Hello = {s = table { + Familiar => "ciao" ; + Polite => "buongiorno" + } ; + } ``` +Other features of the module system: +- extension and opening +- parametrized modules (cf. ML: signatures, structures, functors) + @@ -110,7 +144,7 @@ GF includes a complete Logical Framework). A concrete syntax defines a homomorphism (compositional mapping) from the abstract syntax to a system of tuples of strings. -The homomorphism can as such be used as linearization algorithm. +The homomorphism can as such be used as linearization function. It is a functional program, but a restricted one, since it works in the end on finite data structures only. @@ -136,7 +170,7 @@ The MPCFG grammar can be used for parsing, with (unbounded) polynomial time complexity (w.r.t. the size of the string). For these target formats, we have also built interpreters in -different programming languages (C++, Haskell, Java, Prolog). +different programming languages (C, C++, Haskell, Java, Prolog). Moreover, we generate supplementary formats such as grammars required by various speech recognition systems. @@ -183,9 +217,7 @@ built for sets of modules: GFCC and the parser formats.) When the grammar compiler is called, it has a main module as its argument. It then builds recursively a dependency graph with all the other modules, and decides which ones must be recompiled. -The behaviour is rather similar to GHC, and we don't go into -details (although it would be beneficial to make explicit the -rules that are right now just in the implementation...) +The behaviour is rather similar to GHC. Separate compilation is //extremely important// when developing big grammars, especially when using grammar libraries. Example: compiling @@ -271,7 +303,7 @@ This has helped to make the formats portable. "Almost compositional functions" (``composOp``) are used in many compiler passes, making them easier to write and understand. -A grep on the sources reveals 40 uses (outside the definition +A ``grep`` on the sources reveals 40 uses (outside the definition of ``composOp`` itself). The key algorithmic ideas are @@ -361,8 +393,8 @@ the application inside the table, ``` table {P => f a ; Q = g a} ! x ``` -The transformation is the same as Prawitz's (1965) elimination -of maximal segments: +This transformation is the same as Prawitz's (1965) elimination +of maximal segments in natural deduction: ``` A B C -> D C C -> D C @@ -397,7 +429,7 @@ argument by a variable: ``` table { table { P => t ; ---> x => t[P->x] ; - Q => t x => t[Q->x] + Q => u x => u[Q->x] } } ``` If the resulting branches are all equal, you can replace the table @@ -412,6 +444,24 @@ with the lambda abstract is efficient. #NEW +==Course-of-values tables== + +By maintaining a canonical order of parameters in a type, we can +eliminate the left hand sides of branches. +``` + table { table T [ + P => t ; ---> P ; + Q => u Q + } ] +``` +The treatment is similar to ``Enum`` instances in Haskell. + +Actually, all parameter types could be translated to +initial segments of integers. This is done in the GFCC format. + + +#NEW + ==Common subexpression elimination== Algorithm: @@ -430,23 +480,6 @@ But since all variables (and constants) are unique, they can be computed by simple replacement. -#NEW - -==Course-of-values tables== - -By maintaining a canonical order of parameters in a type, we can -eliminate the left hand sides of branches. -``` - table { table T [ - P => t ; ---> P ; - Q => u Q - } ] -``` -The treatment is similar to ``Enum`` instances in Haskell. - -Actually, all parameter types could be translated to -initial segments of integers. This is done in the GFCC format. - #NEW @@ -467,7 +500,7 @@ Example: the German resource grammar Optimization "all" means parametrization + course-of-values. The source code size is an estimate, since it includes -potentially irrelevant library modules. +potentially irrelevant library modules, and comments. The GFCC format is not reusable in separate compilation. @@ -568,4 +601,4 @@ Compilation-related modules that need rewriting - ``Compile``: clarify the logic of what to do with each module - ``Compute``: make the evaluation more efficient - ``Parsing/*``, ``OldParsing/*``, ``Conversion/*``: reduce the number - of parser formats and algorithms
\ No newline at end of file + of parser formats and algorithms |
