summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/compiling-gf.txt143
-rw-r--r--doc/gf-compiler.dot3
-rw-r--r--doc/gf-compiler.pngbin31740 -> 27451 bytes
-rw-r--r--examples/gfcc/compiler/gfcc2
4 files changed, 147 insertions, 1 deletions
diff --git a/doc/compiling-gf.txt b/doc/compiling-gf.txt
new file mode 100644
index 000000000..edc061a5f
--- /dev/null
+++ b/doc/compiling-gf.txt
@@ -0,0 +1,143 @@
+Compiling GF
+Aarne Ranta
+
+==The compilation task==
+
+GF is a grammar formalism, i.e. a special purpose programming language
+for writing grammars.
+
+Cf: BNF, YACC, Happy (grammars for programming languages);
+PATR, HPSG, LFG (grammars for natural languages).
+
+The grammar compiler prepares a GF grammar for two computational tasks:
+- linearization: take syntax trees to strings
+- parsing: take strings to syntax trees
+
+
+The grammar gives a declarative description of these functionalities,
+preferably on a high abstraction level enhancing grammar writing
+productivity.
+
+
+==Characteristics of GF language==
+
+Functional language with types, both built-in and user-defined.
+
+Pattern matching and higher-order functions.
+
+Module system reminiscent of ML (signatures, structures, functors).
+
+
+==GF vs. Haskell==
+
+Some things that (standard) Haskell hasn't:
+- records and record subtyping
+- regular expression patterns
+- dependent types
+- ML-style modules
+
+
+Some things that GF hasn't:
+- infinite (recursive) data types
+- recursive functions
+- classes, polymorphism
+
+
+==GF vs. most linguistic grammar formalisms==
+
+GF separates abstract syntax from concrete syntax
+
+GF has a module system with separate compilation
+
+GF is generation-oriented (as opposed to parsing)
+
+GF has unidirectional matching (as opposed to unification)
+
+GF has a static type system (as opposed to a type-free universe)
+
+"I was - and I still am - firmly convinced that a program composed
+out of statically type-checked parts is more likely to faithfully
+express a well-thought-out design than a program relying on
+weakly-typed interfaces or dynamically-checked interfaces."
+(B. Stroustrup, 1994, p. 107)
+
+
+==The computation model==
+
+An abstract syntax defines a free algebra of trees (using
+dependent types, recursion, higher-order abstract syntax: GF has a
+complete Logical Framework).
+
+A concrete syntax defines a homomorphism (compositional mapping)
+from the abstract syntax to a system of tuples of strings.
+
+The homomorphism can as such be used as linearization algorithm.
+
+The parsing problem can be reduced to that of MPCFG (Multiple
+Parallel Context Free Grammars), see P. Ljunglöf's thesis (2004).
+
+
+==The compilation task, again==
+
+1. From a GF source grammar, derive a canonical GF grammar
+(a much simpler format)
+
+2. From the canonical GF grammar derive an MPCFG grammar
+
+The canonical GF grammar can be used for linearization, with
+linear time complexity (w.r.t. the size of the tree).
+
+The MPCFG grammar can be used for parsing, with (unbounded)
+polynomial time complexity (w.r.t. the size of the string).
+
+For these target formats, we have also built interpreters in
+different programming languages (C++, Haskell, Java, Prolog).
+
+Moreover, we generate supplementary formats such as grammars
+required by various speech recognition systems.
+
+
+==An overview of compilation phases==
+
+Legend:
+- ellipse node: representation saved in a file
+- plain text node: internal representation
+- solid arrow or ellipse: essential phare or format
+- dashed arrow or ellipse: optional phase or format
+- arrow label: the module implementing the phase
+
+
+[gf-compiler.png]
+
+
+==Using the compiler==
+
+Batch mode (cf. GHC)
+
+Interactive mode, building the grammar incrementally from
+different files, with the possibility of testing them
+(cf. GHCI)
+
+The interactive mode was first, built on the model of ALF-2
+(L. Magnusson), and there was no file output of compiled
+grammars.
+
+
+==Modules and separate compilation==
+
+The above diagram shows essentially what happens to each module.
+(But not quite, since some of the back-end formats must be
+built for sets of modules.)
+
+When the grammar compiler is called, it has a main module as its
+argument. It then builds recursively a dependency graph with all
+the other modules, and decides which ones must be recompiled.
+The behaviour is rather similar to GHC, and we don't go into
+details (although it would be beneficial to spell out the
+rules that are right now just in the implementation...)
+
+Separate compilation is //extremely important// when developing
+big grammars, especially when using grammar libraries. Compiling
+the GF resource grammar library takes 5 minutes, whereas reading
+in the compiled image takes 10 seconds.
+
diff --git a/doc/gf-compiler.dot b/doc/gf-compiler.dot
index 69916c695..f8ce1aaae 100644
--- a/doc/gf-compiler.dot
+++ b/doc/gf-compiler.dot
@@ -43,6 +43,9 @@ gfc -> gf11 [label = " PrintGFC", style = "solid"];
gf11 [label = "file.gfc", style = "solid", shape = "ellipse"];
+ gfcc [label = "file.gfcc", style = "solid", shape = "ellipse"];
+ gfc -> gfcc [label = " CanonToGFCC", style = "solid"];
+
mcfg [label = "file.gfcm", style = "dashed", shape = "ellipse"];
gfc -> mcfg [label = " PrintGFC", style = "dashed"];
diff --git a/doc/gf-compiler.png b/doc/gf-compiler.png
index 64b68b962..6949c37b5 100644
--- a/doc/gf-compiler.png
+++ b/doc/gf-compiler.png
Binary files differ
diff --git a/examples/gfcc/compiler/gfcc b/examples/gfcc/compiler/gfcc
index 9750b8133..5be0e4094 100644
--- a/examples/gfcc/compiler/gfcc
+++ b/examples/gfcc/compiler/gfcc
@@ -1,4 +1,4 @@
./TestImperC $1 | tail -1 >gft.tmp
echo "es -file=typecheck.gfs" | gf -s Imper.gfcm
-runhugs CleanJVM jvm.tmp $1
+runghc CleanJVM jvm.tmp $1
rm *.tmp