1 files changed, 0 insertions, 750 deletions
diff --git a/doc/compiling-gf.txt b/doc/compiling-gf.txt
deleted file mode 100644
index 9e438f40f..000000000
--- a/doc/compiling-gf.txt
+++ /dev/null
@@ -1,750 +0,0 @@
-Compiling GF
-Aarne Ranta
-Proglog meeting, 1 November 2006
-
-% to compile: txt2tags -thtml compiling-gf.txt ; htmls compiling-gf.html
-
-%!target:html
-%!postproc(html): #NEW <!-- NEW -->
-
-#NEW
-
-==The compilation task==
-
-GF is a grammar formalism, i.e. a special purpose programming language
-for writing grammars.
-
-Other grammar formalisms: 
-- BNF, YACC, Happy (grammars for programming languages); 
-- PATR, HPSG, LFG (grammars for natural languages).
-
-
-The grammar compiler prepares a GF grammar for two computational tasks:
-- linearization: take syntax trees to strings
-- parsing: take strings to syntax trees
-
-
-The grammar gives a declarative description of these functionalities,
-on a high abstraction level that improves grammar writing
-productivity.
-
-For efficiency, the grammar is compiled to lower-level formats.
-
-Type checking is another essential compilation phase. Its purpose is
-twofold, as usual:
-- checking the correctness of the grammar
-- type-annotating expressions for code generation
-
-
-#NEW
-
-==Characteristics of GF language==
-
-Functional language with types, both built-in and user-defined.
-```
-  Str : Type
-
-  param Number = Sg | Pl
-
-  param AdjForm = ASg Gender | APl
-
-  Noun : Type = {s : Number => Str ; g : Gender}
-```
-Pattern matching.
-```
-  svart_A = table {
-    ASg _ => "svart" ;
-    _ => "svarta"
-    }
-```
-Higher-order functions.
-
-Dependent types.
-```
-  flip : (a, b, c : Type) -> (a -> b -> c) -> b -> a -> c = 
-    \_,_,_,f,y,x -> f x y ;
-```
-
-
-#NEW
-
-==The module system of GF==
-
-Main division: abstract syntax and concrete syntax
-```
-  abstract Greeting = {
-    cat Greet ;
-    fun Hello : Greet ;
-  }
-
-  concrete GreetingEng of Greeting = {
-    lincat Greet = {s : Str} ;
-    lin Hello = {s = "hello"} ;
-  }
-
-  concrete GreetingIta of Greeting = {
-    param Politeness = Familiar | Polite ;
-    lincat Greet = {s : Politeness => Str} ;
-    lin Hello = {s = table {
-      Familiar => "ciao" ;
-      Polite => "buongiorno"
-    } ;
-  }
-```
-Other features of the module system:
-- extension and opening
-- parametrized modules (cf. ML: signatures, structures, functors)
-
-
-
-
-#NEW
-
-==GF vs. Haskell==
-
-Some things that (standard) Haskell hasn't:
-- records and record subtyping
-- regular expression patterns
-- dependent types
-- ML-style modules 
-
-
-Some things that GF hasn't:
-- infinite (recursive) data types
-- recursive functions
-- classes, polymorphism
-
-
-#NEW
-
-==GF vs. most linguistic grammar formalisms==
-
-GF separates abstract syntax from concrete syntax.
-
-GF has a module system with separate compilation.
-
-GF is generation-oriented (as opposed to parsing).
-
-GF has unidirectional matching (as opposed to unification).
-
-GF has a static type system (as opposed to a type-free universe).
-
-"I was - and I still am - firmly convinced that a program composed
-out of statically type-checked parts is more likely to faithfully
-express a well-thought-out design than a program relying on
-weakly-typed interfaces or dynamically-checked interfaces."
-(B. Stroustrup, 1994, p. 107)
-
-
-
-#NEW
-
-==The computation model: abstract syntax==
-
-An abstract syntax defines a free algebra of trees (using
-dependent types, recursion, higher-order abstract syntax: 
-GF includes a complete Logical Framework).
-```
-  cat C (x_1 : A_1)...(x_n : A_n)  
-  a_1 : A_1 
-  ... 
-  a_n : A_n{x_1 : A_1,...,x_n-1 : A_n-1}
-  ----------------------------------------------------
-  (C a_1 ... a_n) : Type 
-
-
-  fun f : (x_1 : A_1) -> ... -> (x_n : A_n) -> A
-  a_1 : A_1 
-  ... 
-  a_n : A_n{x_1 : A_1,...,x_n-1 : A_n-1}
-  ----------------------------------------------------
-  (f a_1 ... a_n) : A{x_1 : A_1,...,x_n : A_n}
-
-
- A : Type  x : A |- B : Type       x : A |- b : B        f : (x : A) -> B   a : A
- ----------------------------   ----------------------   ------------------------
-     (x : A) -> B : Type        \x -> b : (x : A) -> B       f a : B{x := A}
-```
-Notice that all syntax trees are in eta-long form.
-
-
-#NEW
-
-==The computation model: concrete syntax==
-
-A concrete syntax defines a homomorphism (compositional mapping)
-from the abstract syntax to a system of concrete syntax objects.
-```
-  cat C _
-  --------------------
-  lincat C = C* : Type
-
-  fun f : (x_1 : A_1) -> ... -> (x_n : A_n) -> A
-  -----------------------------------------------
-  lin f = f* : A_1* -> ... -> A_n* -> A*
-
-  (f a_1 ... a_n)*  =  f* a_1* ... a_n*
-```
-The homomorphism can as such be used as linearization function.
-
-It is a functional program, but a restricted one, since it works 
-in the end on finite data structures only.
-
-But a more efficient program is obtained via compilation to 
-GFC = Canonical GF: the "machine code" of GF.
-
-The parsing problem of GFC can be reduced to that of MPCFG (Multiple
-Parallel Context Free Grammars), see P. Ljunglöf's thesis (2004).
-
-
-
-#NEW
-
-==The core type system of concrete syntax: basic types==
-
-```
-                   param P      P : PType
-  PType : Type    ---------     ---------
-                  P : PType      P : Type
-
-                                          s : Str  t : Str
-  Str : type     "foo" : Str   [] : Str   ----------------
-                                            s ++ t : Str
-```
-
-
-#NEW
-
-==The core type system of concrete syntax: functions and tables==
-
-```
- A : Type  x : A |- B : Type       x : A |- b : B        f : (x : A) -> B   a : A
- ----------------------------   ----------------------   ------------------------
-     (x : A) -> B : Type        \x -> b : (x : A) -> B       f a : B{x := A}
-
-
-        P : PType   A : Type      t : P => A  p : p
-        --------------------      -----------------
-            P => A : Type            t ! p : A
-
-                   v_1,...,v_n : A
-       ---------------------------------------------- P = {C_1,...,C_n}
-       table {C_1 => v_1 ; ... ; C_n => v_n} : P => A 
-```
-Pattern matching is treated as an abbreviation for tables. Notice that
-```
-  case e of {...}  ==  table {...} ! e
-```
-
-
-#NEW
-
-==The core type system of concrete syntax: records==
-
-```
-              A_1,...,A_n : Type
-     ------------------------------------ n >= 0
-     {r_1 : A_1 ; ... ; r_n : A_n} : Type
-
-
-                 a_1 : A_1  ...  a_n : A_n
-     ------------------------------------------------------------
-     {r_1 = a_1 ; ... ; r_n = a_n} : {r_1 : A_1 ; ... ; r_n : A_n}
-
-
-             r : {r_1 : A_1 ; ... ; r_n : A_n}
-            ----------------------------------- i = 1,...,n
-                          r.r_1 : A_1
-```
-Subtyping: if ``r : R`` then ``r : R ** {r : A}``
-
-
-
-#NEW
-
-==Computation rules==
-
-```
-  (\x -> b) a = b{x := a}
-
-  (table {C_1 => v_1 ; ... ; C_n => v_n} : P => A) ! C_i  =  v_i
-
-  {r_1 = a_1 ; ... ; r_n = a_n}.r_i  =  a_i
-```
-
-
-
-#NEW
-
-==Canonical GF==
-
-Concrete syntax type system:
-```
-                             A_1 : Type ... A_n : Type
-  Str : Type   Int : Type    -------------------------   $i : A
-                               [A_1, ..., A_n] : Type
-
-
-     a_1 : A_1  ...  a_n : A_n         t : [A_1, ..., A_n]
-  ---------------------------------    ------------------- i = 1,..,n
-  [a_1, ..., a_n] : [A_1, ..., A_n]        t ! i : A_i 
-```
-Tuples represent both records and tables.
-
-There are no functions.
-
-Linearization:
-```
-  lin f = f*
-
-  (f a_1 ... a_n)*  =  f*{$1 = a_1*, ..., $n = a_n*} 
-```
-
-
-#NEW
-
-==The compilation task, again==
-
-1. From a GF source grammar, derive a canonical GF grammar.
-
-2. From the canonical GF grammar derive an MPCFG grammar
-
-The canonical GF grammar can be used for linearization, with
-linear time complexity (w.r.t. the size of the tree).
-
-The MPCFG grammar can be used for parsing, with (unbounded)
-polynomial time complexity (w.r.t. the size of the string).
-
-For these target formats, we have also built interpreters in
-different programming languages (C, C++, Haskell, Java, Prolog).
-
-Moreover, we generate supplementary formats such as grammars
-required by various speech recognition systems.
-
-
-#NEW
-
-==An overview of compilation phases==
-
-Legend:
-- ellipse node: representation saved in a file
-- plain text node: internal representation
-- solid arrow or ellipse: essential phare or format
-- dashed arrow or ellipse: optional phase or format
-- arrow label: the module implementing the phase
-
-
-[gf-compiler.png]
-
-
-#NEW
-
-==Using the compiler==
-
-Batch mode (cf. GHC).
-
-Interactive mode, building the grammar incrementally from
-different files, with the possibility of testing them
-(cf. GHCI).
-
-The interactive mode was first, built on the model of ALF-2
-(L. Magnusson), and there was no file output of compiled
-grammars.
-
-
-#NEW
-
-==Modules and separate compilation==
-
-The above diagram shows what happens to each module.
-(But not quite, since some of the back-end formats must be
-built for sets of modules: GFCC and the parser formats.)
-
-When the grammar compiler is called, it has a main module as its
-argument. It then builds recursively a dependency graph with all
-the other modules, and decides which ones must be recompiled.
-The behaviour is rather similar to GHC.
-
-Separate compilation is //extremely important// when developing
-big grammars, especially when using grammar libraries. Example: compiling
-the GF resource grammar library takes 5 minutes, whereas reading
-in the compiled image takes 10 seconds.
-
-
-#NEW
-
-==Module dependencies and recompilation==
-
-(For later use, not for the Proglog talk)
-
-For each module M, there are 3 kinds of files:
-- M.gf, source file
-- M.gfc, compiled file ("object file")
-- M.gfr, type-checked and optimized source file (for resource modules only)
-
-
-The compiler reads gf files and writes gfc files (and gfr files if appropriate)
-
-The Main module is the one used as argument when calling GF.
-
-A module M (immediately) depends on the module K, if either
-- M is a concrete of K
-- M is an instance of K
-- M extends K
-- M opens K
-- M is a completion of K with something
-- M is a completion of some module with K instantiated with something
-
-
-A module M (transitively) depends on the module K, if either
-- M immediately depends on K
-- M depends on some L such that L immediately depends on K
-
-
-Immediate dependence is readable from the module header without parsing
-the whole module.
-
-The compiler reads recursively the headers of all modules that Main depends on.
-
-These modules are arranged in a dependency graph, which is checked to be acyclic.
-
-To decide whether a module M has to be compiled, do:
-+ Get the time stamps t() of M.gf and M.gfc (if a file doesn't exist, its
-  time is minus infinity).
-+ If t(M.gf) > t(M.gfc), M must be compiled.
-+ If M depends on K and K must be compiled, then M must be compiled.
-+ If M depends on K and t(K.gf) > t(M.gfc), then M must be compiled.
-
-
-Decorate the dependency graph by information on whether the gf or the gfc (and gfr)
-format is to be read.
-
-Topologically sort the decorated graph, and read each file in the chosen format.
-
-The gfr file is generated for these module types only:
-- resource
-- instance
-
-
-When reading K.gfc, also K.gfr is read if some M depending on K has to be compiled.
-In other cases, it is enough to read K.gfc.
-
-In an interactive GF session, some modules may be in memory already.
-When read to the memory, each module M is given time stamp t(M.m).
-The additional rule now is:
-- If M.gfc is to be read, and t(M.m) > t(M.gfc), don't read M.gfc.
-
-
-
-
-#NEW
-
-==Techniques used==
-
-The compiler is written in Haskell, with some C foreign function calls
-in the interactive version (readline, killing threads).
-
-BNFC is used for generating both the parsers and printers.
-This has helped to make the formats portable.
-
-"Almost compositional functions" (``composOp``) are used in
-many compiler passes, making them easier to write and understand. 
-A ``grep`` on the sources reveals 40 uses (outside the definition 
-of ``composOp`` itself).
-
-The key algorithmic ideas are
-- type-driven partial evaluation in GF-to-GFC generation
-- common subexpression elimination as back-end optimization
-- some ideas in GFC-to-MCFG encoding
-
-
-#NEW
-
-==Type-driven partial evaluation==
-
-Each abstract syntax category in GF has a corresponding linearization type:
-```
-  cat C
-  lincat C = T
-```
-The general form of a GF rule pair is
-```
-  fun f : C1 -> ... -> Cn -> C
-  lin f = t
-```
-with the typing condition following the ``lincat`` definitions
-```
-  t : T1 -> ... -> Tn -> T
-```
-The term ``t`` is in general built by using abstraction methods such
-as pattern matching, higher-order functions, local definitions,
-and library functions.
-
-The compilation technique proceeds as follows:
-- use eta-expansion on ``t`` to determine the canonical form of the term
-```
-  \ $C1, ...., $Cn -> (t $C1 .... $Cn)
-```
-with unique variables ``$C1 .... $Cn`` for the arguments; repeat this
-inside the term for records and tables
-- evaluate the resulting term using the computation rules of GF
-- what remains is a canonical term with ``$C1 .... $Cn`` the only
-variables (the run-time input of the linearization function)
-
-
-#NEW
-
-==Eta-expanding records and tables==
-
-For records that are valied via subtyping, eta expansion 
-eliminates superfluous fields:
-```
-  {r1 = t1 ; r2 = t2} : {r1 : T1}  ---->  {r1 = t1}
-```
-For tables, the effect is always expansion, since
-pattern matching can be used to represent tables
-compactly:
-```
-  table {n => "fish"} : Number => Str   --->
-
-  table {
-    Sg => "fish" ;
-    Pl => "fish"
-    }
-```
-This can be helped by back-end optimizations (see below).
-
-
-#NEW
-
-==Eliminating functions==
-
-"Everything is finite": parameter types, records, tables;
-finite number of string tokens per grammar.
-
-But "inifinite types" such as function types are useful when
-writing grammars, to enable abstractions.
-
-Since function types do not appear in linearization types,
-we want functions to be eliminated from linearization terms.
-
-This is similar to the **subformula property** in logic.
-Also the main problem is similar: function depending on
-a run-time variable,
-```
-  (table {P => f ; Q = g} ! x) a
-```
-This is not a redex, but we can make it closer to one by moving
-the application inside the table,
-```
-  table {P => f a ; Q = g a} ! x
-```
-This transformation is the same as Prawitz's (1965) elimination
-of maximal segments in natural deduction:
-```
-                                            A           B
-                                          C -> D  C   C -> D  C
-           A       B                      ---------   ---------
-  A v B  C -> D  C -> D            A v B       D          D
-  ---------------------     ===>   -------------------------
-      C -> D             C                    D
-      --------------------
-              D
-```
-
-
-
-#NEW
-
-==Size effects of partial evaluation==
-
-Irrelevant table branches are thrown away, which can reduce the size.
-
-But, since tables are expanded and auxiliary functions are inlined,
-the size can grow exponentially.
-
-How can we keep the first property and eliminate the second?
-
-
-#NEW
-
-==Parametrization of tables==
-
-Algorithm: for each branch in a table, consider replacing the
-argument by a variable:
-```
-  table {             table {
-    P => t ;    --->    x => t[P->x] ;
-    Q => u              x => u[Q->x]
-    }                   }
-```
-If the resulting branches are all equal, you can replace the table
-by a lambda abstract
-```
-  \\x => t[P->x]
-```
-If each created variable ``x`` is unique in the grammar, computation
-with the lambda abstract is efficient.
-
-
-
-#NEW
-
-==Course-of-values tables==
-
-By maintaining a canonical order of parameters in a type, we can
-eliminate the left hand sides of branches.
-```
-  table {              table T [
-    P => t ;    --->     t ;
-    Q => u               u
-    }                    ]
-```
-The treatment is similar to ``Enum`` instances in Haskell.
-
-In the end, all parameter types can be translated to
-initial segments of integers.
-
-
-#NEW
-
-==Common subexpression elimination==
-
-Algorithm: 
-+ Go through all terms and subterms in a module, creating
-  a symbol table mapping terms to the number of occurrences.
-+ For each subterm appearing at least twice, create a fresh
-  constant defined as that subterm.
-+ Go through all rules (incl. rules for the new constants),
-  replacing largest possible subterms with such new constants.
-
-
-This algorithm, in a way, creates the strongest possible abstractions.
-
-In general, the new constants have open terms as definitions.
-But since all variables (and constants) are unique, they can
-be computed by simple replacement.
-
-
-
-#NEW
-
-==Size effects of optimizations==
-
-Example: the German resource grammar
-``LangGer``
-
-|| optimization |  lines |  characters |  size % | blow-up |
-| none       |  5394 |  3208435 |   100 |  25 |
-| all        |  5394 |   750277 |    23 |   6 |
-| none_subs  |  5772 |  1290866 |    40 |  10 |
-| all_subs   |  5644 |   414119 |    13 |   3 |
-| gfcc       |  3279 |   190004 |     6 |   1.5 |
-| gf source  |  3976 |   121939 |     4 |   1 |
-
-
-Optimization "all" means parametrization + course-of-values.
-
-The source code size is an estimate, since it includes
-potentially irrelevant library modules, and comments.
-
-The GFCC format is not reusable in separate compilation.
-
-
-
-#NEW
-
-==The shared prefix optimization==
-
-This is currently performed in GFCC only.
-
-The idea works for languages that have a rich morphology
-based on suffixes. Then we can replace a course of values
-with a pair of a prefix and a suffix set:
-```
-  ["apa", "apan", "apor", "aporna"] ---> 
-  ("ap" + ["a", "an", "or", "orna"])
-```
-The real gain comes via common subexpression elimination:
-```
-  _34 = ["a", "an", "or", "orna"]
-  apa = ("ap" + _34)
-  blomma = ("blomm" + _34)
-  flicka = ("flick" + _34)
-```
-Notice that it now matters a lot how grammars are written.
-For instance, if German verbs are treated as a one-dimensional
-table,
-```
-  ["lieben", "liebe", "liebst", ...., "geliebt", "geliebter",...]
-```
-no shared prefix optimization is possible. A better form is
-separate tables for non-"ge" and "ge" forms:
-```
-  [["lieben", "liebe", "liebst", ....], ["geliebt", "geliebter",...]]
-```
-
-
-#NEW
-
-==Reuse of grammars as libraries==
-
-The idea of resource grammars: take care of all aspects of
-surface grammaticality (inflection, agreement, word order).
-
-Reuse in application grammar: via translations
-```
-  cat C          --->  oper C : Type = T
-  lincat C = T 
-
-  fun f : A      --->  oper f : A* = t
-  lin f = t 
-```
-The user only needs to know the type signatures (abstract syntax).
-
-However, this does not quite guarantee grammaticality, because
-different categories can have the same lincat:
-```
-  lincat Conj = {s : Str}
-  lincat Adv  = {s : Str}
-```
-Thus someone may by accident use "and" as an adverb!
-
-
-#NEW
-
-==Forcing the type checker to act as a grammar checker==
-
-We just have to make linearization types unique for each category.
-
-The technique is reminiscent of Haskell's ``newtype`` but uses
-records instead: we add **lock fields** e.g.
-```
-  lincat Conj = {s : Str ; lock_Conj : {}}
-  lincat Adv  = {s : Str ; lock_Adv  : {}}
-```
-Thanks to record subtyping, the translation is simple:
-```
-  fun f : C1 -> ... -> Cn -> C        
-  lin f = t
-
-                  --->
-
-  oper f : C1* -> ... -> Cn* -> C* = 
-    \x1,...,xn -> (t x1 ... xn) ** {lock_C = {}}
-```
-
-#NEW
-
-==Things to do==
-
-Better compression of gfc file format.
-
-Type checking of dependent-type pattern matching in abstract syntax.
-
-Compilation-related modules that need rewriting
-- ``ReadFiles``: clarify the logic of dependencies
-- ``Compile``: clarify the logic of what to do with each module
-- ``Compute``: make the evaluation more efficient
-- ``Parsing/*``, ``OldParsing/*``, ``Conversion/*``: reduce the number
-  of parser formats and algorithms