From ce15ec7b787479ca4c7295863ea7fa5cfdd16755 Mon Sep 17 00:00:00 2001 From: aarne Date: Wed, 22 Dec 2010 14:08:42 +0000 Subject: moved parts of doc to deprecated/doc --- deprecated/Resource-HOWTO.html | 967 +++++++++++++++++++++++++++++++++++++ deprecated/Resource-HOWTO.txt | 827 +++++++++++++++++++++++++++++++ deprecated/Syntax.png | Bin 0 -> 104604 bytes deprecated/doc/2341.html | 259 ++++++++++ deprecated/doc/DocGF.pdf | Bin 0 -> 56906 bytes deprecated/doc/DocGF.tex | 569 ++++++++++++++++++++++ deprecated/doc/German.png | Bin 0 -> 21000 bytes deprecated/doc/Grammar.dot | 75 +++ deprecated/doc/Grammar.png | Bin 0 -> 78790 bytes deprecated/doc/TODO | 231 +++++++++ deprecated/doc/compiling-gf.txt | 750 ++++++++++++++++++++++++++++ deprecated/doc/eu-langs.dot | 79 +++ deprecated/doc/eu-langs.png | Bin 0 -> 85484 bytes deprecated/doc/food-translet.png | Bin 0 -> 22916 bytes deprecated/doc/food1.png | Bin 0 -> 22805 bytes deprecated/doc/food2.png | Bin 0 -> 31506 bytes deprecated/doc/gf-compiler.dot | 88 ++++ deprecated/doc/gf-compiler.png | Bin 0 -> 27451 bytes deprecated/doc/gf-formalism.html | 350 ++++++++++++++ deprecated/doc/gf-formalism.txt | 279 +++++++++++ deprecated/doc/gf-ideas.html | 311 ++++++++++++ deprecated/doc/gf-ideas.txt | 231 +++++++++ deprecated/doc/gf-statistics.txt | 289 +++++++++++ deprecated/doc/gf-summerschool.txt | 533 ++++++++++++++++++++ deprecated/doc/gf3-release.html | 73 +++ deprecated/doc/gf3-release.txt | 58 +++ deprecated/doc/school-langs.dot | 106 ++++ deprecated/doc/school-langs.png | Bin 0 -> 131704 bytes deprecated/doc/summer-align.png | Bin 0 -> 449911 bytes deprecated/doc/summer-langs.png | Bin 0 -> 1885485 bytes deprecated/doc/vr.html | 46 ++ deprecated/doc/vr.txt | 32 ++ 32 files changed, 6153 insertions(+) create mode 100644 deprecated/Resource-HOWTO.html create mode 100644 deprecated/Resource-HOWTO.txt create mode 100644 deprecated/Syntax.png create mode 100644 deprecated/doc/2341.html create mode 100644 deprecated/doc/DocGF.pdf create mode 100644 deprecated/doc/DocGF.tex create mode 100644 deprecated/doc/German.png create mode 100644 deprecated/doc/Grammar.dot create mode 100644 deprecated/doc/Grammar.png create mode 100644 deprecated/doc/TODO create mode 100644 deprecated/doc/compiling-gf.txt create mode 100644 deprecated/doc/eu-langs.dot create mode 100644 deprecated/doc/eu-langs.png create mode 100644 deprecated/doc/food-translet.png create mode 100644 deprecated/doc/food1.png create mode 100644 deprecated/doc/food2.png create mode 100644 deprecated/doc/gf-compiler.dot create mode 100644 deprecated/doc/gf-compiler.png create mode 100644 deprecated/doc/gf-formalism.html create mode 100644 deprecated/doc/gf-formalism.txt create mode 100644 deprecated/doc/gf-ideas.html create mode 100644 deprecated/doc/gf-ideas.txt create mode 100644 deprecated/doc/gf-statistics.txt create mode 100644 deprecated/doc/gf-summerschool.txt create mode 100644 deprecated/doc/gf3-release.html create mode 100644 deprecated/doc/gf3-release.txt create mode 100644 deprecated/doc/school-langs.dot create mode 100644 deprecated/doc/school-langs.png create mode 100644 deprecated/doc/summer-align.png create mode 100644 deprecated/doc/summer-langs.png create mode 100644 deprecated/doc/vr.html create mode 100644 deprecated/doc/vr.txt (limited to 'deprecated') diff --git a/deprecated/Resource-HOWTO.html b/deprecated/Resource-HOWTO.html new file mode 100644 index 000000000..ce2c15137 --- /dev/null +++ b/deprecated/Resource-HOWTO.html @@ -0,0 +1,967 @@ + + + + +Resource grammar writing HOWTO + +

Resource grammar writing HOWTO

+ +Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Last update: Mon Sep 22 14:28:01 2008 + + +

The resource grammar structure +
+
Language-dependent syntax modules +
- The present-tense fragment +
+
Phases of the work +
+
Lexicon extension +
+
Extending the resource grammar API +
Using parametrized modules +
- Writing an instance of parametrized resource grammar implementation +
- Parametrizing a resource grammar implementation +
+
Character encoding and transliterations +
Coding conventions in GF +
Transliterations +

+ +

+History +

+September 2008: updated for Version 1.5. +

+October 2007: updated for Version 1.2. +

+January 2006: first version. +

+The purpose of this document is to tell how to implement the GF +resource grammar API for a new language. We will not cover how +to use the resource grammar, nor how to change the API. But we +will give some hints how to extend the API. +

+A manual for using the resource grammar is found in +

+www.cs.chalmers.se/Cs/Research/Language-technology/GF/lib/resource/doc/synopsis.html. +

+A tutorial on GF, also introducing the idea of resource grammars, is found in +

+www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/gf-tutorial.html. +

+This document concerns the API v. 1.5, while the current stable release is 1.4. +You can find the code for the stable release in +

+www.cs.chalmers.se/Cs/Research/Language-technology/GF/lib/resource/ +

+and the next release in +

+www.cs.chalmers.se/Cs/Research/Language-technology/GF/next-lib/src/ +

+It is recommended to build new grammars to match the next release. +

+ +

The resource grammar structure

+The library is divided into a bunch of modules, whose dependencies +are given in the following figure. +

+ +

+Modules of different kinds are distinguished as follows: +

solid contours: module seen by end users +
dashed contours: internal module +
ellipse: abstract/concrete pair of modules +
rectangle: resource or instance +
diamond: interface +

+ +

+Put in another way: +

solid rectangles and diamonds: user-accessible library API +
solid ellipses: user-accessible top-level grammar for parsing and linearization +
dashed contours: not visible to users +

+ +

+The dashed ellipses form the main parts of the implementation, on which the resource +grammar programmer has to work with. She also has to work on the Paradigms +module. The rest of the modules can be produced mechanically from corresponding +modules for other languages, by just changing the language codes appearing in +their module headers. +

+The module structure is rather flat: most modules are direct +parents of Grammar. The idea +is that the implementors can concentrate on one linguistic aspect at a time, or +also distribute the work among several authors. The module Cat +defines the "glue" that ties the aspects together - a type system +to which all the other modules conform, so that e.g. NP means +the same thing in those modules that use NPs and those that +constructs them. +

+ +

Library API modules

+For the user of the library, these modules are the most important ones. +In a typical application, it is enough to open Paradigms and Syntax. +The module Try combines these two, making it possible to experiment +with combinations of syntactic and lexical constructors by using the +cc command in the GF shell. Here are short explanations of each API module: +

Try: the whole resource library for a language (Paradigms, Syntax, + Irreg, and Extra); + produced mechanically as a collection of modules +
Syntax: language-independent categories, syntax functions, and structural words; + produced mechanically as a collection of modules +
Constructors: language-independent syntax functions and structural words; + produced mechanically via functor instantiation +
Paradigms: language-dependent morphological paradigms +

+ + +

Phrase category modules

+The immediate parents of Grammar will be called phrase category modules, +since each of them concentrates on a particular phrase category (nouns, verbs, +adjectives, sentences,...). A phrase category module tells +how to construct phrases in that category. You will find out that +all functions in any of these modules have the same value type (or maybe +one of a small number of different types). Thus we have +

Noun: construction of nouns and noun phrases +
Adjective: construction of adjectival phrases +
Verb: construction of verb phrases +
Adverb: construction of adverbial phrases +
Numeral: construction of cardinal and ordinal numerals +
Sentence: construction of sentences and imperatives +
Question: construction of questions +
Relative: construction of relative clauses +
Conjunction: coordination of phrases +
Phrase: construction of the major units of text and speech +
Text: construction of texts as sequences of phrases +
Idiom: idiomatic expressions such as existentials +

+ + +

Infrastructure modules

+Expressions of each phrase category are constructed in the corresponding +phrase category module. But their use takes mostly place in other modules. +For instance, noun phrases, which are constructed in Noun, are +used as arguments of functions of almost all other phrase category modules. +How can we build all these modules independently of each other? +

+As usual in typeful programming, the only thing you need to know +about an object you use is its type. When writing a linearization rule +for a GF abstract syntax function, the only thing you need to know is +the linearization types of its value and argument categories. To achieve +the division of the resource grammar to several parallel phrase category modules, +what we need is an underlying definition of the linearization types. This +definition is given as the implementation of +

Cat: syntactic categories of the resource grammar +

+ +

+Any resource grammar implementation has first to agree on how to implement +Cat. Luckily enough, even this can be done incrementally: you +can skip the lincat definition of a category and use the default +{s : Str} until you need to change it to something else. In +English, for instance, many categories do have this linearization type. +

+ +

Lexical modules

+What is lexical and what is syntactic is not as clearcut in GF as in +some other grammar formalisms. Logically, lexical means atom, i.e. a +fun with no arguments. Linguistically, one may add to this +that the lin consists of only one token (or of a table whose values +are single tokens). Even in the restricted lexicon included in the resource +API, the latter rule is sometimes violated in some languages. For instance, +Structural.both7and_DConj is an atom, but its linearization is +two words e.g. both - and. +

+Another characterization of lexical is that lexical units can be added +almost ad libitum, and they cannot be defined in terms of already +given rules. The lexical modules of the resource API are thus more like +samples than complete lists. There are two such modules: +

Structural: structural words (determiners, conjunctions,...) +
Lexicon: basic everyday content words (nouns, verbs,...) +

+ +

+The module Structural aims for completeness, and is likely to +be extended in future releases of the resource. The module Lexicon +gives a "random" list of words, which enables testing the syntax. +It also provides a check list for morphology, since those words are likely to include +most morphological patterns of the language. +

+In the case of Lexicon it may come out clearer than anywhere else +in the API that it is impossible to give exact translation equivalents in +different languages on the level of a resource grammar. This is no problem, +since application grammars can use the resource in different ways for +different languages. +

+ +

Language-dependent syntax modules

+In addition to the common API, there is room for language-dependent extensions +of the resource. The top level of each languages looks as follows (with German +as example): +

+    abstract AllGerAbs = Lang, ExtraGerAbs, IrregGerAbs
+

+where ExtraGerAbs is a collection of syntactic structures specific to German, +and IrregGerAbs is a dictionary of irregular words of German +(at the moment, just verbs). Each of these language-specific grammars has +the potential to grow into a full-scale grammar of the language. These grammar +can also be used as libraries, but the possibility of using functors is lost. +

+To give a better overview of language-specific structures, +modules like ExtraGerAbs +are built from a language-independent module ExtraAbs +by restricted inheritance: +

+    abstract ExtraGerAbs = Extra [f,g,...]
+

+Thus any category and function in Extra may be shared by a subset of all +languages. One can see this set-up as a matrix, which tells +what Extra structures +are implemented in what languages. For the common API in Grammar, the matrix +is filled with 1's (everything is implemented in every language). +

+In a minimal resource grammar implementation, the language-dependent +extensions are just empty modules, but it is good to provide them for +the sake of uniformity. +

+ +

The present-tense fragment

+Some lines in the resource library are suffixed with the comment +

+    --# notpresent
+

+which is used by a preprocessor to exclude those lines from +a reduced version of the full resource. This present-tense-only +version is useful for applications in most technical text, since +they reduce the grammar size and compilation time. It can also +be useful to exclude those lines in a first version of resource +implementation. To compile a grammar with present-tense-only, use +

+    make Present
+

+with resource/Makefile. +

+ +

Phases of the work

+ +

Putting up a directory

+Unless you are writing an instance of a parametrized implementation +(Romance or Scandinavian), which will be covered later, the +simplest way is to follow roughly the following procedure. Assume you +are building a grammar for the German language. Here are the first steps, +which we actually followed ourselves when building the German implementation +of resource v. 1.0 at Ubuntu linux. We have slightly modified them to +match resource v. 1.5 and GF v. 3.0. +

Create a sister directory for GF/lib/resource/english, named + german. +

+         cd GF/lib/resource/
+         mkdir german
+         cd german
+

Check out the [ISO 639 3-letter language code + http://www.w3.org/WAI/ER/IG/ert/iso639.htm] + for German: both Ger and Deu are given, and we pick Ger. + (We use the 3-letter codes rather than the more common 2-letter codes, + since they will suffice for many more languages!) +
+
Copy the *Eng.gf files from english german, + and rename them: +
```
+         cp ../english/*Eng.gf .
+         rename 's/Eng/Ger/' *Eng.gf
+
```
+ If you don't have the rename command, you can use a bash script with mv. +

+ +

Change the Eng module references to Ger references + in all files: +
```
+         sed -i 's/English/German/g' *Ger.gf
+         sed -i 's/Eng/Ger/g' *Ger.gf
+
```
+ The first line prevents changing the word English, which appears + here and there in comments, to Gerlish. The sed command syntax + may vary depending on your operating system. +
+
This may of course change unwanted occurrences of the + string Eng - verify this by +
```
+         grep Ger *.gf
+
```
+ But you will have to make lots of manual changes in all files anyway! +
+
Comment out the contents of these files: +
```
+         sed -i 's/^/--/' *Ger.gf
+
```
+ This will give you a set of templates out of which the grammar + will grow as you uncomment and modify the files rule by rule. +
+
In all .gf files, uncomment the module headers and brackets, + leaving the module bodies commented. Unfortunately, there is no + simple way to do this automatically (or to avoid commenting these + lines in the previous step) - but uncommenting the first + and the last lines will actually do the job for many of the files. +
+
Uncomment the contents of the main grammar file: +
```
+         sed -i 's/^--//' LangGer.gf
+
```
+
+
Now you can open the grammar LangGer in GF: +
```
+         gf LangGer.gf
+
```
+ You will get lots of warnings on missing rules, but the grammar will compile. +
+
At all the following steps you will now have a valid, but incomplete + GF grammar. The GF command +
```
+         pg -missing
+
```
+ tells you what exactly is missing. +

+ +

+Here is the module structure of LangGer. It has been simplified by leaving out +the majority of the phrase category modules. Each of them has the same dependencies +as VerbGer, whose complete dependencies are shown as an example. +

+ +

Direction of work

+The real work starts now. There are many ways to proceed, the most obvious ones being +

Top-down: start from the module Phrase and go down to Sentence, then + Verb, Noun, and in the end Lexicon. In this way, you are all the time + building complete phrases, and add them with more content as you proceed. + This approach is not recommended. It is impossible to test the rules if + you have no words to apply the constructions to. +
+
Bottom-up: set as your first goal to implement Lexicon. To this end, you + need to write ParadigmsGer, which in turn needs parts of + MorphoGer and ResGer. + This approach is not recommended. You can get stuck to details of + morphology such as irregular words, and you don't have enough grasp about + the type system to decide what forms to cover in morphology. +

+ +

+The practical working direction is thus a saw-like motion between the morphological +and top-level modules. Here is a possible course of the work that gives enough +test data and enough general view at any point: +

Define Cat.N and the required parameter types in ResGer. As we define +
```
+    lincat N  = {s : Number => Case => Str ; g : Gender} ;
+
```
+we need the parameter types Number, Case, and Gender. The definition +of Number in common/ParamX +works for German, so we +use it and just define Case and Gender in ResGer. +
+
Define some cases of mkN in ParadigmsGer. In this way you can +already implement a huge amount of nouns correctly in LexiconGer. Actually +just adding the worst-case instance of mkN (the one taking the most +arguments) should suffice for every noun - but, +since it is tedious to use, you +might proceed to the next step before returning to morphology and defining the +real work horse, mkN taking two forms and a gender. +
+
While doing this, you may want to test the resource independently. Do this by + starting the GF shell in the resource directory, by the commands +
```
+    > i -retain german/ParadigmsGer
+    > cc -table mkN "Kirche"
+
```
+
+
Proceed to determiners and pronouns in +NounGer (DetCN UsePron DetQuant NumSg DefArt IndefArt UseN) and +StructuralGer (i_Pron this_Quant). You also need some categories and +parameter types. At this point, it is maybe not possible to find out the final +linearization types of CN, NP, Det, and Quant, but at least you should +be able to correctly inflect noun phrases such as every airplane: +
```
+    > i german/LangGer.gf
+    > l -table DetCN every_Det (UseN airplane_N)
+  
+    Nom: jeder Flugzeug
+    Acc: jeden Flugzeug
+    Dat: jedem Flugzeug
+    Gen: jedes Flugzeugs
+
```
+
+
Proceed to verbs: define CatGer.V, ResGer.VForm, and +ParadigmsGer.mkV. You may choose to exclude notpresent +cases at this point. But anyway, you will be able to inflect a good +number of verbs in Lexicon, such as +live_V (mkV "leben"). +
+

Now you can soon form your first sentences: define VP and +Cl in CatGer, VerbGer.UseV, and SentenceGer.PredVP. +Even if you have excluded the tenses, you will be able to produce +

+    > i -preproc=./mkPresent german/LangGer.gf
+    > l -table PredVP (UsePron i_Pron) (UseV live_V)
+  
+    Pres Simul Pos Main: ich lebe
+    Pres Simul Pos Inv:  lebe ich
+    Pres Simul Pos Sub:  ich lebe
+    Pres Simul Neg Main: ich lebe nicht
+    Pres Simul Neg Inv:  lebe ich nicht
+    Pres Simul Neg Sub:  ich nicht lebe
+

+You should also be able to parse: +

+    > p -cat=Cl "ich lebe"
+    PredVP (UsePron i_Pron) (UseV live_V)
+

Transitive verbs +(CatGer.V2 CatGer.VPSlash ParadigmsGer.mkV2 VerbGer.ComplSlash VerbGer.SlashV2a) +are a natural next step, so that you can +produce ich liebe dich ("I love you"). +
+
Adjectives (CatGer.A ParadigmsGer.mkA NounGer.AdjCN AdjectiveGer.PositA) +will force you to think about strong and weak declensions, so that you can +correctly inflect mein neuer Wagen, dieser neue Wagen +("my new car, this new car"). +
+
Once you have implemented the set +(``Noun.DetCN Noun.AdjCN Verb.UseV Verb.ComplSlash Verb.SlashV2a Sentence.PredVP), +you have overcome most of difficulties. You know roughly what parameters +and dependences there are in your language, and you can now proceed very +much in the order you please. +

+ + +

The develop-test cycle

+The following develop-test cycle will +be applied most of the time, both in the first steps described above +and in later steps where you are more on your own. +

Select a phrase category module, e.g. NounGer, and uncomment some + linearization rules (for instance, DetCN, as above). +
+
Write down some German examples of this rule, for instance translations + of "the dog", "the house", "the big house", etc. Write these in all their + different forms (two numbers and four cases). +
+
Think about the categories involved (CN, NP, N, Det) and the + variations they have. Encode this in the lincats of CatGer. + You may have to define some new parameter types in ResGer. +
+
To be able to test the construction, + define some words you need to instantiate it + in LexiconGer. You will also need some regular inflection patterns + inParadigmsGer. +
+
Test by parsing, linearization, + and random generation. In particular, linearization to a table should + be used so that you see all forms produced; the treebank option + preserves the tree +
```
+      > gr -cat=NP -number=20 | l -table -treebank
+
```
+
+
Save some tree-linearization pairs for later regression testing. You can save + a gold standard treebank and use the Unix diff command to compare later + linearizations produced from the same list of trees. If you save the trees + in a file trees, you can do as follows: +
```
+      > rf -file=trees -tree -lines | l -table -treebank | wf -file=treebank
+
```
+
+
A file with trees testing all resource functions is included in the resource, + entitled resource/exx-resource.gft. A treebank can be created from this by + the Unix command +
```
+    % runghc Make.hs test langs=Ger
+
```
+

+ +

+You are likely to run this cycle a few times for each linearization rule +you implement, and some hundreds of times altogether. There are roughly +70 cats and +600 funs in Lang at the moment; 170 of the funs are outside the two +lexicon modules). +

+ +

Auxiliary modules

+These auxuliary resource modules will be written by you. +

ResGer: parameter types and auxiliary operations +(a resource for the resource grammar!) +
ParadigmsGer: complete inflection engine and most important regular paradigms +
MorphoGer: auxiliaries for ParadigmsGer and StructuralGer. This need +not be separate from ResGer. +

+ +

+These modules are language-independent and provided by the existing resource +package. +

ParamX: parameter types used in many languages +
CommonX: implementation of language-uniform categories + such as $Text$ and $Phr$, as well as of + the logical tense, anteriority, and polarity parameters +
Coordination: operations to deal with lists and coordination +
Prelude: general-purpose operations on strings, records, + truth values, etc. +
Predef: general-purpose operations with hard-coded definitions +

+ +

+An important decision is what rules to implement in terms of operations in +ResGer. The golden rule of functional programming says: +

Whenever you find yourself programming by copy and paste, write a function instead!. +

+ +

+This rule suggests that an operation should be created if it is to be +used at least twice. At the same time, a sound principle of vicinity says: +

It should not require too much browsing to understand what a piece of code does. +

+ +

+From these two principles, we have derived the following practice: +

If an operation is needed in two different modules, + it should be created in as an oper in ResGer. An example is mkClause, + used in Sentence, Question, and Relative- +
If an operation is needed twice in the same module, but never + outside, it should be created in the same module. Many examples are + found in Numerals. +
If an operation is needed twice in the same judgement, but never + outside, it should be created by a let definition. +
If an operation is only needed once, it should not be created as an oper, + but rather inlined. However, a let definition may well be in place just + to make the readable. + Most functions in phrase category modules + are implemented in this way. +

+ +

+This discipline is very different from the one followed in early +versions of the library (up to 0.9). We then valued the principle of +abstraction more than vicinity, creating layers of abstraction for +almost everything. This led in practice to the duplication of almost +all code on the lin and oper levels, and made the code +hard to understand and maintain. +

+ +

Morphology and lexicon

+The paradigms needed to implement +LexiconGer are defined in +ParadigmsGer. +This module provides high-level ways to define the linearization of +lexical items, of categories N, A, V and their complement-taking +variants. +

+For ease of use, the Paradigms modules follow a certain +naming convention. Thus they for each lexical category, such as N, +the overloaded functions, such as mkN, with the following cases: +

the worst-case construction of N. Its type signature + has the form +
```
+         mkN : Str -> ... -> Str -> P -> ... -> Q -> N
+
```
+ with as many string and parameter arguments as can ever be needed to + construct an N. +
the most regular cases, with just one string argument: +
```
+         mkN : Str -> N
+
```
+
A language-dependent (small) set of functions to handle mild irregularities + and common exceptions. +

+ +

+For the complement-taking variants, such as V2, we provide +

a case that takes a V and all necessary arguments, such + as case and preposition: +
```
+         mkV2 : V -> Case -> Str -> V2 ;
+
```
+
a case that takes a Str and produces a transitive verb with the direct + object case: +
```
+         mkV2 : Str -> V2 ;
+
```
+
A language-dependent (small) set of functions to handle common special cases, + such as transitive verbs that are not regular: +
```
+         mkV2 : V -> V2 ;
+
```
+

+ +

+The golden rule for the design of paradigms is that +

The user of the library will only need function applications with constants and strings, never any records or tables. +

+ +

+The discipline of data abstraction moreover requires that the user of the resource +is not given access to parameter constructors, but only to constants that denote +them. This gives the resource grammarian the freedom to change the underlying +data representation if needed. It means that the ParadigmsGer module has +to define constants for those parameter types and constructors that +the application grammarian may need to use, e.g. +

+    oper 
+      Case : Type ;
+      nominative, accusative, genitive, dative : Case ;
+

+These constants are defined in terms of parameter types and constructors +in ResGer and MorphoGer, which modules are not +visible to the application grammarian. +

+ +

Lock fields

+An important difference between MorphoGer and +ParadigmsGer is that the former uses "raw" record types +for word classes, whereas the latter used category symbols defined in +CatGer. When these category symbols are used to denote +record types in a resource modules, such as ParadigmsGer, +a lock field is added to the record, so that categories +with the same implementation are not confused with each other. +(This is inspired by the newtype discipline in Haskell.) +For instance, the lincats of adverbs and conjunctions are the same +in CommonX (and therefore in CatGer, which inherits it): +

+    lincat Adv  = {s : Str} ;
+    lincat Conj = {s : Str} ;
+

+But when these category symbols are used to denote their linearization +types in resource module, these definitions are translated to +

+    oper Adv  : Type = {s : Str  ; lock_Adv  : {}} ;
+    oper Conj : Type = {s : Str} ; lock_Conj : {}} ;
+

+In this way, the user of a resource grammar cannot confuse adverbs with +conjunctions. In other words, the lock fields force the type checker +to function as grammaticality checker. +

+When the resource grammar is opened in an application grammar, the +lock fields are never seen (except possibly in type error messages), +and the application grammarian should never write them herself. If she +has to do this, it is a sign that the resource grammar is incomplete, and +the proper way to proceed is to fix the resource grammar. +

+The resource grammarian has to provide the dummy lock field values +in her hidden definitions of constants in Paradigms. For instance, +

+    mkAdv : Str -> Adv ;
+    -- mkAdv s = {s = s ; lock_Adv = <>} ;
+

+ +

Lexicon construction

+The lexicon belonging to LangGer consists of two modules: +

StructuralGer, structural words, built by using both + ParadigmsGer and MorphoGer. +
LexiconGer, content words, built by using ParadigmsGer only. +

+ +

+The reason why MorphoGer has to be used in StructuralGer +is that ParadigmsGer does not contain constructors for closed +word classes such as pronouns and determiners. The reason why we +recommend ParadigmsGer for building LexiconGer is that +the coverage of the paradigms gets thereby tested and that the +use of the paradigms in LexiconGer gives a good set of examples for +those who want to build new lexica. +

+ +

Lexicon extension

+ +

The irregularity lexicon

+It is useful in most languages to provide a separate module of irregular +verbs and other words which are difficult for a lexicographer +to handle. There are usually a limited number of such words - a +few hundred perhaps. Building such a lexicon separately also +makes it less important to cover everything by the +worst-case variants of the paradigms mkV etc. +

+ +

Lexicon extraction from a word list

+You can often find resources such as lists of +irregular verbs on the internet. For instance, the +Irregular German Verb page +previously found in +http://www.iee.et.tu-dresden.de/~wernerr/grammar/verben_dt.html +page gives a list of verbs in the +traditional tabular format, which begins as follows: +

+    backen (du bäckst, er bäckt)                   backte [buk]              gebacken
+    befehlen (du befiehlst, er befiehlt; befiehl!) befahl (beföhle; befähle) befohlen
+    beginnen                                       begann (begönne; begänne) begonnen
+    beißen                                         biß                       gebissen
+

+All you have to do is to write a suitable verb paradigm +

+    irregV : (x1,_,_,_,_,x6 : Str) -> V ;
+

+and a Perl or Python or Haskell script that transforms +the table to +

+    backen_V   = irregV "backen" "bäckt" "back" "backte" "backte" "gebacken" ;
+    befehlen_V = irregV "befehlen" "befiehlt" "befiehl" "befahl" "beföhle" "befohlen" ;
+

+When using ready-made word lists, you should think about +coyright issues. All resource grammar material should +be provided under GNU Lesser General Public License (LGPL). +

+ +

Lexicon extraction from raw text data

+This is a cheap technique to build a lexicon of thousands +of words, if text data is available in digital format. +See the Extract Homepage +homepage for details. +

+ +

Bootstrapping with smart paradigms

+This is another cheap technique, where you need as input a list of words with +part-of-speech marking. You initialize the lexicon by using the one-argument +mkN etc paradigms, and add forms to those words that do not come out right. +This procedure is described in the paper +

+A. Ranta. +How predictable is Finnish morphology? An experiment on lexicon construction. +In J. Nivre, M. Dahllöf and B. Megyesi (eds), +Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein, +University of Uppsala, +2008. +Available from the series homepage +

+ +

Extending the resource grammar API

+Sooner or later it will happen that the resource grammar API +does not suffice for all applications. A common reason is +that it does not include idiomatic expressions in a given language. +The solution then is in the first place to build language-specific +extension modules, like ExtraGer. +

+ +

Using parametrized modules

+ +

Writing an instance of parametrized resource grammar implementation

+Above we have looked at how a resource implementation is built by +the copy and paste method (from English to German), that is, formally +speaking, from scratch. A more elegant solution available for +families of languages such as Romance and Scandinavian is to +use parametrized modules. The advantages are +

theoretical: linguistic generalizations and insights +
practical: maintainability improves with fewer components +

+ +

+Here is a set of +slides +on the topic. +

+ +

Parametrizing a resource grammar implementation

+This is the most demanding form of resource grammar writing. +We do not recommend the method of parametrizing from the +beginning: it is easier to have one language first implemented +in the conventional way and then add another language of the +same family by aprametrization. This means that the copy and +paste method is still used, but at this time the differences +are put into an interface module. +

+ +

Character encoding and transliterations

+This section is relevant for languages using a non-ASCII character set. +

+ +

Coding conventions in GF

+From version 3.0, GF follows a simple encoding convention: +

GF source files may follow any encoding, such as isolatin-1 or UTF-8; + the default is isolatin-1, and UTF8 must be indicated by the judgement +
```
+      flags coding = utf8 ;
+
```
+ in each source module. +
for internal processing, all characters are converted to 16-bit unicode, + as the first step of grammar compilation guided by the coding flag +
as the last step of compilation, all characters are converted to UTF-8 +
thus, GF object files (gfo) and the Portable Grammar Format (pgf) + are in UTF-8 +

+ +

+Most current resource grammars use isolatin-1 in the source, but this does +not affect their use in parallel with grammars written in other encodings. +In fact, a grammar can be put up from modules using different codings. +

+Warning. While string literals may contain any characters, identifiers +must be isolatin-1 letters (or digits, underscores, or dashes). This has to +do with the restrictions of the lexer tool that is used. +

+ +

Transliterations

+While UTF-8 is well supported by most web browsers, its use in terminals and +text editors may cause disappointment. Many grammarians therefore prefer to +use ASCII transliterations. GF 3.0beta2 provides the following built-in +transliterations: +

Arabic +
Devanagari (Hindi) +
Thai +

+ +

+New transliterations can be defined in the GF source file +GF/Text/Transliterations.hs. +This file also gives instructions on how new ones are added. +

+ + + + diff --git a/deprecated/Resource-HOWTO.txt b/deprecated/Resource-HOWTO.txt new file mode 100644 index 000000000..8e50974a7 --- /dev/null +++ b/deprecated/Resource-HOWTO.txt @@ -0,0 +1,827 @@ +Resource grammar writing HOWTO +Author: Aarne Ranta +Last update: %%date(%c) + +% NOTE: this is a txt2tags file. +% Create an html file from this file using: +% txt2tags --toc -thtml Resource-HOWTO.txt + +%!target:html + +**History** + +September 2008: updated for Version 1.5. + +October 2007: updated for Version 1.2. + +January 2006: first version. + + +The purpose of this document is to tell how to implement the GF +resource grammar API for a new language. We will //not// cover how +to use the resource grammar, nor how to change the API. But we +will give some hints how to extend the API. + +A manual for using the resource grammar is found in + +[``www.cs.chalmers.se/Cs/Research/Language-technology/GF/lib/resource/doc/synopsis.html`` ../lib/resource/doc/synopsis.html]. + +A tutorial on GF, also introducing the idea of resource grammars, is found in + +[``www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/gf-tutorial.html`` ./gf-tutorial.html]. + +This document concerns the API v. 1.5, while the current stable release is 1.4. +You can find the code for the stable release in + +[``www.cs.chalmers.se/Cs/Research/Language-technology/GF/lib/resource/`` ../lib/resource] + +and the next release in + +[``www.cs.chalmers.se/Cs/Research/Language-technology/GF/next-lib/src/`` ../next-lib/src] + +It is recommended to build new grammars to match the next release. + + + + +==The resource grammar structure== + +The library is divided into a bunch of modules, whose dependencies +are given in the following figure. + +[Syntax.png] + +Modules of different kinds are distinguished as follows: +- solid contours: module seen by end users +- dashed contours: internal module +- ellipse: abstract/concrete pair of modules +- rectangle: resource or instance +- diamond: interface + + +Put in another way: +- solid rectangles and diamonds: user-accessible library API +- solid ellipses: user-accessible top-level grammar for parsing and linearization +- dashed contours: not visible to users + + +The dashed ellipses form the main parts of the implementation, on which the resource +grammar programmer has to work with. She also has to work on the ``Paradigms`` +module. The rest of the modules can be produced mechanically from corresponding +modules for other languages, by just changing the language codes appearing in +their module headers. + +The module structure is rather flat: most modules are direct +parents of ``Grammar``. The idea +is that the implementors can concentrate on one linguistic aspect at a time, or +also distribute the work among several authors. The module ``Cat`` +defines the "glue" that ties the aspects together - a type system +to which all the other modules conform, so that e.g. ``NP`` means +the same thing in those modules that use ``NP``s and those that +constructs them. + + +===Library API modules=== + +For the user of the library, these modules are the most important ones. +In a typical application, it is enough to open ``Paradigms`` and ``Syntax``. +The module ``Try`` combines these two, making it possible to experiment +with combinations of syntactic and lexical constructors by using the +``cc`` command in the GF shell. Here are short explanations of each API module: +- ``Try``: the whole resource library for a language (``Paradigms``, ``Syntax``, + ``Irreg``, and ``Extra``); + produced mechanically as a collection of modules +- ``Syntax``: language-independent categories, syntax functions, and structural words; + produced mechanically as a collection of modules +- ``Constructors``: language-independent syntax functions and structural words; + produced mechanically via functor instantiation +- ``Paradigms``: language-dependent morphological paradigms + + + + + +===Phrase category modules=== + +The immediate parents of ``Grammar`` will be called **phrase category modules**, +since each of them concentrates on a particular phrase category (nouns, verbs, +adjectives, sentences,...). A phrase category module tells +//how to construct phrases in that category//. You will find out that +all functions in any of these modules have the same value type (or maybe +one of a small number of different types). Thus we have +- ``Noun``: construction of nouns and noun phrases +- ``Adjective``: construction of adjectival phrases +- ``Verb``: construction of verb phrases +- ``Adverb``: construction of adverbial phrases +- ``Numeral``: construction of cardinal and ordinal numerals +- ``Sentence``: construction of sentences and imperatives +- ``Question``: construction of questions +- ``Relative``: construction of relative clauses +- ``Conjunction``: coordination of phrases +- ``Phrase``: construction of the major units of text and speech +- ``Text``: construction of texts as sequences of phrases +- ``Idiom``: idiomatic expressions such as existentials + + + + +===Infrastructure modules=== + +Expressions of each phrase category are constructed in the corresponding +phrase category module. But their //use// takes mostly place in other modules. +For instance, noun phrases, which are constructed in ``Noun``, are +used as arguments of functions of almost all other phrase category modules. +How can we build all these modules independently of each other? + +As usual in typeful programming, the //only// thing you need to know +about an object you use is its type. When writing a linearization rule +for a GF abstract syntax function, the only thing you need to know is +the linearization types of its value and argument categories. To achieve +the division of the resource grammar to several parallel phrase category modules, +what we need is an underlying definition of the linearization types. This +definition is given as the implementation of +- ``Cat``: syntactic categories of the resource grammar + + +Any resource grammar implementation has first to agree on how to implement +``Cat``. Luckily enough, even this can be done incrementally: you +can skip the ``lincat`` definition of a category and use the default +``{s : Str}`` until you need to change it to something else. In +English, for instance, many categories do have this linearization type. + + + +===Lexical modules=== + +What is lexical and what is syntactic is not as clearcut in GF as in +some other grammar formalisms. Logically, lexical means atom, i.e. a +``fun`` with no arguments. Linguistically, one may add to this +that the ``lin`` consists of only one token (or of a table whose values +are single tokens). Even in the restricted lexicon included in the resource +API, the latter rule is sometimes violated in some languages. For instance, +``Structural.both7and_DConj`` is an atom, but its linearization is +two words e.g. //both - and//. + +Another characterization of lexical is that lexical units can be added +almost //ad libitum//, and they cannot be defined in terms of already +given rules. The lexical modules of the resource API are thus more like +samples than complete lists. There are two such modules: +- ``Structural``: structural words (determiners, conjunctions,...) +- ``Lexicon``: basic everyday content words (nouns, verbs,...) + + +The module ``Structural`` aims for completeness, and is likely to +be extended in future releases of the resource. The module ``Lexicon`` +gives a "random" list of words, which enables testing the syntax. +It also provides a check list for morphology, since those words are likely to include +most morphological patterns of the language. + +In the case of ``Lexicon`` it may come out clearer than anywhere else +in the API that it is impossible to give exact translation equivalents in +different languages on the level of a resource grammar. This is no problem, +since application grammars can use the resource in different ways for +different languages. + + +==Language-dependent syntax modules== + +In addition to the common API, there is room for language-dependent extensions +of the resource. The top level of each languages looks as follows (with German +as example): +``` + abstract AllGerAbs = Lang, ExtraGerAbs, IrregGerAbs +``` +where ``ExtraGerAbs`` is a collection of syntactic structures specific to German, +and ``IrregGerAbs`` is a dictionary of irregular words of German +(at the moment, just verbs). Each of these language-specific grammars has +the potential to grow into a full-scale grammar of the language. These grammar +can also be used as libraries, but the possibility of using functors is lost. + +To give a better overview of language-specific structures, +modules like ``ExtraGerAbs`` +are built from a language-independent module ``ExtraAbs`` +by restricted inheritance: +``` + abstract ExtraGerAbs = Extra [f,g,...] +``` +Thus any category and function in ``Extra`` may be shared by a subset of all +languages. One can see this set-up as a matrix, which tells +what ``Extra`` structures +are implemented in what languages. For the common API in ``Grammar``, the matrix +is filled with 1's (everything is implemented in every language). + +In a minimal resource grammar implementation, the language-dependent +extensions are just empty modules, but it is good to provide them for +the sake of uniformity. + + + +===The present-tense fragment=== + +Some lines in the resource library are suffixed with the comment +``` + --# notpresent +``` +which is used by a preprocessor to exclude those lines from +a reduced version of the full resource. This present-tense-only +version is useful for applications in most technical text, since +they reduce the grammar size and compilation time. It can also +be useful to exclude those lines in a first version of resource +implementation. To compile a grammar with present-tense-only, use +``` + make Present +``` +with ``resource/Makefile``. + + + +==Phases of the work== + +===Putting up a directory=== + +Unless you are writing an instance of a parametrized implementation +(Romance or Scandinavian), which will be covered later, the +simplest way is to follow roughly the following procedure. Assume you +are building a grammar for the German language. Here are the first steps, +which we actually followed ourselves when building the German implementation +of resource v. 1.0 at Ubuntu linux. We have slightly modified them to +match resource v. 1.5 and GF v. 3.0. + ++ Create a sister directory for ``GF/lib/resource/english``, named + ``german``. +``` + cd GF/lib/resource/ + mkdir german + cd german +``` + ++ Check out the [ISO 639 3-letter language code + http://www.w3.org/WAI/ER/IG/ert/iso639.htm] + for German: both ``Ger`` and ``Deu`` are given, and we pick ``Ger``. + (We use the 3-letter codes rather than the more common 2-letter codes, + since they will suffice for many more languages!) + ++ Copy the ``*Eng.gf`` files from ``english`` ``german``, + and rename them: +``` + cp ../english/*Eng.gf . + rename 's/Eng/Ger/' *Eng.gf +``` + If you don't have the ``rename`` command, you can use a bash script with ``mv``. + + ++ Change the ``Eng`` module references to ``Ger`` references + in all files: +``` + sed -i 's/English/German/g' *Ger.gf + sed -i 's/Eng/Ger/g' *Ger.gf +``` + The first line prevents changing the word ``English``, which appears + here and there in comments, to ``Gerlish``. The ``sed`` command syntax + may vary depending on your operating system. + ++ This may of course change unwanted occurrences of the + string ``Eng`` - verify this by +``` + grep Ger *.gf +``` + But you will have to make lots of manual changes in all files anyway! + ++ Comment out the contents of these files: +``` + sed -i 's/^/--/' *Ger.gf +``` + This will give you a set of templates out of which the grammar + will grow as you uncomment and modify the files rule by rule. + ++ In all ``.gf`` files, uncomment the module headers and brackets, + leaving the module bodies commented. Unfortunately, there is no + simple way to do this automatically (or to avoid commenting these + lines in the previous step) - but uncommenting the first + and the last lines will actually do the job for many of the files. + ++ Uncomment the contents of the main grammar file: +``` + sed -i 's/^--//' LangGer.gf +``` + ++ Now you can open the grammar ``LangGer`` in GF: +``` + gf LangGer.gf +``` + You will get lots of warnings on missing rules, but the grammar will compile. + ++ At all the following steps you will now have a valid, but incomplete + GF grammar. The GF command +``` + pg -missing +``` + tells you what exactly is missing. + + +Here is the module structure of ``LangGer``. It has been simplified by leaving out +the majority of the phrase category modules. Each of them has the same dependencies +as ``VerbGer``, whose complete dependencies are shown as an example. + +[German.png] + + +===Direction of work=== + +The real work starts now. There are many ways to proceed, the most obvious ones being +- Top-down: start from the module ``Phrase`` and go down to ``Sentence``, then + ``Verb``, ``Noun``, and in the end ``Lexicon``. In this way, you are all the time + building complete phrases, and add them with more content as you proceed. + **This approach is not recommended**. It is impossible to test the rules if + you have no words to apply the constructions to. + +- Bottom-up: set as your first goal to implement ``Lexicon``. To this end, you + need to write ``ParadigmsGer``, which in turn needs parts of + ``MorphoGer`` and ``ResGer``. + **This approach is not recommended**. You can get stuck to details of + morphology such as irregular words, and you don't have enough grasp about + the type system to decide what forms to cover in morphology. + + +The practical working direction is thus a saw-like motion between the morphological +and top-level modules. Here is a possible course of the work that gives enough +test data and enough general view at any point: ++ Define ``Cat.N`` and the required parameter types in ``ResGer``. As we define +``` + lincat N = {s : Number => Case => Str ; g : Gender} ; +``` +we need the parameter types ``Number``, ``Case``, and ``Gender``. The definition +of ``Number`` in [``common/ParamX`` ../lib/resource/common/ParamX.gf] +works for German, so we +use it and just define ``Case`` and ``Gender`` in ``ResGer``. + ++ Define some cases of ``mkN`` in ``ParadigmsGer``. In this way you can +already implement a huge amount of nouns correctly in ``LexiconGer``. Actually +just adding the worst-case instance of ``mkN`` (the one taking the most +arguments) should suffice for every noun - but, +since it is tedious to use, you +might proceed to the next step before returning to morphology and defining the +real work horse, ``mkN`` taking two forms and a gender. + ++ While doing this, you may want to test the resource independently. Do this by + starting the GF shell in the ``resource`` directory, by the commands +``` + > i -retain german/ParadigmsGer + > cc -table mkN "Kirche" +``` + ++ Proceed to determiners and pronouns in +``NounGer`` (``DetCN UsePron DetQuant NumSg DefArt IndefArt UseN``) and +``StructuralGer`` (``i_Pron this_Quant``). You also need some categories and +parameter types. At this point, it is maybe not possible to find out the final +linearization types of ``CN``, ``NP``, ``Det``, and ``Quant``, but at least you should +be able to correctly inflect noun phrases such as //every airplane//: +``` + > i german/LangGer.gf + > l -table DetCN every_Det (UseN airplane_N) + + Nom: jeder Flugzeug + Acc: jeden Flugzeug + Dat: jedem Flugzeug + Gen: jedes Flugzeugs +``` + ++ Proceed to verbs: define ``CatGer.V``, ``ResGer.VForm``, and +``ParadigmsGer.mkV``. You may choose to exclude ``notpresent`` +cases at this point. But anyway, you will be able to inflect a good +number of verbs in ``Lexicon``, such as +``live_V`` (``mkV "leben"``). + ++ Now you can soon form your first sentences: define ``VP`` and +``Cl`` in ``CatGer``, ``VerbGer.UseV``, and ``SentenceGer.PredVP``. +Even if you have excluded the tenses, you will be able to produce +``` + > i -preproc=./mkPresent german/LangGer.gf + > l -table PredVP (UsePron i_Pron) (UseV live_V) + + Pres Simul Pos Main: ich lebe + Pres Simul Pos Inv: lebe ich + Pres Simul Pos Sub: ich lebe + Pres Simul Neg Main: ich lebe nicht + Pres Simul Neg Inv: lebe ich nicht + Pres Simul Neg Sub: ich nicht lebe +``` +You should also be able to parse: +``` + > p -cat=Cl "ich lebe" + PredVP (UsePron i_Pron) (UseV live_V) +``` + ++ Transitive verbs +(``CatGer.V2 CatGer.VPSlash ParadigmsGer.mkV2 VerbGer.ComplSlash VerbGer.SlashV2a``) +are a natural next step, so that you can +produce ``ich liebe dich`` ("I love you"). + ++ Adjectives (``CatGer.A ParadigmsGer.mkA NounGer.AdjCN AdjectiveGer.PositA``) +will force you to think about strong and weak declensions, so that you can +correctly inflect //mein neuer Wagen, dieser neue Wagen// +("my new car, this new car"). + ++ Once you have implemented the set +(``Noun.DetCN Noun.AdjCN Verb.UseV Verb.ComplSlash Verb.SlashV2a Sentence.PredVP), +you have overcome most of difficulties. You know roughly what parameters +and dependences there are in your language, and you can now proceed very +much in the order you please. + + + +===The develop-test cycle=== + +The following develop-test cycle will +be applied most of the time, both in the first steps described above +and in later steps where you are more on your own. + ++ Select a phrase category module, e.g. ``NounGer``, and uncomment some + linearization rules (for instance, ``DetCN``, as above). + ++ Write down some German examples of this rule, for instance translations + of "the dog", "the house", "the big house", etc. Write these in all their + different forms (two numbers and four cases). + ++ Think about the categories involved (``CN, NP, N, Det``) and the + variations they have. Encode this in the lincats of ``CatGer``. + You may have to define some new parameter types in ``ResGer``. + ++ To be able to test the construction, + define some words you need to instantiate it + in ``LexiconGer``. You will also need some regular inflection patterns + in``ParadigmsGer``. + ++ Test by parsing, linearization, + and random generation. In particular, linearization to a table should + be used so that you see all forms produced; the ``treebank`` option + preserves the tree +``` + > gr -cat=NP -number=20 | l -table -treebank +``` + ++ Save some tree-linearization pairs for later regression testing. You can save + a gold standard treebank and use the Unix ``diff`` command to compare later + linearizations produced from the same list of trees. If you save the trees + in a file ``trees``, you can do as follows: +``` + > rf -file=trees -tree -lines | l -table -treebank | wf -file=treebank +``` + ++ A file with trees testing all resource functions is included in the resource, + entitled ``resource/exx-resource.gft``. A treebank can be created from this by + the Unix command +``` + % runghc Make.hs test langs=Ger +``` + + + +You are likely to run this cycle a few times for each linearization rule +you implement, and some hundreds of times altogether. There are roughly +70 ``cat``s and +600 ``funs`` in ``Lang`` at the moment; 170 of the ``funs`` are outside the two +lexicon modules). + + +===Auxiliary modules=== + +These auxuliary ``resource`` modules will be written by you. + +- ``ResGer``: parameter types and auxiliary operations +(a resource for the resource grammar!) +- ``ParadigmsGer``: complete inflection engine and most important regular paradigms +- ``MorphoGer``: auxiliaries for ``ParadigmsGer`` and ``StructuralGer``. This need +not be separate from ``ResGer``. + + +These modules are language-independent and provided by the existing resource +package. + +- ``ParamX``: parameter types used in many languages +- ``CommonX``: implementation of language-uniform categories + such as $Text$ and $Phr$, as well as of + the logical tense, anteriority, and polarity parameters +- ``Coordination``: operations to deal with lists and coordination +- ``Prelude``: general-purpose operations on strings, records, + truth values, etc. +- ``Predef``: general-purpose operations with hard-coded definitions + + +An important decision is what rules to implement in terms of operations in +``ResGer``. The **golden rule of functional programming** says: +- //Whenever you find yourself programming by copy and paste, write a function instead!//. + + +This rule suggests that an operation should be created if it is to be +used at least twice. At the same time, a sound principle of **vicinity** says: +- //It should not require too much browsing to understand what a piece of code does.// + + +From these two principles, we have derived the following practice: +- If an operation is needed //in two different modules//, + it should be created in as an ``oper`` in ``ResGer``. An example is ``mkClause``, + used in ``Sentence``, ``Question``, and ``Relative``- +- If an operation is needed //twice in the same module//, but never + outside, it should be created in the same module. Many examples are + found in ``Numerals``. +- If an operation is needed //twice in the same judgement//, but never + outside, it should be created by a ``let`` definition. +- If an operation is only needed once, it should not be created as an ``oper``, + but rather inlined. However, a ``let`` definition may well be in place just + to make the readable. + Most functions in phrase category modules + are implemented in this way. + + +This discipline is very different from the one followed in early +versions of the library (up to 0.9). We then valued the principle of +abstraction more than vicinity, creating layers of abstraction for +almost everything. This led in practice to the duplication of almost +all code on the ``lin`` and ``oper`` levels, and made the code +hard to understand and maintain. + + + +===Morphology and lexicon=== + +The paradigms needed to implement +``LexiconGer`` are defined in +``ParadigmsGer``. +This module provides high-level ways to define the linearization of +lexical items, of categories ``N, A, V`` and their complement-taking +variants. + +For ease of use, the ``Paradigms`` modules follow a certain +naming convention. Thus they for each lexical category, such as ``N``, +the overloaded functions, such as ``mkN``, with the following cases: + +- the worst-case construction of ``N``. Its type signature + has the form +``` + mkN : Str -> ... -> Str -> P -> ... -> Q -> N +``` + with as many string and parameter arguments as can ever be needed to + construct an ``N``. +- the most regular cases, with just one string argument: +``` + mkN : Str -> N +``` +- A language-dependent (small) set of functions to handle mild irregularities + and common exceptions. + + +For the complement-taking variants, such as ``V2``, we provide +- a case that takes a ``V`` and all necessary arguments, such + as case and preposition: +``` + mkV2 : V -> Case -> Str -> V2 ; +``` +- a case that takes a ``Str`` and produces a transitive verb with the direct + object case: +``` + mkV2 : Str -> V2 ; +``` +- A language-dependent (small) set of functions to handle common special cases, + such as transitive verbs that are not regular: +``` + mkV2 : V -> V2 ; +``` + + +The golden rule for the design of paradigms is that +- //The user of the library will only need function applications with constants and strings, never any records or tables.// + + +The discipline of data abstraction moreover requires that the user of the resource +is not given access to parameter constructors, but only to constants that denote +them. This gives the resource grammarian the freedom to change the underlying +data representation if needed. It means that the ``ParadigmsGer`` module has +to define constants for those parameter types and constructors that +the application grammarian may need to use, e.g. +``` + oper + Case : Type ; + nominative, accusative, genitive, dative : Case ; +``` +These constants are defined in terms of parameter types and constructors +in ``ResGer`` and ``MorphoGer``, which modules are not +visible to the application grammarian. + + +===Lock fields=== + +An important difference between ``MorphoGer`` and +``ParadigmsGer`` is that the former uses "raw" record types +for word classes, whereas the latter used category symbols defined in +``CatGer``. When these category symbols are used to denote +record types in a resource modules, such as ``ParadigmsGer``, +a **lock field** is added to the record, so that categories +with the same implementation are not confused with each other. +(This is inspired by the ``newtype`` discipline in Haskell.) +For instance, the lincats of adverbs and conjunctions are the same +in ``CommonX`` (and therefore in ``CatGer``, which inherits it): +``` + lincat Adv = {s : Str} ; + lincat Conj = {s : Str} ; +``` +But when these category symbols are used to denote their linearization +types in resource module, these definitions are translated to +``` + oper Adv : Type = {s : Str ; lock_Adv : {}} ; + oper Conj : Type = {s : Str} ; lock_Conj : {}} ; +``` +In this way, the user of a resource grammar cannot confuse adverbs with +conjunctions. In other words, the lock fields force the type checker +to function as grammaticality checker. + +When the resource grammar is ``open``ed in an application grammar, the +lock fields are never seen (except possibly in type error messages), +and the application grammarian should never write them herself. If she +has to do this, it is a sign that the resource grammar is incomplete, and +the proper way to proceed is to fix the resource grammar. + +The resource grammarian has to provide the dummy lock field values +in her hidden definitions of constants in ``Paradigms``. For instance, +``` + mkAdv : Str -> Adv ; + -- mkAdv s = {s = s ; lock_Adv = <>} ; +``` + + +===Lexicon construction=== + +The lexicon belonging to ``LangGer`` consists of two modules: +- ``StructuralGer``, structural words, built by using both + ``ParadigmsGer`` and ``MorphoGer``. +- ``LexiconGer``, content words, built by using ``ParadigmsGer`` only. + + +The reason why ``MorphoGer`` has to be used in ``StructuralGer`` +is that ``ParadigmsGer`` does not contain constructors for closed +word classes such as pronouns and determiners. The reason why we +recommend ``ParadigmsGer`` for building ``LexiconGer`` is that +the coverage of the paradigms gets thereby tested and that the +use of the paradigms in ``LexiconGer`` gives a good set of examples for +those who want to build new lexica. + + + + + +==Lexicon extension== + +===The irregularity lexicon=== + +It is useful in most languages to provide a separate module of irregular +verbs and other words which are difficult for a lexicographer +to handle. There are usually a limited number of such words - a +few hundred perhaps. Building such a lexicon separately also +makes it less important to cover //everything// by the +worst-case variants of the paradigms ``mkV`` etc. + + + +===Lexicon extraction from a word list=== + +You can often find resources such as lists of +irregular verbs on the internet. For instance, the +Irregular German Verb page +previously found in +``http://www.iee.et.tu-dresden.de/~wernerr/grammar/verben_dt.html`` +page gives a list of verbs in the +traditional tabular format, which begins as follows: +``` + backen (du bäckst, er bäckt) backte [buk] gebacken + befehlen (du befiehlst, er befiehlt; befiehl!) befahl (beföhle; befähle) befohlen + beginnen begann (begönne; begänne) begonnen + beißen biß gebissen +``` +All you have to do is to write a suitable verb paradigm +``` + irregV : (x1,_,_,_,_,x6 : Str) -> V ; +``` +and a Perl or Python or Haskell script that transforms +the table to +``` + backen_V = irregV "backen" "bäckt" "back" "backte" "backte" "gebacken" ; + befehlen_V = irregV "befehlen" "befiehlt" "befiehl" "befahl" "beföhle" "befohlen" ; +``` + +When using ready-made word lists, you should think about +coyright issues. All resource grammar material should +be provided under GNU Lesser General Public License (LGPL). + + + +===Lexicon extraction from raw text data=== + +This is a cheap technique to build a lexicon of thousands +of words, if text data is available in digital format. +See the [Extract Homepage http://www.cs.chalmers.se/~markus/extract/] +homepage for details. + + +===Bootstrapping with smart paradigms=== + +This is another cheap technique, where you need as input a list of words with +part-of-speech marking. You initialize the lexicon by using the one-argument +``mkN`` etc paradigms, and add forms to those words that do not come out right. +This procedure is described in the paper + +A. Ranta. +How predictable is Finnish morphology? An experiment on lexicon construction. +In J. Nivre, M. Dahllöf and B. Megyesi (eds), +//Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein//, +University of Uppsala, +2008. +Available from the [series homepage http://publications.uu.se/abstract.xsql?dbid=8933] + + + + +==Extending the resource grammar API== + +Sooner or later it will happen that the resource grammar API +does not suffice for all applications. A common reason is +that it does not include idiomatic expressions in a given language. +The solution then is in the first place to build language-specific +extension modules, like ``ExtraGer``. + +==Using parametrized modules== + +===Writing an instance of parametrized resource grammar implementation=== + +Above we have looked at how a resource implementation is built by +the copy and paste method (from English to German), that is, formally +speaking, from scratch. A more elegant solution available for +families of languages such as Romance and Scandinavian is to +use parametrized modules. The advantages are +- theoretical: linguistic generalizations and insights +- practical: maintainability improves with fewer components + + +Here is a set of +[slides http://www.cs.chalmers.se/~aarne/geocal2006.pdf] +on the topic. + + +===Parametrizing a resource grammar implementation=== + +This is the most demanding form of resource grammar writing. +We do //not// recommend the method of parametrizing from the +beginning: it is easier to have one language first implemented +in the conventional way and then add another language of the +same family by aprametrization. This means that the copy and +paste method is still used, but at this time the differences +are put into an ``interface`` module. + + +==Character encoding and transliterations== + +This section is relevant for languages using a non-ASCII character set. + +==Coding conventions in GF== + +From version 3.0, GF follows a simple encoding convention: +- GF source files may follow any encoding, such as isolatin-1 or UTF-8; + the default is isolatin-1, and UTF8 must be indicated by the judgement +``` + flags coding = utf8 ; +``` + in each source module. +- for internal processing, all characters are converted to 16-bit unicode, + as the first step of grammar compilation guided by the ``coding`` flag +- as the last step of compilation, all characters are converted to UTF-8 +- thus, GF object files (``gfo``) and the Portable Grammar Format (``pgf``) + are in UTF-8 + + +Most current resource grammars use isolatin-1 in the source, but this does +not affect their use in parallel with grammars written in other encodings. +In fact, a grammar can be put up from modules using different codings. + +**Warning**. While string literals may contain any characters, identifiers +must be isolatin-1 letters (or digits, underscores, or dashes). This has to +do with the restrictions of the lexer tool that is used. + + +==Transliterations== + +While UTF-8 is well supported by most web browsers, its use in terminals and +text editors may cause disappointment. Many grammarians therefore prefer to +use ASCII transliterations. GF 3.0beta2 provides the following built-in +transliterations: +- Arabic +- Devanagari (Hindi) +- Thai + + +New transliterations can be defined in the GF source file +[``GF/Text/Transliterations.hs`` ../src/GF/Text/Transliterations.hs]. +This file also gives instructions on how new ones are added. + + + + + diff --git a/deprecated/Syntax.png b/deprecated/Syntax.png new file mode 100644 index 000000000..f36c098f6 Binary files /dev/null and b/deprecated/Syntax.png differ diff --git a/deprecated/doc/2341.html b/deprecated/doc/2341.html new file mode 100644 index 000000000..ff3e9644d --- /dev/null +++ b/deprecated/doc/2341.html @@ -0,0 +1,259 @@ + + + +af_tunni : lámma kún síddi? boqól afartón i ków + +

+albanian : dy mijë tre qind e dyzet e një + +

+amharic : ሁለት ሺህ ሦስት መቶ ኣርባ ኣንድ + +

+arabic_classical : الفان و ثلاث مائة و واحد و أربعون + +

+arabic_modern : ﺍﻟﻔﻴﻦ ﻭ ﺛﻼﺛﻤﺎﺋﺔ ﻭ ﻭﺍﺣﺪ ﻭ ﺃﺭﺑﻌﻴﻦ + +

+basque : bi mila ta hirurehun berrogei ta bat + +

+bearlake_slave : nákee lamíl tai lak'o, óno, di,i, honéno, ?ó, l-ée + +

+bulgarian : две жиляди триста четирисет и едно + +

+catalan : dos mil tres-cents quaranta - u + +

+chinese : è´° ä» é¶ å ä½° è æ¾ å£¹ + +

+croatian : dva hiljade tri stotine četrdeset i jedan + +

+czech : dva tisíce tr^i sta čtyr^icet jeden + +

+dagur : hoire miange guarebe jau duci neke + +

+danish : to tusind og tre hundrede og en og fyrre + +

+decimal : 2341 + +

+dutch : twee duizend drie honderd een en veertig + +

+english : two thousand three hundred and forty - one + +

+finnish : kaksi tuhatta kolme sataa neljä kymmentä yksi + +

+french : deux mille trois cent quarante et un + +

+french_swiss : deux mille trois cent quarante et un + +

+fulfulde : ujine d.id.i temed.d.e tati e chappand.e nai e go'o + +

+geez : ዕሽራ ወ ሠላስቱ ምእት አርብዓ ወ አሐዱ + +

+german : zwei tausend drei hundert ein und vierzig + +

+greek_classical : δισχίλιοι τριακόσιοι τετταράκοντα εἵς + +

+greek_modern : δύο χιλιάδες τριακόσια σαράντα ένα + +

+guahibo : aniha sunu akueya sia yana bae kae + +

+guarani : moko~i ma mpohapy sa~ irundy kua~ petei~ + +

+hebrew_biblical : אלפים ו שלש מאות ו ארבעים ו אחד + +

+hindi : दो हज़ार तीन सौ एक्तालीस + +

+hungarian : két ezer három száz negyven egy + +

+icelandic : tvö Þúsund Þrjú hundrað fjörutíu og einn + +

+irish : dhá mhíle trí chead dhá fhichead a haon + +

+italian : due mila tre cento quaranta uno + +

+japanese : にせんさんびゃくよんぢゅういち + +

+kabardian : m&yn&yt' s'a&ys' p'L-'&s'ra z&ra + +

+kambera : dua riu tailu ngahu patu kambulu hau + +

+kawaiisu : N +

+khmer : bīra bā'na pī raya sē sipa mwya + +

+khowar : joo hazâr troi shọr oché joo bîsher î + +

+kodagu : i:ra:yrat mu:nu:yt.a na:padï + +

+kolyma_yukaghir : N +

+kulung : ni habau su chhum lik i + +

+kwami : dùbúk póllów dálmágí kúnún kán kúu pòD^òw kán múndí + +

+kwaza : N +

+lalo : `n. t'w sa há i tjhí tjh`& + +

+lamani : di hajaar do se caaLise par ek + +

+latvian : divtu^kstoš trīssimt četrdesmit viens + +

+lithuanian : dù tú:kstanc^iu, try:s s^imtai~ ke:turiasdes^imt víenas + +

+lotuxo : tausand ârrexai ikO EssIxa xunixoi ikO atOmwana aNwan x' âbotye + +

+maale : lam?ó $íya haitsó s'ééta ?oydí-támmi pétte + +

+malay : dua ribu tiga ratus empat puluh satu + +

+maltese : elfejn tliet mija u wieh-ed u erbgh-in + +

+mapuche : epu warangka külá pataka meli mari kiñe + +

+margi : dúbú s`&d>àN ghàrú mák`&r agá fód>ú kùmì gà s'&r pátlú* + +

+maybrat : N +

+miya : d'&bu ts`&r '`&náa d>àriy kìdi '`&náa díb>i f`&d>& bèh&n wut'& + +

+mongolian : qoyar mingGan Gurban ĵa'un döčin nigän + +

+nenets : side juonar n-ahar jur t-êt ju' ~ob + +

+norwegian_book : to tusen og tre hundre og førti et + +

+old_church_slavonic : дъвѣ тысѭшти триѥ съта четыре десѧте и ѥдинъ + +

+oromo : kuma lama fi dhibba sadii fi afurtamii tokko + +

+pashto : دوه زره دري سوه او يو څلوۍښت + +

+polish : dwa tysiace trzysta czterdziesci jeden + +

+portuguese : dois mil trezentos quarenta e um + +

+quechua : iskay warank'a kinsa pachak tawa chunka jukniyuq + +

+romanian : două mii trei sute patruzeci şi unu + +

+russian : две тысячи триста сорок один + +

+sango : ngbangbu bale óse na ndó ní ngbangbu otá na ndó ní bale osió na ndó ní ÓkO + +

+sanskrit : त्रि शतान्य एकचत्वारिंशच च द्वे सहस्रे + +

+slovak : dva tisic tri sto styridsat jedna + +

+sorani : دۇ ههزار سىسهد ځل و يهك + +

+spanish : dos mil trescientos cuarenta y uno + +

+stieng : baar ban pê riêng puôn jo't muôi + +

+swahili : elfu mbili mia tatu arobaini na moja + +

+swedish : två tusen tre hundra fyrtio ett + +

+tamil : இரணௌடௌ ஆயாரதௌதீ மீனௌ ந஽ரீ ந஽ரௌ பதௌ ஓனௌரீ + +

+tampere : kaks tuhatta kolme sataa nel kyt yks + +

+tibetan : t̆ong ṭ'a' n̆yī d́ang sumğya d́ang z̆hyib chu źhye chi' + +

+totonac : maa t~u3 mil lii ~a tuhun pus^um tun + +

+tuda_daza : dubu cu sao kidra ago.zo. sao mOrta tozo sao tro + +

+tukang_besi : dua riwu tolu hatu hato hulu sa'asa + +

+turkish : iki bin üç yüz kırk bir + +

+votic : kahsi tuhatta keVmsata: nelläts^ümmet ühsi + +

+welsh : dau fil tri chan un a deugain + +

+yasin_burushaski : altó hazár iskí tha altó-áltar hek + +

+zaiwa : i55 hing55 sum11 syo31 mi11 cue31 ra11 + + + + diff --git a/deprecated/doc/DocGF.pdf b/deprecated/doc/DocGF.pdf new file mode 100644 index 000000000..27e4262db Binary files /dev/null and b/deprecated/doc/DocGF.pdf differ diff --git a/deprecated/doc/DocGF.tex b/deprecated/doc/DocGF.tex new file mode 100644 index 000000000..6388d3548 --- /dev/null +++ b/deprecated/doc/DocGF.tex @@ -0,0 +1,569 @@ +\batchmode +%This Latex file is machine-generated by the BNF-converter + +\documentclass[a4paper,11pt]{article} +\author{BNF-converter} +\title{The Language GF} +\setlength{\parindent}{0mm} +\setlength{\parskip}{1mm} +\begin{document} + +\maketitle + +\newcommand{\emptyP}{\mbox{$\epsilon$}} +\newcommand{\terminal}[1]{\mbox{{\texttt {#1}}}} +\newcommand{\nonterminal}[1]{\mbox{$\langle \mbox{{\sl #1 }} \! \rangle$}} +\newcommand{\arrow}{\mbox{::=}} +\newcommand{\delimit}{\mbox{$|$}} +\newcommand{\reserved}[1]{\mbox{{\texttt {#1}}}} +\newcommand{\literal}[1]{\mbox{{\texttt {#1}}}} +\newcommand{\symb}[1]{\mbox{{\texttt {#1}}}} + +This document was automatically generated by the {\em BNF-Converter}. It was generated together with the lexer, the parser, and the abstract syntax module, which guarantees that the document matches with the implementation of the language (provided no hand-hacking has taken place). + +\section*{The lexical structure of GF} +\subsection*{Identifiers} +Identifiers \nonterminal{Ident} are unquoted strings beginning with a letter, +followed by any combination of letters, digits, and the characters {\tt \_ '}, +reserved words excluded. + + +\subsection*{Literals} +Integer literals \nonterminal{Int}\ are nonempty sequences of digits. + + +String literals \nonterminal{String}\ have the form +\terminal{"}$x$\terminal{"}, where $x$ is any sequence of any characters +except \terminal{"}\ unless preceded by \verb6\6. + + + + +LString literals are recognized by the regular expression +$\mbox{`''} ({\nonterminal{anychar}} - \mbox{`''})* \mbox{`''}$ + + +\subsection*{Reserved words and symbols} +The set of reserved words is the set of terminals appearing in the grammar. Those reserved words that consist of non-letter characters are called symbols, and they are treated in a different way from those that are similar to identifiers. The lexer follows rules familiar from languages like Haskell, C, and Java, including longest match and spacing conventions. + +The reserved words used in GF are the following: \\ + +\begin{tabular}{lll} +{\reserved{Lin}} &{\reserved{PType}} &{\reserved{Str}} \\ +{\reserved{Strs}} &{\reserved{Tok}} &{\reserved{Type}} \\ +{\reserved{abstract}} &{\reserved{case}} &{\reserved{cat}} \\ +{\reserved{concrete}} &{\reserved{data}} &{\reserved{def}} \\ +{\reserved{flags}} &{\reserved{fn}} &{\reserved{fun}} \\ +{\reserved{grammar}} &{\reserved{in}} &{\reserved{include}} \\ +{\reserved{incomplete}} &{\reserved{instance}} &{\reserved{interface}} \\ +{\reserved{let}} &{\reserved{lin}} &{\reserved{lincat}} \\ +{\reserved{lindef}} &{\reserved{lintype}} &{\reserved{of}} \\ +{\reserved{open}} &{\reserved{oper}} &{\reserved{out}} \\ +{\reserved{package}} &{\reserved{param}} &{\reserved{pattern}} \\ +{\reserved{pre}} &{\reserved{printname}} &{\reserved{resource}} \\ +{\reserved{reuse}} &{\reserved{strs}} &{\reserved{table}} \\ +{\reserved{tokenizer}} &{\reserved{transfer}} &{\reserved{union}} \\ +{\reserved{var}} &{\reserved{variants}} &{\reserved{where}} \\ +{\reserved{with}} & & \\ +\end{tabular}\\ + +The symbols used in GF are the following: \\ + +\begin{tabular}{lll} +{\symb{;}} &{\symb{{$=$}}} &{\symb{\{}} \\ +{\symb{\}}} &{\symb{(}} &{\symb{)}} \\ +{\symb{:}} &{\symb{{$-$}{$>$}}} &{\symb{**}} \\ +{\symb{,}} &{\symb{[}} &{\symb{]}} \\ +{\symb{.}} &{\symb{{$|$}}} &{\symb{\%}} \\ +{\symb{?}} &{\symb{{$<$}}} &{\symb{{$>$}}} \\ +{\symb{@}} &{\symb{!}} &{\symb{*}} \\ +{\symb{$\backslash$}} &{\symb{{$=$}{$>$}}} &{\symb{{$+$}{$+$}}} \\ +{\symb{{$+$}}} &{\symb{\_}} &{\symb{\$}} \\ +{\symb{/}} &{\symb{{$-$}}} & \\ +\end{tabular}\\ + +\subsection*{Comments} +Single-line comments begin with {\symb{{$-$}{$-$}}}. \\Multiple-line comments are enclosed with {\symb{\{{$-$}}} and {\symb{{$-$}\}}}. + +\section*{The syntactic structure of GF} +Non-terminals are enclosed between $\langle$ and $\rangle$. +The symbols {\arrow} (production), {\delimit} (union) +and {\emptyP} (empty rule) belong to the BNF notation. +All other symbols are terminals.\\ + +\begin{tabular}{lll} +{\nonterminal{Grammar}} & {\arrow} &{\nonterminal{ListModDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListModDef}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{ModDef}} {\nonterminal{ListModDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ModDef}} & {\arrow} &{\nonterminal{ModDef}} {\terminal{;}} \\ + & {\delimit} &{\terminal{grammar}} {\nonterminal{Ident}} {\terminal{{$=$}}} {\terminal{\{}} {\terminal{abstract}} {\terminal{{$=$}}} {\nonterminal{Ident}} {\terminal{;}} {\nonterminal{ListConcSpec}} {\terminal{\}}} \\ + & {\delimit} &{\nonterminal{ComplMod}} {\nonterminal{ModType}} {\terminal{{$=$}}} {\nonterminal{ModBody}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ConcSpec}} & {\arrow} &{\nonterminal{Ident}} {\terminal{{$=$}}} {\nonterminal{ConcExp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListConcSpec}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{ConcSpec}} \\ + & {\delimit} &{\nonterminal{ConcSpec}} {\terminal{;}} {\nonterminal{ListConcSpec}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ConcExp}} & {\arrow} &{\nonterminal{Ident}} {\nonterminal{ListTransfer}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListTransfer}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Transfer}} {\nonterminal{ListTransfer}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Transfer}} & {\arrow} &{\terminal{(}} {\terminal{transfer}} {\terminal{in}} {\nonterminal{Open}} {\terminal{)}} \\ + & {\delimit} &{\terminal{(}} {\terminal{transfer}} {\terminal{out}} {\nonterminal{Open}} {\terminal{)}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ModType}} & {\arrow} &{\terminal{abstract}} {\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{resource}} {\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{interface}} {\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{concrete}} {\nonterminal{Ident}} {\terminal{of}} {\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{instance}} {\nonterminal{Ident}} {\terminal{of}} {\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{transfer}} {\nonterminal{Ident}} {\terminal{:}} {\nonterminal{Open}} {\terminal{{$-$}{$>$}}} {\nonterminal{Open}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ModBody}} & {\arrow} &{\nonterminal{Extend}} {\nonterminal{Opens}} {\terminal{\{}} {\nonterminal{ListTopDef}} {\terminal{\}}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{with}} {\nonterminal{ListOpen}} \\ + & {\delimit} &{\nonterminal{ListIdent}} {\terminal{**}} {\nonterminal{Ident}} {\terminal{with}} {\nonterminal{ListOpen}} \\ + & {\delimit} &{\terminal{reuse}} {\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{union}} {\nonterminal{ListIncluded}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListTopDef}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{TopDef}} {\nonterminal{ListTopDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Extend}} & {\arrow} &{\nonterminal{ListIdent}} {\terminal{**}} \\ + & {\delimit} &{\emptyP} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListOpen}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Open}} \\ + & {\delimit} &{\nonterminal{Open}} {\terminal{,}} {\nonterminal{ListOpen}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Opens}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\terminal{open}} {\nonterminal{ListOpen}} {\terminal{in}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Open}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{(}} {\nonterminal{QualOpen}} {\nonterminal{Ident}} {\terminal{)}} \\ + & {\delimit} &{\terminal{(}} {\nonterminal{QualOpen}} {\nonterminal{Ident}} {\terminal{{$=$}}} {\nonterminal{Ident}} {\terminal{)}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ComplMod}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\terminal{incomplete}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{QualOpen}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\terminal{incomplete}} \\ + & {\delimit} &{\terminal{interface}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListIncluded}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Included}} \\ + & {\delimit} &{\nonterminal{Included}} {\terminal{,}} {\nonterminal{ListIncluded}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Included}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{[}} {\nonterminal{ListIdent}} {\terminal{]}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Def}} & {\arrow} &{\nonterminal{ListName}} {\terminal{:}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{ListName}} {\terminal{{$=$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{Name}} {\nonterminal{ListPatt}} {\terminal{{$=$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{ListName}} {\terminal{:}} {\nonterminal{Exp}} {\terminal{{$=$}}} {\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{TopDef}} & {\arrow} &{\terminal{cat}} {\nonterminal{ListCatDef}} \\ + & {\delimit} &{\terminal{fun}} {\nonterminal{ListFunDef}} \\ + & {\delimit} &{\terminal{data}} {\nonterminal{ListFunDef}} \\ + & {\delimit} &{\terminal{def}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{data}} {\nonterminal{ListDataDef}} \\ + & {\delimit} &{\terminal{transfer}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{param}} {\nonterminal{ListParDef}} \\ + & {\delimit} &{\terminal{oper}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{lincat}} {\nonterminal{ListPrintDef}} \\ + & {\delimit} &{\terminal{lindef}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{lin}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{printname}} {\terminal{cat}} {\nonterminal{ListPrintDef}} \\ + & {\delimit} &{\terminal{printname}} {\terminal{fun}} {\nonterminal{ListPrintDef}} \\ + & {\delimit} &{\terminal{flags}} {\nonterminal{ListFlagDef}} \\ + & {\delimit} &{\terminal{printname}} {\nonterminal{ListPrintDef}} \\ + & {\delimit} &{\terminal{lintype}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{pattern}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{package}} {\nonterminal{Ident}} {\terminal{{$=$}}} {\terminal{\{}} {\nonterminal{ListTopDef}} {\terminal{\}}} {\terminal{;}} \\ + & {\delimit} &{\terminal{var}} {\nonterminal{ListDef}} \\ + & {\delimit} &{\terminal{tokenizer}} {\nonterminal{Ident}} {\terminal{;}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{CatDef}} & {\arrow} &{\nonterminal{Ident}} {\nonterminal{ListDDecl}} \\ + & {\delimit} &{\terminal{[}} {\nonterminal{Ident}} {\nonterminal{ListDDecl}} {\terminal{]}} \\ + & {\delimit} &{\terminal{[}} {\nonterminal{Ident}} {\nonterminal{ListDDecl}} {\terminal{]}} {\terminal{\{}} {\nonterminal{Integer}} {\terminal{\}}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{FunDef}} & {\arrow} &{\nonterminal{ListIdent}} {\terminal{:}} {\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{DataDef}} & {\arrow} &{\nonterminal{Ident}} {\terminal{{$=$}}} {\nonterminal{ListDataConstr}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{DataConstr}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{.}} {\nonterminal{Ident}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListDataConstr}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{DataConstr}} \\ + & {\delimit} &{\nonterminal{DataConstr}} {\terminal{{$|$}}} {\nonterminal{ListDataConstr}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ParDef}} & {\arrow} &{\nonterminal{Ident}} {\terminal{{$=$}}} {\nonterminal{ListParConstr}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{{$=$}}} {\terminal{(}} {\terminal{in}} {\nonterminal{Ident}} {\terminal{)}} \\ + & {\delimit} &{\nonterminal{Ident}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ParConstr}} & {\arrow} &{\nonterminal{Ident}} {\nonterminal{ListDDecl}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{PrintDef}} & {\arrow} &{\nonterminal{ListName}} {\terminal{{$=$}}} {\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{FlagDef}} & {\arrow} &{\nonterminal{Ident}} {\terminal{{$=$}}} {\nonterminal{Ident}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListDef}} & {\arrow} &{\nonterminal{Def}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{Def}} {\terminal{;}} {\nonterminal{ListDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListCatDef}} & {\arrow} &{\nonterminal{CatDef}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{CatDef}} {\terminal{;}} {\nonterminal{ListCatDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListFunDef}} & {\arrow} &{\nonterminal{FunDef}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{FunDef}} {\terminal{;}} {\nonterminal{ListFunDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListDataDef}} & {\arrow} &{\nonterminal{DataDef}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{DataDef}} {\terminal{;}} {\nonterminal{ListDataDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListParDef}} & {\arrow} &{\nonterminal{ParDef}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{ParDef}} {\terminal{;}} {\nonterminal{ListParDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListPrintDef}} & {\arrow} &{\nonterminal{PrintDef}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{PrintDef}} {\terminal{;}} {\nonterminal{ListPrintDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListFlagDef}} & {\arrow} &{\nonterminal{FlagDef}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{FlagDef}} {\terminal{;}} {\nonterminal{ListFlagDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListParConstr}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{ParConstr}} \\ + & {\delimit} &{\nonterminal{ParConstr}} {\terminal{{$|$}}} {\nonterminal{ListParConstr}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListIdent}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{,}} {\nonterminal{ListIdent}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Name}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{[}} {\nonterminal{Ident}} {\terminal{]}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListName}} & {\arrow} &{\nonterminal{Name}} \\ + & {\delimit} &{\nonterminal{Name}} {\terminal{,}} {\nonterminal{ListName}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{LocDef}} & {\arrow} &{\nonterminal{ListIdent}} {\terminal{:}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{ListIdent}} {\terminal{{$=$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{ListIdent}} {\terminal{:}} {\nonterminal{Exp}} {\terminal{{$=$}}} {\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListLocDef}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{LocDef}} \\ + & {\delimit} &{\nonterminal{LocDef}} {\terminal{;}} {\nonterminal{ListLocDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Exp4}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{\{}} {\nonterminal{Ident}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{\%}} {\nonterminal{Ident}} {\terminal{\%}} \\ + & {\delimit} &{\nonterminal{Sort}} \\ + & {\delimit} &{\nonterminal{String}} \\ + & {\delimit} &{\nonterminal{Integer}} \\ + & {\delimit} &{\terminal{?}} \\ + & {\delimit} &{\terminal{[}} {\terminal{]}} \\ + & {\delimit} &{\terminal{data}} \\ + & {\delimit} &{\terminal{[}} {\nonterminal{Ident}} {\nonterminal{Exps}} {\terminal{]}} \\ + & {\delimit} &{\terminal{[}} {\nonterminal{String}} {\terminal{]}} \\ + & {\delimit} &{\terminal{\{}} {\nonterminal{ListLocDef}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{{$<$}}} {\nonterminal{ListTupleComp}} {\terminal{{$>$}}} \\ + & {\delimit} &{\terminal{(}} {\terminal{in}} {\nonterminal{Ident}} {\terminal{)}} \\ + & {\delimit} &{\terminal{{$<$}}} {\nonterminal{Exp}} {\terminal{:}} {\nonterminal{Exp}} {\terminal{{$>$}}} \\ + & {\delimit} &{\terminal{(}} {\nonterminal{Exp}} {\terminal{)}} \\ + & {\delimit} &{\nonterminal{LString}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Exp3}} & {\arrow} &{\nonterminal{Exp3}} {\terminal{.}} {\nonterminal{Label}} \\ + & {\delimit} &{\terminal{\{}} {\nonterminal{Ident}} {\terminal{.}} {\nonterminal{Ident}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{\%}} {\nonterminal{Ident}} {\terminal{.}} {\nonterminal{Ident}} {\terminal{\%}} \\ + & {\delimit} &{\nonterminal{Exp4}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Exp2}} & {\arrow} &{\nonterminal{Exp2}} {\nonterminal{Exp3}} \\ + & {\delimit} &{\terminal{table}} {\terminal{\{}} {\nonterminal{ListCase}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{table}} {\nonterminal{Exp4}} {\terminal{\{}} {\nonterminal{ListCase}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{table}} {\nonterminal{Exp4}} {\terminal{[}} {\nonterminal{ListExp}} {\terminal{]}} \\ + & {\delimit} &{\terminal{case}} {\nonterminal{Exp}} {\terminal{of}} {\terminal{\{}} {\nonterminal{ListCase}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{variants}} {\terminal{\{}} {\nonterminal{ListExp}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{pre}} {\terminal{\{}} {\nonterminal{Exp}} {\terminal{;}} {\nonterminal{ListAltern}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{strs}} {\terminal{\{}} {\nonterminal{ListExp}} {\terminal{\}}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{@}} {\nonterminal{Exp4}} \\ + & {\delimit} &{\nonterminal{Exp3}} \\ + & {\delimit} &{\terminal{Lin}} {\nonterminal{Ident}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Exp1}} & {\arrow} &{\nonterminal{Exp1}} {\terminal{!}} {\nonterminal{Exp2}} \\ + & {\delimit} &{\nonterminal{Exp1}} {\terminal{*}} {\nonterminal{Exp2}} \\ + & {\delimit} &{\nonterminal{Exp1}} {\terminal{**}} {\nonterminal{Exp2}} \\ + & {\delimit} &{\nonterminal{Exp2}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Exp}} & {\arrow} &{\terminal{$\backslash$}} {\nonterminal{ListBind}} {\terminal{{$-$}{$>$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\terminal{$\backslash$}} {\terminal{$\backslash$}} {\nonterminal{ListBind}} {\terminal{{$=$}{$>$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{Decl}} {\terminal{{$-$}{$>$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{Exp1}} {\terminal{{$=$}{$>$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{Exp1}} {\terminal{{$+$}{$+$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{Exp1}} {\terminal{{$+$}}} {\nonterminal{Exp}} \\ + & {\delimit} &{\terminal{let}} {\terminal{\{}} {\nonterminal{ListLocDef}} {\terminal{\}}} {\terminal{in}} {\nonterminal{Exp}} \\ + & {\delimit} &{\terminal{let}} {\nonterminal{ListLocDef}} {\terminal{in}} {\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{Exp1}} {\terminal{where}} {\terminal{\{}} {\nonterminal{ListLocDef}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{fn}} {\terminal{\{}} {\nonterminal{ListEquation}} {\terminal{\}}} \\ + & {\delimit} &{\nonterminal{Exp1}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListExp}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Exp}} \\ + & {\delimit} &{\nonterminal{Exp}} {\terminal{;}} {\nonterminal{ListExp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Exps}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Exp4}} {\nonterminal{Exps}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Patt1}} & {\arrow} &{\terminal{\_}} \\ + & {\delimit} &{\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{\{}} {\nonterminal{Ident}} {\terminal{\}}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{.}} {\nonterminal{Ident}} \\ + & {\delimit} &{\nonterminal{Integer}} \\ + & {\delimit} &{\nonterminal{String}} \\ + & {\delimit} &{\terminal{\{}} {\nonterminal{ListPattAss}} {\terminal{\}}} \\ + & {\delimit} &{\terminal{{$<$}}} {\nonterminal{ListPattTupleComp}} {\terminal{{$>$}}} \\ + & {\delimit} &{\terminal{(}} {\nonterminal{Patt}} {\terminal{)}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Patt}} & {\arrow} &{\nonterminal{Ident}} {\nonterminal{ListPatt}} \\ + & {\delimit} &{\nonterminal{Ident}} {\terminal{.}} {\nonterminal{Ident}} {\nonterminal{ListPatt}} \\ + & {\delimit} &{\nonterminal{Patt1}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{PattAss}} & {\arrow} &{\nonterminal{ListIdent}} {\terminal{{$=$}}} {\nonterminal{Patt}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Label}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{\$}} {\nonterminal{Integer}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Sort}} & {\arrow} &{\terminal{Type}} \\ + & {\delimit} &{\terminal{PType}} \\ + & {\delimit} &{\terminal{Tok}} \\ + & {\delimit} &{\terminal{Str}} \\ + & {\delimit} &{\terminal{Strs}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListPattAss}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{PattAss}} \\ + & {\delimit} &{\nonterminal{PattAss}} {\terminal{;}} {\nonterminal{ListPattAss}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{PattAlt}} & {\arrow} &{\nonterminal{Patt}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListPatt}} & {\arrow} &{\nonterminal{Patt1}} \\ + & {\delimit} &{\nonterminal{Patt1}} {\nonterminal{ListPatt}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListPattAlt}} & {\arrow} &{\nonterminal{PattAlt}} \\ + & {\delimit} &{\nonterminal{PattAlt}} {\terminal{{$|$}}} {\nonterminal{ListPattAlt}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Bind}} & {\arrow} &{\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{\_}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListBind}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Bind}} \\ + & {\delimit} &{\nonterminal{Bind}} {\terminal{,}} {\nonterminal{ListBind}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Decl}} & {\arrow} &{\terminal{(}} {\nonterminal{ListBind}} {\terminal{:}} {\nonterminal{Exp}} {\terminal{)}} \\ + & {\delimit} &{\nonterminal{Exp2}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{TupleComp}} & {\arrow} &{\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{PattTupleComp}} & {\arrow} &{\nonterminal{Patt}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListTupleComp}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{TupleComp}} \\ + & {\delimit} &{\nonterminal{TupleComp}} {\terminal{,}} {\nonterminal{ListTupleComp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListPattTupleComp}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{PattTupleComp}} \\ + & {\delimit} &{\nonterminal{PattTupleComp}} {\terminal{,}} {\nonterminal{ListPattTupleComp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Case}} & {\arrow} &{\nonterminal{ListPattAlt}} {\terminal{{$=$}{$>$}}} {\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListCase}} & {\arrow} &{\nonterminal{Case}} \\ + & {\delimit} &{\nonterminal{Case}} {\terminal{;}} {\nonterminal{ListCase}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Equation}} & {\arrow} &{\nonterminal{ListPatt}} {\terminal{{$-$}{$>$}}} {\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListEquation}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Equation}} \\ + & {\delimit} &{\nonterminal{Equation}} {\terminal{;}} {\nonterminal{ListEquation}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Altern}} & {\arrow} &{\nonterminal{Exp}} {\terminal{/}} {\nonterminal{Exp}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListAltern}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{Altern}} \\ + & {\delimit} &{\nonterminal{Altern}} {\terminal{;}} {\nonterminal{ListAltern}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{DDecl}} & {\arrow} &{\terminal{(}} {\nonterminal{ListBind}} {\terminal{:}} {\nonterminal{Exp}} {\terminal{)}} \\ + & {\delimit} &{\nonterminal{Exp4}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListDDecl}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\nonterminal{DDecl}} {\nonterminal{ListDDecl}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{OldGrammar}} & {\arrow} &{\nonterminal{Include}} {\nonterminal{ListTopDef}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{Include}} & {\arrow} &{\emptyP} \\ + & {\delimit} &{\terminal{include}} {\nonterminal{ListFileName}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{FileName}} & {\arrow} &{\nonterminal{String}} \\ + & {\delimit} &{\nonterminal{Ident}} \\ + & {\delimit} &{\terminal{/}} {\nonterminal{FileName}} \\ + & {\delimit} &{\terminal{.}} {\nonterminal{FileName}} \\ + & {\delimit} &{\terminal{{$-$}}} {\nonterminal{FileName}} \\ + & {\delimit} &{\nonterminal{Ident}} {\nonterminal{FileName}} \\ +\end{tabular}\\ + +\begin{tabular}{lll} +{\nonterminal{ListFileName}} & {\arrow} &{\nonterminal{FileName}} {\terminal{;}} \\ + & {\delimit} &{\nonterminal{FileName}} {\terminal{;}} {\nonterminal{ListFileName}} \\ +\end{tabular}\\ + + + +\end{document} + diff --git a/deprecated/doc/German.png b/deprecated/doc/German.png new file mode 100644 index 000000000..7c6303897 Binary files /dev/null and b/deprecated/doc/German.png differ diff --git a/deprecated/doc/Grammar.dot b/deprecated/doc/Grammar.dot new file mode 100644 index 000000000..cb2998eb3 --- /dev/null +++ b/deprecated/doc/Grammar.dot @@ -0,0 +1,75 @@ +digraph { + +size = "12,8" ; + +Lang [style = "solid", shape = "ellipse", URL = "Lang.gf"]; + +Lang -> Grammar [style = "solid"]; +Lang -> Lexicon [style = "solid"]; + +Grammar [style = "solid", shape = "ellipse", URL = "Lang.gf"]; + + +Grammar -> Noun [style = "solid"]; +Grammar -> Verb [style = "solid"]; +Grammar -> Adjective [style = "solid"]; +Grammar -> Adverb [style = "solid"]; +Grammar -> Numeral [style = "solid"]; +Grammar -> Sentence [style = "solid"]; +Grammar -> Question [style = "solid"]; +Grammar -> Relative [style = "solid"]; +Grammar -> Conjunction [style = "solid"]; +Grammar -> Phrase [style = "solid"]; +Grammar -> Text [style = "solid"]; +Grammar -> Idiom [style = "solid"]; +Grammar -> Structural [style = "solid"]; + + +Noun [style = "solid", shape = "ellipse", URL = "Noun.gf"]; +Noun -> Cat [style = "solid"]; + +Verb [style = "solid", shape = "ellipse", URL = "Verb.gf"]; +Verb -> Cat [style = "solid"]; + +Adjective [style = "solid", shape = "ellipse", URL = "Adjective.gf"]; +Adjective -> Cat [style = "solid"]; + +Adverb [style = "solid", shape = "ellipse", URL = "Adverb.gf"]; +Adverb -> Cat [style = "solid"]; + +Numeral [style = "solid", shape = "ellipse", URL = "Numeral.gf"]; +Numeral -> Cat [style = "solid"]; + +Sentence [style = "solid", shape = "ellipse", URL = "Sentence.gf"]; +Sentence -> Cat [style = "solid"]; + +Question [style = "solid", shape = "ellipse", URL = "Question.gf"]; +Question -> Cat [style = "solid"]; + +Relative [style = "solid", shape = "ellipse", URL = "Relative.gf"]; +Relative -> Cat [style = "solid"]; + +Conjunction [style = "solid", shape = "ellipse", URL = "Conjunction.gf"]; +Conjunction -> Cat [style = "solid"]; + +Phrase [style = "solid", shape = "ellipse", URL = "Phrase.gf"]; +Phrase -> Cat [style = "solid"]; + +Text [style = "solid", shape = "ellipse", URL = "Phrase.gf"]; +Text -> Cat [style = "solid"]; + +Idiom [style = "solid", shape = "ellipse", URL = "Phrase.gf"]; +Idiom -> Cat [style = "solid"]; + +Structural [style = "solid", shape = "ellipse", URL = "Structural.gf"]; +Structural -> Cat [style = "solid"]; + +Lexicon [style = "solid", shape = "ellipse", URL = "Lexicon.gf"]; +Lexicon -> Cat [style = "solid"]; + +Cat [style = "solid", shape = "ellipse", URL = "Cat.gf"]; +Cat -> Common [style = "solid"]; + +Common [style = "solid", shape = "ellipse", URL = "Tense.gf"]; + +} diff --git a/deprecated/doc/Grammar.png b/deprecated/doc/Grammar.png new file mode 100644 index 000000000..ada2847d7 Binary files /dev/null and b/deprecated/doc/Grammar.png differ diff --git a/deprecated/doc/TODO b/deprecated/doc/TODO new file mode 100644 index 000000000..c92f4c8fa --- /dev/null +++ b/deprecated/doc/TODO @@ -0,0 +1,231 @@ + +* Some notes on the syntax of this file, making it possible to use todoo-mode.el: + +- Items start with "* " +- Sub-items start with "- " +- It should be noted somewhere in the item, who has reported the item + Suggestion: Add "[who]" at the beginning of the item title + (then one can use "assign item" in todoo-mode) +- Each item should have a priority + Suggestion: Add "URGENT", "IMPORTANT" or "WISH" at the beginning of + the item title +- Sort the items in priority order + (todoo-mode can move an item up or down) + +---------------------------------------------------------------------- + + +* [peb] URGENT: Error messages for syntax errors + + When a syntax error is reported, it should be noted which file it + is. Otherwise it is impossible to know where the error is + (if one uses the -s flag): + + > i -s Domain/MP3/Domain_MP_Semantics.gf + syntax error at line 33 before ve , Proposition , + + There's no problem with other kinds of errors: + + > i -s Domain/MP3/Domain_MP_Semantics.gf + checking module Godis_Semantics + Happened in linearization of userMove : + product expected instead of { + pl : Str + } + + +* [peb] IMPORTANT: Add the -path of a module to daughter modules + + Then the main module does not have to know where all grandchildren are: + + file A.gf: + abstract A = B ** {...} + + file B.gf: + --# -path=./resource + abstract B = Lang ** {...} + + I.e.: the file A.gf should not need to know that B.gf uses the + resource library. + + +* [peb] IMPORTANT: incomplete concrete and interfaces + +- The following works in GF: + + incomplete concrete TestDI of TestA = open (C=TestCI) in { + lincat A = TestCI.A ** {p : Str}; + lin f = TestCI.f ** {p = "f"}; + g = TestCI.g ** {p = "g"}; + } + + > i -src TestDE.gf + +- BUT, if we exchange "TestCI" for "C" we get an error: + + incomplete concrete TestDI of TestA = open (C=TestCI) in { + lincat A = C.A ** {p : Str}; + lin f = C.f ** {p = "f"}; + g = C.g ** {p = "g"}; + } + + > i -src TestDE.gf + compiling TestDE.gf... failed to find C + OCCURRED IN + atomic term C given TestCE TestCI TestCE TestDE + OCCURRED IN + renaming definition of f + OCCURRED IN + renaming module TestDE + +- the other modules: + + abstract TestA = { + cat A; + fun f, g : A; + } + + instance TestBE of TestBI = { + oper hello = "hello"; + bye = "bye"; + } + + interface TestBI = { + oper hello : Str; + bye : Str; + } + + concrete TestCE of TestA = TestCI with (TestBI = TestBE); + + incomplete concrete TestCI of TestA = open TestBI in { + lincat A = {s : Str}; + lin f = {s = hello}; + g = {s = bye}; + } + + concrete TestDE of TestA = TestDI with (TestCI = TestCE); + +* [peb] IMPORTANT: Missing things in the help command + + > h -printer + (the flag -printer=cfgm is missing) + + > h -cat + WARNING: invalid option: cat + + > h -lang + WARNING: invalid option: lang + + > h -language + WARNING: invalid option: language + + > h -parser + WARNING: invalid option: parser + + > h -aslkdjaslkdjss + WARNING: invalid option: aslkdjaslkdjss + Command not found. + (it should note: "option not found") + + > h -optimize + WARNING: invalid option: optimize + + > h -startcat + WARNING: invalid option: startcat + + > h h + h, help: h Command? + (it should also mention "h -option") + + +* [peb] IMPORTANT: Set GF_LIb-PATH within GF + + > sf libpath=~/GF/lib + + +* [peb] IMPORTANT: Set the starting category with "sf" + + > sf startcat=X + + +* [peb] IMPORTANT: import-flags + +- There are some inconsistencies when importing grammars: + + 1. when doing "pg -printer=cfg", one must have used "i -conversion=finite", + since "pg" doesn't care about the flags that are set in the grammar file + + 2. when doing "pm -printer=cfgm", one must have set the flag + "conversion=finite" within the grammar file, since "pm" doesn't + care about the flags to the import command + + (I guess it's me (peb) who should fix this, but I don't know where + the different flags reside...) + +- Also, it must be decided in what cases flags can override other flags: + + a) in the grammar file, e.g. "flags conversion=finite;" + b) on the command line, e.g. "> sf conversion=finite" + c) as argument to a command, e.g. "> i -conversion=finite file.gf" + +- A related issue is to decide the scope of flags: + + Some flags are (or should be) local to the module + (e.g. -coding and -path) + Other flags override daughter flags for daughter modules + (e.g. -startcat and -conversion) + +* [bringert] IMPORTANT: get right startcat flag when printing CFGM + GF.CFGM.PrintCFGrammar.prCanonAsCFGM currently only gets the startcat + flag from the top-level concrete module. This might be easier + to fix if the multi grammar printers had access to more than just + the CanonGrammar. + +* [peb] WISH: generalizing incomplete concrete + + I want to be able to open an incomplete concrete module + inside another incomplete conrete. + Then I can instantiate both incompletes at the same time. + +* [peb] WISH: _tmpi, _tmpo + + The files _tmpi and _tmpo are never removed when quitting GF. + Further suggestion: put them in /tmp or similar. + + peb: n�r man anv�nder "|" till ett systemanrop, t.ex: + pg | ! sort + s� skapas filerna _tmpi och _tmpo. Men de tas aldrig bort. + + peb: �nnu b�ttre: ta bort filerna efter�t. + + aarne: Sant: n�r GF quittas (om detta inte sker onormalt). + Eller n�r kommandot har k�rt f�rdigt (om det terminerar). + + peb: B�st(?): skapa filerna i /tmp eller liknande. + + aarne: Ibland f�r man skrivr�ttighetsproblem - och det �r + inte kul om man m�ste ange en tmp-path. Och olika + anv�ndare och gf-processer m�ste ha unika filnamn. + Och vet inte hur det funkar p� windows... + + aarne: Ett till alternativ skulle vara att anv�nda handles + utan n�gra tmp-filer alls. Men jag har inte hunnit + ta reda p� hur det g�r till. + + bj�rn: Lite slumpm�ssiga tankar: + + man kan anv�nda System.Directory.getTemporaryDirectory, s� slipper man iaf bry sig om olika plattformsproblem. + + sen kan man anv�nda System.IO.openTempFile f�r att skapa en tempor�r fil. Den tas dock inte bort n�r programmet avslutas, s� det f�r man fixa sj�lv. + + System.Posix.Temp.mkstemp g�r n�t liknande, men dokumentationen �r d�lig. + + biblioteket HsShellScript har lite funktioner f�r s�nt h�r, se + http://www.volker-wysk.de/hsshellscript/apidoc/HsShellScript.html#16 + + +* [peb] WISH: Hierarchic modules + + Suggestion by peb: + The module A.B.C is located in the file A/B/C.gf + + Main advantage: you no longer need to state "--# -path=..." in + modules + +- How can this be combined with several modules inside one file? diff --git a/deprecated/doc/compiling-gf.txt b/deprecated/doc/compiling-gf.txt new file mode 100644 index 000000000..9e438f40f --- /dev/null +++ b/deprecated/doc/compiling-gf.txt @@ -0,0 +1,750 @@ +Compiling GF +Aarne Ranta +Proglog meeting, 1 November 2006 + +% to compile: txt2tags -thtml compiling-gf.txt ; htmls compiling-gf.html + +%!target:html +%!postproc(html): #NEW + +#NEW + +==The compilation task== + +GF is a grammar formalism, i.e. a special purpose programming language +for writing grammars. + +Other grammar formalisms: +- BNF, YACC, Happy (grammars for programming languages); +- PATR, HPSG, LFG (grammars for natural languages). + + +The grammar compiler prepares a GF grammar for two computational tasks: +- linearization: take syntax trees to strings +- parsing: take strings to syntax trees + + +The grammar gives a declarative description of these functionalities, +on a high abstraction level that improves grammar writing +productivity. + +For efficiency, the grammar is compiled to lower-level formats. + +Type checking is another essential compilation phase. Its purpose is +twofold, as usual: +- checking the correctness of the grammar +- type-annotating expressions for code generation + + +#NEW + +==Characteristics of GF language== + +Functional language with types, both built-in and user-defined. +``` + Str : Type + + param Number = Sg | Pl + + param AdjForm = ASg Gender | APl + + Noun : Type = {s : Number => Str ; g : Gender} +``` +Pattern matching. +``` + svart_A = table { + ASg _ => "svart" ; + _ => "svarta" + } +``` +Higher-order functions. + +Dependent types. +``` + flip : (a, b, c : Type) -> (a -> b -> c) -> b -> a -> c = + \_,_,_,f,y,x -> f x y ; +``` + + +#NEW + +==The module system of GF== + +Main division: abstract syntax and concrete syntax +``` + abstract Greeting = { + cat Greet ; + fun Hello : Greet ; + } + + concrete GreetingEng of Greeting = { + lincat Greet = {s : Str} ; + lin Hello = {s = "hello"} ; + } + + concrete GreetingIta of Greeting = { + param Politeness = Familiar | Polite ; + lincat Greet = {s : Politeness => Str} ; + lin Hello = {s = table { + Familiar => "ciao" ; + Polite => "buongiorno" + } ; + } +``` +Other features of the module system: +- extension and opening +- parametrized modules (cf. ML: signatures, structures, functors) + + + + +#NEW + +==GF vs. Haskell== + +Some things that (standard) Haskell hasn't: +- records and record subtyping +- regular expression patterns +- dependent types +- ML-style modules + + +Some things that GF hasn't: +- infinite (recursive) data types +- recursive functions +- classes, polymorphism + + +#NEW + +==GF vs. most linguistic grammar formalisms== + +GF separates abstract syntax from concrete syntax. + +GF has a module system with separate compilation. + +GF is generation-oriented (as opposed to parsing). + +GF has unidirectional matching (as opposed to unification). + +GF has a static type system (as opposed to a type-free universe). + +"I was - and I still am - firmly convinced that a program composed +out of statically type-checked parts is more likely to faithfully +express a well-thought-out design than a program relying on +weakly-typed interfaces or dynamically-checked interfaces." +(B. Stroustrup, 1994, p. 107) + + + +#NEW + +==The computation model: abstract syntax== + +An abstract syntax defines a free algebra of trees (using +dependent types, recursion, higher-order abstract syntax: +GF includes a complete Logical Framework). +``` + cat C (x_1 : A_1)...(x_n : A_n) + a_1 : A_1 + ... + a_n : A_n{x_1 : A_1,...,x_n-1 : A_n-1} + ---------------------------------------------------- + (C a_1 ... a_n) : Type + + + fun f : (x_1 : A_1) -> ... -> (x_n : A_n) -> A + a_1 : A_1 + ... + a_n : A_n{x_1 : A_1,...,x_n-1 : A_n-1} + ---------------------------------------------------- + (f a_1 ... a_n) : A{x_1 : A_1,...,x_n : A_n} + + + A : Type x : A |- B : Type x : A |- b : B f : (x : A) -> B a : A + ---------------------------- ---------------------- ------------------------ + (x : A) -> B : Type \x -> b : (x : A) -> B f a : B{x := A} +``` +Notice that all syntax trees are in eta-long form. + + +#NEW + +==The computation model: concrete syntax== + +A concrete syntax defines a homomorphism (compositional mapping) +from the abstract syntax to a system of concrete syntax objects. +``` + cat C _ + -------------------- + lincat C = C* : Type + + fun f : (x_1 : A_1) -> ... -> (x_n : A_n) -> A + ----------------------------------------------- + lin f = f* : A_1* -> ... -> A_n* -> A* + + (f a_1 ... a_n)* = f* a_1* ... a_n* +``` +The homomorphism can as such be used as linearization function. + +It is a functional program, but a restricted one, since it works +in the end on finite data structures only. + +But a more efficient program is obtained via compilation to +GFC = Canonical GF: the "machine code" of GF. + +The parsing problem of GFC can be reduced to that of MPCFG (Multiple +Parallel Context Free Grammars), see P. Ljunglöf's thesis (2004). + + + +#NEW + +==The core type system of concrete syntax: basic types== + +``` + param P P : PType + PType : Type --------- --------- + P : PType P : Type + + s : Str t : Str + Str : type "foo" : Str [] : Str ---------------- + s ++ t : Str +``` + + +#NEW + +==The core type system of concrete syntax: functions and tables== + +``` + A : Type x : A |- B : Type x : A |- b : B f : (x : A) -> B a : A + ---------------------------- ---------------------- ------------------------ + (x : A) -> B : Type \x -> b : (x : A) -> B f a : B{x := A} + + + P : PType A : Type t : P => A p : p + -------------------- ----------------- + P => A : Type t ! p : A + + v_1,...,v_n : A + ---------------------------------------------- P = {C_1,...,C_n} + table {C_1 => v_1 ; ... ; C_n => v_n} : P => A +``` +Pattern matching is treated as an abbreviation for tables. Notice that +``` + case e of {...} == table {...} ! e +``` + + +#NEW + +==The core type system of concrete syntax: records== + +``` + A_1,...,A_n : Type + ------------------------------------ n >= 0 + {r_1 : A_1 ; ... ; r_n : A_n} : Type + + + a_1 : A_1 ... a_n : A_n + ------------------------------------------------------------ + {r_1 = a_1 ; ... ; r_n = a_n} : {r_1 : A_1 ; ... ; r_n : A_n} + + + r : {r_1 : A_1 ; ... ; r_n : A_n} + ----------------------------------- i = 1,...,n + r.r_1 : A_1 +``` +Subtyping: if ``r : R`` then ``r : R ** {r : A}`` + + + +#NEW + +==Computation rules== + +``` + (\x -> b) a = b{x := a} + + (table {C_1 => v_1 ; ... ; C_n => v_n} : P => A) ! C_i = v_i + + {r_1 = a_1 ; ... ; r_n = a_n}.r_i = a_i +``` + + + +#NEW + +==Canonical GF== + +Concrete syntax type system: +``` + A_1 : Type ... A_n : Type + Str : Type Int : Type ------------------------- $i : A + [A_1, ..., A_n] : Type + + + a_1 : A_1 ... a_n : A_n t : [A_1, ..., A_n] + --------------------------------- ------------------- i = 1,..,n + [a_1, ..., a_n] : [A_1, ..., A_n] t ! i : A_i +``` +Tuples represent both records and tables. + +There are no functions. + +Linearization: +``` + lin f = f* + + (f a_1 ... a_n)* = f*{$1 = a_1*, ..., $n = a_n*} +``` + + +#NEW + +==The compilation task, again== + +1. From a GF source grammar, derive a canonical GF grammar. + +2. From the canonical GF grammar derive an MPCFG grammar + +The canonical GF grammar can be used for linearization, with +linear time complexity (w.r.t. the size of the tree). + +The MPCFG grammar can be used for parsing, with (unbounded) +polynomial time complexity (w.r.t. the size of the string). + +For these target formats, we have also built interpreters in +different programming languages (C, C++, Haskell, Java, Prolog). + +Moreover, we generate supplementary formats such as grammars +required by various speech recognition systems. + + +#NEW + +==An overview of compilation phases== + +Legend: +- ellipse node: representation saved in a file +- plain text node: internal representation +- solid arrow or ellipse: essential phare or format +- dashed arrow or ellipse: optional phase or format +- arrow label: the module implementing the phase + + +[gf-compiler.png] + + +#NEW + +==Using the compiler== + +Batch mode (cf. GHC). + +Interactive mode, building the grammar incrementally from +different files, with the possibility of testing them +(cf. GHCI). + +The interactive mode was first, built on the model of ALF-2 +(L. Magnusson), and there was no file output of compiled +grammars. + + +#NEW + +==Modules and separate compilation== + +The above diagram shows what happens to each module. +(But not quite, since some of the back-end formats must be +built for sets of modules: GFCC and the parser formats.) + +When the grammar compiler is called, it has a main module as its +argument. It then builds recursively a dependency graph with all +the other modules, and decides which ones must be recompiled. +The behaviour is rather similar to GHC. + +Separate compilation is //extremely important// when developing +big grammars, especially when using grammar libraries. Example: compiling +the GF resource grammar library takes 5 minutes, whereas reading +in the compiled image takes 10 seconds. + + +#NEW + +==Module dependencies and recompilation== + +(For later use, not for the Proglog talk) + +For each module M, there are 3 kinds of files: +- M.gf, source file +- M.gfc, compiled file ("object file") +- M.gfr, type-checked and optimized source file (for resource modules only) + + +The compiler reads gf files and writes gfc files (and gfr files if appropriate) + +The Main module is the one used as argument when calling GF. + +A module M (immediately) depends on the module K, if either +- M is a concrete of K +- M is an instance of K +- M extends K +- M opens K +- M is a completion of K with something +- M is a completion of some module with K instantiated with something + + +A module M (transitively) depends on the module K, if either +- M immediately depends on K +- M depends on some L such that L immediately depends on K + + +Immediate dependence is readable from the module header without parsing +the whole module. + +The compiler reads recursively the headers of all modules that Main depends on. + +These modules are arranged in a dependency graph, which is checked to be acyclic. + +To decide whether a module M has to be compiled, do: ++ Get the time stamps t() of M.gf and M.gfc (if a file doesn't exist, its + time is minus infinity). ++ If t(M.gf) > t(M.gfc), M must be compiled. ++ If M depends on K and K must be compiled, then M must be compiled. ++ If M depends on K and t(K.gf) > t(M.gfc), then M must be compiled. + + +Decorate the dependency graph by information on whether the gf or the gfc (and gfr) +format is to be read. + +Topologically sort the decorated graph, and read each file in the chosen format. + +The gfr file is generated for these module types only: +- resource +- instance + + +When reading K.gfc, also K.gfr is read if some M depending on K has to be compiled. +In other cases, it is enough to read K.gfc. + +In an interactive GF session, some modules may be in memory already. +When read to the memory, each module M is given time stamp t(M.m). +The additional rule now is: +- If M.gfc is to be read, and t(M.m) > t(M.gfc), don't read M.gfc. + + + + +#NEW + +==Techniques used== + +The compiler is written in Haskell, with some C foreign function calls +in the interactive version (readline, killing threads). + +BNFC is used for generating both the parsers and printers. +This has helped to make the formats portable. + +"Almost compositional functions" (``composOp``) are used in +many compiler passes, making them easier to write and understand. +A ``grep`` on the sources reveals 40 uses (outside the definition +of ``composOp`` itself). + +The key algorithmic ideas are +- type-driven partial evaluation in GF-to-GFC generation +- common subexpression elimination as back-end optimization +- some ideas in GFC-to-MCFG encoding + + +#NEW + +==Type-driven partial evaluation== + +Each abstract syntax category in GF has a corresponding linearization type: +``` + cat C + lincat C = T +``` +The general form of a GF rule pair is +``` + fun f : C1 -> ... -> Cn -> C + lin f = t +``` +with the typing condition following the ``lincat`` definitions +``` + t : T1 -> ... -> Tn -> T +``` +The term ``t`` is in general built by using abstraction methods such +as pattern matching, higher-order functions, local definitions, +and library functions. + +The compilation technique proceeds as follows: +- use eta-expansion on ``t`` to determine the canonical form of the term +``` + \ $C1, ...., $Cn -> (t $C1 .... $Cn) +``` +with unique variables ``$C1 .... $Cn`` for the arguments; repeat this +inside the term for records and tables +- evaluate the resulting term using the computation rules of GF +- what remains is a canonical term with ``$C1 .... $Cn`` the only +variables (the run-time input of the linearization function) + + +#NEW + +==Eta-expanding records and tables== + +For records that are valied via subtyping, eta expansion +eliminates superfluous fields: +``` + {r1 = t1 ; r2 = t2} : {r1 : T1} ----> {r1 = t1} +``` +For tables, the effect is always expansion, since +pattern matching can be used to represent tables +compactly: +``` + table {n => "fish"} : Number => Str ---> + + table { + Sg => "fish" ; + Pl => "fish" + } +``` +This can be helped by back-end optimizations (see below). + + +#NEW + +==Eliminating functions== + +"Everything is finite": parameter types, records, tables; +finite number of string tokens per grammar. + +But "inifinite types" such as function types are useful when +writing grammars, to enable abstractions. + +Since function types do not appear in linearization types, +we want functions to be eliminated from linearization terms. + +This is similar to the **subformula property** in logic. +Also the main problem is similar: function depending on +a run-time variable, +``` + (table {P => f ; Q = g} ! x) a +``` +This is not a redex, but we can make it closer to one by moving +the application inside the table, +``` + table {P => f a ; Q = g a} ! x +``` +This transformation is the same as Prawitz's (1965) elimination +of maximal segments in natural deduction: +``` + A B + C -> D C C -> D C + A B --------- --------- + A v B C -> D C -> D A v B D D + --------------------- ===> ------------------------- + C -> D C D + -------------------- + D +``` + + + +#NEW + +==Size effects of partial evaluation== + +Irrelevant table branches are thrown away, which can reduce the size. + +But, since tables are expanded and auxiliary functions are inlined, +the size can grow exponentially. + +How can we keep the first property and eliminate the second? + + +#NEW + +==Parametrization of tables== + +Algorithm: for each branch in a table, consider replacing the +argument by a variable: +``` + table { table { + P => t ; ---> x => t[P->x] ; + Q => u x => u[Q->x] + } } +``` +If the resulting branches are all equal, you can replace the table +by a lambda abstract +``` + \\x => t[P->x] +``` +If each created variable ``x`` is unique in the grammar, computation +with the lambda abstract is efficient. + + + +#NEW + +==Course-of-values tables== + +By maintaining a canonical order of parameters in a type, we can +eliminate the left hand sides of branches. +``` + table { table T [ + P => t ; ---> t ; + Q => u u + } ] +``` +The treatment is similar to ``Enum`` instances in Haskell. + +In the end, all parameter types can be translated to +initial segments of integers. + + +#NEW + +==Common subexpression elimination== + +Algorithm: ++ Go through all terms and subterms in a module, creating + a symbol table mapping terms to the number of occurrences. ++ For each subterm appearing at least twice, create a fresh + constant defined as that subterm. ++ Go through all rules (incl. rules for the new constants), + replacing largest possible subterms with such new constants. + + +This algorithm, in a way, creates the strongest possible abstractions. + +In general, the new constants have open terms as definitions. +But since all variables (and constants) are unique, they can +be computed by simple replacement. + + + +#NEW + +==Size effects of optimizations== + +Example: the German resource grammar +``LangGer`` + +|| optimization | lines | characters | size % | blow-up | +| none | 5394 | 3208435 | 100 | 25 | +| all | 5394 | 750277 | 23 | 6 | +| none_subs | 5772 | 1290866 | 40 | 10 | +| all_subs | 5644 | 414119 | 13 | 3 | +| gfcc | 3279 | 190004 | 6 | 1.5 | +| gf source | 3976 | 121939 | 4 | 1 | + + +Optimization "all" means parametrization + course-of-values. + +The source code size is an estimate, since it includes +potentially irrelevant library modules, and comments. + +The GFCC format is not reusable in separate compilation. + + + +#NEW + +==The shared prefix optimization== + +This is currently performed in GFCC only. + +The idea works for languages that have a rich morphology +based on suffixes. Then we can replace a course of values +with a pair of a prefix and a suffix set: +``` + ["apa", "apan", "apor", "aporna"] ---> + ("ap" + ["a", "an", "or", "orna"]) +``` +The real gain comes via common subexpression elimination: +``` + _34 = ["a", "an", "or", "orna"] + apa = ("ap" + _34) + blomma = ("blomm" + _34) + flicka = ("flick" + _34) +``` +Notice that it now matters a lot how grammars are written. +For instance, if German verbs are treated as a one-dimensional +table, +``` + ["lieben", "liebe", "liebst", ...., "geliebt", "geliebter",...] +``` +no shared prefix optimization is possible. A better form is +separate tables for non-"ge" and "ge" forms: +``` + [["lieben", "liebe", "liebst", ....], ["geliebt", "geliebter",...]] +``` + + +#NEW + +==Reuse of grammars as libraries== + +The idea of resource grammars: take care of all aspects of +surface grammaticality (inflection, agreement, word order). + +Reuse in application grammar: via translations +``` + cat C ---> oper C : Type = T + lincat C = T + + fun f : A ---> oper f : A* = t + lin f = t +``` +The user only needs to know the type signatures (abstract syntax). + +However, this does not quite guarantee grammaticality, because +different categories can have the same lincat: +``` + lincat Conj = {s : Str} + lincat Adv = {s : Str} +``` +Thus someone may by accident use "and" as an adverb! + + +#NEW + +==Forcing the type checker to act as a grammar checker== + +We just have to make linearization types unique for each category. + +The technique is reminiscent of Haskell's ``newtype`` but uses +records instead: we add **lock fields** e.g. +``` + lincat Conj = {s : Str ; lock_Conj : {}} + lincat Adv = {s : Str ; lock_Adv : {}} +``` +Thanks to record subtyping, the translation is simple: +``` + fun f : C1 -> ... -> Cn -> C + lin f = t + + ---> + + oper f : C1* -> ... -> Cn* -> C* = + \x1,...,xn -> (t x1 ... xn) ** {lock_C = {}} +``` + +#NEW + +==Things to do== + +Better compression of gfc file format. + +Type checking of dependent-type pattern matching in abstract syntax. + +Compilation-related modules that need rewriting +- ``ReadFiles``: clarify the logic of dependencies +- ``Compile``: clarify the logic of what to do with each module +- ``Compute``: make the evaluation more efficient +- ``Parsing/*``, ``OldParsing/*``, ``Conversion/*``: reduce the number + of parser formats and algorithms diff --git a/deprecated/doc/eu-langs.dot b/deprecated/doc/eu-langs.dot new file mode 100644 index 000000000..115ce0040 --- /dev/null +++ b/deprecated/doc/eu-langs.dot @@ -0,0 +1,79 @@ +graph{ + +size = "7,7" ; + +overlap = scale ; + +"Abs" [label = "Abstract Syntax", style = "solid", shape = "rectangle"] ; + +"1" [label = "Bulgarian", style = "solid", shape = "ellipse", color = "green"] ; +"1" -- "Abs" [style = "solid"]; + +"2" [label = "Czech", style = "solid", shape = "ellipse", color = "red"] ; +"2" -- "Abs" [style = "solid"]; + +"3" [label = "Danish", style = "solid", shape = "ellipse", color = "green"] ; +"3" -- "Abs" [style = "solid"]; + +"4" [label = "German", style = "solid", shape = "ellipse", color = "green"] ; +"4" -- "Abs" [style = "solid"]; + +"5" [label = "Estonian", style = "solid", shape = "ellipse", color = "red"] ; +"5" -- "Abs" [style = "solid"]; + +"6" [label = "Greek", style = "solid", shape = "ellipse", color = "red"] ; +"6" -- "Abs" [style = "solid"]; + +"7" [label = "English", style = "solid", shape = "ellipse", color = "green"] ; +"7" -- "Abs" [style = "solid"]; + +"8" [label = "Spanish", style = "solid", shape = "ellipse", color = "green"] ; +"8" -- "Abs" [style = "solid"]; + +"9" [label = "French", style = "solid", shape = "ellipse", color = "green"] ; +"9" -- "Abs" [style = "solid"]; + +"10" [label = "Italian", style = "solid", shape = "ellipse", color = "green"] ; +"10" -- "Abs" [style = "solid"]; + +"11" [label = "Latvian", style = "solid", shape = "ellipse", color = "red"] ; +"11" -- "Abs" [style = "solid"]; + +"12" [label = "Lithuanian", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "12" [style = "solid"]; + +"13" [label = "Irish", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "13" [style = "solid"]; + +"14" [label = "Hungarian", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "14" [style = "solid"]; + +"15" [label = "Maltese", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "15" [style = "solid"]; + +"16" [label = "Dutch", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "16" [style = "solid"]; + +"17" [label = "Polish", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "17" [style = "solid"]; + +"18" [label = "Portuguese", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "18" [style = "solid"]; + +"19" [label = "Slovak", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "19" [style = "solid"]; + +"20" [label = "Slovene", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "20" [style = "solid"]; + +"21" [label = "Romanian", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "21" [style = "solid"]; + +"22" [label = "Finnish", style = "solid", shape = "ellipse", color = "green"] ; +"Abs" -- "22" [style = "solid"]; + +"23" [label = "Swedish", style = "solid", shape = "ellipse", color = "green"] ; +"Abs" -- "23" [style = "solid"]; + + +} diff --git a/deprecated/doc/eu-langs.png b/deprecated/doc/eu-langs.png new file mode 100644 index 000000000..8c46a19db Binary files /dev/null and b/deprecated/doc/eu-langs.png differ diff --git a/deprecated/doc/food-translet.png b/deprecated/doc/food-translet.png new file mode 100644 index 000000000..dd622a4bf Binary files /dev/null and b/deprecated/doc/food-translet.png differ diff --git a/deprecated/doc/food1.png b/deprecated/doc/food1.png new file mode 100644 index 000000000..767069dab Binary files /dev/null and b/deprecated/doc/food1.png differ diff --git a/deprecated/doc/food2.png b/deprecated/doc/food2.png new file mode 100644 index 000000000..b36a01b22 Binary files /dev/null and b/deprecated/doc/food2.png differ diff --git a/deprecated/doc/gf-compiler.dot b/deprecated/doc/gf-compiler.dot new file mode 100644 index 000000000..f8ce1aaae --- /dev/null +++ b/deprecated/doc/gf-compiler.dot @@ -0,0 +1,88 @@ +digraph { + + gfe [label = "file.gfe", style = "dashed", shape = "ellipse"]; + gfe -> gf1 [label = " MkConcrete", style = "dashed"]; + +gf1 [label = "file.gf", style = "solid", shape = "ellipse"]; +gf1 -> gf2 [label = " LexGF", style = "solid"]; + +gf2 [label = "token list", style = "solid", shape = "plaintext"]; +gf2 -> gf3 [label = " ParGF", style = "solid"]; + +gf3 [label = "source tree", style = "solid", shape = "plaintext"]; +gf3 -> gf4 [label = " SourceToGrammar", style = "solid"]; + + cf [label = "file.cf", style = "dashed", shape = "ellipse"]; + cf -> gf4 [label = " CF.PPrCF", style = "dashed"]; + + ebnf [label = "file.ebnf", style = "dashed", shape = "ellipse"]; + ebnf -> gf4 [label = " CF.EBNF", style = "dashed"]; + + +gf4 [label = "GF tree", style = "solid", shape = "plaintext"]; +gf4 -> gf5 [label = " Extend", style = "solid"]; + +gf5 [label = "inheritance-linked GF tree", style = "solid", shape = "plaintext"]; +gf5 -> gf6 [label = " Rename", style = "solid"]; + +gf6 [label = "name-resolved GF tree", style = "solid", shape = "plaintext"]; +gf6 -> gf7 [label = " CheckGrammar", style = "solid"]; + +gf7 [label = "type-annotated GF tree", style = "solid", shape = "plaintext"]; +gf7 -> gf8 [label = " Optimize", style = "solid"]; + +gf8 [label = "optimized GF tree", style = "solid", shape = "plaintext"]; +gf8 -> gf9 [label = " GrammarToCanon", style = "solid"]; + +gf9 [label = "GFC tree", style = "solid", shape = "plaintext"]; +gf9 -> gfc [label = " BackOpt", style = "solid"]; + +gfc [label = "optimized GFC tree", style = "solid", shape = "box"]; +gfc -> gf11 [label = " PrintGFC", style = "solid"]; + +gf11 [label = "file.gfc", style = "solid", shape = "ellipse"]; + + + gfcc [label = "file.gfcc", style = "solid", shape = "ellipse"]; + gfc -> gfcc [label = " CanonToGFCC", style = "solid"]; + + mcfg [label = "file.gfcm", style = "dashed", shape = "ellipse"]; + gfc -> mcfg [label = " PrintGFC", style = "dashed"]; + + bnf [label = "file.cf", style = "dashed", shape = "ellipse"]; + gfc -> bnf [label = " CF.PrLBNF", style = "dashed"]; + + happy [label = "file.y (Happy)", style = "dashed", shape = "ellipse"]; + bnf -> happy [label = " bnfc", style = "dashed"]; + + bison [label = "file.y (Bison)", style = "dashed", shape = "ellipse"]; + bnf -> bison [label = " bnfc", style = "dashed"]; + + cup [label = "parser.java (CUP)", style = "dashed", shape = "ellipse"]; + bnf -> cup [label = " bnfc", style = "dashed"]; + + xml [label = "file.dtd (XML)", style = "dashed", shape = "ellipse"]; + bnf -> xml [label = " bnfc", style = "dashed"]; + + cfg [label = "CFG tree", style = "solid", shape = "plaintext"]; + gfc -> cfg [label = " Conversions.GFC", style = "dashed"]; + + cfgm [label = "file.cfgm", style = "dashed", shape = "ellipse"]; + cfg -> cfgm [label = " Conversions.GFC", style = "dashed"]; + + srg [label = "Non-LR CFG", style = "solid", shape = "plaintext"]; + cfg -> srg [label = " Speech.SRG", style = "dashed"]; + + gsl [label = "file.gsl", style = "dashed", shape = "ellipse"]; + srg -> gsl [label = " Speech.PrGSL", style = "dashed"]; + + jsgf [label = "file.jsgf", style = "dashed", shape = "ellipse"]; + srg -> jsgf [label = " Speech.PrJSGF", style = "dashed"]; + + fa [label = "DFA", style = "solid", shape = "plaintext"]; + cfg -> fa [label = " Speech.CFGToFiniteState", style = "dashed"]; + + slf [label = "file.slf", style = "dashed", shape = "ellipse"]; + fa -> slf [label = " Speech.PrSLF", style = "dashed"]; + +} diff --git a/deprecated/doc/gf-compiler.png b/deprecated/doc/gf-compiler.png new file mode 100644 index 000000000..6949c37b5 Binary files /dev/null and b/deprecated/doc/gf-compiler.png differ diff --git a/deprecated/doc/gf-formalism.html b/deprecated/doc/gf-formalism.html new file mode 100644 index 000000000..52d9256aa --- /dev/null +++ b/deprecated/doc/gf-formalism.html @@ -0,0 +1,350 @@ + + + + +A Birds-Eye View of GF as a Grammar Formalism + +

A Birds-Eye View of GF as a Grammar Formalism

+ +Author: Aarne Ranta
+Last update: Thu Feb 2 14:16:01 2006 + + +

GF in a few words +
History of GF +
Some key ingredients of GF in other grammar formalisms +
Examples of descriptions in each formalism +
Lambda terms and records +
The structure of GF formalisms +
The expressivity of GF +
Grammars and parsing +
Grammars as software libraries +
Multilinguality +
Parametrized modules +

+ +

+Abstract. This document gives a general description of the +Grammatical Framework (GF), with comparisons to other grammar +formalisms such as CG, ACG, HPSG, and LFG. +

+ +

GF in a few words

+Grammatical Framework (GF) is a grammar formalism +based on constructive type theory. +

+GF makes a distinction between abstract syntax and concrete syntax. +

+The abstract syntax part of GF is a logical framework, with +dependent types and higher-order functions. +

+The concrete syntax is a system of records containing strings and features. +

+A GF grammar defines a reversible homomorphism from an abstract syntax to a +concrete syntax. +

+A multilingual GF grammar is a set of concrete syntaxes associated with +one abstract syntax. +

+GF grammars are written in a high-level functional programming language, +which is compiled into a core language (GFC). +

+GF grammars can be used as resources, i.e. as libraries for writing +new grammars; these are compiled and optimized by the method of +grammar composition. +

+GF has a module system that supports grammar engineering and separate +compilation. +

+ +

History of GF

+1988. Intuitionistic Categorial Grammar; type theory as abstract syntax, +playing the role of Montague's analysis trees. Grammars implemented in Prolog. +

+1994. Type-Theoretical Grammar. Abstract syntax organized as a system of +combinators. Grammars implemented in ALF. +

+1996. Multilingual Type-Theoretical Grammar. Rules for generating six +languages from the same abstract syntax. Grammars implemented in ALF, ML, and +Haskell. +

+1998. The first implementation of GF as a language of its own. +

+2000. New version of GF: high-level functional source language, records used +for concrete syntax. +

+2003. The module system. +

+2004. Ljunglöf's thesis Expressivity and Complexity of GF. +

+ +

Some key ingredients of GF in other grammar formalisms

[GF ]: Grammatical Framework +
[CG ]: categorial grammar +
[ACG ]: abstract categorial grammar +
[HPSG ]: head-driven phrase structure grammar +
[LFG ]: lexical functional grammar +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

/	GF	ACG	LFG	HPSG	CG
abstract vs concrete syntax	X	X	?	-	-
type theory	X	X	-	-	X
records and features	X	-	X	X	-

+ +

Examples of descriptions in each formalism

+To be written... +

+ +

Lambda terms and records

+In CS, abstract syntax is trees and concrete syntax is strings. +This works more or less for programming languages. +

+In CG, all syntax is lambda terms. +

+In Montague grammar, abstract syntax is lambda terms and +concrete syntax is trees. Abstract syntax as lambda terms +can be considered well-established. +

+In PATR and HPSG, concrete syntax it records. This can be considered +well-established for natural languages. +

+In ACG, both are lambda terms. This is more general than GF, +but reversibility requires linearity restriction, which can be +unnatural for grammar writing. +

+In GF, linearization from lambda terms to records is reversible, +and grammar writing is not restricted to linear terms. +

+Grammar composition in ACG is just function composition. In GF, +it is more restricted... +

+ +

The structure of GF formalisms

+The following diagram (to be drawn properly!) describes the +levels. +

+         |   programming language design
+         V
+    GF source language
+         |
+         |   type-directed partial evaluation
+         V
+    GFC assembly language
+         |
+         |   Ljunglöf's translation
+         V
+    MCFG parser
+

+The last two phases are nontrivial mathematica properties. +

+In most grammar formalisms, grammarians have to work on the GFC +(or MCFG) level. +

+Maybe they use macros - they are therefore like macro assemblers. But there +are no separately compiled library modules, no type checking, etc. +

+ +

The expressivity of GF

+Parsing complexity is the same as MCFG: polynomial, with +unrestricted exponent depending on grammar. +This is between TAG and HPSG. +

+If semantic well-formedness (type theory) is taken into account, +then arbitrary logic can be expressed. The well-formedness of +abstract syntax is decidable, but the well-formedness of a +concrete-syntax string can require an arbitrary proof construction +and is therefore undecidable. +

+Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF +has the goal of assigning intended trees for strings. This is +generalized to shared trees for different languages. +

+The high-level language strives after the properties of +writability and readability (programming language notions). +

+ +

Grammars and parsing

+In many projects, a grammar is just seen as a declarative parsing program. +

+For GF, a grammar is primarily the definition of a language. +

+Detaching grammars from parsers is a good idea, giving +

more efficient and robust parsing (statistical etc) +
cleaner grammars +

+ +

+Separating abstract from concrete syntax is a prerequisite for this: +we want parsers to return abstract syntax objects, and these must exist +independently of parse trees. +

+A possible radical approach to parsing: +use a grammar to generate a treebank and machine-learn +a statistical parser from this. +

+Comparison: Steedman in CCG has done something like this. +

+ +

Grammars as software libraries

+Reuse for different purposes. +

+Grammar composition. +

+ +

Multilinguality

+In application grammars, the AS is a semantic +model, and a CS covers domain terminology and idioms. +

+This can give publication-quality translation on +limited domains (e.g. the WebALT project). +

+Resource grammars with grammar composition lead to +compile-time transfer. +

+When is run-time transfer necessary? +

+Cf. CLE (Core Language Engine). +

+ +

Parametrized modules

+This notion comes from the ML language in the 1980's. +

+It can be used for sharing even more code between languages +than their AS. +

+Especially, for related languages (Scandinavian, Romance). +

+Cf. grammar porting in CLE: what they do with untyped +macro packages GF does with typable interfaces. +

+ + + + diff --git a/deprecated/doc/gf-formalism.txt b/deprecated/doc/gf-formalism.txt new file mode 100644 index 000000000..3b6963d11 --- /dev/null +++ b/deprecated/doc/gf-formalism.txt @@ -0,0 +1,279 @@ +A Birds-Eye View of GF as a Grammar Formalism +Author: Aarne Ranta +Last update: %%date(%c) + +% NOTE: this is a txt2tags file. +% Create an html file from this file using: +% txt2tags -thtml --toc gf-formalism.txt + +%!target:html + +%!postproc(html): #NEW + +[Logos/gf0.png] + +//Abstract. This document gives a general description of the// +//Grammatical Framework (GF), with comparisons to other grammar// +//formalisms such as CG, ACG, HPSG, and LFG.// + + +#NEW + +==Logical Frameworks and Grammar Formalisms== + +Logic - formalization of mathematics (mathematical language?) + +Linguistics - formalization of natural language + +Since math lang is a subset, we can expect similarities. + +But in natural language we have +- masses of empirical data +- no right of reform + + + +#NEW + +==High-level programming== + +We have to write a lot of program code when formalizing language. + +We need a language with proper abstractions. + +Cf. Paul Graham on Prolog: very high-level, but wrong abstractions. + +Typed functional languages work well in maths. + +We have developed one for linguistics +- some extra constructs, e.g. inflection tables +- constraint of reversibility (nontrivial math problem) + + +Writing a grammar of e.g. French clitics should not be a topic +on which one can write a paper - it should be easy to render in code +the known facts about languages! + + + +#NEW + +==GF in a few words== + +Grammatical Framework (GF) is a grammar formalism +based on **constructive type theory**. + +GF makes a distinction between **abstract syntax** and **concrete syntax**. + +The abstract syntax part of GF is a **logical framework**, with +dependent types and higher-order functions. + +The concrete syntax is a system of **records** containing strings and features. + +A GF grammar defines a **reversible homomorphism** from an abstract syntax to a +concrete syntax. + +A **multilingual GF grammar** is a set of concrete syntaxes associated with +one abstract syntax. + +GF grammars are written in a high-level **functional programming language**, +which is compiled into a **core language** (GFC). + +GF grammars can be used as **resources**, i.e. as libraries for writing +new grammars; these are compiled and optimized by the method of +**grammar composition**. + +GF has a **module system** that supports grammar engineering and separate +compilation. + + +#NEW + +==History of GF== + +1988. Intuitionistic Categorial Grammar; type theory as abstract syntax, +playing the role of Montague's analysis trees. Grammars implemented in Prolog. + +1994. Type-Theoretical Grammar. Abstract syntax organized as a system of +combinators. Grammars implemented in ALF. + +1996. Multilingual Type-Theoretical Grammar. Rules for generating six +languages from the same abstract syntax. Grammars implemented in ALF, ML, and +Haskell. + +1998. The first implementation of GF as a language of its own. + +2000. New version of GF: high-level functional source language, records used +for concrete syntax. + +2003. The module system. + +2004. Ljunglöf's thesis //Expressivity and Complexity of GF//. + + + +#NEW + +==Some key ingredients of GF in other grammar formalisms== + +- [GF ]: Grammatical Framework +- [CG ]: categorial grammar +- [ACG ]: abstract categorial grammar +- [HPSG ]: head-driven phrase structure grammar +- [LFG ]: lexical functional grammar + + +| / | GF | ACG | LFG | HPSG | CG | +| abstract vs concrete syntax | X | X | ? | - | - | +| type theory | X | X | - | - | X | +| records and features | X | - | X | X | - | + + +#NEW + +==Examples of descriptions in each formalism== + +To be written... + + +#NEW + +==Lambda terms and records== + +In CS, abstract syntax is trees and concrete syntax is strings. +This works more or less for programming languages. + +In CG, all syntax is lambda terms. + +In Montague grammar, abstract syntax is lambda terms and +concrete syntax is trees. Abstract syntax as lambda terms +can be considered well-established. + +In PATR and HPSG, concrete syntax it records. This can be considered +well-established for natural languages. + +In ACG, both are lambda terms. This is more general than GF, +but reversibility requires linearity restriction, which can be +unnatural for grammar writing. + +In GF, linearization from lambda terms to records is reversible, +and grammar writing is not restricted to linear terms. + +Grammar composition in ACG is just function composition. In GF, +it is more restricted... + + +#NEW + +==The structure of GF formalisms== + +The following diagram (to be drawn properly!) describes the +levels. +``` + | programming language design + V + GF source language + | + | type-directed partial evaluation + V + GFC assembly language + | + | Ljunglöf's translation + V + MCFG parser +``` +The last two phases are nontrivial mathematica properties. + +In most grammar formalisms, grammarians have to work on the GFC +(or MCFG) level. + +Maybe they use macros - they are therefore like macro assemblers. But there +are no separately compiled library modules, no type checking, etc. + + +#NEW + +==The expressivity of GF== + +Parsing complexity is the same as MCFG: polynomial, with +unrestricted exponent depending on grammar. +This is between TAG and HPSG. + +If semantic well-formedness (type theory) is taken into account, +then arbitrary logic can be expressed. The well-formedness of +abstract syntax is decidable, but the well-formedness of a +concrete-syntax string can require an arbitrary proof construction +and is therefore undecidable. + +Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF +has the goal of assigning intended trees for strings. This is +generalized to shared trees for different languages. + +The high-level language strives after the properties of +writability and readability (programming language notions). + + +#NEW + +==Grammars and parsing== + +In many projects, a grammar is just seen as a **declarative parsing program**. + +For GF, a grammar is primarily the **definition of a language**. + +Detaching grammars from parsers is a good idea, giving +- more efficient and robust parsing (statistical etc) +- cleaner grammars + + +Separating abstract from concrete syntax is a prerequisite for this: +we want parsers to return abstract syntax objects, and these must exist +independently of parse trees. + +A possible radical approach to parsing: +use a grammar to generate a treebank and machine-learn +a statistical parser from this. + +Comparison: Steedman in CCG has done something like this. + + +#NEW + +==Grammars as software libraries== + +Reuse for different purposes. + +Grammar composition. + + +#NEW + +==Multilinguality== + +In **application grammars**, the AS is a semantic +model, and a CS covers domain terminology and idioms. + +This can give publication-quality translation on +limited domains (e.g. the WebALT project). + +Resource grammars with grammar composition lead to +**compile-time transfer**. + +When is **run-time transfer** necessary? + +Cf. CLE (Core Language Engine). + + +#NEW + +==Parametrized modules== + +This notion comes from the ML language in the 1980's. + +It can be used for sharing even more code between languages +than their AS. + +Especially, for related languages (Scandinavian, Romance). + +Cf. grammar porting in CLE: what they do with untyped +macro packages GF does with typable interfaces. diff --git a/deprecated/doc/gf-ideas.html b/deprecated/doc/gf-ideas.html new file mode 100644 index 000000000..8119740fa --- /dev/null +++ b/deprecated/doc/gf-ideas.html @@ -0,0 +1,311 @@ + + + + + +GF Project Ideas + + +

+ +

GF Project Ideas

+ +Resource Grammars, Web Applications, etc
+contact: Aarne Ranta (aarne at chalmers dot se) + + +

Resource Grammar Implementations +
- Tasks +
- Who is qualified +
- The Summer School +
+
Other project ideas +
+
Dissemination and intellectual property +

+ +

Resource Grammar Implementations

+GF Resource Grammar Library is an open-source computational grammar resource +that currently covers 12 languages. +The Library is a collaborative effort to which programmers from many countries +have contributed. The next goal is to extend the library +to all of the 23 official EU languages. Also other languages +are welcome all the time. The following diagram show the current status of the +library. Each of the red and yellow ones are a potential project. +

+ +

+red=wanted, green=exists, orange=in-progress, solid=official-eu, dotted=non-eu +

+The linguistic coverage of the library includes the inflectional morphology +and basic syntax of each language. It can be used in GF applications +and also ported to other formats. It can also be used for building other +linguistic resources, such as morphological lexica and parsers. +The library is licensed under LGPL. +

+ +

Tasks

+Writing a grammar for a language is usually easier if other languages +from the same family already have grammars. The colours have the same +meaning as in the diagram above; in addition, we use boldface for the +red, still unimplemented languages and italics for the +orange languages in progress. Thus, in particular, each of the languages +coloured red below are possible programming projects. +

+Baltic: +

Latvian +
Lithuanian +

+ +

+Celtic: +

Irish +

+ +

+Fenno-Ugric: +

Estonian +
Finnish +
Hungarian +

+ +

+Germanic: +

Danish +
Dutch +
English +
German +
Norwegian +
Swedish +

+ +

+Hellenic: +

Greek +

+ +

+Indo-Iranian: +

Hindi +
Urdu +

+ +

+Romance: +

Catalan +
French +
Italian +
Portuguese +
Romanian +
Spanish +

+ +

+Semitic: +

Arabic +
Maltese +

+ +

+Slavonic: +

Bulgarian +
Czech +
Polish +
Russian +
Slovak +
Slovenian +

+ +

+Tai: +

Thai +

+ +

+Turkic: +

Turkish +

+ + +

Who is qualified

+Writing a resource grammar implementation requires good general programming +skills, and a good explicit knowledge of the grammar of the target language. +A typical participant could be +

native or fluent speaker of the target language +
interested in languages on the theoretical level, and preferably familiar + with many languages (to be able to think about them on an abstract level) +
familiar with functional programming languages such as ML or Haskell + (GF itself is a language similar to these) +
on Master's or PhD level in linguistics, computer science, or mathematics +

+ +

+But it is the quality of the assignment that is assessed, not any formal +requirements. The "typical participant" was described to give an idea of +who is likely to succeed in this. +

+ +

The Summer School

+A Summer School on resource grammars and applications will +be organized at the campus of Chalmers University of Technology in Gothenburg, +Sweden, on 17-28 August 2009. It can be seen as a natural checkpoint in +a resource grammar project; the participants are assumed to learn GF before +the Summer School, but how far they have come in their projects may vary. +

+More information on the Summer School web page: +

+http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/gf-summerschool.html +

+ +

Other project ideas

+ +

GF interpreter in Java

+The idea is to write a run-time system for GF grammars in Java. This enables +the use of embedded grammars in Java applications. This project is +a fresh-up of earlier work, +now using the new run-time format PGF and addressing a new parsing algorithm. +

+Requirements: Java, Haskell, basics of compilers and parsing algorithms. +

+ +

GF interpreter in C#

+The idea is to write a run-time system for GF grammars in C#. This enables +the use of embedded grammars in C# applications. This project is +similar to earlier work +on Java, now addressing C# and using the new run-time format PGF. +

+Requirements: C#, Haskell, basics of compilers and parsing algorithms. +

+ +

GF localization library

+This is an idea for a software localization library using GF grammars. +The library should replace strings by grammar rules, which can be conceived +as very smart templates always guaranteeing grammatically correct output. +The library should be based on the +GF Resource Grammar Library, providing infrastructure +currently for 12 languages. +

+Requirements: GF, some natural languages, some localization platform +

+ +

Multilingual grammar applications for mobile phones

+GF grammars can be compiled into programs that can be run on different +platforms, such as web browsers and mobile phones. An example is a +numeral translator running on both these platforms. +

+The proposed project is rather open: find some cool applications of +the technology that are useful or entertaining for mobile phone users. A +part of the project is to investigate implementation issues such as making +the best use of the phone's resources. Possible applications have +something to do with translation; one suggestion is an sms editor/translator. +

+Requirements: GF, JavaScript, some phone application development tools +

+ +

Multilingual grammar applications for the web

+This project is rather open: find some cool applications of +the technology that are useful or entertaining on the web. Examples include +

translators: see demo +
multilingual wikis: see demo +
fridge magnets: see demo +

+ +

+Requirements: GF, JavaScript or Java and Google Web Toolkit, CGI +

+ +

GMail gadget for GF

+It is possible to add custom gadgets to GMail. If you are going to write +e-mail in a foreign language then you probably will need help from +dictonary or you may want to check something in the grammar. GF provides +all resources that you may need but you have to think about how to +design gadget that fits well in the GMail environment and what +functionality from GF you want to expose. +

+Requirements: GF, Google Web Toolkit +

+ +

Dissemination and intellectual property

+All code suggested here will be released under the LGPL just like +the current resource grammars and run-time GF libraries, +with the copyright held by respective authors. +

+As a rule, the code will be distributed via the GF web site. +

+ + + + diff --git a/deprecated/doc/gf-ideas.txt b/deprecated/doc/gf-ideas.txt new file mode 100644 index 000000000..3f62196b9 --- /dev/null +++ b/deprecated/doc/gf-ideas.txt @@ -0,0 +1,231 @@ +GF Project Ideas +Resource Grammars, Web Applications, etc +contact: Aarne Ranta (aarne at chalmers dot se) + +%!Encoding : iso-8859-1 + +%!target:html +%!postproc(html): #BECE +%!postproc(html): #ENCE +%!postproc(html): #GRAY +%!postproc(html): #EGRAY +%!postproc(html): #RED +%!postproc(html): #YELLOW +%!postproc(html): #ERED +%!postproc(html): #EYELLOW + +#BECE +[Logos/gf0.png] +#ENCE + + +==Resource Grammar Implementations== + +GF Resource Grammar Library is an open-source computational grammar resource +that currently covers 12 languages. +The Library is a collaborative effort to which programmers from many countries +have contributed. The next goal is to extend the library +to all of the 23 official EU languages. Also other languages +are welcome all the time. The following diagram show the current status of the +library. Each of the red and yellow ones are a potential project. + +#BECE +[school-langs.png] +#ENCE + + +//red=wanted, green=exists, orange=in-progress, solid=official-eu, dotted=non-eu// + +The linguistic coverage of the library includes the inflectional morphology +and basic syntax of each language. It can be used in GF applications +and also ported to other formats. It can also be used for building other +linguistic resources, such as morphological lexica and parsers. +The library is licensed under LGPL. + + +===Tasks=== + +Writing a grammar for a language is usually easier if other languages +from the same family already have grammars. The colours have the same +meaning as in the diagram above; in addition, we use boldface for the +red, still unimplemented languages and italics for the +orange languages in progress. Thus, in particular, each of the languages +coloured red below are possible programming projects. + +Baltic: +- #RED Latvian #ERED +- #RED Lithuanian #ERED + + +Celtic: +- #RED Irish #ERED + + +Fenno-Ugric: +- #RED Estonian #ERED +- #GRAY Finnish #EGRAY +- #RED Hungarian #ERED + + +Germanic: +- #GRAY Danish #EGRAY +- #RED Dutch #ERED +- #GRAY English #EGRAY +- #GRAY German #EGRAY +- #GRAY Norwegian #EGRAY +- #GRAY Swedish #EGRAY + + +Hellenic: +- #RED Greek #ERED + + +Indo-Iranian: +- #YELLOW Hindi #EYELLOW +- #YELLOW Urdu #EYELLOW + + +Romance: +- #GRAY Catalan #EGRAY +- #GRAY French #EGRAY +- #GRAY Italian #EGRAY +- #RED Portuguese #ERED +- #YELLOW Romanian #EYELLOW +- #GRAY Spanish #EGRAY + + +Semitic: +- #YELLOW Arabic #EYELLOW +- #RED Maltese #ERED + + +Slavonic: +- #GRAY Bulgarian #EGRAY +- #RED Czech #ERED +- #YELLOW Polish #EYELLOW +- #GRAY Russian #EGRAY +- #RED Slovak #ERED +- #RED Slovenian #ERED + + +Tai: +- #YELLOW Thai #EYELLOW + + +Turkic: +- #YELLOW Turkish #EYELLOW + + +===Who is qualified=== + +Writing a resource grammar implementation requires good general programming +skills, and a good explicit knowledge of the grammar of the target language. +A typical participant could be +- native or fluent speaker of the target language +- interested in languages on the theoretical level, and preferably familiar + with many languages (to be able to think about them on an abstract level) +- familiar with functional programming languages such as ML or Haskell + (GF itself is a language similar to these) +- on Master's or PhD level in linguistics, computer science, or mathematics + + +But it is the quality of the assignment that is assessed, not any formal +requirements. The "typical participant" was described to give an idea of +who is likely to succeed in this. + + +===The Summer School=== + +A Summer School on resource grammars and applications will +be organized at the campus of Chalmers University of Technology in Gothenburg, +Sweden, on 17-28 August 2009. It can be seen as a natural checkpoint in +a resource grammar project; the participants are assumed to learn GF before +the Summer School, but how far they have come in their projects may vary. + +More information on the Summer School web page: + +[``http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/gf-summerschool.html`` http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/gf-summerschool.html] + + +==Other project ideas== + +===GF interpreter in Java=== + +The idea is to write a run-time system for GF grammars in Java. This enables +the use of **embedded grammars** in Java applications. This project is +a fresh-up of [earlier work http://www.cs.chalmers.se/~bringert/gf/gf-java.html], +now using the new run-time format PGF and addressing a new parsing algorithm. + +Requirements: Java, Haskell, basics of compilers and parsing algorithms. + + +===GF interpreter in C#=== + +The idea is to write a run-time system for GF grammars in C#. This enables +the use of **embedded grammars** in C# applications. This project is +similar to [earlier work http://www.cs.chalmers.se/~bringert/gf/gf-java.html] +on Java, now addressing C# and using the new run-time format PGF. + +Requirements: C#, Haskell, basics of compilers and parsing algorithms. + + +===GF localization library=== + +This is an idea for a software localization library using GF grammars. +The library should replace strings by grammar rules, which can be conceived +as very smart templates always guaranteeing grammatically correct output. +The library should be based on the +[GF Resource Grammar Library http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/lib/resource/doc/synopsis.html], providing infrastructure +currently for 12 languages. + +Requirements: GF, some natural languages, some localization platform + + +===Multilingual grammar applications for mobile phones=== + +GF grammars can be compiled into programs that can be run on different +platforms, such as web browsers and mobile phones. An example is a +[numeral translator http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/demos/index-numbers.html] running on both these platforms. + +The proposed project is rather open: find some cool applications of +the technology that are useful or entertaining for mobile phone users. A +part of the project is to investigate implementation issues such as making +the best use of the phone's resources. Possible applications have +something to do with translation; one suggestion is an sms editor/translator. + +Requirements: GF, JavaScript, some phone application development tools + + +===Multilingual grammar applications for the web=== + +This project is rather open: find some cool applications of +the technology that are useful or entertaining on the web. Examples include +- translators: see [demo http://129.16.250.57:41296/translate] +- multilingual wikis: see [demo http://csmisc14.cs.chalmers.se/~meza/restWiki/wiki.cgi] +- fridge magnets: see [demo http://129.16.250.57:41296/fridge] + + +Requirements: GF, JavaScript or Java and Google Web Toolkit, CGI + + +===GMail gadget for GF=== + +It is possible to add custom gadgets to GMail. If you are going to write +e-mail in a foreign language then you probably will need help from +dictonary or you may want to check something in the grammar. GF provides +all resources that you may need but you have to think about how to +design gadget that fits well in the GMail environment and what +functionality from GF you want to expose. + +Requirements: GF, Google Web Toolkit + + + +==Dissemination and intellectual property== + +All code suggested here will be released under the LGPL just like +the current resource grammars and run-time GF libraries, +with the copyright held by respective authors. + +As a rule, the code will be distributed via the GF web site. + diff --git a/deprecated/doc/gf-statistics.txt b/deprecated/doc/gf-statistics.txt new file mode 100644 index 000000000..499ad7d09 --- /dev/null +++ b/deprecated/doc/gf-statistics.txt @@ -0,0 +1,289 @@ +(Adapted from KeY statistics by Vladimir Klebanov) + +This is GF right now: + +Total Physical Source Lines of Code (SLOC) = 42,467 + +Development Effort Estimate, Person-Years (Person-Months) = 10.24 (122.932) + (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) + +Schedule Estimate, Years (Months) = 1.30 (15.56) + (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) + +Estimated Average Number of Developers (Effort/Schedule) = 7.90 + +Total Estimated Cost to Develop = $ 1,383,870 + (average salary = $56,286/year, overhead = 2.40). + +SLOCCount, Copyright (C) 2001-2004 David A. Wheeler + + + +----------- basis of counting: Haskell code + BNFC code - generated Happy parsers + +-- GF/src% wc -l *.hs GF/*.hs GF/*/*.hs GF/*/*/*.hs GF/*/*.cf JavaGUI/*.java +-- date Fri Jun 3 10:00:31 CEST 2005 + + 104 GF.hs + 402 GF/API.hs + 98 GF/GFModes.hs + 379 GF/Shell.hs + 4 GF/Today.hs + 43 GF/API/BatchTranslate.hs + 145 GF/API/GrammarToHaskell.hs + 77 GF/API/IOGrammar.hs + 25 GF/API/MyParser.hs + 177 GF/Canon/AbsGFC.hs + 37 GF/Canon/ByLine.hs + 192 GF/Canon/CanonToGrammar.hs + 293 GF/Canon/CMacros.hs + 79 GF/Canon/GetGFC.hs + 86 GF/Canon/GFC.hs + 291 GF/Canon/LexGFC.hs + 201 GF/Canon/Look.hs + 235 GF/Canon/MkGFC.hs + 46 GF/Canon/PrExp.hs + 352 GF/Canon/PrintGFC.hs + 147 GF/Canon/Share.hs + 207 GF/Canon/SkelGFC.hs + 46 GF/Canon/TestGFC.hs + 49 GF/Canon/Unlex.hs + 202 GF/CF/CanonToCF.hs + 213 GF/CF/CF.hs + 217 GF/CF/CFIdent.hs + 62 GF/CF/CFtoGrammar.hs + 47 GF/CF/CFtoSRG.hs + 206 GF/CF/ChartParser.hs + 191 GF/CF/EBNF.hs + 45 GF/CFGM/AbsCFG.hs + 312 GF/CFGM/LexCFG.hs + 157 GF/CFGM/PrintCFG.hs + 109 GF/CFGM/PrintCFGrammar.hs + 85 GF/CF/PPrCF.hs + 150 GF/CF/PrLBNF.hs + 106 GF/CF/Profile.hs + 141 GF/Compile/BackOpt.hs + 763 GF/Compile/CheckGrammar.hs + 337 GF/Compile/Compile.hs + 136 GF/Compile/Extend.hs + 124 GF/Compile/GetGrammar.hs + 282 GF/Compile/GrammarToCanon.hs + 93 GF/Compile/MkConcrete.hs + 128 GF/Compile/MkResource.hs + 83 GF/Compile/MkUnion.hs + 146 GF/Compile/ModDeps.hs + 294 GF/Compile/NewRename.hs + 227 GF/Compile/Optimize.hs + 76 GF/Compile/PGrammar.hs + 84 GF/Compile/PrOld.hs + 119 GF/Compile/Rebuild.hs + 63 GF/Compile/RemoveLiT.hs + 274 GF/Compile/Rename.hs + 535 GF/Compile/ShellState.hs + 135 GF/Compile/Update.hs + 129 GF/Conversion/GFC.hs + 149 GF/Conversion/GFCtoSimple.hs + 53 GF/Conversion/MCFGtoCFG.hs + 46 GF/Conversion/RemoveEpsilon.hs + 102 GF/Conversion/RemoveErasing.hs + 82 GF/Conversion/RemoveSingletons.hs + 137 GF/Conversion/SimpleToFinite.hs + 26 GF/Conversion/SimpleToMCFG.hs + 230 GF/Conversion/Types.hs + 143 GF/Data/Assoc.hs + 118 GF/Data/BacktrackM.hs + 20 GF/Data/ErrM.hs + 119 GF/Data/GeneralDeduction.hs + 30 GF/Data/Glue.hs + 67 GF/Data/IncrementalDeduction.hs + 61 GF/Data/Map.hs + 662 GF/Data/Operations.hs + 127 GF/Data/OrdMap2.hs + 120 GF/Data/OrdSet.hs + 193 GF/Data/Parsers.hs + 64 GF/Data/RedBlack.hs + 150 GF/Data/RedBlackSet.hs + 19 GF/Data/SharedString.hs + 127 GF/Data/SortedList.hs + 134 GF/Data/Str.hs + 120 GF/Data/Trie2.hs + 129 GF/Data/Trie.hs + 71 GF/Data/Utilities.hs + 243 GF/Data/Zipper.hs + 78 GF/Embed/EmbedAPI.hs + 113 GF/Embed/EmbedCustom.hs + 137 GF/Embed/EmbedParsing.hs + 50 GF/Formalism/CFG.hs + 51 GF/Formalism/GCFG.hs + 58 GF/Formalism/MCFG.hs + 246 GF/Formalism/SimpleGFC.hs + 349 GF/Formalism/Utilities.hs + 30 GF/Fudgets/ArchEdit.hs + 134 GF/Fudgets/CommandF.hs + 51 GF/Fudgets/EventF.hs + 59 GF/Fudgets/FudgetOps.hs + 37 GF/Fudgets/UnicodeF.hs + 86 GF/Grammar/AbsCompute.hs + 38 GF/Grammar/Abstract.hs + 149 GF/Grammar/AppPredefined.hs + 312 GF/Grammar/Compute.hs + 215 GF/Grammar/Grammar.hs + 46 GF/Grammar/Lockfield.hs + 189 GF/Grammar/LookAbs.hs + 182 GF/Grammar/Lookup.hs + 745 GF/Grammar/Macros.hs + 340 GF/Grammar/MMacros.hs + 115 GF/Grammar/PatternMatch.hs + 279 GF/Grammar/PrGrammar.hs + 121 GF/Grammar/Refresh.hs + 44 GF/Grammar/ReservedWords.hs + 251 GF/Grammar/TC.hs + 301 GF/Grammar/TypeCheck.hs + 96 GF/Grammar/Unify.hs + 101 GF/Grammar/Values.hs + 89 GF/Infra/CheckM.hs + 43 GF/Infra/Comments.hs + 152 GF/Infra/Ident.hs + 390 GF/Infra/Modules.hs + 358 GF/Infra/Option.hs + 179 GF/Infra/Print.hs + 331 GF/Infra/ReadFiles.hs + 337 GF/Infra/UseIO.hs + 153 GF/OldParsing/CFGrammar.hs + 283 GF/OldParsing/ConvertFiniteGFC.hs + 121 GF/OldParsing/ConvertFiniteSimple.hs + 34 GF/OldParsing/ConvertGFCtoMCFG.hs + 122 GF/OldParsing/ConvertGFCtoSimple.hs + 44 GF/OldParsing/ConvertGrammar.hs + 52 GF/OldParsing/ConvertMCFGtoCFG.hs + 30 GF/OldParsing/ConvertSimpleToMCFG.hs + 43 GF/OldParsing/GCFG.hs + 86 GF/OldParsing/GeneralChart.hs + 148 GF/OldParsing/GrammarTypes.hs + 50 GF/OldParsing/IncrementalChart.hs + 206 GF/OldParsing/MCFGrammar.hs + 43 GF/OldParsing/ParseCFG.hs + 82 GF/OldParsing/ParseCF.hs + 177 GF/OldParsing/ParseGFC.hs + 37 GF/OldParsing/ParseMCFG.hs + 161 GF/OldParsing/SimpleGFC.hs + 188 GF/OldParsing/Utilities.hs + 51 GF/Parsing/CFG.hs + 66 GF/Parsing/CF.hs + 151 GF/Parsing/GFC.hs + 64 GF/Parsing/MCFG.hs + 83 GF/Printing/PrintParser.hs + 127 GF/Printing/PrintSimplifiedTerm.hs + 190 GF/Shell/CommandL.hs + 556 GF/Shell/Commands.hs + 524 GF/Shell/HelpFile.hs + 79 GF/Shell/JGF.hs + 171 GF/Shell/PShell.hs + 221 GF/Shell/ShellCommands.hs + 66 GF/Shell/SubShell.hs + 87 GF/Shell/TeachYourself.hs + 296 GF/Source/AbsGF.hs + 229 GF/Source/GrammarToSource.hs + 312 GF/Source/LexGF.hs + 528 GF/Source/PrintGF.hs + 353 GF/Source/SkelGF.hs + 657 GF/Source/SourceToGrammar.hs + 58 GF/Source/TestGF.hs + 72 GF/Speech/PrGSL.hs + 65 GF/Speech/PrJSGF.hs + 128 GF/Speech/SRG.hs + 103 GF/Speech/TransformCFG.hs + 30 GF/System/ArchEdit.hs + 90 GF/System/Arch.hs + 27 GF/System/NoReadline.hs + 27 GF/System/Readline.hs + 73 GF/System/Tracing.hs + 25 GF/System/UseReadline.hs + 63 GF/Text/Arabic.hs + 97 GF/Text/Devanagari.hs + 72 GF/Text/Ethiopic.hs + 99 GF/Text/ExtendedArabic.hs + 37 GF/Text/ExtraDiacritics.hs + 172 GF/Text/Greek.hs + 53 GF/Text/Hebrew.hs + 95 GF/Text/Hiragana.hs + 69 GF/Text/LatinASupplement.hs + 47 GF/Text/OCSCyrillic.hs + 45 GF/Text/Russian.hs + 77 GF/Text/Tamil.hs + 125 GF/Text/Text.hs + 69 GF/Text/Unicode.hs + 47 GF/Text/UTF8.hs + 56 GF/Translate/GFT.hs + 427 GF/UseGrammar/Custom.hs + 435 GF/UseGrammar/Editing.hs + 180 GF/UseGrammar/Generate.hs + 71 GF/UseGrammar/GetTree.hs + 143 GF/UseGrammar/Information.hs + 228 GF/UseGrammar/Linear.hs + 130 GF/UseGrammar/Morphology.hs + 70 GF/UseGrammar/Paraphrases.hs + 157 GF/UseGrammar/Parsing.hs + 66 GF/UseGrammar/Randomized.hs + 170 GF/UseGrammar/Session.hs + 186 GF/UseGrammar/Tokenize.hs + 43 GF/UseGrammar/Transfer.hs + 122 GF/Visualization/NewVisualizationGrammar.hs + 123 GF/Visualization/VisualizeGrammar.hs + 63 GF/Conversion/SimpleToMCFG/Coercions.hs + 256 GF/Conversion/SimpleToMCFG/Nondet.hs + 129 GF/Conversion/SimpleToMCFG/Strict.hs + 71 GF/OldParsing/ConvertGFCtoMCFG/Coercions.hs + 281 GF/OldParsing/ConvertGFCtoMCFG/Nondet.hs + 277 GF/OldParsing/ConvertGFCtoMCFG/Old.hs + 189 GF/OldParsing/ConvertGFCtoMCFG/Strict.hs + 70 GF/OldParsing/ConvertSimpleToMCFG/Coercions.hs + 245 GF/OldParsing/ConvertSimpleToMCFG/Nondet.hs + 277 GF/OldParsing/ConvertSimpleToMCFG/Old.hs + 139 GF/OldParsing/ConvertSimpleToMCFG/Strict.hs + 83 GF/OldParsing/ParseCFG/General.hs + 142 GF/OldParsing/ParseCFG/Incremental.hs + 156 GF/OldParsing/ParseMCFG/Basic.hs + 103 GF/Parsing/CFG/General.hs + 150 GF/Parsing/CFG/Incremental.hs + 98 GF/Parsing/CFG/PInfo.hs + 226 GF/Parsing/MCFG/Active2.hs + 304 GF/Parsing/MCFG/Active.hs + 144 GF/Parsing/MCFG/Incremental2.hs + 163 GF/Parsing/MCFG/Incremental.hs + 128 GF/Parsing/MCFG/Naive.hs + 163 GF/Parsing/MCFG/PInfo.hs + 194 GF/Parsing/MCFG/Range.hs + 183 GF/Parsing/MCFG/ViaCFG.hs + 167 GF/Canon/GFC.cf + 36 GF/CFGM/CFG.cf + 321 GF/Source/GF.cf + 272 JavaGUI/DynamicTree2.java + 272 JavaGUI/DynamicTree.java + 2357 JavaGUI/GFEditor2.java + 1420 JavaGUI/GFEditor.java + 30 JavaGUI/GrammarFilter.java + 13 JavaGUI/LinPosition.java + 18 JavaGUI/MarkedArea.java + 1552 JavaGUI/Numerals.java + 22 JavaGUI/Utils.java + 5956 total + 48713 total + +- 2131 GF/Canon/ParGFC.hs + 3336 GF/Source/ParGF.hs + 779 GF/CFGM/ParCFG.hs + + 42467 total + +-------- + +sloccount sloc = + let + ksloc = sloc / 1000 + effort = 2.4 * (ksloc ** 1.05) + schedule = 2.5 * (effort ** 0.38) + develops = effort / schedule + cost = 56286 * (effort/12) * 2.4 + in + [sloc,ksloc,effort,effort/12,schedule,schedule/12,develops,cost] diff --git a/deprecated/doc/gf-summerschool.txt b/deprecated/doc/gf-summerschool.txt new file mode 100644 index 000000000..0acf9177d --- /dev/null +++ b/deprecated/doc/gf-summerschool.txt @@ -0,0 +1,533 @@ +GF Resource Grammar Summer School +Gothenburg, 17-28 August 2009 +Aarne Ranta (aarne at chalmers.se) + +%!Encoding : iso-8859-1 + +%!target:html +%!postproc(html): #BECE +%!postproc(html): #ENCE +%!postproc(html): #GRAY +%!postproc(html): #EGRAY +%!postproc(html): #RED +%!postproc(html): #YELLOW +%!postproc(html): #ERED + +#BECE +[school-langs.png] +#ENCE + + +//red=wanted, green=exists, orange=in-progress, solid=official-eu, dotted=non-eu// + + +==News== + +An on-line course //GF for Resource Grammar Writers// will start on +Monday 20 April at 15.30 CEST. The slides and recordings of the five +45-minute lectures will be made available via this web page. If requested, +the course may be repeated in the beginning of the summer school. + + +==Executive summary== + +GF Resource Grammar Library is an open-source computational grammar resource +that currently covers 12 languages. +The Summer School is a part of a collaborative effort to extend the library +to all of the 23 official EU languages. Also other languages +chosen by the participants are welcome. + +The missing EU languages are: +Czech, Dutch, Estonian, Greek, Hungarian, Irish, Latvian, Lithuanian, +Maltese, Portuguese, Slovak, and Slovenian. There is also more work to +be done on Polish and Romanian. + +The linguistic coverage of the library includes the inflectional morphology +and basic syntax of each language. It can be used in GF applications +and also ported to other formats. It can also be used for building other +linguistic resources, such as morphological lexica and parsers. +The library is licensed under LGPL. + +In the summer school, each language will be implemented by one or two students +working together. A morphology implementation will be credited +as a Chalmers course worth 7.5 ETCS points; adding a syntax implementation +will be worth more. The estimated total work load is 1-2 months for the +morphology, and 3-6 months for the whole grammar. + +Participation in the course is free. Registration is done via the courses's +Google group, [``groups.google.com/group/gf-resource-school-2009/`` http://groups.google.com/group/gf-resource-school-2009/]. The registration deadline is 15 June 2009. + +Some travel grants will be available. They are distributed on the basis of a +GF programming contest in April and May. + +The summer school will be held on 17-28 August 2009, at the campus of +Chalmers University of Technology in Gothenburg, Sweden. + + +[align6.png] + +//Word alignment produced by GF from the resource grammar in Bulgarian, English, Italian, German, Finnish, French, and Swedish.// + +==Introduction== + +Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this +document. There is a growing need of linguistic resources for these +languages, to help in tasks such as translation and information retrieval. +These resources should be **portable** and **freely accessible**. +Languages marked in red in the diagram are of particular interest for +the summer school, since they are those on which the effort will be concentrated. + +GF (Grammatical Framework, +[``digitalgrammars.com/gf`` http://digitalgrammars.com/gf]) +is a **functional programming language** designed for writing natural +language grammars. It provides an efficient platform for this task, due to +its modern characteristics: +- It is a functional programming language, similar to Haskell and ML. +- It has a static type system and type checker. +- It has a powerful module system supporting separate compilation + and data abstraction. +- It has an optimizing compiler to **Portable Grammar Format** (PGF). +- PGF can be further compiled to other formats, such as JavaScript and + speech recognition language models. +- GF has a **resource grammar library** giving access to the morphology and + basic syntax of 12 languages. + + +In addition to "ordinary" grammars for single languages, GF +supports **multilingual grammars**. A multilingual GF grammar consists of an +**abstract syntax** and a set of **concrete syntaxes**. +An abstract syntax is system of **trees**, serving as a semantic +model or an ontology. A concrete syntax is a mapping from abstract syntax +trees to strings of a particular language. + +These mappings defined in concrete syntax are **reversible**: they +can be used both for **generating** strings from trees, and for +**parsing** strings into trees. Combinations of generation and +parsing can be used for **translation**, where the abstract +syntax works as an **interlingua**. Thus GF has been used as a +framework for building translation systems in several areas +of application and large sets of languages. + + + +==The GF resource grammar library== + +The GF resource grammar library is a set of grammars usable as libraries when +building translation systems and other applications. +The library currently covers +the 9 languages coloured in green in the diagram above; in addition, +Catalan, Norwegian, and Russian are covered, and there is ongoing work on +Arabic, Hindi/Urdu, Polish, Romanian, and Thai. + +The purpose of the resource grammar library is to define the "low-level" structure +of a language: inflection, word order, agreement. This structure belongs to what +linguists call morphology and syntax. It can be very complex and requires +a lot of knowledge. Yet, when translating from one language to +another, knowing morphology and syntax is but a part of what is needed. +The translator (whether human +or machine) must understand the meaning of what is translated, and must also know +the idiomatic way to express the meaning in the target language. This knowledge +can be very domain-dependent and requires in general an expert in the field to +reach high quality: a mathematician in the field of mathematics, a meteorologist +in the field of weather reports, etc. + +The problem is to find a person who is an expert in both the domain of translation +and in the low-level linguistic details. It is the rareness of this combination +that has made it difficult to build interlingua-based translation systems. +The GF resource grammar library has the mission of helping in this task. +It encapsulates the low-level linguistics in program modules +accessed through easy-to-use interfaces. +Experts on different domains can build translation systems by using the library, +without knowing low-level linguistics. The idea is much the same as when a +programmer builds a graphical user interface (GUI) from high-level elements such as +buttons and menus, without having to care about pixels or geometrical forms. + + +===Missing EU languages, by the family=== + +Writing a grammar for a language is usually easier if other languages +from the same family already have grammars. The colours have the same +meaning as in the diagram above. + +Baltic: +#RED Latvian #ERED +#RED Lithuanian #ERED + +Celtic: +#RED Irish #ERED + +Fenno-Ugric: +#RED Estonian #ERED +#GRAY Finnish #EGRAY +#RED Hungarian #ERED + +Germanic: +#GRAY Danish #EGRAY +#RED Dutch #ERED +#GRAY English #EGRAY +#GRAY German #EGRAY +#GRAY Swedish #EGRAY + +Hellenic: +#RED Greek #ERED + +Romance: +#GRAY French #EGRAY +#GRAY Italian #EGRAY +#RED Portuguese #ERED +#YELLOW Romanian #ERED +#GRAY Spanish #EGRAY + +Semitic: +#RED Maltese #ERED + +Slavonic: +#GRAY Bulgarian #EGRAY +#RED Czech #ERED +#YELLOW Polish #ERED +#RED Slovak #ERED +#RED Slovenian #ERED + + + + + + +===Applications of the library=== + +In addition to translation, the library is also useful in **localization**, +that is, porting a piece of software to new languages. +The GF resource grammar library has been used in three major projects that need +interlingua-based translation or localization of systems to new languages: +- in KeY, + [``http://www.key-project.org/`` http://www.key-project.org/], + for writing formal and informal software specifications (3 languages) +- in WebALT, + [``http://webalt.math.helsinki.fi/content/index_eng.html`` http://webalt.math.helsinki.fi/content/index_eng.html], + for translating mathematical exercises to 7 languages +- in TALK [``http://www.talk-project.org`` http://www.talk-project.org], + where the library was used for localizing spoken dialogue systems + to six languages + + +The library is also a generic **linguistic resource**, +which can be used for tasks +such as language teaching and information retrieval. The liberal license (LGPL) +makes it usable for anyone and for any task. GF also has tools supporting the +use of grammars in programs written in other +programming languages: C, C++, Haskell, +Java, JavaScript, and Prolog. In connection with the TALK project, +support has also been +developed for translating GF grammars to language models used in speech +recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF). + + + +===The structure of the library=== + +The library has the following main parts: +- **Inflection paradigms**, covering the inflection of each language. +- **Core Syntax**, covering a large set of syntax rule that + can be implemented for all languages involved. +- **Common Test Lexicon**, giving ca. 500 common words that can be used for + testing the library. +- **Language-Specific Syntax Extensions**, covering syntax rules that are + not implementable for all languages. +- **Language-Specific Lexica**, word lists for each language, with + accurate morphological and syntactic information. + + +The goal of the summer school is to implement, for each language, at least +the first three components. The latter three are more open-ended in character. + + +==The summer school== + +The goal of the summer school is to extend the GF resource grammar library +to covering all 23 EU languages, which means we need 15 new languages. +We also welcome other languages than these 23, +if there are interested participants. + +The amount of work and skill is between a Master's thesis and a PhD thesis. +The Russian implementation was made by Janna Khegai as a part of her +PhD thesis; the thesis contains other material, too. +The Arabic implementation was started by Ali El Dada in his Master's thesis, +but the thesis does not cover the whole API. The realistic amount of work is +somewhere between 3 and 8 person months, +but this is very much language-dependent. +Dutch, for instance, can profit from previous implementations of German and +Scandinavian languages, and will probably require less work. +Latvian and Lithuanian are the first languages of the Baltic family and +will probably require more work. + +In any case, the proposed allocation of work power is 2 participants per +language. They will do 1 months' worth of home work, followed +by 2 weeks of summer school, followed by 4 months work at home. +Who are these participants? + + +===Selecting participants=== + +Persons interested to participate in the Summer School should sign up in +the **Google Group** of the course, + +[``groups.google.com/group/gf-resource-school-2009/`` http://groups.google.com/group/gf-resource-school-2009/] + +The registration deadline is 15 June 2009. + +Notice: you can sign up in the Google +group even if you are not planning to attend the summer school, but are +just interested in the topic. There will be a separate registration to the +school itself later. + +The participants are recommended to learn GF in advance, by self-study from the +[tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html]. +This should take a couple of weeks. An **on-line course** will be +arranged on 20-29 April to help in getting started with GF. + +At the end of the on-line course, a **programming assignment** will be published. +This assignment will test skills required in resource grammar programming. +Work on the assignment will take a couple of weeks. +Those who are interested in getting a travel grant will submit +their sample resource grammar fragment +to the Summer School Committee by 12 May. +The Committee then decides who is given a travel grant of up to 1000 EUR. + +Notice: you can participate in the summer school without following the on-line +course or participating in the contest. These things are required only if you +want a travel grant. If requested by enough many participants, the lectures of +the on-line course will be repeated in the beginning of the summer school. + +The summer school itself is devoted for working on resource grammars. +In addition to grammar writing itself, testing and evaluation is +performed. One way to do this is via adding new languages +to resource grammar applications - in particular, to the WebALT mathematical +exercise translator. + +The resource grammars are expected to be completed by December 2009. They will +be published at GF website and licensed under LGPL. + +The participants are encouraged to contact each other and even work in groups. + + + +===Who is qualified=== + +Writing a resource grammar implementation requires good general programming +skills, and a good explicit knowledge of the grammar of the target language. +A typical participant could be +- native or fluent speaker of the target language +- interested in languages on the theoretical level, and preferably familiar + with many languages (to be able to think about them on an abstract level) +- familiar with functional programming languages such as ML or Haskell + (GF itself is a language similar to these) +- on Master's or PhD level in linguistics, computer science, or mathematics + + +But it is the quality of the assignment that is assessed, not any formal +requirements. The "typical participant" was described to give an idea of +who is likely to succeed in this. + + +===Costs=== + +The summer school is free of charge. + +Some travel grants are given, on the basis of a programming contest, +to cover travel and accommodation costs up to 1000 EUR +per person. + +The number of grants will be decided during Spring 2009, and the grand +holders will be notified before the beginning of June. + +Special terms will apply to students in +[GSLT http://www.gslt.hum.gu.se/] and +[NGSLT http://ngslt.org/]. + + + + + +===Teachers=== + +A list of teachers will be published here later. Some of the local teachers +probably involved are the following: +- Krasimir Angelov +- Robin Cooper +- H�kan Burden +- Markus Forsberg +- Harald Hammarstr�m +- Peter Ljungl�f +- Aarne Ranta + + +More teachers are welcome! If you are interested, please contact us so that +we can discuss your involvement and travel arrangements. + +In addition to teachers, we will look for consultants who can help to assess +the results for each language. Please contact us! + + + +===The Summer School Committee=== + +This committee consists of a number of teachers and informants, +who will select the participants. It will be selected by April 2009. + + +===Time and Place=== + +The summer school will +be organized at the campus of Chalmers University of Technology in Gothenburg, +Sweden, on 17-28 August 2009. + +Time schedule: +- February: announcement of summer school +- 20-29 April: on-line course +- 12 May: submission deadline for assignment work +- 31 May: review of assignments, notifications of acceptance +- 15 June: **registration deadline** +- 17-28 August: Summer School +- September-December: homework on resource grammars +- December: release of the extended Resource Grammar Library + + +===Dissemination and intellectual property=== + +The new resource grammars will be released under the LGPL just like +the current resource grammars, +with the copyright held by respective authors. + +The grammars will be distributed via the GF web site. + + + +==Why I should participate== + +Seven reasons: ++ participation in a pioneering language technology work in an + enthusiastic atmosphere ++ work and fun with people from all over Europe and the world ++ job opportunities and business ideas ++ credits: the school project will be established as a course at Chalmers worth + 7.5 or 15 ETCS points per person, depending on the work accompliched; also + extensions to Master's thesis will be considered (special credit arrangements + for [GSLT http://www.gslt.hum.gu.se/] and [NGSLT http://ngslt.org/]) ++ merits: the resulting grammar can easily lead to a published paper (see below) ++ contribution to the multilingual and multicultural development of Europe and the + world ++ free trip and stay in Gothenburg (for travel grant students) + + +==More information== + +[Course Google Group http://groups.google.com/group/gf-resource-school-2009/] + +[GF web page http://digitalgrammars.com/gf/] + +[GF tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html] + +[GF resource synopsis http://digitalgrammars.com/gf/lib/resource/doc/synopsis.html] + +[Resource-HOWTO document http://digitalgrammars.com/gf/doc/Resource-HOWTO.html] + + +===Contact=== + +H�kan Burden: burden at chalmers se + +Aarne Ranta: aarne at chalmers se + + + +===Selected publications from earlier resource grammar projects=== + +K. Angelov. +Type-Theoretical Bulgarian Grammar. +In B. Nordstr�m and A. Ranta (eds), +//Advances in Natural Language Processing (GoTAL 2008)//, +LNCS/LNAI 5221, Springer, +2008. + +B. Bringert. +//Programming Language Techniques for Natural Language Applications//. +Phd thesis, Computer Science, University of Gothenburg, +2008. + +A. El Dada and A. Ranta. +Implementing an Open Source Arabic Resource Grammar in GF. +In M. Mughazy (ed), +//Perspectives on Arabic Linguistics XX. Papers from the Twentieth Annual Symposium on Arabic Linguistics, Kalamazoo, March 26// +John Benjamins Publishing Company. +2007. + +A. El Dada. +Implementation of the Arabic Numerals and their Syntax in GF. +Computational Approaches to Semitic Languages: Common Issues and Resources, + ACL-2007 Workshop, +June 28, 2007, Prague. +2007. + +H. Hammarstr�m and A. Ranta. +Cardinal Numerals Revisited in GF. +//Workshop on Numerals in the World's Languages//. +Dept. of Linguistics Max Planck Institute for Evolutionary Anthropology, Leipzig, +2004. + +M. Humayoun, H. Hammarstr�m, and A. Ranta. +Urdu Morphology, Orthography and Lexicon Extraction. +//CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages//, +July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University. +2007. + +K. Johannisson. +//Formal and Informal Software Specifications.// +Phd thesis, Computer Science, University of Gothenburg, +2005. + +J. Khegai. +GF parallel resource grammars and Russian. +In proceedings of ACL2006 + (The joint conference of the International Committee on Computational + Linguistics and the Association for Computational Linguistics) (pp. 475-482), + Sydney, Australia, July 2006. + +J. Khegai. +//Language engineering in Grammatical Framework (GF)//. +Phd thesis, Computer Science, Chalmers University of Technology, +2006. + +W. Ng'ang'a. +Multilingual content development for eLearning in Africa. +eLearning Africa: 1st Pan-African Conference on ICT for Development, + Education and Training. 24-26 May 2006, Addis Ababa, Ethiopia. +2006. + +N. Perera and A. Ranta. +Dialogue System Localization with the GF Resource Grammar Library. +//SPEECHGRAM 2007: ACL Workshop on Grammar-Based Approaches to Spoken Language Processing//, +June 29, 2007, Prague. +2007. + +A. Ranta. +Modular Grammar Engineering in GF. +//Research on Language and Computation//, +5:133-158, 2007. + +A. Ranta. +How predictable is Finnish morphology? An experiment on lexicon construction. +In J. Nivre, M. Dahll�f and B. Megyesi (eds), +//Resourceful Language Technology: Festschrift in Honor of Anna S�gvall Hein//, +University of Uppsala, +2008. + +A. Ranta. Grammars as Software Libraries. +To appear in +Y. Bertot, G. Huet, J-J. L�vy, and G. Plotkin (eds.), +//From Semantics to Computer Science//, +Cambridge University Press, Cambridge, 2009. + +A. Ranta and K. Angelov. +Implementing Controlled Languages in GF. +To appear in the proceedings of //CNL 2009//. + diff --git a/deprecated/doc/gf3-release.html b/deprecated/doc/gf3-release.html new file mode 100644 index 000000000..75557c94a --- /dev/null +++ b/deprecated/doc/gf3-release.html @@ -0,0 +1,73 @@ + + + + +GF 3.0 + +

GF 3.0

+ +Krasimir Angelov, Bj�rn Bringert, and Aarne Ranta
+Beta release, 27 June 2008 + + +

+GF Version 3.0 is a major revision of GF. The source language is a superset of the +language in 2.9, which means backward compatibility. But the target languages, the +compiler implementation, and the functionalities (e.g. the shell) have undergone +radical changes. +

New features

+Here is a summary of the main novelties visible to the user: +

Size: the source code and the executable binary size have gone + down to about the half of 2.9. +
Portability: the new back end format PGF (Portable Grammar Format) is + much simpler than the old GFC format, and therefore easier to port to new + platforms. +
Multilingual web page support: as an example of portability, GF 3.0 provides a + compiler from PGF to JavaScript. There are also JavaScript libraries for creating + translators and syntax editors as client-side web applications. +
Incremental parsing: there is a possibility of word completion when + input strings are sent to the parser. +
Application programmer's interfaces: both source-GF and PGF formats, + the shell, and the compiler are accessible via high-level APIs. +
Resource library version 1.4: more coverage, more languages; some of + the new GF language features are exploited. +
Uniform character encoding: UTF8 in generated files, user-definable in + source files +

+ +

Non-supported features

+There are some features of GF 2.9 that will not work in the 3.0 beta release. +

Java Editor GUI: we now see the JavaScript editor as the main form of + syntax editing. +
Pre-module multi-file grammar format: the grammar format of GF before version 2.0 + is still not yet supported. +
Context-free and EBNF input grammar formats. +
Probabilistic GF grammars. +
Some output formats: LBNF. +
Some GF shell commands: while the main ones will be supported with their familiar + syntax and options, some old commands have not been included. The GF shell + command help -changes gives the actual list. +

+ +

+Users who want to have these features are welcome to contact us, +and even more welcome to contribute code that restores them! +

GF language extensions

+Operations for defining patterns. +

+Inheritance of overload groups. +

+ + + + diff --git a/deprecated/doc/gf3-release.txt b/deprecated/doc/gf3-release.txt new file mode 100644 index 000000000..631752c90 --- /dev/null +++ b/deprecated/doc/gf3-release.txt @@ -0,0 +1,58 @@ +GF 3.0 +Krasimir Angelov, Bj�rn Bringert, and Aarne Ranta +Beta release, 27 June 2008 + + +GF Version 3.0 is a major revision of GF. The source language is a superset of the +language in 2.9, which means backward compatibility. But the target languages, the +compiler implementation, and the functionalities (e.g. the shell) have undergone +radical changes. + + +==New features== + +Here is a summary of the main novelties visible to the user: +- **Size**: the source code and the executable binary size have gone + down to about the half of 2.9. +- **Portability**: the new back end format PGF (Portable Grammar Format) is + much simpler than the old GFC format, and therefore easier to port to new + platforms. +- **Multilingual web page support**: as an example of portability, GF 3.0 provides a + compiler from PGF to JavaScript. There are also JavaScript libraries for creating + translators and syntax editors as client-side web applications. +- **Incremental parsing**: there is a possibility of word completion when + input strings are sent to the parser. +- **Application programmer's interfaces**: both source-GF and PGF formats, + the shell, and the compiler are accessible via high-level APIs. +- **Resource library version 1.4**: more coverage, more languages; some of + the new GF language features are exploited. +- **Uniform character encoding**: UTF8 in generated files, user-definable in + source files + + +==Non-supported features== + +There are some features of GF 2.9 that will //not// work in the 3.0 beta release. +- Java Editor GUI: we now see the JavaScript editor as the main form of + syntax editing. +- Pre-module multi-file grammar format: the grammar format of GF before version 2.0 + is still not yet supported. +- Context-free and EBNF input grammar formats. +- Probabilistic GF grammars. +- Some output formats: LBNF. +- Some GF shell commands: while the main ones will be supported with their familiar + syntax and options, some old commands have not been included. The GF shell + command ``help -changes`` gives the actual list. + + +Users who want to have these features are welcome to contact us, +and even more welcome to contribute code that restores them! + + +==GF language extensions== + +Operations for defining patterns. + +Inheritance of overload groups. + + diff --git a/deprecated/doc/school-langs.dot b/deprecated/doc/school-langs.dot new file mode 100644 index 000000000..88e0a9c96 --- /dev/null +++ b/deprecated/doc/school-langs.dot @@ -0,0 +1,106 @@ +graph{ + +size = "8,8" ; + +overlap = scale ; + +"Abs" [label = "Abstract Syntax", style = "solid", shape = "rectangle"] ; + +"1" [label = "Bulgarian", style = "solid", shape = "ellipse", color = "green"] ; +"1" -- "Abs" [style = "solid"]; + +"2" [label = "Czech", style = "solid", shape = "ellipse", color = "red"] ; +"2" -- "Abs" [style = "solid"]; + +"3" [label = "Danish", style = "solid", shape = "ellipse", color = "green"] ; +"3" -- "Abs" [style = "solid"]; + +"4" [label = "German", style = "solid", shape = "ellipse", color = "green"] ; +"4" -- "Abs" [style = "solid"]; + +"5" [label = "Estonian", style = "solid", shape = "ellipse", color = "red"] ; +"5" -- "Abs" [style = "solid"]; + +"6" [label = "Greek", style = "solid", shape = "ellipse", color = "red"] ; +"6" -- "Abs" [style = "solid"]; + +"7" [label = "English", style = "solid", shape = "ellipse", color = "green"] ; +"7" -- "Abs" [style = "solid"]; + +"8" [label = "Spanish", style = "solid", shape = "ellipse", color = "green"] ; +"8" -- "Abs" [style = "solid"]; + +"9" [label = "French", style = "solid", shape = "ellipse", color = "green"] ; +"9" -- "Abs" [style = "solid"]; + +"10" [label = "Italian", style = "solid", shape = "ellipse", color = "green"] ; +"10" -- "Abs" [style = "solid"]; + +"11" [label = "Latvian", style = "solid", shape = "ellipse", color = "red"] ; +"11" -- "Abs" [style = "solid"]; + +"12" [label = "Lithuanian", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "12" [style = "solid"]; + +"13" [label = "Irish", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "13" [style = "solid"]; + +"14" [label = "Hungarian", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "14" [style = "solid"]; + +"15" [label = "Maltese", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "15" [style = "solid"]; + +"16" [label = "Dutch", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "16" [style = "solid"]; + +"17" [label = "Polish", style = "solid", shape = "ellipse", color = "orange"] ; +"Abs" -- "17" [style = "solid"]; + +"18" [label = "Portuguese", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "18" [style = "solid"]; + +"19" [label = "Slovak", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "19" [style = "solid"]; + +"20" [label = "Slovene", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "20" [style = "solid"]; + +"21" [label = "Romanian", style = "solid", shape = "ellipse", color = "orange"] ; +"Abs" -- "21" [style = "solid"]; + +"22" [label = "Finnish", style = "solid", shape = "ellipse", color = "green"] ; +"Abs" -- "22" [style = "solid"]; + +"23" [label = "Swedish", style = "solid", shape = "ellipse", color = "green"] ; +"Abs" -- "23" [style = "solid"]; + +"24" [label = "Catalan", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "24" [style = "solid"]; + +"25" [label = "Norwegian", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "25" [style = "solid"]; + +"26" [label = "Russian", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "26" [style = "solid"]; + +"27" [label = "Interlingua", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "27" [style = "solid"]; + +"28" [label = "Latin", style = "dotted", shape = "ellipse", color = "orange"] ; +"Abs" -- "28" [style = "solid"]; +"29" [label = "Turkish", style = "dotted", shape = "ellipse", color = "orange"] ; +"Abs" -- "29" [style = "solid"]; +"30" [label = "Hindi", style = "dotted", shape = "ellipse", color = "orange"] ; +"Abs" -- "30" [style = "solid"]; +"31" [label = "Thai", style = "dotted", shape = "ellipse", color = "orange"] ; +"Abs" -- "31" [style = "solid"]; +"32" [label = "Urdu", style = "dotted", shape = "ellipse", color = "orange"] ; +"Abs" -- "32" [style = "solid"]; +"33" [label = "Telugu", style = "dotted", shape = "ellipse", color = "red"] ; +"Abs" -- "33" [style = "solid"]; +"34" [label = "Arabic", style = "dotted", shape = "ellipse", color = "orange"] ; +"Abs" -- "34" [style = "solid"]; + + +} diff --git a/deprecated/doc/school-langs.png b/deprecated/doc/school-langs.png new file mode 100644 index 000000000..7230e0bff Binary files /dev/null and b/deprecated/doc/school-langs.png differ diff --git a/deprecated/doc/summer-align.png b/deprecated/doc/summer-align.png new file mode 100644 index 000000000..796754408 Binary files /dev/null and b/deprecated/doc/summer-align.png differ diff --git a/deprecated/doc/summer-langs.png b/deprecated/doc/summer-langs.png new file mode 100644 index 000000000..729af722a Binary files /dev/null and b/deprecated/doc/summer-langs.png differ diff --git a/deprecated/doc/vr.html b/deprecated/doc/vr.html new file mode 100644 index 000000000..e5dee1885 --- /dev/null +++ b/deprecated/doc/vr.html @@ -0,0 +1,46 @@ + + + + +Library-Based Grammar Engineering + +

Library-Based Grammar Engineering

+ +VR Project 2006-2008
+ + +

Staff

+Lars Borin (co-leader) +

+Robin Cooper (co-leader) +

+Aarne Ranta (project responsible) +

+Sibylle Schupp (co-leader) +

Publications

+Ali El Dada, MSc Thesis +

+Muhammad Humayoun, MSc Thesis +

+Janna Khegai, +Language Engineering in GF, PhD Thesis, Chalmers. 2006. +

Links

+GF +

+Functional Morphology +

+ + + + diff --git a/deprecated/doc/vr.txt b/deprecated/doc/vr.txt new file mode 100644 index 000000000..9b5045978 --- /dev/null +++ b/deprecated/doc/vr.txt @@ -0,0 +1,32 @@ +Library-Based Grammar Engineering +VR Project 2006-2008 + + +=Staff= + +Lars Borin (co-leader) + +Robin Cooper (co-leader) + +Aarne Ranta (project responsible) + +Sibylle Schupp (co-leader) + + + +=Publications= + +Ali El Dada, MSc Thesis + +Muhammad Humayoun, MSc Thesis + +Janna Khegai, +Language Engineering in GF, PhD Thesis, Chalmers. 2006. + + + +=Links= + +[GF http://www.cs.chalmers.se/~aarne/GF/] + +[Functional Morphology http://www.cs.chalmers.se/~markus/FM/] -- cgit v1.2.3