From 0a2e24493ea62a6e2432fb8bd1396c01ecedf222 Mon Sep 17 00:00:00 2001 From: aarne Date: Wed, 29 Aug 2007 09:00:50 +0000 Subject: appendices to the book --- doc/tutorial/gf-book.txt | 3966 ++++++++++++++++++++++++++-------------------- 1 file changed, 2208 insertions(+), 1758 deletions(-) (limited to 'doc/tutorial/gf-book.txt') diff --git a/doc/tutorial/gf-book.txt b/doc/tutorial/gf-book.txt index 56141bece..738e57d0e 100644 --- a/doc/tutorial/gf-book.txt +++ b/doc/tutorial/gf-book.txt @@ -1,4 +1,4 @@ -Grammatical Framework: Tutorial, Advanced Applications, and Reference Manual +Grammatical Framework: Tutorial, Applications, and Reference Manual Author: Aarne Ranta aarne (at) cs.chalmers.se Last update: %%date(%c) @@ -28,9 +28,16 @@ Last update: %%date(%c) %!postproc(tex): #PARTone "part{Tutorial}" -%!postproc(tex): #PARTtwo "part{Advanced Applications}" +%!postproc(tex): #PARTtwo "part{Advanced Grammars and Applications}" %!postproc(tex): #PARTthree "part{Reference Manual}" +%!postproc(tex): #PARTbnf "include{DocGF}" +%!postproc(tex): #PARTquickref "chapter{Quick Reference}" +%!postproc(tex): #twocolumn "twocolumn" +%!postproc(tex): #smallsize "tiny" +%!postproc(tex): #startappendix "appendix" + + #LOGOPNG @@ -347,7 +354,7 @@ is given in the libraries. -==Who is the tutorial for== +==Who should read this tutorial== The tutorial part of this book is mainly for programmers who want to learn to write application grammars. @@ -357,7 +364,7 @@ linguistics, functional programming, and type theory. This knowledge will be introduced as a part of grammar writing practice. -Thus the book should be accessible to anyone who has some +Thus the tutorial should be accessible to anyone who has some previous programming experience from any programming language; the basics of using computers are also presupposed, e.g. the use of text editors and the management of files. @@ -1638,35 +1645,6 @@ same time: -===System commands=== - -To document your grammar, you may want to print the -graph into a file, e.g. a ``.png`` file that -can be included in an HTML document. You can do this -by first printing the graph into a file ``.dot`` and then -processing this file with the ``dot`` program (from the Graphviz package). -``` - > pm -printer=graph | wf Foodmarket.dot - > ! dot -Tpng Foodmarket.dot > Foodmarket.png -``` -The latter command is a Unix command, issued from GF by using the -shell escape symbol ``!``. The resulting graph was shown in the previous section. - -The command ``print_multi = pm`` is used for printing the current multilingual -grammar in various formats, of which the format ``-printer=graph`` just -shows the module dependencies. Use ``help`` to see what other formats -are available: -``` - > help pm - > help -printer - > help help -``` -Another form of system commands are those usable in GF pipes. The escape symbol -is then ``?``. -``` - > generate_trees | ? wc -``` - ===Division of labour=== @@ -1679,12 +1657,14 @@ available through resource grammar modules, whose users only need to pick the right operations and not to know their implementation details. -In the following sections, we will go through some +In the following Chapter, we will go through some such linguistic details. The programming constructs needed when doing this are useful for all GF programmers, even for those who don't hand-code the linguistics of their applications but get them -from libraries. And it is quite interesting to know something about the -linguistic concepts of inflection, agreement, and parts of speech. +from libraries. And it can be generally interesting to learn something about the +linguistic concepts of inflection, agreement, and parts of speech, in the +form of precise computer-executable code. + ==Summary of GF language features== @@ -2705,2182 +2685,2181 @@ now aiming for complete grammatical correctness by the use of parameters. -=Implementing morphology and syntax= +=Using the resource grammar library= -In this chapter, we will dig deeper into linguistic concepts than -so far. We will build an implementation of a linguistic motivated -fragment of English and Italian, covering basic morphology and syntax. -The result is a miniature of the GF resource library, which will -be covered in the next chapter. There are two main purposes -for this chapter: -- to understand the linguistic concepts underlying the resource - grammar library -- to get practice in the more advanced constructs of concrete syntax +In this chapter, we will take a look at the GF resource grammar library. +We will use the library to implement a slightly extended ``Food`` grammar +and port it to some new languages. +**Exercise**. Define the mini resource of the previous chapter by +using a functor over the full resource. -However, the reader who is not willing to work on an advanced level -of concrete syntax may just skim through the introductory parts of -each section, thus using the chapter in its first purpose only. +==The coverage of the library== -==Lexical vs. syntactic rules== +The GF Resource Grammar Library contains grammar rules for +10 languages (in addition, 2 languages are available as incomplete +implementations, and a few more are under construction). Its purpose +is to make these rules available for application programmers, +who can thereby concentrate on the semantic and stylistic +aspects of their grammars, without having to think about +grammaticality. The targeted level of application grammarians +is that of a skilled programmer with +a practical knowledge of the target languages, but without +theoretical knowledge about their grammars. +Such a combination of +skills is typical of programmers who, for instance, want to localize +software to new languages. -So far we have seen a grammar from a semantic point of view: -a grammar specifies a system of meanings (specified in the abstract syntax) and -tells how they are expressed in some language (as specified in a concrete syntax). -In resource grammars, as in linguistic tradition, the goal is to -specify the **grammatically correct combinations of words**, whatever their -meanings are. +The current resource languages are +- ``Ara``bic (incomplete) +- ``Cat``alan (incomplete) +- ``Dan``ish +- ``Eng``lish +- ``Fin``nish +- ``Fre``nch +- ``Ger``man +- ``Ita``lian +- ``Nor``wegian +- ``Rus``sian +- ``Spa``nish +- ``Swe``dish -Thus the grammar has two kinds of categories and two kinds of rules: -- lexical: - - lexical categories, to classify words - - lexical rules, to define words their properties +The first three letters (``Eng`` etc) are used in grammar module names. +The incomplete Arabic and Catalan implementations are +enough to be used in many applications; they both contain, amoung other +things, complete inflectional morphology. -- phrasal (combinatorial, syntactic): - - phrasal categories, to classify phrases of arbitrary size - - phrasal rules, to combine phrases into larger phrases +==The resource API== -Many grammar formalisms force a radical distinction between the lexical and syntactic -components; sometimes it is not even possible to express the two kinds of rules in -the same formalism. GF has no such restrictions. Nevertheless, it has turned out -to be a good discipline to maintain a distinction between the lexical and syntactic -components. +The resource library API is devided into language-specific +and language-independent parts. To put it roughly, +- the syntax API is language-independent, i.e. has the same types and functions for all + languages. + Its name is ``Syntax``//L// for each language //L// +- the morphology API is language-specific, i.e. has partly different types and functions + for different languages. + Its name is ``Paradigms``//L// for each language //L// +A full documentation of the API is available on-line in the +[resource synopsis ../../lib/resource-1.0/synopsis.html]. For our +examples, we will only need a fragment of the full API. -==The abstract syntax== +In the first examples, +we will make use of the following categories, from the module ``Syntax``. -Let us go through the abstract syntax contained in the module ``Syntax``. -It can be found in the file -[``examples/tutorial/syntax/Syntax.gf`` examples/tutorial/syntax/Syntax.gf]. +|| Category | Explanation | Example || +| ``Utt`` | sentence, question, word... | "be quiet" | +| ``Adv`` | verb-phrase-modifying adverb, | "in the house" | +| ``AdA`` | adjective-modifying adverb, | "very" | +| ``S`` | declarative sentence | "she lived here" | +| ``Cl`` | declarative clause, with all tenses | "she looks at this" | +| ``AP`` | adjectival phrase | "very warm" | +| ``CN`` | common noun (without determiner) | "red house" | +| ``NP`` | noun phrase (subject or object) | "the red house" | +| ``Det`` | determiner phrase | "those seven" | +| ``Predet`` | predeterminer | "only" | +| ``Quant`` | quantifier with both sg and pl | "this/these" | +| ``Prep`` | preposition, or just case | "in" | +| ``A`` | one-place adjective | "warm" | +| ``N`` | common noun | "house" | -===Lexical categories=== +We will need the following syntax rules from ``Syntax``. -Words are classified into two kinds of categories: **closed** and -**open**. The definining property of closed categories is that the -words of them can easily be enumerated; it is very seldom that any -new words are introduced in them. In general, closed categories -contain **structural words**, also known as **function words**. -In ``Syntax``, we have just two closed lexical categories: -``` - cat - Det ; -- determiner e.g. "this" - AdA ; -- adadjective e.g. "very" -``` -We have already used words of both categories in the ``Food`` -examples; they have just not been assigned a category, but -treated as **syncategorematic**. In GF, a syncategoramatic -word is one that is introduced in a linearization rule of -some construction alongside with some other expressions that -are combined; there is no abstract syntax tree for that word -alone. Thus in the rules -``` - fun That : Kind -> Item ; - lin That k = {"that" ++ k.s} ; -``` -the word //that// is syncategoramatic. In linguistically motivated -grammars, syncategorematic words are usually avoided, whereas in -semantically motivated grammars, structural words are often treated -as syncategoramatic. This is partly so because the concept expressed -by a structural word in one language is often expressed by some other -means than an individual word in another. For instance, the definite -article //the// is a determiner word in English, whereas Swedish expresses -determination by inflecting the determined noun: //the wine// is //vinet// -in Swedish. +|| Function | Type | Example || +| ``mkUtt`` | ``S -> Utt`` | //John walked// | +| ``mkUtt`` | ``Cl -> Utt`` | //John walks// | +| ``mkCl`` | ``NP -> AP -> Cl`` | //John is very old// | +| ``mkNP`` | ``Det -> CN -> NP`` | //the first old man// | +| ``mkNP`` | ``Predet -> NP -> NP`` | //only John// | +| ``mkDet`` | ``Quant -> Det`` | //this// | +| ``mkCN`` | ``N -> CN`` | //house// | +| ``mkCN`` | ``AP -> CN -> CN`` | //very big blue house// | +| ``mkAP`` | ``A -> AP`` | //old// | +| ``mkAP`` | ``AdA -> AP -> AP`` | //very very old// | -As for open classes, we will use four: -``` - cat - N ; -- noun e.g. "pizza" - A ; -- adjective e.g. "good" - V ; -- intransitive verb e.g. "boil" - V2 ; -- two-place verb e.g. "eat" -``` -Two-place verbs differ from intransitive verbs syntactically by -taking an object. In the lexicon, they must be equipped with information -on the //case// of the object in some languages (such as German and Latin), -and on the //preposition// in some languages (such as English). +We will also need the following structural words from ``Syntax``. +|| Function | Type | Example || +| ``all_Predet`` | ``Predet`` | //all// | +| ``defPlDet`` | ``Det`` | //the (houses)// | +| ``this_Quant`` | ``Quant`` | //this// | +| ``very_AdA`` | ``AdA`` | //very// | -===Lexical rules=== +For French, we will use the following part of ``ParadigmsFre``. -The words of closed categories can be listed once and for all in a -library. The ``Syntax`` module has the following: -``` - fun - this_Det, that_Det, these_Det, those_Det, - every_Det, theSg_Det, thePl_Det, indef_Det, plur_Det, two_Det : Det ; - very_AdA : AdA ; -``` -The naming convention for lexical rules is that we use a word followed by -the category. In this way we can for instance distinguish the determiner -//that// from the conjunction //that//. But there are also rules where this -does not quite suffice. English has no distinction between singular and -plural //the//; yet they behave differently as determiners, analogously to -//this// vs. //these//. The function //indef_Det// is the indefinite article -//a//, whereas //plur_Det// is semantically the plural indefinite article, -which has no separate word in English, as in some other languages, e.g. -//des// in French. +|| Function | Type || +| ``Gender`` | ``Type`` | +| ``masculine`` | ``Gender`` | +| ``feminine`` | ``Gender`` | +| ``mkN`` | ``(cheval : Str) -> N`` | +| ``mkN`` | ``(foie : Str) -> Gender -> N`` | +| ``mkA`` | ``(cher : Str) -> A`` | +| ``mkA`` | ``(sec,seche : Str) -> A`` | -Open lexical categories have no objects in ``Syntax``. However, we can -build lexical modules as extensions of ``Syntax``. An example is -[``examples/tutorial/syntax/Test.gf`` examples/tutorial/syntax/Test.gf], -which we use to test the syntax. Its vocabulary is from the food domain: -``` - abstract Test = Syntax ** { - fun - wine_N, cheese_N, fish_N, pizza_N, waiter_N, customer_N : N ; - fresh_A, warm_A, italian_A, expensive_A, delicious_A, boring_A : A ; - stink_V : V ; - eat_V2, love_V2, talk_V2 : V2 ; - } -``` -===Phrasal categories=== +For German, we will use the following part of ``ParadigmsGer``. -The topmost category in ``Syntax`` is ``Phr``, **phrase**, covering -all complete sentences, which have a punctuation mark and could be -used alone to make an utterance. In addition to **declarative sentences** -``S``, there are also **question sentences** ``QS``: -``` - cat - Phr ; -- any complete sentence e.g. "Is this pizza good?" - S ; -- declarative sentence e.g. "this pizza is good" - QS ; -- question sentence e.g. "is this pizza good" -``` -The main parts of a sentence are usually taken to be the **noun phrase** ``NP`` and -the **verb phrase** ``VP``. In analogy to noun phrases, we consider -**interrogative phrases**, which are used for forming question sentences. -``` - NP ; -- noun phrase e.g. "this pizza" - IP ; -- interrogative phrase e.g "which pizza" - VP ; -- verb phrase e.g. "is good" -``` -The "smallest" phrasal categories are **common nouns** ``CN`` and -**adjectival phrases** ``AP``: +|| Function | Type || +| ``Gender`` | ``Type`` | +| ``masculine`` | ``Gender`` | +| ``feminine`` | ``Gender`` | +| ``neuter`` | ``Gender`` | +| ``mkN`` | ``(Stufe : Str) -> N`` | +| ``mkN`` | ``(Bild,Bilder : Str) -> Gender -> N`` | +| ``mkA`` | ``(klein : Str) -> A`` | +| ``mkA`` | ``(gut,besser,beste : Str) -> A`` | + + +**Exercise**. Try out the morphological paradigms in different languages. Do +in this way: ``` - CN ; -- common noun phrase e.g. "very good pizza" - AP ; -- adjectival phrase e.g. "very good" + > i -path=alltenses:prelude -retain alltenses/ParadigmsGer.gfr + > cc mkN "Farbe" + > cc mkA "gut" "besser" "beste" ``` -Common nouns are typically combined with determiners to build noun -phrases, whereas adjectival phrases are combined with the copula to -form verb phrases. -===Phrasal rules=== +==Example: French== -Phrasal rules specify how complex phrases are built from simpler ones. -At the bottom, there are **lexical insertion rules** telling how -words from each lexical category are "promoted" to phrases; i.e. how -the most elementary phrases are built. +We start with an abstract syntax that is like ``Food`` before, but +has a plural determiner (//all wines//) and some new nouns that will +need different genders in most languages. ``` + abstract Food = { + cat + S ; Item ; Kind ; Quality ; fun - UseN : N -> CN ; -- pizza - UseA : A -> AP ; -- be good - UseV : V -> VP ; -- stink + Is : Item -> Quality -> S ; + This, All : Kind -> Item ; + QKind : Quality -> Kind -> Kind ; + Wine, Cheese, Fish, Beer, Pizza : Kind ; + Very : Quality -> Quality ; + Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; + } +``` +The French implementation opens ``SyntaxFre`` and ``ParadigmsFre`` +to get access to the resource libraries needed. In order to find +the libraries, a ``path`` directive is prepended; it is interpreted +relative to the environment variable ``GF_LIB_PATH``. ``` -Structural words usually don't form phrases themselves; thus they -are at the first place used for promoting "lower" phrase categories -to "higher" ones, + --# -path=.:present:prelude + + concrete FoodFre of Food = open SyntaxFre,ParadigmsFre in { + lincat + S = Utt ; + Item = NP ; + Kind = CN ; + Quality = AP ; + lin + Is item quality = mkUtt (mkCl item quality) ; + This kind = mkNP (mkDet this_Quant) kind ; + All kind = mkNP all_Predet (mkNP defPlDet kind) ; + QKind quality kind = mkCN quality kind ; + Wine = mkCN (mkN "vin") ; + Beer = mkCN (mkN "bière") ; + Pizza = mkCN (mkN "pizza" feminine) ; + Cheese = mkCN (mkN "fromage" masculine) ; + Fish = mkCN (mkN "poisson") ; + Very quality = mkAP very_AdA quality ; + Fresh = mkAP (mkA "frais" "fraîche") ; + Warm = mkAP (mkA "chaud") ; + Italian = mkAP (mkA "italien") ; + Expensive = mkAP (mkA "cher") ; + Delicious = mkAP (mkA "délicieux") ; + Boring = mkAP (mkA "ennuyeux") ; + } ``` - DetCN : Det -> CN -> NP ; -- this pizza +The ``lincat`` definitions in ``FoodFre`` assign **resource categories** +to **application categories**. In a sense, the application categories +are **semantic**, as they correspond to concepts in the grammar application, +whereas the resource categories are **syntactic**: they give the linguistic +means to express concepts in any application. + +The ``lin`` definitions likewise assign resource functions to application +functions. Under the hood, there is a lot of matching with parameters to +take care of word order, inflection, and agreement. But the user of the +library sees nothing of this: the only parameters you need to give are +the genders of some nouns, which cannot be correctly inferred from the word. + +In French, for example, the one-argument ``mkN`` assigns the noun the feminine +gender if and only if it ends with an //e//. Therefore the words //fromage// and +//pizza// are given genders manually. +One can of course always give genders manually, to be on the safe side. + +As for inflection, the one-argument adjective pattern ``mkA`` takes care of +completely regular adjective such as //chaud-chaude//, but also of special +cases such as //italien-italienne//, //cher-chère//, and //délicieux-délicieuse//. +But it cannot form //frais-fraîche// properly. Once again, you can give more +forms to be on the safe side. You can also test the paradigms in the GF +system. + +**Exercise**. Compile the grammar ``FoodFre`` and generate and parse some sentences. + +**Exercise**. Write a concrete syntax of ``Food`` for English or some other language +included in the resource library. You can also compare the output with the hand-written +grammars presented earlier in this tutorial. + +**Exercise**. In particular, try to write a concrete syntax for Italian, even if +you don't know Italian. What you need to know is that "beer" is //birra// and +"pizza" is //pizza//, and that all the nouns and adjectives in the grammar +are regular. + + + +==Functor implementation of multilingual grammars== + +If you did the exercise of writing a concrete syntax of ``Food`` for some other +language, you probably noticed that much of the code looks exactly the same +as for French. The immediate reason for this is that the ``Syntax`` API is the +same for all languages; the deeper reason is that all languages (at least those +in the resource package) implement the same syntactic structures and tend to use them +in similar ways. Thus it is only the lexical parts of a concrete syntax that +you need to write anew for a new language. In brief, +- first copy the concrete syntax for one language +- then change the words (the strings and perhaps some paradigms) + + +But programming by copy-and-paste is not worthy of a functional programmer. +Can we write a function that takes care of the shared parts of grammar modules? +Yes, we can. It is not a function in the ``fun`` or ``oper`` sense, but +a function operating on modules, called a **functor**. This construct +is familiar from the functional languages ML and OCaml, but it does not +exist in Haskell. It also bears some resemblance to templates in C++. +Functors are also known as **parametrized modules**. + +In GF, a functor is a module that ``open``s one or more **interfaces**. +An ``interface`` is a module similar to a ``resource``, but it only +contains the types of ``oper``s, not their definitions. You can think +of an interface as a kind of a record type. Thus a functor is a kind +of a function taking records as arguments and producins a module +as value. + +Let us look at a functor implementation of the ``Food`` grammar. +Consider its module header first: ``` -or for recursively building more complex phrases: + incomplete concrete FoodI of Food = open Syntax, LexFood in ``` - AdAP : AdA -> AP -> AP ; -- very good +In the functor-function analogy, ``FoodI`` would be presented as a function +with the following type signature: ``` -In analogy to ``DetCN``, we could have a rule forming interrogative -noun phrases with interogative determiners such as //which//. In -``Syntax``, we however make a shortcut and just treat //which// -syncategorematically: + FoodI : instance of Syntax -> instance of LexFood -> concrete of Food ``` - WhichCN : CN -> IP ; +It takes as arguments two interfaces: +- ``Syntax``, the resource grammar interface +- ``LexFood``, the domain-specific lexicon interface + + +Functors opening ``Syntax`` and a domain lexicon interface are in fact +so typical in GF applications, that this structure could be called +a **design patter** +for GF grammars. The idea in this pattern is, again, that +the languages use the same syntactic structures but different words. + +Before going to the details of the module bodies, let us look at how functors +are concretely used. An interface has a header such as ``` -Starting from the top of the grammar, we need two rules promoting -sentences and questions into complete phrases: + interface LexFood = open Syntax in ``` - PhrS : S -> Phr ; -- This pizza is good. - PhrQS : QS -> Phr ; -- Is this pizza good? +To give an ``instance`` of it means that all ``oper``s are given definitione (of +appropriate types). For example, ``` -The most central rule in most grammars is the **predication rule**, -which combines a noun -phrase and a verb phrase into a sentence. In the present grammar, -though not in the full resource grammar library, we split this -rule into two: one for positive and one for negated sentences: + instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in ``` - PosVP, NegVP : NP -> VP -> S ; -- this pizza is/isn't good +Notice that when an interface opens an interface, such as ``Syntax``, +then its instance +opens an instance of it. But the instance may also open some other +resources - typically, +a domain lexicon instance opens a ``Paradigms`` module. + +In the function-functor analogy, we now have ``` -In the same way, question sentences can be formed with these two -**polarities**: + SyntaxGer : instance of Syntax + LexFoodGer : instance of LexFood ``` - QPosVP, QNegVP : NP -> VP -> QS ; -- is/isn't this pizza good +Thus we can complete the German implementation by "applying" the functor: ``` -Another form of questions are ones with interrogative noun phrases: + FoodI SyntaxGer LexFoodGer : concrete of Food ``` - IPPosVP, IPNegVP : IP -> VP -> QS ; -- which pizza is/isn't good +The GF syntax for doing so is ``` -Verb phrases can be built by **complementation**, where a two-place -verb needs a noun phrase complement, and the (syncategoriematic) copula -can take an adjectival phrase as complement: + concrete FoodGer of Food = FoodI with + (Syntax = SyntaxGer), + (LexFood = LexFoodGer) ; ``` - ComplV2 : V2 -> NP -> VP ; -- eat this pizza - ComplAP : AP -> VP ; -- be good +Notice that this is the //complete// module, not just a header of it. +The module body is received from ``FoodI``, by instantiating the +interface constants with their definitions given in the German +instances. + +A module of this form, characterized by the keyword ``with``, is +called a **functor instantiation**. + +Here is the complete code for the functor ``FoodI``: ``` -**Adjectival modification** is a recursive rule for forming common nouns: + incomplete concrete FoodI of Food = open Syntax, LexFood in { + lincat + S = Utt ; + Item = NP ; + Kind = CN ; + Quality = AP ; + lin + Is item quality = mkUtt (mkCl item quality) ; + This kind = mkNP (mkDet this_Quant) kind ; + All kind = mkNP all_Predet (mkNP defPlDet kind) ; + QKind quality kind = mkCN quality kind ; + Wine = mkCN wine_N ; + Beer = mkCN beer_N ; + Pizza = mkCN pizza_N ; + Cheese = mkCN cheese_N ; + Fish = mkCN fish_N ; + Very quality = mkAP very_AdA quality ; + Fresh = mkAP fresh_A ; + Warm = mkAP warm_A ; + Italian = mkAP italian_A ; + Expensive = mkAP expensive_A ; + Delicious = mkAP delicious_A ; + Boring = mkAP boring_A ; +} ``` - ModCN : AP -> CN -> CN ; -- warm pizza + + +==Interfaces and instances== + +Let us now define the ``LexFood`` interface: +``` + interface LexFood = open Syntax in { + oper + wine_N : N ; + beer_N : N ; + pizza_N : N ; + cheese_N : N ; + fish_N : N ; + fresh_A : A ; + warm_A : A ; + italian_A : A ; + expensive_A : A ; + delicious_A : A ; + boring_A : A ; +} +``` +In this interface, only lexical items are declared. In general, an +interface can declare any functions and also types. The ``Syntax`` +interface does so. + +Here is the German instance of the interface: +``` + instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in { + oper + wine_N = mkN "Wein" ; + beer_N = mkN "Bier" "Biere" neuter ; + pizza_N = mkN "Pizza" "Pizzen" feminine ; + cheese_N = mkN "Käse" "Käsen" masculine ; + fish_N = mkN "Fisch" ; + fresh_A = mkA "frisch" ; + warm_A = mkA "warm" "wärmer" "wärmste" ; + italian_A = mkA "italienisch" ; + expensive_A = mkA "teuer" ; + delicious_A = mkA "köstlich" ; + boring_A = mkA "langweilig" ; + } ``` -Finally, we have two special rules that are instances of so-called -**wh-movement**. The idea with this term is that a question such -as //which pizza do you eat// is a result of moving //which pizza// -from its "proper" place which is after the verb: //you eat which pizza//: +Just to complete the picture, we repeat the German functor instantiation +for ``FoodI``, this time with a path directive that makes it compilable. ``` - IPPosV2, IPNegV2 : IP -> NP -> V2 -> QS ; -- which pizza do/don't you eat + --# -path=.:present:prelude + + concrete FoodGer of Food = FoodI with + (Syntax = SyntaxGer), + (LexFood = LexFoodGer) ; ``` -The full resource grammar has a more general treatment of this phenomenon. -But these special cases are already quite useful; moreover, they illustrate -variation that is possible in English between -**pied piping** (//about which pizzza do you talk//) and -**preposition stranding** (//which pizzza do you talk about//). -==Concrete syntax: English morphology== +**Exercise**. Compile and test ``FoodGer``. -===Worst-case functions and data abstraction=== +**Exercise**. Refactor ``FoodFre`` into a functor instantiation. -Some English nouns, such as ``mouse``, are so irregular that -it makes no sense to see them as instances of a paradigm. Even -then, it is useful to perform **data abstraction** from the -definition of the type ``Noun``, and introduce a constructor -operation, a **worst-case function** for nouns: -``` - oper mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; -``` -Thus we can define -``` - lin Mouse = mkNoun "mouse" "mice" ; -``` -and -``` - oper regNoun : Str -> Noun = \x -> - mkNoun x (x + "s") ; -``` -instead of writing the inflection tables explicitly. -The grammar engineering advantage of worst-case functions is that -the author of the resource module may change the definitions of -``Noun`` and ``mkNoun``, and still retain the -interface (i.e. the system of type signatures) that makes it -correct to use these functions in concrete modules. In programming -terms, ``Noun`` is then treated as an **abstract datatype**. +==Adding languages to a functor implementation== -===A system of paradigms using predefined string operations=== +Once we have an application grammar defined by using a functor, +adding a new language is simple. Just two modules need to be written: +- a domain lexicon instance +- a functor instantiation -In addition to the completely regular noun paradigm ``regNoun``, -some other frequent noun paradigms deserve to be -defined, for instance, -``` - sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ; -``` -What about nouns like //fly//, with the plural //flies//? The already -available solution is to use the longest common prefix -//fl// (also known as the **technical stem**) as argument, and define -``` - yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ; -``` -But this paradigm would be very unintuitive to use, because the technical stem -is not an existing form of the word. A better solution is to use -the lemma and a string operator ``init``, which returns the initial segment (i.e. -all characters but the last) of a string: -``` - yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ; -``` -The operation ``init`` belongs to a set of operations in the -resource module ``Prelude``, which therefore has to be -``open``ed so that ``init`` can be used. -``` - > cc init "curry" - "curr" -``` -Its dual is ``last``: + +The functor instantiation is completely mechanical to write. +Here is one for Finnish: ``` - > cc last "curry" - "y" +--# -path=.:present:prelude + +concrete FoodFin of Food = FoodI with + (Syntax = SyntaxFin), + (LexFood = LexFoodFin) ; ``` -As generalizations of the library functions ``init`` and ``last``, GF has -two predefined funtions: -``Predef.dp``, which "drops" suffixes of any length, -and ``Predef.tk``, which "takes" a prefix -just omitting a number of characters from the end. For instance, +The domain lexicon instance requires some knowledge of the words of the +language: what words are used for which concepts, how the words are +inflected, plus features such as genders. Here is a lexicon instance for +Finnish: ``` - > cc Predef.tk 3 "worried" - "worr" - > cc Predef.dp 3 "worried" - "ied" + instance LexFoodFin of LexFood = open SyntaxFin, ParadigmsFin in { + oper + wine_N = mkN "viini" ; + beer_N = mkN "olut" ; + pizza_N = mkN "pizza" ; + cheese_N = mkN "juusto" ; + fish_N = mkN "kala" ; + fresh_A = mkA "tuore" ; + warm_A = mkA "lämmin" ; + italian_A = mkA "italialainen" ; + expensive_A = mkA "kallis" ; + delicious_A = mkA "herkullinen" ; + boring_A = mkA "tylsä" ; + } ``` -The prefix ``Predef`` is given to a handful of functions that could -not be defined internally in GF. They are available in all modules -without explicit ``open`` of the module ``Predef``. +**Exercise**. Instantiate the functor ``FoodI`` to some language of +your choice. -===An intelligent noun paradigm using pattern matching=== +==Division of labour revisited== -It may be hard for the user of a resource morphology to pick the right -inflection paradigm. A way to help this is to define a more intelligent -paradigm, which chooses the ending by first analysing the lemma. -The following variant for English regular nouns puts together all the -previously shown paradigms, and chooses one of them on the basis of -the final letter of the lemma (found by the prelude operation ``last``). -``` - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; -``` -The paradigms ``regNoun`` does not give the correct forms for -all nouns. For instance, //mouse - mice// and -//fish - fish// must be given by using ``mkNoun``. -Also the word //boy// would be inflected incorrectly; to prevent -this, either use ``mkNoun`` or modify -``regNoun`` so that the ``"y"`` case does not -apply if the second-last character is a vowel. +One purpose with the resource grammars was stated to be a division +of labour between linguists and application grammarians. We can now +reflect on what this means more precisely, by asking ourselves what +skills are required of grammarians working on different components. -**Exercise**. Extend the ``regNoun`` paradigm so that it takes care -of all variations there are in English. Test it with the nouns -//ax//, //bamboo//, //boy//, //bush//, //hero//, //match//. -**Hint**. The library functions ``Predef.dp`` and ``Predef.tk`` -are useful in this task. +Building a GF application starts from the abstract syntax. Writing +an abstract syntax requires +- understanding the semantic structure of the application domain +- knowledge of the GF fragment with categories and functions -**Exercise**. The same rules that form plural nouns in English also -apply in the formation of third-person singular verbs. -Write a regular verb paradigm that uses this idea, but first -rewrite ``regNoun`` so that the analysis needed to build //s//-forms -is factored out as a separate ``oper``, which is shared with -``regVerb``. +If the concrete syntax is written by means of a functor, the programmer +has to decide what parts of the implementation are put to the interface +and what parts are shared in the functor. This requires +- knowing how the domain concepts are expressed in natural language +- knowledge of the resource grammar library - the categories and combinators +- understanding what parts are likely to be expressed in language-dependent + ways, so that they must belong to the interface and not the functor +- knowledge of the GF fragment with function applications and strings -===Morphological resource modules=== -A common idiom is to -gather the ``oper`` and ``param`` definitions -needed for inflecting words in -a language into a morphology module. Here is a simple -example, [``MorphoEng`` resource/MorphoEng.gf]. -``` - --# -path=.:prelude +Instantiating a ready-made functor to a new language is less demanding. +It requires essentially +- knowing how the domain words are expressed in the language +- knowing, roughly, how these words are inflected +- knowledge of the paradigms available in the library +- knowledge of the GF fragment with function applications and strings - resource MorphoEng = open Prelude in { - param - Number = Sg | Pl ; +Notice that none of these tasks requires the use of GF records, tables, +or parameters. Thus only a small fragment of GF is needed; the rest of +GF is only relevant for those who write the libraries. - oper - Noun, Verb : Type = {s : Number => Str} ; +Of course, grammar writing is not always straightforward usage of libraries. +For example, GF can be used for other languages than just those in the +libraries - for both natural and formal languages. A knowledge of records +and tables can, unfortunately, also be needed for understanding GF's error +messages. - mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; +**Exercise**. Design a small grammar that can be used for controlling +an MP3 player. The grammar should be able to recognize commands such +as //play this song//, with the following variations: +- verbs: //play//, //remove// +- objects: //song//, //artist// +- determiners: //this//, //the previous// +- verbs without arguments: //stop//, //pause// - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; - mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ; +The implementation goes in the following phases: ++ abstract syntax ++ functor and lexicon interface ++ lexicon instance for the first language ++ functor instantiation for the first language ++ lexicon instance for the second language ++ functor instantiation for the second language ++ ... + + + +==Restricted inheritance== + +A functor implementation using the resource ``Syntax`` interface +works as long as all concepts are expressed by using the same structures +in all languages. If this is not the case, the deviant linearization can +be made into a parameter and moved to the domain lexicon interface. + +Let us take a slightly contrived example: assume that English has +no word for ``Pizza``, but has to use the paraphrase //Italian pie//. +This paraphrase is no longer a noun ``N``, but a complex phrase +in the category ``CN``. An obvious way to solve this problem is +to change interface ``LexEng`` so that the constant declared for +``Pizza`` gets a new type: +``` + oper pizza_CN : CN ; +``` +But this solution is unstable: we may end up changing the interface +and the function with each new language, and we must every time also +change the interface instances for the old languages to maintain +type correctness. + +A better solution is to use **restricted inheritance**: the English +instantiation inherits the functor implementation except for the +constant ``Pizza``. This is how we write: +``` + --# -path=.:present:prelude + + concrete FoodEng of Food = FoodI - [Pizza] with + (Syntax = SyntaxEng), + (LexFood = LexFoodEng) ** + open SyntaxEng, ParadigmsEng in { - regVerb : Str -> Verb = \s -> case last s of { - "s" | "z" => mkVerb s (s + "es") ; - "y" => mkVerb s (init s + "ies") ; - "o" => mkVerb s (s + "es") ; - _ => mkVerb s (s + "s") - } ; + lin Pizza = mkCN (mkA "Italian") (mkN "pie") ; } ``` -The first line gives as a hint to the compiler the -**search path** needed to find all the other modules that the -module depends on. The directory ``prelude`` is a subdirectory of -``GF/lib``; to be able to refer to it in this simple way, you can -set the environment variable ``GF_LIB_PATH`` to point to this -directory. +Restricted inheritance is available for all inherited modules. One can for +instance exclude some mushrooms and pick up just some fruit in +the ``FoodMarket`` example: +``` + abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric] +``` +A concrete syntax of ``Foodmarket`` must then indicate the same inheritance +restrictions. + + +**Exercise**. Change ``FoodGer`` in such a way that it says, instead of +//X is Y//, the equivalent of //X must be Y// (//X muss Y sein//). +You will have to browse the full resource API to find all +the functions needed. -===Morphological analysis and morphology quiz=== +==Browsing the resource with GF commands== -Even though morphology is in GF -mostly used as an auxiliary for syntax, it -can also be useful on its own right. The command ``morpho_analyse = ma`` -can be used to read a text and return for each word the analyses that -it has in the current concrete syntax. +In addition to reading the +[resource synopsis ../../lib/resource-1.0/synopsis.html], you +can find resource function combinations by using the parser. This +is so because the resource library is in the end implemented as +a top-level ``abstract-concrete`` grammar, on which parsing +and linearization work. + +Unfortunately, only English and the Scandinavian languages can be +parsed within acceptable computer resource limits when the full +resource is used. + +To look for a syntax tree in the overload API by parsing, do like this: ``` - > rf bible.txt | morpho_analyse + > $GF_LIB_PATH + > i -path=alltenses:prelude alltenses/OverLangEng.gfc + > p -cat=S -overload "this grammar is too big" + mkS (mkCl (mkNP (mkDet this_Quant) grammar_N) (mkAP too_AdA big_A)) ``` -In the same way as translation exercises, morphological exercises can -be generated, by the command ``morpho_quiz = mq``. Usually, -the category is set to be something else than ``S``. For instance, +To view linearizations in all languages by parsing from English: ``` - > cd GF/lib/resource-1.0/ - > i french/IrregFre.gf - > morpho_quiz -cat=V - - Welcome to GF Morphology Quiz. - ... - - réapparaître : VFin VCondit Pl P2 - réapparaitriez - > No, not réapparaitriez, but - réapparaîtriez - Score 0/1 + > i alltenses/langs.gfcm + > p -cat=S -lang=LangEng "this grammar is too big" | tb + UseCl TPres ASimul PPos (PredVP (DetCN (DetSg (SgQuant this_Quant) + NoOrd) (UseN grammar_N)) (UseComp (CompAP (AdAP too_AdA (PositA big_A))))) + Den här grammatiken är för stor + Esta gramática es demasiado grande + (Cyrillic: eta grammatika govorit des'at' jazykov) + Denne grammatikken er for stor + Questa grammatica è troppo grande + Diese Grammatik ist zu groß + Cette grammaire est trop grande + Tämä kielioppi on liian suuri + This grammar is too big + Denne grammatik er for stor ``` -Finally, a list of morphological exercises can be generated -off-line and saved in a -file for later use, by the command ``morpho_list = ml`` +Unfortunately, the Russian grammar uses at the moment a different +character encoding than the rest and is therefore not displayed correctly +in a terminal window. However, the GF syntax editor does display all +examples correctly: ``` - > morpho_list -number=25 -cat=V | wf exx.txt + % gfeditor alltenses/langs.gfcm ``` -The ``number`` flag gives the number of exercises generated. - - +When you have constructed the tree, you will see the following screen: -==Concrete syntax: English phrase building== +#BCEN + [../../lib/resource-1.0/doc/10lang-small.png] -===Predication=== +#ECEN -===Complementization=== +**Exercise**. Find the resource grammar translations for the following +English phrases (parse in the category ``Phr``). You can first try to +build the terms manually. +//every man loves a woman// -===Determination=== +//this grammar speaks more than ten languages// +//which languages aren't in the grammar// -===Modification=== +//which languages did you want to speak// -===Putting the syntax together=== +=Refining semantics in abstract syntax= -==Concrete syntax for Italian== +==GF as a logical framework== +In this section, we will show how +to encode advanced semantic concepts in an abstract syntax. +We use concepts inherited from **type theory**. Type theory +is the basis of many systems known as **logical frameworks**, which are +used for representing mathematical theorems and their proofs on a computer. +In fact, GF has a logical framework as its proper part: +this part is the abstract syntax. -=Using the resource grammar library= +In a logical framework, the formalization of a mathematical theory +is a set of type and function declarations. The following is an example +of such a theory, represented as an ``abstract`` module in GF. +``` +abstract Arithm = { + cat + Prop ; -- proposition + Nat ; -- natural number + fun + Zero : Nat ; -- 0 + Succ : Nat -> Nat ; -- successor of x + Even : Nat -> Prop ; -- x is even + And : Prop -> Prop -> Prop ; -- A and B + } +``` -In this chapter, we will take a look at the GF resource grammar library. -We will use the library to implement a slightly extended ``Food`` grammar -and port it to some new languages. +**Exercise**. Give a concrete syntax of ``Arithm``, either from scatch or +by using the resource library. -**Exercise**. Define the mini resource of the previous chapter by -using a functor over the full resource. -==The coverage of the library== -The GF Resource Grammar Library contains grammar rules for -10 languages (in addition, 2 languages are available as incomplete -implementations, and a few more are under construction). Its purpose -is to make these rules available for application programmers, -who can thereby concentrate on the semantic and stylistic -aspects of their grammars, without having to think about -grammaticality. The targeted level of application grammarians -is that of a skilled programmer with -a practical knowledge of the target languages, but without -theoretical knowledge about their grammars. -Such a combination of -skills is typical of programmers who, for instance, want to localize -software to new languages. +==Dependent types== -The current resource languages are -- ``Ara``bic (incomplete) -- ``Cat``alan (incomplete) -- ``Dan``ish -- ``Eng``lish -- ``Fin``nish -- ``Fre``nch -- ``Ger``man -- ``Ita``lian -- ``Nor``wegian -- ``Rus``sian -- ``Spa``nish -- ``Swe``dish +**Dependent types** are a characteristic feature of GF, +inherited from the **constructive type theory** of Martin-Löf and +distinguishing GF from most other grammar formalisms and +functional programming languages. +Dependent types can be used for stating stronger +**conditions of well-formedness** than ordinary types. +A simple example is a "smart house" system, which +defines voice commands for household appliances. This example +is borrowed from the +[Regulus Book http://cslipublications.stanford.edu/site/1575865262.html] +(Rayner & al. 2006). -The first three letters (``Eng`` etc) are used in grammar module names. -The incomplete Arabic and Catalan implementations are -enough to be used in many applications; they both contain, amoung other -things, complete inflectional morphology. +One who enters a smart house can use speech to dim lights, switch +on the fan, etc. For each ``Kind`` of a device, there is a set of +``Actions`` that can be performed on it; thus one can dim the lights but + not the fan, for example. These dependencies can be expressed by +by making the type ``Action`` dependent on ``Kind``. We express this +as follows in ``cat`` declarations: +``` + cat + Command ; + Kind ; + Action Kind ; + Device Kind ; +``` +The crucial use of the dependencies is made in the rule for forming commands: +``` + fun CAction : (k : Kind) -> Action k -> Device k -> Command ; +``` +In other words: an action and a device can be combined into a command only +if they are of the same ``Kind`` ``k``. If we have the functions +``` + DKindOne : (k : Kind) -> Device k ; -- the light + light, fan : Kind ; + dim : Action light ; +``` +we can form the syntax tree +``` + CAction light dim (DKindOne light) +``` +but we cannot form the trees +``` + CAction light dim (DKindOne fan) + CAction fan dim (DKindOne light) + CAction fan dim (DKindOne fan) +``` +Linearization rules are written as usual: the concrete syntax does not +know if a category is a dependent type. In English, you can write as follows: +``` + lincat Action = {s : Str} ; + lin CAction kind act dev = {s = act.s ++ dev.s} ; +``` +Notice that the argument ``kind`` does not appear in the linearization. +The type checker will be able to reconstruct it from the ``dev`` argument. -==The resource API== +Parsing with dependent types is performed in two phases: ++ context-free parsing ++ filtering through type checker -The resource library API is devided into language-specific -and language-independent parts. To put it roughly, -- the syntax API is language-independent, i.e. has the same types and functions for all - languages. - Its name is ``Syntax``//L// for each language //L// -- the morphology API is language-specific, i.e. has partly different types and functions - for different languages. - Its name is ``Paradigms``//L// for each language //L// +If you just parse in the usual way, you don't enter the second phase, and +the ``kind`` argument is not found: +``` + > parse "dim the light" + CAction ? dim (DKindOne light) +``` +Moreover, type-incorrect commands are not rejected: +``` + > parse "dim the fan" + CAction ? dim (DKindOne fan) +``` +The question mark ``?`` is a **metavariable**, and is returned by the parser +for any subtree that is suppressed by a linearization rule. -A full documentation of the API is available on-line in the -[resource synopsis ../../lib/resource-1.0/synopsis.html]. For our -examples, we will only need a fragment of the full API. +To get rid of metavariables, you must feed the parse result into the +second phase of **solving** them. The ``solve`` process uses the dependent +type checker to restore the values of the metavariables. It is invoked by +the command ``put_tree = pt`` with the flag ``-transform=solve``: +``` + > parse "dim the light" | put_tree -transform=solve + CAction light dim (DKindOne light) +``` +The ``solve`` process may fail, in which case no tree is returned: +``` + > parse "dim the fan" | put_tree -transform=solve + no tree found +``` -In the first examples, -we will make use of the following categories, from the module ``Syntax``. -|| Category | Explanation | Example || -| ``Utt`` | sentence, question, word... | "be quiet" | -| ``Adv`` | verb-phrase-modifying adverb, | "in the house" | -| ``AdA`` | adjective-modifying adverb, | "very" | -| ``S`` | declarative sentence | "she lived here" | -| ``Cl`` | declarative clause, with all tenses | "she looks at this" | -| ``AP`` | adjectival phrase | "very warm" | -| ``CN`` | common noun (without determiner) | "red house" | -| ``NP`` | noun phrase (subject or object) | "the red house" | -| ``Det`` | determiner phrase | "those seven" | -| ``Predet`` | predeterminer | "only" | -| ``Quant`` | quantifier with both sg and pl | "this/these" | -| ``Prep`` | preposition, or just case | "in" | -| ``A`` | one-place adjective | "warm" | -| ``N`` | common noun | "house" | +**Exercise**. Write an abstract syntax module with above contents +and an appropriate English concrete syntax. Try to parse the commands +//dim the light// and //dim the fan//, with and without ``solve`` filtering. -We will need the following syntax rules from ``Syntax``. +**Exercise**. Perform random and exhaustive generation, with and without +``solve`` filtering. -|| Function | Type | Example || -| ``mkUtt`` | ``S -> Utt`` | //John walked// | -| ``mkUtt`` | ``Cl -> Utt`` | //John walks// | -| ``mkCl`` | ``NP -> AP -> Cl`` | //John is very old// | -| ``mkNP`` | ``Det -> CN -> NP`` | //the first old man// | -| ``mkNP`` | ``Predet -> NP -> NP`` | //only John// | -| ``mkDet`` | ``Quant -> Det`` | //this// | -| ``mkCN`` | ``N -> CN`` | //house// | -| ``mkCN`` | ``AP -> CN -> CN`` | //very big blue house// | -| ``mkAP`` | ``A -> AP`` | //old// | -| ``mkAP`` | ``AdA -> AP -> AP`` | //very very old// | +**Exercise**. Add some device kinds and actions to the grammar. -We will also need the following structural words from ``Syntax``. -|| Function | Type | Example || -| ``all_Predet`` | ``Predet`` | //all// | -| ``defPlDet`` | ``Det`` | //the (houses)// | -| ``this_Quant`` | ``Quant`` | //this// | -| ``very_AdA`` | ``AdA`` | //very// | +==Polymorphism== +Sometimes an action can be performed on all kinds of devices. It would be +possible to introduce separate ``fun`` constants for each kind-action pair, +but this would be tedious. Instead, one can use **polymorphic** actions, +i.e. actions that take a ``Kind`` as an argument and produce an ``Action`` +for that ``Kind``: +``` + fun switchOn, switchOff : (k : Kind) -> Action k ; +``` +Functions that are not polymorphic are **monomorphic**. However, the +dichotomy into monomorphism and full polymorphism is not always sufficien +for good semantic modelling: very typically, some actions are defined +for a proper subset of devices, but not just one. For instance, both doors and +windows can be opened, whereas lights cannot. +We will return to this problem by introducing the +concept of **restricted polymorphism** later, +after a chapter on proof objects. -For French, we will use the following part of ``ParadigmsFre``. -|| Function | Type || -| ``Gender`` | ``Type`` | -| ``masculine`` | ``Gender`` | -| ``feminine`` | ``Gender`` | -| ``mkN`` | ``(cheval : Str) -> N`` | -| ``mkN`` | ``(foie : Str) -> Gender -> N`` | -| ``mkA`` | ``(cher : Str) -> A`` | -| ``mkA`` | ``(sec,seche : Str) -> A`` | +==Dependent types and spoken language models== -For German, we will use the following part of ``ParadigmsGer``. +We have used dependent types to control semantic well-formedness +in grammars. This is important in traditional type theory +applications such as proof assistants, where only mathematically +meaningful formulas should be constructed. But semantic filtering has +also proved important in speech recognition, because it reduces the +ambiguity of the results. -|| Function | Type || -| ``Gender`` | ``Type`` | -| ``masculine`` | ``Gender`` | -| ``feminine`` | ``Gender`` | -| ``neuter`` | ``Gender`` | -| ``mkN`` | ``(Stufe : Str) -> N`` | -| ``mkN`` | ``(Bild,Bilder : Str) -> Gender -> N`` | -| ``mkA`` | ``(klein : Str) -> A`` | -| ``mkA`` | ``(gut,besser,beste : Str) -> A`` | +===Grammar-based language models=== -**Exercise**. Try out the morphological paradigms in different languages. Do -in this way: -``` - > i -path=alltenses:prelude -retain alltenses/ParadigmsGer.gfr - > cc mkN "Farbe" - > cc mkA "gut" "besser" "beste" +The standard way of using GF in speech recognition is by building +**grammar-based language models**. To this end, GF comes with compilers +into several formats that are used in speech recognition systems. +One such format is GSL, used in the [Nuance speech recognizer www.nuance.com]. +It is produced from GF simply by printing a grammar with the flag +``-printer=gsl``. ``` + > import -conversion=finite SmartEng.gf + > print_grammar -printer=gsl + ;GSL2.0 + ; Nuance speech recognition grammar for SmartEng + ; Generated by GF -==Example: French== - -We start with an abstract syntax that is like ``Food`` before, but -has a plural determiner (//all wines//) and some new nouns that will -need different genders in most languages. -``` - abstract Food = { - cat - S ; Item ; Kind ; Quality ; - fun - Is : Item -> Quality -> S ; - This, All : Kind -> Item ; - QKind : Quality -> Kind -> Kind ; - Wine, Cheese, Fish, Beer, Pizza : Kind ; - Very : Quality -> Quality ; - Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; - } -``` -The French implementation opens ``SyntaxFre`` and ``ParadigmsFre`` -to get access to the resource libraries needed. In order to find -the libraries, a ``path`` directive is prepended; it is interpreted -relative to the environment variable ``GF_LIB_PATH``. -``` - --# -path=.:present:prelude + .MAIN SmartEng_2 - concrete FoodFre of Food = open SyntaxFre,ParadigmsFre in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN (mkN "vin") ; - Beer = mkCN (mkN "bière") ; - Pizza = mkCN (mkN "pizza" feminine) ; - Cheese = mkCN (mkN "fromage" masculine) ; - Fish = mkCN (mkN "poisson") ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP (mkA "frais" "fraîche") ; - Warm = mkAP (mkA "chaud") ; - Italian = mkAP (mkA "italien") ; - Expensive = mkAP (mkA "cher") ; - Delicious = mkAP (mkA "délicieux") ; - Boring = mkAP (mkA "ennuyeux") ; - } + SmartEng_0 [("switch" "off") ("switch" "on")] + SmartEng_1 ["dim" ("switch" "off") + ("switch" "on")] + SmartEng_2 [(SmartEng_0 SmartEng_3) + (SmartEng_1 SmartEng_4)] + SmartEng_3 ("the" SmartEng_5) + SmartEng_4 ("the" SmartEng_6) + SmartEng_5 "fan" + SmartEng_6 "light" ``` -The ``lincat`` definitions in ``FoodFre`` assign **resource categories** -to **application categories**. In a sense, the application categories -are **semantic**, as they correspond to concepts in the grammar application, -whereas the resource categories are **syntactic**: they give the linguistic -means to express concepts in any application. - -The ``lin`` definitions likewise assign resource functions to application -functions. Under the hood, there is a lot of matching with parameters to -take care of word order, inflection, and agreement. But the user of the -library sees nothing of this: the only parameters you need to give are -the genders of some nouns, which cannot be correctly inferred from the word. +Now, GSL is a context-free format, so how does it cope with dependent types? +In general, dependent types can give rise to infinitely many basic types +(exercise!), whereas a context-free grammar can by definition only have +finitely many nonterminals. -In French, for example, the one-argument ``mkN`` assigns the noun the feminine -gender if and only if it ends with an //e//. Therefore the words //fromage// and -//pizza// are given genders manually. -One can of course always give genders manually, to be on the safe side. +This is where the flag ``-conversion=finite`` is needed in the ``import`` +command. Its effect is to convert a GF grammar with dependent types to +one without, so that each instance of a dependent type is replaced by +an atomic type. This can then be used as a nonterminal in a context-free +grammar. The ``finite`` conversion presupposes that every +dependent type has only finitely many instances, which is in fact +the case in the ``Smart`` grammar. -As for inflection, the one-argument adjective pattern ``mkA`` takes care of -completely regular adjective such as //chaud-chaude//, but also of special -cases such as //italien-italienne//, //cher-chère//, and //délicieux-délicieuse//. -But it cannot form //frais-fraîche// properly. Once again, you can give more -forms to be on the safe side. You can also test the paradigms in the GF -system. -**Exercise**. Compile the grammar ``FoodFre`` and generate and parse some sentences. +**Exercise**. If you have access to the Nuance speech recognizer, +test it with GF-generated language models for ``SmartEng``. Do this +both with and without ``-conversion=finite``. -**Exercise**. Write a concrete syntax of ``Food`` for English or some other language -included in the resource library. You can also compare the output with the hand-written -grammars presented earlier in this tutorial. +**Exercise**. Construct an abstract syntax with infinitely many instances +of dependent types. -**Exercise**. In particular, try to write a concrete syntax for Italian, even if -you don't know Italian. What you need to know is that "beer" is //birra// and -"pizza" is //pizza//, and that all the nouns and adjectives in the grammar -are regular. +===Statistical language models=== +An alternative to grammar-based language models are +**statistical language models** (**SLM**s). An SLM is +built from a **corpus**, i.e. a set of utterances. It specifies the +probability of each **n-gram**, i.e. sequence of //n// words. The +typical value of //n// is 2 (bigrams) or 3 (trigrams). -==Functor implementation of multilingual grammars== +One advantage of SLMs over grammar-based models is that they are +**robust**, i.e. they can be used to recognize sequences that would +be out of the grammar or the corpus. Another advantage is that +an SLM can be built "for free" if a corpus is available. -If you did the exercise of writing a concrete syntax of ``Food`` for some other -language, you probably noticed that much of the code looks exactly the same -as for French. The immediate reason for this is that the ``Syntax`` API is the -same for all languages; the deeper reason is that all languages (at least those -in the resource package) implement the same syntactic structures and tend to use them -in similar ways. Thus it is only the lexical parts of a concrete syntax that -you need to write anew for a new language. In brief, -- first copy the concrete syntax for one language -- then change the words (the strings and perhaps some paradigms) +However, collecting a corpus can require a lot of work, and writing +a grammar can be less demanding, especially with tools such as GF or +Regulus. This advantage of grammars can be combined with robustness +by creating a back-up SLM from a **synthesized corpus**. This means +simply that the grammar is used for generating such a corpus. +In GF, this can be done with the ``generate_trees`` command. +As with grammar-based models, the quality of the SLM is better +if meaningless utterances are excluded from the corpus. Thus +a good way to generate an SLM from a GF grammar is by using +dependent types and filter the results through the type checker: +``` + > generate_trees | put_trees -transform=solve | linearize +``` -But programming by copy-and-paste is not worthy of a functional programmer. -Can we write a function that takes care of the shared parts of grammar modules? -Yes, we can. It is not a function in the ``fun`` or ``oper`` sense, but -a function operating on modules, called a **functor**. This construct -is familiar from the functional languages ML and OCaml, but it does not -exist in Haskell. It also bears some resemblance to templates in C++. -Functors are also known as **parametrized modules**. +**Exercise**. Measure the size of the corpus generated from +``SmartEng``, with and without type checker filtering. -In GF, a functor is a module that ``open``s one or more **interfaces**. -An ``interface`` is a module similar to a ``resource``, but it only -contains the types of ``oper``s, not their definitions. You can think -of an interface as a kind of a record type. Thus a functor is a kind -of a function taking records as arguments and producins a module -as value. -Let us look at a functor implementation of the ``Food`` grammar. -Consider its module header first: -``` - incomplete concrete FoodI of Food = open Syntax, LexFood in -``` -In the functor-function analogy, ``FoodI`` would be presented as a function -with the following type signature: -``` - FoodI : instance of Syntax -> instance of LexFood -> concrete of Food -``` -It takes as arguments two interfaces: -- ``Syntax``, the resource grammar interface -- ``LexFood``, the domain-specific lexicon interface +==Digression: dependent types in concrete syntax== -Functors opening ``Syntax`` and a domain lexicon interface are in fact -so typical in GF applications, that this structure could be called -a **design patter** -for GF grammars. The idea in this pattern is, again, that -the languages use the same syntactic structures but different words. +The **functional fragment** of GF +terms and types comprises function types, applications, lambda +abstracts, constants, and variables. This fragment is similar in +abstract and concrete syntax. In particular, +dependent types are also available in concrete syntax. +We have not made use of them yet, +but we will now look at one example of how they +can be used. -Before going to the details of the module bodies, let us look at how functors -are concretely used. An interface has a header such as -``` - interface LexFood = open Syntax in -``` -To give an ``instance`` of it means that all ``oper``s are given definitione (of -appropriate types). For example, -``` - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in +Those readers who are familiar with functional programming languages +like ML and Haskell, may already have missed **polymorphic** +functions. For instance, Haskell programmers have access to +the functions ``` -Notice that when an interface opens an interface, such as ``Syntax``, -then its instance -opens an instance of it. But the instance may also open some other -resources - typically, -a domain lexicon instance opens a ``Paradigms`` module. + const :: a -> b -> a + const c _ = c -In the function-functor analogy, we now have -``` - SyntaxGer : instance of Syntax - LexFoodGer : instance of LexFood -``` -Thus we can complete the German implementation by "applying" the functor: -``` - FoodI SyntaxGer LexFoodGer : concrete of Food + flip :: (a -> b -> c) -> b -> a -> c + flip f y x = f x y ``` -The GF syntax for doing so is +which can be used for any given types ``a``,``b``, and ``c``. + +The GF counterpart of polymorphic functions are **monomorphic** +functions with explicit **type variables**. Thus the above +definitions can be written ``` - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; + oper const :(a,b : Type) -> a -> b -> a = + \_,_,c,_ -> c ; + + oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c = + \_,_,_,f,x,y -> f y x ; ``` -Notice that this is the //complete// module, not just a header of it. -The module body is received from ``FoodI``, by instantiating the -interface constants with their definitions given in the German -instances. +When the operations are used, the type checker requires +them to be equipped with all their arguments; this may be a nuisance +for a Haskell or ML programmer. -A module of this form, characterized by the keyword ``with``, is -called a **functor instantiation**. -Here is the complete code for the functor ``FoodI``: + +==Proof objects== + +Perhaps the most well-known idea in constructive type theory is +the **Curry-Howard isomorphism**, also known as the +**propositions as types principle**. Its earliest formulations +were attempts to give semantics to the logical systems of +propositional and predicate calculus. In this section, we will consider +a more elementary example, showing how the notion of proof is useful +outside mathematics, as well. + +We first define the category of unary (also known as Peano-style) +natural numbers: ``` - incomplete concrete FoodI of Food = open Syntax, LexFood in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN wine_N ; - Beer = mkCN beer_N ; - Pizza = mkCN pizza_N ; - Cheese = mkCN cheese_N ; - Fish = mkCN fish_N ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP fresh_A ; - Warm = mkAP warm_A ; - Italian = mkAP italian_A ; - Expensive = mkAP expensive_A ; - Delicious = mkAP delicious_A ; - Boring = mkAP boring_A ; -} + cat Nat ; + fun Zero : Nat ; + fun Succ : Nat -> Nat ; ``` +The **successor function** ``Succ`` generates an infinite +sequence of natural numbers, beginning from ``Zero``. +We then define what it means for a number //x// to be //less than// +a number //y//. Our definition is based on two axioms: +- ``Zero`` is less than ``Succ`` //y// for any //y//. +- If //x// is less than //y//, then ``Succ`` //x// is less than ``Succ`` //y//. -==Interfaces and instances== -Let us now define the ``LexFood`` interface: +The most straightforward way of expressing these axioms in type theory +is as typing judgements that introduce objects of a type ``Less`` //x y//: ``` - interface LexFood = open Syntax in { - oper - wine_N : N ; - beer_N : N ; - pizza_N : N ; - cheese_N : N ; - fish_N : N ; - fresh_A : A ; - warm_A : A ; - italian_A : A ; - expensive_A : A ; - delicious_A : A ; - boring_A : A ; -} + cat Less Nat Nat ; + fun lessZ : (y : Nat) -> Less Zero (Succ y) ; + fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ; ``` -In this interface, only lexical items are declared. In general, an -interface can declare any functions and also types. The ``Syntax`` -interface does so. - -Here is the German instance of the interface: +Objects formed by ``lessZ`` and ``lessS`` are +called **proof objects**: they establish the truth of certain +mathematical propositions. +For instance, the fact that 2 is less that +4 has the proof object ``` - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in { - oper - wine_N = mkN "Wein" ; - beer_N = mkN "Bier" "Biere" neuter ; - pizza_N = mkN "Pizza" "Pizzen" feminine ; - cheese_N = mkN "Käse" "Käsen" masculine ; - fish_N = mkN "Fisch" ; - fresh_A = mkA "frisch" ; - warm_A = mkA "warm" "wärmer" "wärmste" ; - italian_A = mkA "italienisch" ; - expensive_A = mkA "teuer" ; - delicious_A = mkA "köstlich" ; - boring_A = mkA "langweilig" ; - } + lessS (Succ Zero) (Succ (Succ (Succ Zero))) + (lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero))) ``` -Just to complete the picture, we repeat the German functor instantiation -for ``FoodI``, this time with a path directive that makes it compilable. +whose type is ``` - --# -path=.:present:prelude - - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; + Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero)))) ``` +which is the formalization of the proposition that 2 is less than 4. +GF grammars can be used to provide a **semantic control** of +well-formedness of expressions. We have already seen examples of this: +the grammar of well-formed actions on household devices. By introducing proof objects +we have now added a very powerful technique of expressing semantic conditions. -**Exercise**. Compile and test ``FoodGer``. - -**Exercise**. Refactor ``FoodFre`` into a functor instantiation. - - - -==Adding languages to a functor implementation== - -Once we have an application grammar defined by using a functor, -adding a new language is simple. Just two modules need to be written: -- a domain lexicon instance -- a functor instantiation - - -The functor instantiation is completely mechanical to write. -Here is one for Finnish: +A simple example of the use of proof objects is the definition of +well-formed //time spans//: a time span is expected to be from an earlier to +a later time: ``` ---# -path=.:present:prelude - -concrete FoodFin of Food = FoodI with - (Syntax = SyntaxFin), - (LexFood = LexFoodFin) ; + from 3 to 8 ``` -The domain lexicon instance requires some knowledge of the words of the -language: what words are used for which concepts, how the words are -inflected, plus features such as genders. Here is a lexicon instance for -Finnish: +is thus well-formed, whereas ``` - instance LexFoodFin of LexFood = open SyntaxFin, ParadigmsFin in { - oper - wine_N = mkN "viini" ; - beer_N = mkN "olut" ; - pizza_N = mkN "pizza" ; - cheese_N = mkN "juusto" ; - fish_N = mkN "kala" ; - fresh_A = mkA "tuore" ; - warm_A = mkA "lämmin" ; - italian_A = mkA "italialainen" ; - expensive_A = mkA "kallis" ; - delicious_A = mkA "herkullinen" ; - boring_A = mkA "tylsä" ; - } + from 8 to 3 +``` +is not. The following rules for spans impose this condition +by using the ``Less`` predicate: +``` + cat Span ; + fun span : (m,n : Nat) -> Less m n -> Span ; ``` -**Exercise**. Instantiate the functor ``FoodI`` to some language of -your choice. - +**Exercise**. Write an abstract and concrete syntax with the +concepts of this section, and experiment with it in GF. -==Division of labour revisited== -One purpose with the resource grammars was stated to be a division -of labour between linguists and application grammarians. We can now -reflect on what this means more precisely, by asking ourselves what -skills are required of grammarians working on different components. +**Exercise**. Define the notions of "even" and "odd" in terms +of proof objects. **Hint**. You need one function for proving +that 0 is even, and two other functions for propagating the +properties. -Building a GF application starts from the abstract syntax. Writing -an abstract syntax requires -- understanding the semantic structure of the application domain -- knowledge of the GF fragment with categories and functions -If the concrete syntax is written by means of a functor, the programmer -has to decide what parts of the implementation are put to the interface -and what parts are shared in the functor. This requires -- knowing how the domain concepts are expressed in natural language -- knowledge of the resource grammar library - the categories and combinators -- understanding what parts are likely to be expressed in language-dependent - ways, so that they must belong to the interface and not the functor -- knowledge of the GF fragment with function applications and strings +===Proof-carrying documents=== -Instantiating a ready-made functor to a new language is less demanding. -It requires essentially -- knowing how the domain words are expressed in the language -- knowing, roughly, how these words are inflected -- knowledge of the paradigms available in the library -- knowledge of the GF fragment with function applications and strings +Another possible application of proof objects is **proof-carrying documents**: +to be semantically well-formed, the abstract syntax of a document must contain a proof +of some property, although the proof is not shown in the concrete document. +Think, for instance, of small documents describing flight connections: +//To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.// -Notice that none of these tasks requires the use of GF records, tables, -or parameters. Thus only a small fragment of GF is needed; the rest of -GF is only relevant for those who write the libraries. +The well-formedness of this text is partly expressible by dependent typing: +``` + cat + City ; + Flight City City ; + fun + Gothenburg, Frankfurt, Prague : City ; + LH3043 : Flight Gothenburg Frankfurt ; + OK0537 : Flight Frankfurt Prague ; +``` +This rules out texts saying //take OK0537 from Gothenburg to Prague//. +However, there is a +further condition saying that it must be possible to +change from LH3043 to OK0537 in Frankfurt. +This can be modelled as a proof object of a suitable type, +which is required by the constructor +that connects flights. +``` + cat + IsPossible (x,y,z : City)(Flight x y)(Flight y z) ; + fun + Connect : (x,y,z : City) -> + (u : Flight x y) -> (v : Flight y z) -> + IsPossible x y z u v -> Flight x z ; +``` -Of course, grammar writing is not always straightforward usage of libraries. -For example, GF can be used for other languages than just those in the -libraries - for both natural and formal languages. A knowledge of records -and tables can, unfortunately, also be needed for understanding GF's error -messages. -**Exercise**. Design a small grammar that can be used for controlling -an MP3 player. The grammar should be able to recognize commands such -as //play this song//, with the following variations: -- verbs: //play//, //remove// -- objects: //song//, //artist// -- determiners: //this//, //the previous// -- verbs without arguments: //stop//, //pause// +==Restricted polymorphism== +In the first version of the smart house grammar ``Smart``, +all Actions were either of +- **monomorphic**: defined for one Kind +- **polymorphic**: defined for all Kinds -The implementation goes in the following phases: -+ abstract syntax -+ functor and lexicon interface -+ lexicon instance for the first language -+ functor instantiation for the first language -+ lexicon instance for the second language -+ functor instantiation for the second language -+ ... +To make this scale up for new Kinds, we can refine this to +**restricted polymorphism**: defined for Kinds of a certain **class** -==Restricted inheritance== +The notion of class can be expressed in abstract syntax +by using the Curry-Howard isomorphism as follows: +- a class is a **predicate** of Kinds - i.e. a type depending of Kinds +- a Kind is in a class if there is a proof object of this type -A functor implementation using the resource ``Syntax`` interface -works as long as all concepts are expressed by using the same structures -in all languages. If this is not the case, the deviant linearization can -be made into a parameter and moved to the domain lexicon interface. -Let us take a slightly contrived example: assume that English has -no word for ``Pizza``, but has to use the paraphrase //Italian pie//. -This paraphrase is no longer a noun ``N``, but a complex phrase -in the category ``CN``. An obvious way to solve this problem is -to change interface ``LexEng`` so that the constant declared for -``Pizza`` gets a new type: -``` - oper pizza_CN : CN ; +Here is an example with switching and dimming. The classes are called +``switchable`` and ``dimmable``. ``` -But this solution is unstable: we may end up changing the interface -and the function with each new language, and we must every time also -change the interface instances for the old languages to maintain -type correctness. +cat + Switchable Kind ; + Dimmable Kind ; +fun + switchable_light : Switchable light ; + switchable_fan : Switchable fan ; + dimmable_light : Dimmable light ; -A better solution is to use **restricted inheritance**: the English -instantiation inherits the functor implementation except for the -constant ``Pizza``. This is how we write: + switchOn : (k : Kind) -> Switchable k -> Action k ; + dim : (k : Kind) -> Dimmable k -> Action k ; ``` - --# -path=.:present:prelude +One advantage of this formalization is that classes for new +actions can be added incrementally. - concrete FoodEng of Food = FoodI - [Pizza] with - (Syntax = SyntaxEng), - (LexFood = LexFoodEng) ** - open SyntaxEng, ParadigmsEng in { +**Exercise**. Write a new version of the ``Smart`` grammar with +classes, and test it in GF. - lin Pizza = mkCN (mkA "Italian") (mkN "pie") ; - } -``` -Restricted inheritance is available for all inherited modules. One can for -instance exclude some mushrooms and pick up just some fruit in -the ``FoodMarket`` example: +**Exercise**. Add some actions, kinds, and classes to the grammar. +Try to port the grammar to a new language. You will probably find +out that restricted polymorphism works differently in different languages. +For instance, in Finnish not only doors but also TVs and radios +can be "opened", which means switching them on. + + +==Variable bindings== + +Mathematical notation and programming languages have +expressions that **bind** variables. For instance, +a universally quantifier proposition ``` - abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric] + (All x)B(x) ``` -A concrete syntax of ``Foodmarket`` must then indicate the same inheritance -restrictions. - +consists of the **binding** ``(All x)`` of the variable ``x``, +and the **body** ``B(x)``, where the variable ``x`` can have +**bound occurrences**. -**Exercise**. Change ``FoodGer`` in such a way that it says, instead of -//X is Y//, the equivalent of //X must be Y// (//X muss Y sein//). -You will have to browse the full resource API to find all -the functions needed. +Variable bindings appear in informal mathematical language as well, for +instance, +``` + for all x, x is equal to x + the function that for any numbers x and y returns the maximum of x+y + and x*y -==Browsing the resource with GF commands== + Let x be a natural number. Assume that x is even. Then x + 3 is odd. +``` +In type theory, variable-binding expression forms can be formalized +as functions that take functions as arguments. The universal +quantifier is defined +``` + fun All : (Ind -> Prop) -> Prop +``` +where ``Ind`` is the type of individuals and ``Prop``, +the type of propositions. If we have, for instance, the equality predicate +``` + fun Eq : Ind -> Ind -> Prop +``` +we may form the tree +``` + All (\x -> Eq x x) +``` +which corresponds to the ordinary notation +``` + (All x)(x = x). +``` +An abstract syntax where trees have functions as arguments, as in +the two examples above, has turned out to be precisely the right +thing for the semantics and computer implementation of +variable-binding expressions. The advantage lies in the fact that +only one variable-binding expression form is needed, the lambda abstract +``\x -> b``, and all other bindings can be reduced to it. +This makes it easier to implement mathematical theories and reason +about them, since variable binding is tricky to implement and +to reason about. The idea of using functions as arguments of +syntactic constructors is known as **higher-order abstract syntax**. -In addition to reading the -[resource synopsis ../../lib/resource-1.0/synopsis.html], you -can find resource function combinations by using the parser. This -is so because the resource library is in the end implemented as -a top-level ``abstract-concrete`` grammar, on which parsing -and linearization work. +The question now arises: how to define linearization rules +for variable-binding expressions? +Let us first consider universal quantification, +``` + fun All : (Ind -> Prop) -> Prop +``` +We write +``` + lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s} +``` +to obtain the form shown above. +This linearization rule brings in a new GF concept - the ``$0`` +field of ``B`` containing a bound variable symbol. +The general rule is that, if an argument type of a function is +itself a function type ``A -> C``, the linearization type of +this argument is the linearization type of ``C`` +together with a new field ``$0 : Str``. In the linearization rule +for ``All``, the argument ``B`` thus has the linearization +type +``` + {$0 : Str ; s : Str}, +``` +since the linearization type of ``Prop`` is +``` + {s : Str} +``` +In other words, the linearization of a function +consists of a linearization of the body together with a +field for a linearization of the bound variable. +Those familiar with type theory or lambda calculus +should notice that GF requires trees to be in +**eta-expanded** form in order to be linearizable: +any function of type +``` + A -> B +``` +always has a syntax tree of the form +``` + \x -> b +``` +where ``b : B`` under the assumption ``x : A``. +It is in this form that an expression can be analysed +as having a bound variable and a body. -Unfortunately, only English and the Scandinavian languages can be -parsed within acceptable computer resource limits when the full -resource is used. -To look for a syntax tree in the overload API by parsing, do like this: +Given the linearization rule ``` - > $GF_LIB_PATH - > i -path=alltenses:prelude alltenses/OverLangEng.gfc - > p -cat=S -overload "this grammar is too big" - mkS (mkCl (mkNP (mkDet this_Quant) grammar_N) (mkAP too_AdA big_A)) + lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"} ``` -To view linearizations in all languages by parsing from English: +the linearization of ``` - > i alltenses/langs.gfcm - > p -cat=S -lang=LangEng "this grammar is too big" | tb - UseCl TPres ASimul PPos (PredVP (DetCN (DetSg (SgQuant this_Quant) - NoOrd) (UseN grammar_N)) (UseComp (CompAP (AdAP too_AdA (PositA big_A))))) - Den här grammatiken är för stor - Esta gramática es demasiado grande - (Cyrillic: eta grammatika govorit des'at' jazykov) - Denne grammatikken er for stor - Questa grammatica è troppo grande - Diese Grammatik ist zu groß - Cette grammaire est trop grande - Tämä kielioppi on liian suuri - This grammar is too big - Denne grammatik er for stor + \x -> Eq x x ``` -Unfortunately, the Russian grammar uses at the moment a different -character encoding than the rest and is therefore not displayed correctly -in a terminal window. However, the GF syntax editor does display all -examples correctly: +is the record ``` - % gfeditor alltenses/langs.gfcm + {$0 = "x", s = ["( x = x )"]} ``` -When you have constructed the tree, you will see the following screen: - -#BCEN - - [../../lib/resource-1.0/doc/10lang-small.png] - -#ECEN - - -**Exercise**. Find the resource grammar translations for the following -English phrases (parse in the category ``Phr``). You can first try to -build the terms manually. - -//every man loves a woman// - -//this grammar speaks more than ten languages// - -//which languages aren't in the grammar// - -//which languages did you want to speak// - - - -=Refining semantics in abstract syntax= - -==GF as a logical framework== - -In this section, we will show how -to encode advanced semantic concepts in an abstract syntax. -We use concepts inherited from **type theory**. Type theory -is the basis of many systems known as **logical frameworks**, which are -used for representing mathematical theorems and their proofs on a computer. -In fact, GF has a logical framework as its proper part: -this part is the abstract syntax. +Thus we can compute the linearization of the formula, +``` + All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}. +``` +How did we get the //linearization// of the variable ``x`` +into the string ``"x"``? GF grammars have no rules for +this: it is just hard-wired in GF that variable symbols are +linearized into the same strings that represent them in +the print-out of the abstract syntax. -In a logical framework, the formalization of a mathematical theory -is a set of type and function declarations. The following is an example -of such a theory, represented as an ``abstract`` module in GF. +To be able to //parse// variable symbols, however, GF needs to know what +to look for (instead of e.g. trying to parse //any// +string as a variable). What strings are parsed as variable symbols +is defined in the lexical analysis part of GF parsing ``` -abstract Arithm = { - cat - Prop ; -- proposition - Nat ; -- natural number - fun - Zero : Nat ; -- 0 - Succ : Nat -> Nat ; -- successor of x - Even : Nat -> Prop ; -- x is even - And : Prop -> Prop -> Prop ; -- A and B - } + > p -cat=Prop -lexer=codevars "(All x)(x = x)" + All (\x -> Eq x x) ``` +(see more details on lexers below). If several variables are bound in the +same argument, the labels are ``$0, $1, $2``, etc. -**Exercise**. Give a concrete syntax of ``Arithm``, either from scatch or -by using the resource library. +**Exercise**. Write an abstract syntax of the whole +**predicate calculus**, with the +**connectives** "and", "or", "implies", and "not", and the +**quantifiers** "exists" and "for all". Use higher-order functions +to guarantee that unbounded variables do not occur. +**Exercise**. Write a concrete syntax for your favourite +notation of predicate calculus. Use Latex as target language +if you want nice output. You can also try producing Haskell boolean +expressions. Use as many parenthesis as you need to +guarantee non-ambiguity. -==Dependent types== -**Dependent types** are a characteristic feature of GF, -inherited from the **constructive type theory** of Martin-Löf and -distinguishing GF from most other grammar formalisms and -functional programming languages. +==Semantic definitions== -Dependent types can be used for stating stronger -**conditions of well-formedness** than ordinary types. -A simple example is a "smart house" system, which -defines voice commands for household appliances. This example -is borrowed from the -[Regulus Book http://cslipublications.stanford.edu/site/1575865262.html] -(Rayner & al. 2006). +We have seen that, +just like functional programming languages, GF has declarations +of functions, telling what the type of a function is. +But we have not yet shown how to **compute** +these functions: all we can do is provide them with arguments +and linearize the resulting terms. +Since our main interest is the well-formedness of expressions, +this has not yet bothered +us very much. As we will see, however, computation does play a role +even in the well-formedness of expressions when dependent types are +present. -One who enters a smart house can use speech to dim lights, switch -on the fan, etc. For each ``Kind`` of a device, there is a set of -``Actions`` that can be performed on it; thus one can dim the lights but - not the fan, for example. These dependencies can be expressed by -by making the type ``Action`` dependent on ``Kind``. We express this -as follows in ``cat`` declarations: +GF has a form of judgement for **semantic definitions**, +recognized by the key word ``def``. At its simplest, it is just +the definition of one constant, e.g. ``` - cat - Command ; - Kind ; - Action Kind ; - Device Kind ; + def one = Succ Zero ; ``` -The crucial use of the dependencies is made in the rule for forming commands: +We can also define a function with arguments, ``` - fun CAction : (k : Kind) -> Action k -> Device k -> Command ; + def Neg A = Impl A Abs ; ``` -In other words: an action and a device can be combined into a command only -if they are of the same ``Kind`` ``k``. If we have the functions +which is still a special case of the most general notion of +definition, that of a group of **pattern equations**: ``` - DKindOne : (k : Kind) -> Device k ; -- the light - - light, fan : Kind ; - dim : Action light ; + def + sum x Zero = x ; + sum x (Succ y) = Succ (Sum x y) ; ``` -we can form the syntax tree +To compute a term is, as in functional programming languages, +simply to follow a chain of reductions until no definition +can be applied. For instance, we compute ``` - CAction light dim (DKindOne light) + Sum one one --> + Sum (Succ Zero) (Succ Zero) --> + Succ (sum (Succ Zero) Zero) --> + Succ (Succ Zero) ``` -but we cannot form the trees +Computation in GF is performed with the ``pt`` command and the +``compute`` transformation, e.g. ``` - CAction light dim (DKindOne fan) - CAction fan dim (DKindOne light) - CAction fan dim (DKindOne fan) + > p -tr "1 + 1" | pt -transform=compute -tr | l + sum one one + Succ (Succ Zero) + s(s(0)) ``` -Linearization rules are written as usual: the concrete syntax does not -know if a category is a dependent type. In English, you can write as follows: + +The ``def`` definitions of a grammar induce a notion of +**definitional equality** among trees: two trees are +definitionally equal if they compute into the same tree. +Thus, trivially, all trees in a chain of computation +(such as the one above) +are definitionally equal to each other. So are the trees ``` - lincat Action = {s : Str} ; - lin CAction kind act dev = {s = act.s ++ dev.s} ; + sum Zero (Succ one) + Succ one + sum (sum Zero Zero) (sum (Succ Zero) one) ``` -Notice that the argument ``kind`` does not appear in the linearization. -The type checker will be able to reconstruct it from the ``dev`` argument. - -Parsing with dependent types is performed in two phases: -+ context-free parsing -+ filtering through type checker +and infinitely many other trees. +A fact that has to be emphasized about ``def`` definitions is that +they are //not// performed as a first step of linearization. +We say that **linearization is intensional**, which means that +the definitional equality of two trees does not imply that +they have the same linearizations. For instance, each of the seven terms +shown above has a different linearizations in arithmetic notation: +``` + 1 + 1 + s(0) + s(0) + s(s(0) + 0) + s(s(0)) + 0 + s(0) + s(1) + 0 + 0 + s(0) + 1 +``` +This notion of intensionality is +no more exotic than the intensionality of any **pretty-printing** +function of a programming language (function that shows +the expressions of the language as strings). It is vital for +pretty-printing to be intensional in this sense - if we want, +for instance, to trace a chain of computation by pretty-printing each +intermediate step, what we want to see is a sequence of different +expression, which are definitionally equal. -If you just parse in the usual way, you don't enter the second phase, and -the ``kind`` argument is not found: +What is more exotic is that GF has two ways of referring to the +abstract syntax objects. In the concrete syntax, the reference is intensional. +In the abstract syntax, the reference is extensional, since +**type checking is extensional**. The reason is that, +in the type theory with dependent types, types may depend on terms. +Two types depending on terms that are definitionally equal are +equal types. For instance, ``` - > parse "dim the light" - CAction ? dim (DKindOne light) + Proof (Odd one) + Proof (Odd (Succ Zero)) ``` -Moreover, type-incorrect commands are not rejected: +are equal types. Hence, any tree that type checks as a proof that +1 is odd also type checks as a proof that the successor of 0 is odd. +(Recall, in this connection, that the +arguments a category depends on never play any role +in the linearization of trees of that category, +nor in the definition of the linearization type.) + +In addition to computation, definitions impose a +**paraphrase** relation on expressions: +two strings are paraphrases if they +are linearizations of trees that are +definitionally equal. +Paraphrases are sometimes interesting for +translation: the **direct translation** +of a string, which is the linearization of the same tree +in the targer language, may be inadequate because it is e.g. +unidiomatic or ambiguous. In such a case, +the translation algorithm may be made to consider +translation by a paraphrase. + +To stress express the distinction between +**constructors** (=**canonical** functions) +and other functions, GF has a judgement form +``data`` to tell that certain functions are canonical, e.g. ``` - > parse "dim the fan" - CAction ? dim (DKindOne fan) + data Nat = Succ | Zero ; ``` -The question mark ``?`` is a **metavariable**, and is returned by the parser -for any subtree that is suppressed by a linearization rule. - -To get rid of metavariables, you must feed the parse result into the -second phase of **solving** them. The ``solve`` process uses the dependent -type checker to restore the values of the metavariables. It is invoked by -the command ``put_tree = pt`` with the flag ``-transform=solve``: +Unlike in Haskell, but similarly to ALF (where constructor functions +are marked with a flag ``C``), +new constructors can be added to +a type with new ``data`` judgements. The type signatures of constructors +are given separately, in ordinary ``fun`` judgements. +One can also write directly ``` - > parse "dim the light" | put_tree -transform=solve - CAction light dim (DKindOne light) + data Succ : Nat -> Nat ; ``` -The ``solve`` process may fail, in which case no tree is returned: +which is equivalent to the two judgements ``` - > parse "dim the fan" | put_tree -transform=solve - no tree found + fun Succ : Nat -> Nat ; + data Nat = Succ ; ``` +**Exercise**. Implement an interpreter of a small functional programming +language with natural numbers, lists, pairs, lambdas, etc. Use higher-order +abstract syntax with semantic definitions. As target language, use +your favourite programming language. -**Exercise**. Write an abstract syntax module with above contents -and an appropriate English concrete syntax. Try to parse the commands -//dim the light// and //dim the fan//, with and without ``solve`` filtering. +**Exercise**. To make your interpreted language look nice, use +**precedences** instead of putting parentheses everywhere. +You can use the [precedence library ../../lib/prelude/Precedence.gf] +of GF to facilitate this. -**Exercise**. Perform random and exhaustive generation, with and without -``solve`` filtering. -**Exercise**. Add some device kinds and actions to the grammar. +#PARTtwo +=Embedded grammars in Haskell= -==Polymorphism== +GF grammars can be used as parts of programs written in the +following languages. We will go through a skeleton application in +Haskell, while the next chapter will show how to build an +application in Java. -Sometimes an action can be performed on all kinds of devices. It would be -possible to introduce separate ``fun`` constants for each kind-action pair, -but this would be tedious. Instead, one can use **polymorphic** actions, -i.e. actions that take a ``Kind`` as an argument and produce an ``Action`` -for that ``Kind``: +We will show how to build a minimal resource grammar +application whose architecture scales up to much +larger applications. The application is run from the +shell by the command ``` - fun switchOn, switchOff : (k : Kind) -> Action k ; + math ``` -Functions that are not polymorphic are **monomorphic**. However, the -dichotomy into monomorphism and full polymorphism is not always sufficien -for good semantic modelling: very typically, some actions are defined -for a proper subset of devices, but not just one. For instance, both doors and -windows can be opened, whereas lights cannot. -We will return to this problem by introducing the -concept of **restricted polymorphism** later, -after a chapter on proof objects. - - +whereafter it reads user input in English and French. +To each input line, it answers by the truth value of +the sentence. +``` + ./math + zéro est pair + True + zero is odd + False + zero is even and zero is odd + False +``` +The source of the application consists of the following +files: +``` + LexEng.gf -- English instance of Lex + LexFre.gf -- French instance of Lex + Lex.gf -- lexicon interface + Makefile -- a makefile + MathEng.gf -- English instantiation of MathI + MathFre.gf -- French instantiation of MathI + Math.gf -- abstract syntax + MathI.gf -- concrete syntax functor for Math + Run.hs -- Haskell Main module +``` +The system was built in 22 steps explained below. -==Dependent types and spoken language models== -We have used dependent types to control semantic well-formedness -in grammars. This is important in traditional type theory -applications such as proof assistants, where only mathematically -meaningful formulas should be constructed. But semantic filtering has -also proved important in speech recognition, because it reduces the -ambiguity of the results. +==Writing GF grammars== +===Creating the first grammar=== -===Grammar-based language models=== +1. Write ``Math.gf``, which defines what you want to say. +``` + abstract Math = { + cat Prop ; Elem ; + fun + And : Prop -> Prop -> Prop ; + Even : Elem -> Prop ; + Zero : Elem ; + } +``` +2. Write ``Lex.gf``, which defines which language-dependent +parts are needed in the concrete syntax. These are mostly +words (lexicon), but can in fact be any operations. The definitions +only use resource abstract syntax, which is opened. +``` + interface Lex = open Syntax in { + oper + even_A : A ; + zero_PN : PN ; + } +``` +3. Write ``LexEng.gf``, the English implementation of ``Lex.gf`` +This module uses English resource libraries. +``` + instance LexEng of Lex = open GrammarEng, ParadigmsEng in { + oper + even_A = regA "even" ; + zero_PN = regPN "zero" ; -The standard way of using GF in speech recognition is by building -**grammar-based language models**. To this end, GF comes with compilers -into several formats that are used in speech recognition systems. -One such format is GSL, used in the [Nuance speech recognizer www.nuance.com]. -It is produced from GF simply by printing a grammar with the flag -``-printer=gsl``. + } ``` - > import -conversion=finite SmartEng.gf - > print_grammar -printer=gsl +4. Write ``MathI.gf``, a language-independent concrete syntax of +``Math.gf``. It opens interfaces. +which makes it an incomplete module, aka. parametrized module, aka. +functor. +``` + incomplete concrete MathI of Math = - ;GSL2.0 - ; Nuance speech recognition grammar for SmartEng - ; Generated by GF + open Syntax, Lex in { - .MAIN SmartEng_2 + flags startcat = Prop ; - SmartEng_0 [("switch" "off") ("switch" "on")] - SmartEng_1 ["dim" ("switch" "off") - ("switch" "on")] - SmartEng_2 [(SmartEng_0 SmartEng_3) - (SmartEng_1 SmartEng_4)] - SmartEng_3 ("the" SmartEng_5) - SmartEng_4 ("the" SmartEng_6) - SmartEng_5 "fan" - SmartEng_6 "light" + lincat + Prop = S ; + Elem = NP ; + lin + And x y = mkS and_Conj x y ; + Even x = mkS (mkCl x even_A) ; + Zero = mkNP zero_PN ; + } +``` +5. Write ``MathEng.gf``, which is just an instatiation of ``MathI.gf``, +replacing the interfaces by their English instances. This is the module +that will be used as a top module in GF, so it contains a path to +the libraries. +``` + instance LexEng of Lex = open SyntaxEng, ParadigmsEng in { + oper + even_A = mkA "even" ; + zero_PN = mkPN "zero" ; + } ``` -Now, GSL is a context-free format, so how does it cope with dependent types? -In general, dependent types can give rise to infinitely many basic types -(exercise!), whereas a context-free grammar can by definition only have -finitely many nonterminals. - -This is where the flag ``-conversion=finite`` is needed in the ``import`` -command. Its effect is to convert a GF grammar with dependent types to -one without, so that each instance of a dependent type is replaced by -an atomic type. This can then be used as a nonterminal in a context-free -grammar. The ``finite`` conversion presupposes that every -dependent type has only finitely many instances, which is in fact -the case in the ``Smart`` grammar. -**Exercise**. If you have access to the Nuance speech recognizer, -test it with GF-generated language models for ``SmartEng``. Do this -both with and without ``-conversion=finite``. +===Testing=== -**Exercise**. Construct an abstract syntax with infinitely many instances -of dependent types. +6. Test the grammar in GF by random generation and parsing. +``` + $ gf + > i MathEng.gf + > gr -tr | l -tr | p + And (Even Zero) (Even Zero) + zero is evenand zero is even + And (Even Zero) (Even Zero) +``` +When importing the grammar, you will fail if you haven't +- correctly defined your ``GF_LIB_PATH`` as ``GF/lib`` +- installed the resource package or + compiled the resource from source by ``make`` in ``GF/lib/resource-1.0`` -===Statistical language models=== -An alternative to grammar-based language models are -**statistical language models** (**SLM**s). An SLM is -built from a **corpus**, i.e. a set of utterances. It specifies the -probability of each **n-gram**, i.e. sequence of //n// words. The -typical value of //n// is 2 (bigrams) or 3 (trigrams). +===Adding a new language=== -One advantage of SLMs over grammar-based models is that they are -**robust**, i.e. they can be used to recognize sequences that would -be out of the grammar or the corpus. Another advantage is that -an SLM can be built "for free" if a corpus is available. +7. Now it is time to add a new language. Write a French lexicon ``LexFre.gf``: +``` + instance LexFre of Lex = open SyntaxFre, ParadigmsFre in { + oper + even_A = mkA "pair" ; + zero_PN = mkPN "zéro" ; + } +``` +8. You also need a French concrete syntax, ``MathFre.gf``: +``` + --# -path=.:present:prelude -However, collecting a corpus can require a lot of work, and writing -a grammar can be less demanding, especially with tools such as GF or -Regulus. This advantage of grammars can be combined with robustness -by creating a back-up SLM from a **synthesized corpus**. This means -simply that the grammar is used for generating such a corpus. -In GF, this can be done with the ``generate_trees`` command. -As with grammar-based models, the quality of the SLM is better -if meaningless utterances are excluded from the corpus. Thus -a good way to generate an SLM from a GF grammar is by using -dependent types and filter the results through the type checker: + concrete MathFre of Math = MathI with + (Syntax = SyntaxFre), + (Lex = LexFre) ; ``` - > generate_trees | put_trees -transform=solve | linearize +9. This time, you can test multilingual generation: +``` + > i MathFre.gf + > gr | tb + Even Zero + zéro est pair + zero is even ``` -**Exercise**. Measure the size of the corpus generated from -``SmartEng``, with and without type checker filtering. +===Extending the language=== +10. You want to add a predicate saying that a number is odd. +It is first added to ``Math.gf``: +``` + fun Odd : Elem -> Prop ; +``` +11. You need a new word in ``Lex.gf``. +``` + oper odd_A : A ; +``` +12. Then you can give a language-independent concrete syntax in +``MathI.gf``: +``` + lin Odd x = mkS (mkCl x odd_A) ; +``` +13. The new word is implemented in ``LexEng.gf``. +``` + oper odd_A = mkA "odd" ; +``` +14. The new word is implemented in ``LexFre.gf``. +``` + oper odd_A = mkA "impair" ; +``` +15. Now you can test with the extended lexicon. First empty +the environment to get rid of the old abstract syntax, then +import the new versions of the grammars. +``` + > e + > i MathEng.gf + > i MathFre.gf + > gr | tb + And (Odd Zero) (Even Zero) + zéro est impair et zéro est pair + zero is odd and zero is even +``` -==Digression: dependent types in concrete syntax== +==Building a user program== -The **functional fragment** of GF -terms and types comprises function types, applications, lambda -abstracts, constants, and variables. This fragment is similar in -abstract and concrete syntax. In particular, -dependent types are also available in concrete syntax. -We have not made use of them yet, -but we will now look at one example of how they -can be used. +===Producing a compiled grammar package=== -Those readers who are familiar with functional programming languages -like ML and Haskell, may already have missed **polymorphic** -functions. For instance, Haskell programmers have access to -the functions +16. Your grammar is going to be used by persons wh``MathEng.gf``o do not need +to compile it again. They may not have access to the resource library, +either. Therefore it is advisable to produce a multilingual grammar +package in a single file. We call this package ``math.gfcm`` and +produce it, when we have ``MathEng.gf`` and +``MathEng.gf`` in the GF state, by the command ``` - const :: a -> b -> a - const c _ = c - - flip :: (a -> b -> c) -> b -> a -> c - flip f y x = f x y + > pm | wf math.gfcm ``` -which can be used for any given types ``a``,``b``, and ``c``. -The GF counterpart of polymorphic functions are **monomorphic** -functions with explicit **type variables**. Thus the above -definitions can be written -``` - oper const :(a,b : Type) -> a -> b -> a = - \_,_,c,_ -> c ; - oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c = - \_,_,_,f,x,y -> f y x ; +===Writing the Haskell application=== + +17. Write the Haskell main file ``Run.hs``. It uses the ``EmbeddedAPI`` +module defining some basic functionalities such as parsing. +The answer is produced by an interpreter of trees returned by the parser. ``` -When the operations are used, the type checker requires -them to be equipped with all their arguments; this may be a nuisance -for a Haskell or ML programmer. +module Main where +import GSyntax +import GF.Embed.EmbedAPI +main :: IO () +main = do + gr <- file2grammar "math.gfcm" + loop gr -==Proof objects== +loop :: MultiGrammar -> IO () +loop gr = do + s <- getLine + interpret gr s + loop gr -Perhaps the most well-known idea in constructive type theory is -the **Curry-Howard isomorphism**, also known as the -**propositions as types principle**. Its earliest formulations -were attempts to give semantics to the logical systems of -propositional and predicate calculus. In this section, we will consider -a more elementary example, showing how the notion of proof is useful -outside mathematics, as well. +interpret :: MultiGrammar -> String -> IO () +interpret gr s = do + let tss = parseAll gr "Prop" s + case (concat tss) of + [] -> putStrLn "no parse" + t:_ -> print $ answer $ fg t -We first define the category of unary (also known as Peano-style) -natural numbers: +answer :: GProp -> Bool +answer p = case p of + (GOdd x1) -> odd (value x1) + (GEven x1) -> even (value x1) + (GAnd x1 x2) -> answer x1 && answer x2 + +value :: GElem -> Int +value e = case e of + GZero -> 0 ``` - cat Nat ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; + +18. The syntax trees manipulated by the interpreter are not raw +GF trees, but objects of the Haskell datatype ``GProp``. +From any GF grammar, a file ``GFSyntax.hs`` with +datatypes corresponding to its abstract +syntax can be produced by the command ``` -The **successor function** ``Succ`` generates an infinite -sequence of natural numbers, beginning from ``Zero``. + > pg -printer=haskell | wf GSyntax.hs +``` +The module also defines the overloaded functions +``gf`` and ``fg`` for translating from these types to +raw trees and back. -We then define what it means for a number //x// to be //less than// -a number //y//. Our definition is based on two axioms: -- ``Zero`` is less than ``Succ`` //y// for any //y//. -- If //x// is less than //y//, then ``Succ`` //x// is less than ``Succ`` //y//. +===Compiling the Haskell grammar=== -The most straightforward way of expressing these axioms in type theory -is as typing judgements that introduce objects of a type ``Less`` //x y//: -``` - cat Less Nat Nat ; - fun lessZ : (y : Nat) -> Less Zero (Succ y) ; - fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ; -``` -Objects formed by ``lessZ`` and ``lessS`` are -called **proof objects**: they establish the truth of certain -mathematical propositions. -For instance, the fact that 2 is less that -4 has the proof object +19. Before compiling ``Run.hs``, you must check that the +embedded GF modules are found. The easiest way to do this +is by two symbolic links to your GF source directories: ``` - lessS (Succ Zero) (Succ (Succ (Succ Zero))) - (lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero))) + $ ln -s /home/aarne/GF/src/GF + $ ln -s /home/aarne/GF/src/Transfer/ ``` -whose type is + +20. Now you can run the GHC Haskell compiler to produce the program. ``` - Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero)))) + $ ghc --make -o math Run.hs ``` -which is the formalization of the proposition that 2 is less than 4. +The program can be tested with the command ``./math``. -GF grammars can be used to provide a **semantic control** of -well-formedness of expressions. We have already seen examples of this: -the grammar of well-formed actions on household devices. By introducing proof objects -we have now added a very powerful technique of expressing semantic conditions. -A simple example of the use of proof objects is the definition of -well-formed //time spans//: a time span is expected to be from an earlier to -a later time: -``` - from 3 to 8 -``` -is thus well-formed, whereas -``` - from 8 to 3 -``` -is not. The following rules for spans impose this condition -by using the ``Less`` predicate: -``` - cat Span ; - fun span : (m,n : Nat) -> Less m n -> Span ; -``` +===Building a distribution=== -**Exercise**. Write an abstract and concrete syntax with the -concepts of this section, and experiment with it in GF. +21. For a stand-alone binary-only distribution, only +the two files ``math`` and ``math.gfcm`` are needed. +For a source distribution, the files mentioned in +the beginning of this documents are needed. -**Exercise**. Define the notions of "even" and "odd" in terms -of proof objects. **Hint**. You need one function for proving -that 0 is even, and two other functions for propagating the -properties. +===Using a Makefile=== +22. As a part of the source distribution, a ``Makefile`` is +essential. The ``Makefile`` is also useful when developing the +application. It should always be possible to build an executable +from source by typing ``make``. Here is a minimal such ``Makefile``: +``` + all: + echo "pm | wf math.gfcm" | gf MathEng.gf MathFre.gf + echo "pg -printer=haskell | wf GSyntax.hs" | gf math.gfcm + ghc --make -o math Run.hs +``` +==The Embedded GF Haskell API== -===Proof-carrying documents=== -Another possible application of proof objects is **proof-carrying documents**: -to be semantically well-formed, the abstract syntax of a document must contain a proof -of some property, although the proof is not shown in the concrete document. -Think, for instance, of small documents describing flight connections: -//To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.// +=Embedded grammars in Java= -The well-formedness of this text is partly expressible by dependent typing: -``` - cat - City ; - Flight City City ; - fun - Gothenburg, Frankfurt, Prague : City ; - LH3043 : Flight Gothenburg Frankfurt ; - OK0537 : Flight Frankfurt Prague ; -``` -This rules out texts saying //take OK0537 from Gothenburg to Prague//. -However, there is a -further condition saying that it must be possible to -change from LH3043 to OK0537 in Frankfurt. -This can be modelled as a proof object of a suitable type, -which is required by the constructor -that connects flights. -``` - cat - IsPossible (x,y,z : City)(Flight x y)(Flight y z) ; - fun - Connect : (x,y,z : City) -> - (u : Flight x y) -> (v : Flight y z) -> - IsPossible x y z u v -> Flight x z ; -``` +Forthcoming; at the moment, the document + [``http://www.cs.chalmers.se/~bringert/gf/gf-java.html`` http://www.cs.chalmers.se/~bringert/gf/gf-java.html] -==Restricted polymorphism== +by Björn Bringert gives more information on Java. -In the first version of the smart house grammar ``Smart``, -all Actions were either of -- **monomorphic**: defined for one Kind -- **polymorphic**: defined for all Kinds +=Spoken language translators= + + +=Multimodal dialogue systems= -To make this scale up for new Kinds, we can refine this to -**restricted polymorphism**: defined for Kinds of a certain **class** +=Grammars of formal languages= -The notion of class can be expressed in abstract syntax -by using the Curry-Howard isomorphism as follows: -- a class is a **predicate** of Kinds - i.e. a type depending of Kinds -- a Kind is in a class if there is a proof object of this type +==Precedence and fixity== +==Higher-order abstract syntax== -Here is an example with switching and dimming. The classes are called -``switchable`` and ``dimmable``. -``` -cat - Switchable Kind ; - Dimmable Kind ; -fun - switchable_light : Switchable light ; - switchable_fan : Switchable fan ; - dimmable_light : Dimmable light ; +==Extensible natural-language interfaces== - switchOn : (k : Kind) -> Switchable k -> Action k ; - dim : (k : Kind) -> Dimmable k -> Action k ; -``` -One advantage of this formalization is that classes for new -actions can be added incrementally. -**Exercise**. Write a new version of the ``Smart`` grammar with -classes, and test it in GF. -**Exercise**. Add some actions, kinds, and classes to the grammar. -Try to port the grammar to a new language. You will probably find -out that restricted polymorphism works differently in different languages. -For instance, in Finnish not only doors but also TVs and radios -can be "opened", which means switching them on. +=Implementing morphology and syntax= +In this chapter, we will dig deeper into linguistic concepts than +so far. We will build an implementation of a linguistic motivated +fragment of English and Italian, covering basic morphology and syntax. +The result is a miniature of the GF resource library, whose internals will +be covered in the next chapter. There are two main purposes +for this chapter: +- to understand the linguistic concepts underlying the resource + grammar library +- to get practice in the more advanced constructs of concrete syntax -==Variable bindings== -Mathematical notation and programming languages have -expressions that **bind** variables. For instance, -a universally quantifier proposition -``` - (All x)B(x) -``` -consists of the **binding** ``(All x)`` of the variable ``x``, -and the **body** ``B(x)``, where the variable ``x`` can have -**bound occurrences**. -Variable bindings appear in informal mathematical language as well, for -instance, -``` - for all x, x is equal to x - the function that for any numbers x and y returns the maximum of x+y - and x*y +==Lexical vs. syntactic rules== - Let x be a natural number. Assume that x is even. Then x + 3 is odd. -``` -In type theory, variable-binding expression forms can be formalized -as functions that take functions as arguments. The universal -quantifier is defined -``` - fun All : (Ind -> Prop) -> Prop -``` -where ``Ind`` is the type of individuals and ``Prop``, -the type of propositions. If we have, for instance, the equality predicate -``` - fun Eq : Ind -> Ind -> Prop -``` -we may form the tree -``` - All (\x -> Eq x x) -``` -which corresponds to the ordinary notation -``` - (All x)(x = x). -``` -An abstract syntax where trees have functions as arguments, as in -the two examples above, has turned out to be precisely the right -thing for the semantics and computer implementation of -variable-binding expressions. The advantage lies in the fact that -only one variable-binding expression form is needed, the lambda abstract -``\x -> b``, and all other bindings can be reduced to it. -This makes it easier to implement mathematical theories and reason -about them, since variable binding is tricky to implement and -to reason about. The idea of using functions as arguments of -syntactic constructors is known as **higher-order abstract syntax**. +So far we have seen a grammar from a semantic point of view: +a grammar specifies a system of meanings (specified in the abstract syntax) and +tells how they are expressed in some language (as specified in a concrete syntax). +In resource grammars, as in linguistic tradition, the goal is to +specify the **grammatically correct combinations of words**, whatever their +meanings are. -The question now arises: how to define linearization rules -for variable-binding expressions? -Let us first consider universal quantification, -``` - fun All : (Ind -> Prop) -> Prop -``` -We write -``` - lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s} -``` -to obtain the form shown above. -This linearization rule brings in a new GF concept - the ``$0`` -field of ``B`` containing a bound variable symbol. -The general rule is that, if an argument type of a function is -itself a function type ``A -> C``, the linearization type of -this argument is the linearization type of ``C`` -together with a new field ``$0 : Str``. In the linearization rule -for ``All``, the argument ``B`` thus has the linearization -type -``` - {$0 : Str ; s : Str}, -``` -since the linearization type of ``Prop`` is -``` - {s : Str} -``` -In other words, the linearization of a function -consists of a linearization of the body together with a -field for a linearization of the bound variable. -Those familiar with type theory or lambda calculus -should notice that GF requires trees to be in -**eta-expanded** form in order to be linearizable: -any function of type -``` - A -> B -``` -always has a syntax tree of the form -``` - \x -> b -``` -where ``b : B`` under the assumption ``x : A``. -It is in this form that an expression can be analysed -as having a bound variable and a body. +Thus the grammar has two kinds of categories and two kinds of rules: +- lexical: + - lexical categories, to classify words + - lexical rules, to define words their properties -Given the linearization rule -``` - lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"} -``` -the linearization of -``` - \x -> Eq x x -``` -is the record -``` - {$0 = "x", s = ["( x = x )"]} -``` -Thus we can compute the linearization of the formula, -``` - All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}. -``` -How did we get the //linearization// of the variable ``x`` -into the string ``"x"``? GF grammars have no rules for -this: it is just hard-wired in GF that variable symbols are -linearized into the same strings that represent them in -the print-out of the abstract syntax. +- phrasal (combinatorial, syntactic): + - phrasal categories, to classify phrases of arbitrary size + - phrasal rules, to combine phrases into larger phrases -To be able to //parse// variable symbols, however, GF needs to know what -to look for (instead of e.g. trying to parse //any// -string as a variable). What strings are parsed as variable symbols -is defined in the lexical analysis part of GF parsing -``` - > p -cat=Prop -lexer=codevars "(All x)(x = x)" - All (\x -> Eq x x) -``` -(see more details on lexers below). If several variables are bound in the -same argument, the labels are ``$0, $1, $2``, etc. +Many grammar formalisms force a radical distinction between the lexical and syntactic +components; sometimes it is not even possible to express the two kinds of rules in +the same formalism. GF has no such restrictions. Nevertheless, it has turned out +to be a good discipline to maintain a distinction between the lexical and syntactic +components. -**Exercise**. Write an abstract syntax of the whole -**predicate calculus**, with the -**connectives** "and", "or", "implies", and "not", and the -**quantifiers** "exists" and "for all". Use higher-order functions -to guarantee that unbounded variables do not occur. -**Exercise**. Write a concrete syntax for your favourite -notation of predicate calculus. Use Latex as target language -if you want nice output. You can also try producing Haskell boolean -expressions. Use as many parenthesis as you need to -guarantee non-ambiguity. +==The abstract syntax== +Let us go through the abstract syntax contained in the module ``Syntax``. +It can be found in the file +[``examples/tutorial/syntax/Syntax.gf`` examples/tutorial/syntax/Syntax.gf]. -==Semantic definitions== -We have seen that, -just like functional programming languages, GF has declarations -of functions, telling what the type of a function is. -But we have not yet shown how to **compute** -these functions: all we can do is provide them with arguments -and linearize the resulting terms. -Since our main interest is the well-formedness of expressions, -this has not yet bothered -us very much. As we will see, however, computation does play a role -even in the well-formedness of expressions when dependent types are -present. +===Lexical categories=== -GF has a form of judgement for **semantic definitions**, -recognized by the key word ``def``. At its simplest, it is just -the definition of one constant, e.g. -``` - def one = Succ Zero ; -``` -We can also define a function with arguments, -``` - def Neg A = Impl A Abs ; -``` -which is still a special case of the most general notion of -definition, that of a group of **pattern equations**: -``` - def - sum x Zero = x ; - sum x (Succ y) = Succ (Sum x y) ; -``` -To compute a term is, as in functional programming languages, -simply to follow a chain of reductions until no definition -can be applied. For instance, we compute +Words are classified into two kinds of categories: **closed** and +**open**. The definining property of closed categories is that the +words of them can easily be enumerated; it is very seldom that any +new words are introduced in them. In general, closed categories +contain **structural words**, also known as **function words**. +In ``Syntax``, we have just two closed lexical categories: ``` - Sum one one --> - Sum (Succ Zero) (Succ Zero) --> - Succ (sum (Succ Zero) Zero) --> - Succ (Succ Zero) + cat + Det ; -- determiner e.g. "this" + AdA ; -- adadjective e.g. "very" ``` -Computation in GF is performed with the ``pt`` command and the -``compute`` transformation, e.g. +We have already used words of both categories in the ``Food`` +examples; they have just not been assigned a category, but +treated as **syncategorematic**. In GF, a syncategoramatic +word is one that is introduced in a linearization rule of +some construction alongside with some other expressions that +are combined; there is no abstract syntax tree for that word +alone. Thus in the rules ``` - > p -tr "1 + 1" | pt -transform=compute -tr | l - sum one one - Succ (Succ Zero) - s(s(0)) + fun That : Kind -> Item ; + lin That k = {"that" ++ k.s} ; ``` +the word //that// is syncategoramatic. In linguistically motivated +grammars, syncategorematic words are usually avoided, whereas in +semantically motivated grammars, structural words are often treated +as syncategoramatic. This is partly so because the concept expressed +by a structural word in one language is often expressed by some other +means than an individual word in another. For instance, the definite +article //the// is a determiner word in English, whereas Swedish expresses +determination by inflecting the determined noun: //the wine// is //vinet// +in Swedish. -The ``def`` definitions of a grammar induce a notion of -**definitional equality** among trees: two trees are -definitionally equal if they compute into the same tree. -Thus, trivially, all trees in a chain of computation -(such as the one above) -are definitionally equal to each other. So are the trees +As for open classes, we will use four: ``` - sum Zero (Succ one) - Succ one - sum (sum Zero Zero) (sum (Succ Zero) one) + cat + N ; -- noun e.g. "pizza" + A ; -- adjective e.g. "good" + V ; -- intransitive verb e.g. "boil" + V2 ; -- two-place verb e.g. "eat" ``` -and infinitely many other trees. +Two-place verbs differ from intransitive verbs syntactically by +taking an object. In the lexicon, they must be equipped with information +on the //case// of the object in some languages (such as German and Latin), +and on the //preposition// in some languages (such as English). -A fact that has to be emphasized about ``def`` definitions is that -they are //not// performed as a first step of linearization. -We say that **linearization is intensional**, which means that -the definitional equality of two trees does not imply that -they have the same linearizations. For instance, each of the seven terms -shown above has a different linearizations in arithmetic notation: + + +===Lexical rules=== + +The words of closed categories can be listed once and for all in a +library. The ``Syntax`` module has the following: ``` - 1 + 1 - s(0) + s(0) - s(s(0) + 0) - s(s(0)) - 0 + s(0) - s(1) - 0 + 0 + s(0) + 1 + fun + this_Det, that_Det, these_Det, those_Det, + every_Det, theSg_Det, thePl_Det, indef_Det, plur_Det, two_Det : Det ; + very_AdA : AdA ; ``` -This notion of intensionality is -no more exotic than the intensionality of any **pretty-printing** -function of a programming language (function that shows -the expressions of the language as strings). It is vital for -pretty-printing to be intensional in this sense - if we want, -for instance, to trace a chain of computation by pretty-printing each -intermediate step, what we want to see is a sequence of different -expression, which are definitionally equal. +The naming convention for lexical rules is that we use a word followed by +the category. In this way we can for instance distinguish the determiner +//that// from the conjunction //that//. But there are also rules where this +does not quite suffice. English has no distinction between singular and +plural //the//; yet they behave differently as determiners, analogously to +//this// vs. //these//. The function //indef_Det// is the indefinite article +//a//, whereas //plur_Det// is semantically the plural indefinite article, +which has no separate word in English, as in some other languages, e.g. +//des// in French. -What is more exotic is that GF has two ways of referring to the -abstract syntax objects. In the concrete syntax, the reference is intensional. -In the abstract syntax, the reference is extensional, since -**type checking is extensional**. The reason is that, -in the type theory with dependent types, types may depend on terms. -Two types depending on terms that are definitionally equal are -equal types. For instance, +Open lexical categories have no objects in ``Syntax``. However, we can +build lexical modules as extensions of ``Syntax``. An example is +[``examples/tutorial/syntax/Test.gf`` examples/tutorial/syntax/Test.gf], +which we use to test the syntax. Its vocabulary is from the food domain: ``` - Proof (Odd one) - Proof (Odd (Succ Zero)) + abstract Test = Syntax ** { + fun + wine_N, cheese_N, fish_N, pizza_N, waiter_N, customer_N : N ; + fresh_A, warm_A, italian_A, expensive_A, delicious_A, boring_A : A ; + stink_V : V ; + eat_V2, love_V2, talk_V2 : V2 ; + } ``` -are equal types. Hence, any tree that type checks as a proof that -1 is odd also type checks as a proof that the successor of 0 is odd. -(Recall, in this connection, that the -arguments a category depends on never play any role -in the linearization of trees of that category, -nor in the definition of the linearization type.) -In addition to computation, definitions impose a -**paraphrase** relation on expressions: -two strings are paraphrases if they -are linearizations of trees that are -definitionally equal. -Paraphrases are sometimes interesting for -translation: the **direct translation** -of a string, which is the linearization of the same tree -in the targer language, may be inadequate because it is e.g. -unidiomatic or ambiguous. In such a case, -the translation algorithm may be made to consider -translation by a paraphrase. +===Phrasal categories=== -To stress express the distinction between -**constructors** (=**canonical** functions) -and other functions, GF has a judgement form -``data`` to tell that certain functions are canonical, e.g. +The topmost category in ``Syntax`` is ``Phr``, **phrase**, covering +all complete sentences, which have a punctuation mark and could be +used alone to make an utterance. In addition to **declarative sentences** +``S``, there are also **question sentences** ``QS``: ``` - data Nat = Succ | Zero ; + cat + Phr ; -- any complete sentence e.g. "Is this pizza good?" + S ; -- declarative sentence e.g. "this pizza is good" + QS ; -- question sentence e.g. "is this pizza good" ``` -Unlike in Haskell, but similarly to ALF (where constructor functions -are marked with a flag ``C``), -new constructors can be added to -a type with new ``data`` judgements. The type signatures of constructors -are given separately, in ordinary ``fun`` judgements. -One can also write directly +The main parts of a sentence are usually taken to be the **noun phrase** ``NP`` and +the **verb phrase** ``VP``. In analogy to noun phrases, we consider +**interrogative phrases**, which are used for forming question sentences. ``` - data Succ : Nat -> Nat ; + NP ; -- noun phrase e.g. "this pizza" + IP ; -- interrogative phrase e.g "which pizza" + VP ; -- verb phrase e.g. "is good" ``` -which is equivalent to the two judgements +The "smallest" phrasal categories are **common nouns** ``CN`` and +**adjectival phrases** ``AP``: ``` - fun Succ : Nat -> Nat ; - data Nat = Succ ; + CN ; -- common noun phrase e.g. "very good pizza" + AP ; -- adjectival phrase e.g. "very good" ``` +Common nouns are typically combined with determiners to build noun +phrases, whereas adjectival phrases are combined with the copula to +form verb phrases. -**Exercise**. Implement an interpreter of a small functional programming -language with natural numbers, lists, pairs, lambdas, etc. Use higher-order -abstract syntax with semantic definitions. As target language, use -your favourite programming language. - -**Exercise**. To make your interpreted language look nice, use -**precedences** instead of putting parentheses everywhere. -You can use the [precedence library ../../lib/prelude/Precedence.gf] -of GF to facilitate this. - - - -#PARTtwo - -=Embedded grammars in Haskell= -GF grammars can be used as parts of programs written in the -following languages. We will go through a skeleton application in -Haskell, while the next chapter will show how to build an -application in Java. +===Phrasal rules=== -We will show how to build a minimal resource grammar -application whose architecture scales up to much -larger applications. The application is run from the -shell by the command +Phrasal rules specify how complex phrases are built from simpler ones. +At the bottom, there are **lexical insertion rules** telling how +words from each lexical category are "promoted" to phrases; i.e. how +the most elementary phrases are built. ``` - math + fun + UseN : N -> CN ; -- pizza + UseA : A -> AP ; -- be good + UseV : V -> VP ; -- stink ``` -whereafter it reads user input in English and French. -To each input line, it answers by the truth value of -the sentence. +Structural words usually don't form phrases themselves; thus they +are at the first place used for promoting "lower" phrase categories +to "higher" ones, ``` - ./math - zéro est pair - True - zero is odd - False - zero is even and zero is odd - False + DetCN : Det -> CN -> NP ; -- this pizza ``` -The source of the application consists of the following -files: +or for recursively building more complex phrases: ``` - LexEng.gf -- English instance of Lex - LexFre.gf -- French instance of Lex - Lex.gf -- lexicon interface - Makefile -- a makefile - MathEng.gf -- English instantiation of MathI - MathFre.gf -- French instantiation of MathI - Math.gf -- abstract syntax - MathI.gf -- concrete syntax functor for Math - Run.hs -- Haskell Main module + AdAP : AdA -> AP -> AP ; -- very good ``` -The system was built in 22 steps explained below. - - -==Writing GF grammars== - -===Creating the first grammar=== - -1. Write ``Math.gf``, which defines what you want to say. +In analogy to ``DetCN``, we could have a rule forming interrogative +noun phrases with interogative determiners such as //which//. In +``Syntax``, we however make a shortcut and just treat //which// +syncategorematically: ``` - abstract Math = { - cat Prop ; Elem ; - fun - And : Prop -> Prop -> Prop ; - Even : Elem -> Prop ; - Zero : Elem ; - } + WhichCN : CN -> IP ; ``` -2. Write ``Lex.gf``, which defines which language-dependent -parts are needed in the concrete syntax. These are mostly -words (lexicon), but can in fact be any operations. The definitions -only use resource abstract syntax, which is opened. +Starting from the top of the grammar, we need two rules promoting +sentences and questions into complete phrases: ``` - interface Lex = open Syntax in { - oper - even_A : A ; - zero_PN : PN ; - } + PhrS : S -> Phr ; -- This pizza is good. + PhrQS : QS -> Phr ; -- Is this pizza good? ``` -3. Write ``LexEng.gf``, the English implementation of ``Lex.gf`` -This module uses English resource libraries. +The most central rule in most grammars is the **predication rule**, +which combines a noun +phrase and a verb phrase into a sentence. In the present grammar, +though not in the full resource grammar library, we split this +rule into two: one for positive and one for negated sentences: ``` - instance LexEng of Lex = open GrammarEng, ParadigmsEng in { - oper - even_A = regA "even" ; - zero_PN = regPN "zero" ; - - } + PosVP, NegVP : NP -> VP -> S ; -- this pizza is/isn't good +``` +In the same way, question sentences can be formed with these two +**polarities**: +``` + QPosVP, QNegVP : NP -> VP -> QS ; -- is/isn't this pizza good +``` +Another form of questions are ones with interrogative noun phrases: ``` -4. Write ``MathI.gf``, a language-independent concrete syntax of -``Math.gf``. It opens interfaces. -which makes it an incomplete module, aka. parametrized module, aka. -functor. + IPPosVP, IPNegVP : IP -> VP -> QS ; -- which pizza is/isn't good ``` - incomplete concrete MathI of Math = - - open Syntax, Lex in { - - flags startcat = Prop ; - - lincat - Prop = S ; - Elem = NP ; - lin - And x y = mkS and_Conj x y ; - Even x = mkS (mkCl x even_A) ; - Zero = mkNP zero_PN ; - } +Verb phrases can be built by **complementation**, where a two-place +verb needs a noun phrase complement, and the (syncategoriematic) copula +can take an adjectival phrase as complement: ``` -5. Write ``MathEng.gf``, which is just an instatiation of ``MathI.gf``, -replacing the interfaces by their English instances. This is the module -that will be used as a top module in GF, so it contains a path to -the libraries. + ComplV2 : V2 -> NP -> VP ; -- eat this pizza + ComplAP : AP -> VP ; -- be good ``` - instance LexEng of Lex = open SyntaxEng, ParadigmsEng in { - oper - even_A = mkA "even" ; - zero_PN = mkPN "zero" ; - } +**Adjectival modification** is a recursive rule for forming common nouns: ``` - - -===Testing=== - -6. Test the grammar in GF by random generation and parsing. + ModCN : AP -> CN -> CN ; -- warm pizza ``` - $ gf - > i MathEng.gf - > gr -tr | l -tr | p - And (Even Zero) (Even Zero) - zero is evenand zero is even - And (Even Zero) (Even Zero) +Finally, we have two special rules that are instances of so-called +**wh-movement**. The idea with this term is that a question such +as //which pizza do you eat// is a result of moving //which pizza// +from its "proper" place which is after the verb: //you eat which pizza//: ``` -When importing the grammar, you will fail if you haven't -- correctly defined your ``GF_LIB_PATH`` as ``GF/lib`` -- installed the resource package or - compiled the resource from source by ``make`` in ``GF/lib/resource-1.0`` + IPPosV2, IPNegV2 : IP -> NP -> V2 -> QS ; -- which pizza do/don't you eat +``` +The full resource grammar has a more general treatment of this phenomenon. +But these special cases are already quite useful; moreover, they illustrate +variation that is possible in English between +**pied piping** (//about which pizzza do you talk//) and +**preposition stranding** (//which pizzza do you talk about//). +==Concrete syntax: English morphology== -===Adding a new language=== +===Worst-case functions and data abstraction=== -7. Now it is time to add a new language. Write a French lexicon ``LexFre.gf``: +Some English nouns, such as ``mouse``, are so irregular that +it makes no sense to see them as instances of a paradigm. Even +then, it is useful to perform **data abstraction** from the +definition of the type ``Noun``, and introduce a constructor +operation, a **worst-case function** for nouns: ``` - instance LexFre of Lex = open SyntaxFre, ParadigmsFre in { - oper - even_A = mkA "pair" ; - zero_PN = mkPN "zéro" ; - } + oper mkNoun : Str -> Str -> Noun = \x,y -> { + s = table { + Sg => x ; + Pl => y + } + } ; ``` -8. You also need a French concrete syntax, ``MathFre.gf``: +Thus we can define ``` - --# -path=.:present:prelude - - concrete MathFre of Math = MathI with - (Syntax = SyntaxFre), - (Lex = LexFre) ; + lin Mouse = mkNoun "mouse" "mice" ; ``` -9. This time, you can test multilingual generation: +and ``` - > i MathFre.gf - > gr | tb - Even Zero - zéro est pair - zero is even + oper regNoun : Str -> Noun = \x -> + mkNoun x (x + "s") ; ``` +instead of writing the inflection tables explicitly. + +The grammar engineering advantage of worst-case functions is that +the author of the resource module may change the definitions of +``Noun`` and ``mkNoun``, and still retain the +interface (i.e. the system of type signatures) that makes it +correct to use these functions in concrete modules. In programming +terms, ``Noun`` is then treated as an **abstract datatype**. -===Extending the language=== +===A system of paradigms using predefined string operations=== -10. You want to add a predicate saying that a number is odd. -It is first added to ``Math.gf``: +In addition to the completely regular noun paradigm ``regNoun``, +some other frequent noun paradigms deserve to be +defined, for instance, ``` - fun Odd : Elem -> Prop ; + sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ; ``` -11. You need a new word in ``Lex.gf``. +What about nouns like //fly//, with the plural //flies//? The already +available solution is to use the longest common prefix +//fl// (also known as the **technical stem**) as argument, and define ``` - oper odd_A : A ; + yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ; ``` -12. Then you can give a language-independent concrete syntax in -``MathI.gf``: +But this paradigm would be very unintuitive to use, because the technical stem +is not an existing form of the word. A better solution is to use +the lemma and a string operator ``init``, which returns the initial segment (i.e. +all characters but the last) of a string: ``` - lin Odd x = mkS (mkCl x odd_A) ; + yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ; ``` -13. The new word is implemented in ``LexEng.gf``. +The operation ``init`` belongs to a set of operations in the +resource module ``Prelude``, which therefore has to be +``open``ed so that ``init`` can be used. ``` - oper odd_A = mkA "odd" ; + > cc init "curry" + "curr" ``` -14. The new word is implemented in ``LexFre.gf``. +Its dual is ``last``: ``` - oper odd_A = mkA "impair" ; + > cc last "curry" + "y" ``` -15. Now you can test with the extended lexicon. First empty -the environment to get rid of the old abstract syntax, then -import the new versions of the grammars. +As generalizations of the library functions ``init`` and ``last``, GF has +two predefined funtions: +``Predef.dp``, which "drops" suffixes of any length, +and ``Predef.tk``, which "takes" a prefix +just omitting a number of characters from the end. For instance, ``` - > e - > i MathEng.gf - > i MathFre.gf - > gr | tb - And (Odd Zero) (Even Zero) - zéro est impair et zéro est pair - zero is odd and zero is even + > cc Predef.tk 3 "worried" + "worr" + > cc Predef.dp 3 "worried" + "ied" ``` +The prefix ``Predef`` is given to a handful of functions that could +not be defined internally in GF. They are available in all modules +without explicit ``open`` of the module ``Predef``. -==Building a user program== -===Producing a compiled grammar package=== +===An intelligent noun paradigm using pattern matching=== -16. Your grammar is going to be used by persons wh``MathEng.gf``o do not need -to compile it again. They may not have access to the resource library, -either. Therefore it is advisable to produce a multilingual grammar -package in a single file. We call this package ``math.gfcm`` and -produce it, when we have ``MathEng.gf`` and -``MathEng.gf`` in the GF state, by the command -``` - > pm | wf math.gfcm +It may be hard for the user of a resource morphology to pick the right +inflection paradigm. A way to help this is to define a more intelligent +paradigm, which chooses the ending by first analysing the lemma. +The following variant for English regular nouns puts together all the +previously shown paradigms, and chooses one of them on the basis of +the final letter of the lemma (found by the prelude operation ``last``). ``` - - -===Writing the Haskell application=== - -17. Write the Haskell main file ``Run.hs``. It uses the ``EmbeddedAPI`` -module defining some basic functionalities such as parsing. -The answer is produced by an interpreter of trees returned by the parser. + regNoun : Str -> Noun = \s -> case last s of { + "s" | "z" => mkNoun s (s + "es") ; + "y" => mkNoun s (init s + "ies") ; + _ => mkNoun s (s + "s") + } ; ``` -module Main where - -import GSyntax -import GF.Embed.EmbedAPI - -main :: IO () -main = do - gr <- file2grammar "math.gfcm" - loop gr +The paradigms ``regNoun`` does not give the correct forms for +all nouns. For instance, //mouse - mice// and +//fish - fish// must be given by using ``mkNoun``. +Also the word //boy// would be inflected incorrectly; to prevent +this, either use ``mkNoun`` or modify +``regNoun`` so that the ``"y"`` case does not +apply if the second-last character is a vowel. -loop :: MultiGrammar -> IO () -loop gr = do - s <- getLine - interpret gr s - loop gr +**Exercise**. Extend the ``regNoun`` paradigm so that it takes care +of all variations there are in English. Test it with the nouns +//ax//, //bamboo//, //boy//, //bush//, //hero//, //match//. +**Hint**. The library functions ``Predef.dp`` and ``Predef.tk`` +are useful in this task. -interpret :: MultiGrammar -> String -> IO () -interpret gr s = do - let tss = parseAll gr "Prop" s - case (concat tss) of - [] -> putStrLn "no parse" - t:_ -> print $ answer $ fg t +**Exercise**. The same rules that form plural nouns in English also +apply in the formation of third-person singular verbs. +Write a regular verb paradigm that uses this idea, but first +rewrite ``regNoun`` so that the analysis needed to build //s//-forms +is factored out as a separate ``oper``, which is shared with +``regVerb``. -answer :: GProp -> Bool -answer p = case p of - (GOdd x1) -> odd (value x1) - (GEven x1) -> even (value x1) - (GAnd x1 x2) -> answer x1 && answer x2 -value :: GElem -> Int -value e = case e of - GZero -> 0 -``` +===Morphological resource modules=== -18. The syntax trees manipulated by the interpreter are not raw -GF trees, but objects of the Haskell datatype ``GProp``. -From any GF grammar, a file ``GFSyntax.hs`` with -datatypes corresponding to its abstract -syntax can be produced by the command -``` - > pg -printer=haskell | wf GSyntax.hs -``` -The module also defines the overloaded functions -``gf`` and ``fg`` for translating from these types to -raw trees and back. +A common idiom is to +gather the ``oper`` and ``param`` definitions +needed for inflecting words in +a language into a morphology module. Here is a simple +example, [``MorphoEng`` resource/MorphoEng.gf]. +``` + --# -path=.:prelude + resource MorphoEng = open Prelude in { -===Compiling the Haskell grammar=== + param + Number = Sg | Pl ; -19. Before compiling ``Run.hs``, you must check that the -embedded GF modules are found. The easiest way to do this -is by two symbolic links to your GF source directories: -``` - $ ln -s /home/aarne/GF/src/GF - $ ln -s /home/aarne/GF/src/Transfer/ -``` + oper + Noun, Verb : Type = {s : Number => Str} ; -20. Now you can run the GHC Haskell compiler to produce the program. -``` - $ ghc --make -o math Run.hs -``` -The program can be tested with the command ``./math``. + mkNoun : Str -> Str -> Noun = \x,y -> { + s = table { + Sg => x ; + Pl => y + } + } ; + regNoun : Str -> Noun = \s -> case last s of { + "s" | "z" => mkNoun s (s + "es") ; + "y" => mkNoun s (init s + "ies") ; + _ => mkNoun s (s + "s") + } ; -===Building a distribution=== + mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ; -21. For a stand-alone binary-only distribution, only -the two files ``math`` and ``math.gfcm`` are needed. -For a source distribution, the files mentioned in -the beginning of this documents are needed. + regVerb : Str -> Verb = \s -> case last s of { + "s" | "z" => mkVerb s (s + "es") ; + "y" => mkVerb s (init s + "ies") ; + "o" => mkVerb s (s + "es") ; + _ => mkVerb s (s + "s") + } ; + } +``` +The first line gives as a hint to the compiler the +**search path** needed to find all the other modules that the +module depends on. The directory ``prelude`` is a subdirectory of +``GF/lib``; to be able to refer to it in this simple way, you can +set the environment variable ``GF_LIB_PATH`` to point to this +directory. -===Using a Makefile=== +===Morphological analysis and morphology quiz=== -22. As a part of the source distribution, a ``Makefile`` is -essential. The ``Makefile`` is also useful when developing the -application. It should always be possible to build an executable -from source by typing ``make``. Here is a minimal such ``Makefile``: +Even though morphology is in GF +mostly used as an auxiliary for syntax, it +can also be useful on its own right. The command ``morpho_analyse = ma`` +can be used to read a text and return for each word the analyses that +it has in the current concrete syntax. ``` - all: - echo "pm | wf math.gfcm" | gf MathEng.gf MathFre.gf - echo "pg -printer=haskell | wf GSyntax.hs" | gf math.gfcm - ghc --make -o math Run.hs + > rf bible.txt | morpho_analyse +``` +In the same way as translation exercises, morphological exercises can +be generated, by the command ``morpho_quiz = mq``. Usually, +the category is set to be something else than ``S``. For instance, ``` + > cd GF/lib/resource-1.0/ + > i french/IrregFre.gf + > morpho_quiz -cat=V + Welcome to GF Morphology Quiz. + ... -==The Embedded GF Haskell API== + réapparaître : VFin VCondit Pl P2 + réapparaitriez + > No, not réapparaitriez, but + réapparaîtriez + Score 0/1 +``` +Finally, a list of morphological exercises can be generated +off-line and saved in a +file for later use, by the command ``morpho_list = ml`` +``` + > morpho_list -number=25 -cat=V | wf exx.txt +``` +The ``number`` flag gives the number of exercises generated. -=Embedded grammars in Java= +==Concrete syntax: English phrase building== -Forthcoming; at the moment, the document - [``http://www.cs.chalmers.se/~bringert/gf/gf-java.html`` http://www.cs.chalmers.se/~bringert/gf/gf-java.html] +===Predication=== -by Björn Bringert gives more information on Java. +===Complementization=== -=Spoken language translators= +===Determination=== -=Multimodal dialogue systems= +===Modification=== -=Grammar of formal languages= -==Precedence and ficity== +===Putting the syntax together=== -==Higher-order abstract syntax== -==Extensible natural-language interfaces== +==Concrete syntax for Italian== + @@ -4899,11 +4878,11 @@ by Bj #PARTthree -=Syntax and semantics of the GF grammar formalism= +=Syntax and semantics of the GF language= =The resource grammar API= -=The GFC format= +=The low-level GFC format= =The command language of the GF shell= @@ -4994,7 +4973,7 @@ Thus the most silent way to invoke GF is -==GFDoc== +=Documenting grammars with GFDoc= @@ -5017,3 +4996,474 @@ GF Homepage: [``http://www.cs.chalmers.se/~aarne/GF/doc`` ../..] + +#startappendix + +#PARTbnf + +#twocolumn + +#PARTquickref + +#smallsize + + +This is a quick reference on GF grammars. It aims to +cover all forms of expression available when writing +grammars. It assumes basic knowledge of GF, which +can be acquired from the Tutorial part of this book. +For the commands of the GF system, help is obtained on line by the +help command (``help``). Help on invoking +GF from the shell is obtained with (``gf -help``). + + +==A complete example== + +This is a complete example of a GF grammar divided +into three modules in files. The grammar recognizes the +phrases //one pizza// and //two pizzas//. + +File ``Order.gf``: +``` +abstract Order = { +cat + Order ; + Item ; +fun + One, Two : Item -> Order ; + Pizza : Item ; +} +``` +File ``OrderEng.gf`` (the top file): +``` +--# -path=.:prelude +concrete OrderEng of Order = + open Res, Prelude in { +flags startcat=Order ; +lincat + Order = SS ; + Item = {s : Num => Str} ; +lin + One it = ss ("one" ++ it.s ! Sg) ; + Two it = ss ("two" ++ it.s ! Pl) ; + Pizza = regNoun "pizza" ; +} +``` +File ``Res.gf``: +``` +resource Res = open Prelude in { +param Num = Sg | Pl ; +oper regNoun : Str -> {s : Num => Str} = + \dog -> {s = table { + Sg => dog ; + _ => dog + "s" + } + } ; +} +``` +To use this example, do +``` + % gf -- in shell: start GF + > i OrderEng.gf -- in GF: import grammar + > p "one pizza" -- parse string + > l Two Pizza -- linearize tree +``` + + + +==Modules and files== + +One module per file. +File named ``Foo.gf`` contains module named +``Foo``. + +Each module has the structure +``` +moduletypename = + Inherits ** -- optional + open Opens in -- optional + { Judgements } +``` +Inherits are names of modules of the same type. +Inheritance can be restricted: +``` + Mo[f,g], -- inherit only f,g from Mo + Lo-[f,g] -- inheris all but f,g from Lo +``` +Opens are possible in ``concrete`` and ``resource``. +They are names of modules of these two types, possibly +qualified: +``` + (M = Mo), -- refer to f as M.f or Mo.f + (Lo = Lo) -- refer to f as Lo.f +``` +Module types and judgements in them: +``` +abstract A -- cat, fun, def, data +concrete C of A -- lincat, lin, lindef, printname +resource R -- param, oper + +interface I -- like resource, but can have + oper f : T without definition +instance J of I -- like resource, defines opers + that I leaves undefined +incomplete -- functor: concrete that opens + concrete CI of A = one or more interfaces + open I in ... +concrete CJ of A = -- completion: concrete that + CI with instantiates a functor by + (I = J) instances of open interfaces +``` +The forms +``param``, ``oper`` +may appear in ``concrete`` as well, but are then +not inherited to extensions. + +All modules can moreover have ``flags`` and comments. +Comments have the forms +``` +-- till the end of line +{- any number of lines between -} +--# used for compiler pragmas +``` +A ``concrete`` can be opened like a ``resource``. +It is translated as follows: +``` +cat C ---> oper C : Type = +lincat C = T T ** {lock_C : {}} + +fun f : G -> C ---> oper f : A* -> C* = \g -> +lin f = t t g ** {lock_C = <>} +``` +An ``abstract`` can be opened like an ``interface``. +Any ``concrete`` of it then works as an ``instance``. + + + +==Judgements== + +``` +cat C -- declare category C +cat C (x:A)(y:B x) -- dependent category C +cat C A B -- same as C (x : A)(y : B) +fun f : T -- declare function f of type T +def f = t -- define f as t +def f p q = t -- define f by pattern matching +data C = f | g -- set f,g as constructors of C +data f : A -> C -- same as + fun f : A -> C; data C=f + +lincat C = T -- define lin.type of cat C +lin f = t -- define lin. of fun f +lin f x y = t -- same as lin f = \x y -> t +lindef C = \s -> t -- default lin. of cat C +printname fun f = s -- printname shown in menus +printname cat C = s -- printname shown in menus +printname f = s -- same as printname fun f = s + +param P = C | D Q R -- define parameter type P + with constructors + C : P, D : Q -> R -> P +oper h : T = t -- define oper h of type T +oper h = t -- omit type, if inferrable + +flags p=v -- set value of flag p +``` +Judgements are terminated by semicolons (``;``). +Subsequent judgments of the same form may share the +keyword: +``` +cat C ; D ; -- same as cat C ; cat D ; +``` +Judgements can also share RHS: +``` +fun f,g : A -- same as fun f : A ; g : A +``` + + +==Types== + +Abstract syntax (in ``fun``): +``` +C -- basic type, if cat C +C a b -- basic type for dep. category +(x : A) -> B -- dep. functions from A to B +(_ : A) -> B -- nondep. functions from A to B +(p,q : A) -> B -- same as (p : A)-> (q : A) -> B +A -> B -- same as (_ : A) -> B +Int -- predefined integer type +Float -- predefined float type +String -- predefined string type +``` +Concrete syntax (in ``lincat``): +``` +Str -- token lists +P -- parameter type, if param P +P => B -- table type, if P param. type +{s : Str ; p : P}-- record type +{s,t : Str} -- same as {s : Str ; t : Str} +{a : A} **{b : B}-- record type extension, same as + {a : A ; b : B} +A * B * C -- tuple type, same as + {p1 : A ; p2 : B ; p3 : C} +Ints n -- type of n first integers +``` +Resource (in ``oper``): all those of concrete, plus +``` +Tok -- tokens (subtype of Str) +A -> B -- functions from A to B +Int -- integers +Strs -- list of prefixes (for pre) +PType -- parameter type +Type -- any type +``` +As parameter types, one can use any finite type: +``P`` defined in ``param P``, +``Ints n``, and record types of parameter types. + + + +==Expressions== + +Syntax trees = full function applications +``` +f a b -- : C if fun f : A -> B -> C +1977 -- : Int +3.14 -- : Float +"foo" -- : String +``` +Higher-Order Abstract syntax (HOAS): functions as arguments: +``` +F a (\x -> c) -- : C if a : A, c : C (x : B), + fun F : A -> (B -> C) -> C +``` +Tokens and token lists +``` +"hello" -- : Tok, singleton Str +"hello" ++ "world" -- : Str +["hello world"] -- : Str, same as "hello" ++ "world" +"hello" + "world" -- : Tok, computes to "helloworld" +[] -- : Str, empty list +``` +Parameters +``` +Sg -- atomic constructor +VPres Sg P2 -- applied constructor +{n = Sg ; p = P3} -- record of parameters +``` +Tables +``` +table { -- by full branches + Sg => "mouse" ; + Pl => "mice" + } +table { -- by pattern matching + Pl => "mice" ; + _ => "mouse" -- wildcard pattern + } +table { + n => regn n "cat" -- variable pattern + } +table Num {...} -- table given with arg. type +table ["ox"; "oxen"] -- table as course of values +\\_ => "fish" -- same as table {_ => "fish"} +\\p,q => t -- same as \\p => \\q => t + +t ! p -- select p from table t +case e of {...} -- same as table {...} ! e +``` +Records +``` +{s = "Liz"; g = Fem} -- record in full form +{s,t = "et"} -- same as {s = "et";t= "et"} +{s = "Liz"} ** -- record extension: same as + {g = Fem} {s = "Liz" ; g = Fem} + + -- tuple, same as {p1=a;p2=b;p3=c} +``` +Functions +``` +\x -> t -- lambda abstract +\x,y -> t -- same as \x -> \y -> t +\x,_ -> t -- binding not in t +``` +Local definitions +``` +let x : A = d in t -- let definition +let x = d in t -- let defin, type inferred +let x=d ; y=e in t -- same as + let x=d in let y=e in t +let {...} in t -- same as let ... in t + +t where {...} -- same as let ... in t +``` +Free variation +``` +variants {x ; y} -- both x and y possible +variants {} -- nothing possible +``` +Prefix-dependent choices +``` +pre {"a" ; "an" / v} -- "an" before v, "a" otherw. +strs {"a" ; "i" ;"o"}-- list of condition prefixes +``` +Typed expression +``` + -- same as t, to help type inference +``` +Accessing bound variables in ``lin``: use fields ``$1, $2, $3,...``. +Example: +``` +fun F : (A : Set) -> (El A -> Prop) -> Prop ; +lin F A B = {s = ["for all"] ++ A.s ++ B.$1 ++ B.s} +``` + + +==Pattern matching== + +These patterns can be used in branches of ``table`` and +``case`` expressions. Patterns are matched in the order in +which they appear in the grammar. +``` +C -- atomic param constructor +C p q -- param constr. applied to patterns +x -- variable, matches anything +_ -- wildcard, matches anything +"foo" -- string +56 -- integer +{s = p ; y = q} -- record, matches extensions too + -- tuple, same as {p1=p ; p2=q} +p | q -- disjunction, binds to first match +x@p -- binds x to what p matches +- p -- negation +p + "s" -- sequence of two string patterns +p* -- repetition of a string pattern +``` + +==Sample library functions== + +``` +-- lib/prelude/Predef.gf +drop : Int -> Tok -> Tok -- drop prefix of length +take : Int -> Tok -> Tok -- take prefix of length +tk : Int -> Tok -> Tok -- drop suffix of length +dp : Int -> Tok -> Tok -- take suffix of length +occur : Tok -> Tok -> PBool -- test if substring +occurs : Tok -> Tok -> PBool -- test if any char occurs +show : (P:Type) -> P ->Tok -- param to string +read : (P:Type) -> Tok-> P -- string to param +toStr : (L:Type) -> L ->Str -- find "first" string + +-- lib/prelude/Prelude.gf +param Bool = True | False +oper + SS : Type -- the type {s : Str} + ss : Str -> SS -- construct SS + cc2 : (_,_ : SS) -> SS -- concat SS's + optStr : Str -> Str -- string or empty + strOpt : Str -> Str -- empty or string + bothWays : Str -> Str -> Str -- X++Y or Y++X + init : Tok -> Tok -- all but last char + last : Tok -> Tok -- last char + prefixSS : Str -> SS -> SS + postfixSS : Str -> SS -> SS + infixSS : Str -> SS -> SS -> SS + if_then_else : (A : Type) -> Bool -> A -> A -> A + if_then_Str : Bool -> Str -> Str -> Str +``` + + +==Flags== + +Flags can appear, with growing priority, +- in files, judgement ``flags`` and without dash (``-``) +- as flags to ``gf`` when invoked, with dash +- as flags to various GF commands, with dash + + +Some common flags used in grammars: +``` +startcat=cat use this category as default + +lexer=literals int and string literals recognized +lexer=code like program code +lexer=text like text: spacing, capitals +lexer=textlit text, unknowns as string lits + +unlexer=code like program code +unlexer=codelit code, remove string lit quotes +unlexer=text like text: punctuation, capitals +unlexer=textlit text, remove string lit quotes +unlexer=concat remove all spaces +unlexer=bind remove spaces around "&+" + +optimize=all_subs best for almost any concrete +optimize=values good for lexicon concrete +optimize=all usually good for resource +optimize=noexpand for resource, if =all too big +``` +For the full set of values for ``FLAG``, +use on-line ``h -FLAG``. + + + +==File paths== + +Colon-separated lists of directories searched in the +given order: +``` +--# -path=.:../abstract:../common:prelude +``` +This can be (in order of growing preference), as +first line in the top file, as flag to ``gf`` +when invoked, or as flag to the ``i`` command. +The prefix ``--#`` is used only in files. + +If the environment variabls ``GF_LIB_PATH`` is defined, its +value is automatically prefixed to each directory to +extend the original search path. + + +==Alternative grammar formats== + +**Old GF** (before GF 2.0): +all judgements in any kinds of modules, +division into files uses ``include``s. +A file ``Foo.gf`` is recognized as the old format +if it lacks a module header. + +**Context-free** (file ``foo.cf``). The form of rules is e.g. +``` +Fun. S ::= NP "is" AP ; +``` +If ``Fun`` is omitted, it is generated automatically. +Rules must be one per line. The RHS can be empty. + +**Extended BNF** (file ``foo.ebnf``). The form of rules is e.g. +``` +S ::= (NP+ ("is" | "was") AP | V NP*) ; +``` +where the RHS is a regular expression of categories +and quoted tokens: ``"foo", CAT, T U, T|U, T*, T+, T?``, or empty. +Rule labels are generated automatically. + + +**Probabilistic grammars** (not a separate format). +You can set the probability of a function ``f`` (in its value category) by +``` +--# prob f 0.009 +``` +These are put into a file given to GF using the ``probs=File`` flag +on command line. This file can be the grammar file itself. + +**Example-based grammars** (file ``foo.gfe``). Expressions of the form +``` +in Cat "example string" +``` +are preprocessed by using a parser given by the flag +``` +--# -resource=File +``` +and the result is written to ``foo.gf``. + + -- cgit v1.2.3