diff options
Diffstat (limited to 'doc/tutorial/gf-tutorial2.txt')
| -rw-r--r-- | doc/tutorial/gf-tutorial2.txt | 809 |
1 files changed, 392 insertions, 417 deletions
diff --git a/doc/tutorial/gf-tutorial2.txt b/doc/tutorial/gf-tutorial2.txt index 4b20f38cd..9c3ae71b2 100644 --- a/doc/tutorial/gf-tutorial2.txt +++ b/doc/tutorial/gf-tutorial2.txt @@ -1,5 +1,5 @@ Grammatical Framework Tutorial -Author: Aarne Ranta <aarne (at) cs.chalmers.se> +Author: Aarne Ranta aarne (at) cs.chalmers.se Last update: %%date(%c) % NOTE: this is a txt2tags file. @@ -20,7 +20,7 @@ Last update: %%date(%c) -[../gf-logo.gif] +[../gf-logo.png] @@ -552,7 +552,7 @@ module forms are %--! -===Record types, records, and ``Str``s=== +===Records and strings=== The linearization type of a category is a **record type**, with zero of more **fields** of different types. The simplest record @@ -922,7 +922,7 @@ The graph uses %--! -==System commands== +===System commands=== To document your grammar, you may want to print the graph into a file, e.g. a ``.png`` file that @@ -1385,7 +1385,8 @@ Why does the command also show the operations that form the same as the value of ``Noun``. -==Using morphology in concrete syntax== + +==Using parameters in concrete syntax== We can now enrich the concrete syntax definitions to comprise morphology. This will involve a more radical @@ -1596,8 +1597,246 @@ of a grammar application. %--! +===Free variation=== + +Sometimes there are many alternative ways to define a concrete syntax. +For instance, the verb negation in English can be expressed both by +//does not// and //doesn't//. In linguistic terms, these expressions +are in **free variation**. The ``variants`` construct of GF can +be used to give a list of strings in free variation. For example, +``` + NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ; +``` +An empty variant list +``` + variants {} +``` +can be used e.g. if a word lacks a certain form. + +In general, ``variants`` should be used cautiously. It is not +recommended for modules aimed to be libraries, because the +user of the library has no way to choose among the variants. + + +===Overloading of operations=== + +Large libraries, such as the GF Resource Grammar Library, may define +hundreds of names, which can be unpractical +for both the library writer and the user. The writer has to invent longer +and longer names which are not always intuitive, +and the user has to learn or at least be able to find all these names. +A solution to this problem, adopted by languages such as C++, is **overloading**: +the same name can be used for several functions. When such a name is used, the +compiler performs **overload resolution** to find out which of the possible functions +is meant. The resolution is based on the types of the functions: all functions that +have the same name must have different types. + +In C++, functions with the same name can be scattered everywhere in the program. +In GF, they must be grouped together in ``overload`` groups. Here is an example +of an overload group, defining four ways to define nouns in Italian: +``` + oper mkN = overload { + mkN : Str -> N = -- regular nouns + mkN : Str -> Gender -> N = -- regular nouns with unexpected gender + mkN : Str -> Str -> N = -- irregular nouns + mkN : Str -> Str -> Gender -> N = -- irregular nouns with unexpected gender + } +``` +All of the following uses of ``mkN`` are easy to resolve: +``` + lin Pizza = mkN "pizza" ; -- Str -> N + lin Hand = mkN "mano" Fem ; -- Str -> Gender -> N + lin Man = mkN "uomo" "uomini" ; -- Str -> Str -> N +``` + + + + + + + +%--! +==Using the resource grammar library TODO== + +A resource grammar is a grammar built on linguistic grounds, +to describe a language rather than a domain. +The GF resource grammar library, which contains resource grammars for +10 languages, is described more closely in the following +documents: +- [Resource library API documentation ../../lib/resource-1.0/doc/]: + for application grammarians using the resource. +- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]: + for resource grammarians developing the resource. + + +===Interfaces, instances, and functors=== + +===The simplest way=== + +The simplest way is to ``open`` a top-level ``Lang`` module +and a ``Paradigms`` module: +``` + abstract Foo = ... + + concrete FooEng = open LangEng, ParadigmsEng in ... + concrete FooSwe = open LangSwe, ParadigmsSwe in ... +``` +Here is an example. +``` +abstract Arithm = { + cat + Prop ; + Nat ; + fun + Zero : Nat ; + Succ : Nat -> Nat ; + Even : Nat -> Prop ; + And : Prop -> Prop -> Prop ; +} + +--# -path=.:alltenses:prelude + +concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in { + lincat + Prop = S ; + Nat = NP ; + lin + Zero = + UsePN (regPN "zero" nonhuman) ; + Succ n = + DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ; + Even n = + UseCl TPres ASimul PPos + (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ; + And x y = + ConjS and_Conj (BaseS x y) ; + +} + +--# -path=.:alltenses:prelude + +concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in { + lincat + Prop = S ; + Nat = NP ; + lin + Zero = + UsePN (regPN "noll" neutrum) ; + Succ n = + DetCN (DetSg (SgQuant DefArt) NoOrd) + (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") + (mkPreposition "till")) n) ; + Even n = + UseCl TPres ASimul PPos + (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ; + And x y = + ConjS and_Conj (BaseS x y) ; +} +``` + + +===How to find resource functions=== + +The definitions in this example were found by parsing: +``` + > i LangEng.gf + + -- for Successor: + > p -cat=NP -mcfg -parser=topdown "the mother of Paris" + + -- for Even: + > p -cat=S -mcfg -parser=topdown "Paris is old" + + -- for And: + > p -cat=S -mcfg -parser=topdown "Paris is old and I am old" +``` +The use of parsing can be systematized by **example-based grammar writing**, +to which we will return later. + + +===A functor implementation=== + +The interesting thing now is that the +code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for +some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor", +"jämn" vs. "even"). How can we exploit the similarities and +actually share code between the languages? + +The solution is to use a functor: an ``incomplete`` module that opens +an ``abstract`` as an ``interface``, and then instantiate it to different +languages that implement the interface. The structure is as follows: +``` + abstract Foo ... + + incomplete concrete FooI = open Lang, Lex in ... + + concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ; + concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ; +``` +where ``Lex`` is an abstract lexicon that includes the vocabulary +specific to this application: +``` + abstract Lex = Cat ** ... + + concrete LexEng of Lex = CatEng ** open ParadigmsEng in ... + concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ... +``` +Here, again, a complete example (``abstract Arithm`` is as above): +``` +incomplete concrete ArithmI of Arithm = open Lang, Lex in { + lincat + Prop = S ; + Nat = NP ; + lin + Zero = + UsePN zero_PN ; + Succ n = + DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ; + Even n = + UseCl TPres ASimul PPos + (PredVP n (UseComp (CompAP (PositA even_A)))) ; + And x y = + ConjS and_Conj (BaseS x y) ; +} + +--# -path=.:alltenses:prelude +concrete ArithmEng of Arithm = ArithmI with + (Lang = LangEng), + (Lex = LexEng) ; + +--# -path=.:alltenses:prelude +concrete ArithmSwe of Arithm = ArithmI with + (Lang = LangSwe), + (Lex = LexSwe) ; + +abstract Lex = Cat ** { + fun + zero_PN : PN ; + successor_N2 : N2 ; + even_A : A ; +} + +concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in { + lin + zero_PN = regPN "noll" neutrum ; + successor_N2 = + mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ; + even_A = regA "jämn" ; +} +``` + +===Restricted inheritance and qualified opening=== + + + + +%--! ==More constructs for concrete syntax== +In this chapter, we go through constructs that are not necessary in simple grammars +or when the concrete syntax relies on libraries, but very useful when writing advanced +concrete syntax implementations, such as resource grammar libraries. + %--! ===Local definitions=== @@ -1621,48 +1860,6 @@ the same expression. Here is an example, from ``` - - -%--! -===Free variation=== - -Sometimes there are many alternative ways to define a concrete syntax. -For instance, the verb negation in English can be expressed both by -//does not// and //doesn't//. In linguistic terms, these expressions -are in **free variation**. The ``variants`` construct of GF can -be used to give a list of strings in free variation. For example, -``` - NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ; -``` -An empty variant list -``` - variants {} -``` -can be used e.g. if a word lacks a certain form. - -In general, ``variants`` should be used cautiously. It is not -recommended for modules aimed to be libraries, because the -user of the library has no way to choose among the variants. - -%Moreover, ``variants`` is only defined for basic types (``Str`` -%and parameter types). The grammar compiler will admit -%``variants`` for any types, but it will push it to the -%level of basic types in a way that may be unwanted. -%For instance, German has two words meaning "car", -%//Wagen//, which is Masculine, and //Auto//, which is Neuter. -%However, if one writes -%``` -% variants {{s = "Wagen" ; g = Masc} ; {s = "Auto" ; g = Neutr}} -%``` -%this will compute to -%``` -% {s = variants {"Wagen" ; "Auto"} ; g = variants {Masc ; Neutr}} -%``` -%which will also accept erroneous combinations of strings and genders. - - - - ===Record extension and subtyping=== Record types and records can be **extended** with new fields. For instance, @@ -1707,7 +1904,7 @@ A typical example is a record of agreement features, e.g. French oper Agr : PType = {g : Gender ; n : Number ; p : Person} ; ``` Notice the term ``PType`` rather than just ``Type`` referring to -parameter types. Every ``PType`` is also a ``Type``. +parameter types. Every ``PType`` is also a ``Type``, but not vice-versa. Pattern matching is done in the expected way, but it can moreover utilize partial records: the branch @@ -1728,6 +1925,7 @@ possible to write, slightly surprisingly, } ``` + %--! ===Regular expression patterns=== @@ -1742,8 +1940,22 @@ as in morphology, it is handy to use regular expression patterns: The last three apply to all types of patterns, the first two only to token strings. -Example: plural formation in Swedish 2nd declension -(//pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar//): +As an example, we give a rule for the formation of English word forms +ending with an //s// and used in the formation of both plural nouns and +third-person present-tense verbs. +``` + add_s : Str -> Str = \w -> case w of { + _ + "oo" => s + "s" ; -- bamboo + _ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ; -- bus, hero + _ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; -- boy + x + "y" => x + "ies" ; -- fly + _ => w + "s" -- car + } ; +``` +Here is another example, the plural formation in Swedish 2nd declension. +The second branch uses a variable binding with ``@`` to cover the cases where an +unstressed pre-final vowel //e// disappears in the plural +(//nyckel-nycklar, seger-segrar, bil-bilar//): ``` plural2 : Str -> Str = \w -> case w of { pojk + "e" => pojk + "ar" ; @@ -1751,15 +1963,8 @@ Example: plural formation in Swedish 2nd declension bil => bil + "ar" } ; ``` -Another example: English noun plural formation. -``` - plural : Str -> Str = \w -> case w of { - _ + ("s" | "z" | "x" | "sh") => w + "es" ; - _ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; - x + "y" => x + "ies" ; - _ => w + "s" - } ; -``` + + Semantics: variables are always bound to the **first match**, which is the first in the sequence of binding lists ``Match p v`` defined as follows. In the definition, ``p`` is a pattern and ``v`` is a value. @@ -1834,8 +2039,15 @@ FIXME: The linearization type is ``{s : Str}`` for all these categories. + ==More concepts of abstract syntax== +This section is about the use of the type theory part of GF for +including more semantics in grammars. Some of the subsections present +ideas that have not yet been used in real-world applications, and whose +tool support outside the GF program is not complete. + + ===GF as a logical framework=== In this section, we will show how @@ -1852,8 +2064,8 @@ of such a theory, represented as an ``abstract`` module in GF. ``` abstract Arithm = { cat - Prop ; -- proposition - Nat ; -- natural number + Prop ; -- proposition + Nat ; -- natural number fun Zero : Nat ; -- 0 Succ : Nat -> Nat ; -- successor of x @@ -1899,11 +2111,9 @@ abstract Address = { OxfordSt, ShaftesburyAve, BdRaspail, RueBlondel, AvAlsaceLorraine : Street ; } ``` -The linearization rules -are straightforward, +The linearization rules are straightforward, ``` lin - mkAddress country city street = ss (street.s ++ "," ++ city.s ++ "," ++ country.s) ; UK = ss ("U.K.") ; @@ -1917,13 +2127,11 @@ are straightforward, RueBlondel = ss ("rue" ++ "Blondel") ; AvAlsaceLorraine = ss ("avenue" ++ "Alsace-Lorraine") ; ``` -with the exception of ``mkAddress``, where we have +Notice that, in ``mkAddress``, we have reversed the order of the constituents. The type of ``mkAddress`` in the abstract syntax takes its arguments in a "logical" order, -with increasing precision. (This order is sometimes even used in the concrete -syntax of addresses, e.g. in Russia). - - +with increasing precision. (This order is sometimes even used in the +concrete syntax of addresses, e.g. in Russia). Both existing and non-existing addresses are recognized by this grammar. The non-existing ones in the following randomly generated @@ -1943,10 +2151,11 @@ Dependent types provide a way to guarantee that addresses are well-formed. What we do is to include **contexts** in ``cat`` judgements: ``` - cat Address ; - cat Country ; - cat City Country ; - cat Street (x : Country)(y : City x) ; + cat + Address ; + Country ; + City Country ; + Street (x : Country)(City x) ; ``` The first two judgements are as before, but the third one makes ``City`` dependent on ``Country``: there are no longer just cities, @@ -1966,19 +2175,18 @@ which is only possible if the subsequent context does not depend on ``x``. The ``fun`` judgements of the grammar are modified accordingly: ``` fun + mkAddress : (x : Country) -> (y : City x) -> Street x y -> Address ; - mkAddress : (x : Country) -> (y : City x) -> Street x y -> Address ; - - UK : Country ; - France : Country ; - Paris : City France ; - London : City UK ; - Grenoble : City France ; - OxfordSt : Street UK London ; - ShaftesburyAve : Street UK London ; - BdRaspail : Street France Paris ; - RueBlondel : Street France Paris ; - AvAlsaceLorraine : Street France Grenoble ; + UK : Country ; + France : Country ; + Paris : City France ; + London : City UK ; + Grenoble : City France ; + OxfordSt : Street UK London ; + ShaftesburyAve : Street UK London ; + BdRaspail : Street France Paris ; + RueBlondel : Street France Paris ; + AvAlsaceLorraine : Street France Grenoble ; ``` Since the type of ``mkAddress`` now has dependencies among its argument types, we have to use variables just like we used in @@ -2012,6 +2220,12 @@ sometimes shortens the code, since we can write e.g. ``` oper triple : (x,y,z : Str) -> Str = ... ``` +If a bound variable is not used, it can here, as elswhere in GF, be replaced by +a wildcard: +``` + oper triple : (_,_,_ : Str) -> Str = ... +``` + ===Dependent types in concrete syntax=== @@ -2074,8 +2288,8 @@ For instance, the sentence is syntactically well-formed but semantically ill-formed. It is well-formed because it combines a well-formed noun phrase ("the number 2") with a well-formed -verb phrase ("is equilateral") in accordance with the -rule that the verb phrase is inflected in the +verb phrase ("is equilateral") and satisfies the agreement +rule saying that the verb phrase is inflected in the number of the noun phrase: ``` fun PredVP : NP -> VP -> S ; @@ -2117,6 +2331,7 @@ but no proposition linearized to the number 2 is equilateral ``` since ``Equilateral two`` is not a well-formed type-theoretical object. +It is not even accepted by the context-free parser. When formalizing mathematics, e.g. in the purpose of computer-assisted theorem proving, we are certainly interested @@ -2147,70 +2362,32 @@ and dependencies of other categories on this: ``` cat S ; -- sentence - V1 Dom ; -- one-place verb - V2 Dom Dom ; -- two-place verb + V1 Dom ; -- one-place verb with specific subject type + V2 Dom Dom ; -- two-place verb with specific subject and object types A1 Dom ; -- one-place adjective A2 Dom Dom ; -- two-place adjective - PN Dom ; -- proper name - NP Dom ; -- noun phrase + NP Dom ; -- noun phrase for an object of specific type Conj ; -- conjunction Det ; -- determiner ``` -The number of ``Dom`` arguments depends on the semantic type -corresponding to the category: one-place verbs and adjectives -correspond to types of the form -``` - A -> Prop -``` -whereas two-place verbs and adjectives correspond to types of the form -``` - A -> B -> Prop -``` -where the domains ``A`` and ``B`` can be distinct. -Proper names correspond to types of the form -``` - A -``` -that is, individual objects of the domain ``A``. Noun phrases -correspond to -``` - (A -> Prop) -> Prop -``` -that is, **quantifiers** over the domain ``A``. -Sentences, conjunctions, and determiners correspond to -``` - Prop - Prop -> Prop -> Prop - (A : Dom) -> (A -> Prop) -> Prop -``` -respectively, -and are thus independent of domain. As for common nouns ``CN``, -the simplest semantics is that they correspond to -``` - Dom -``` -In this section, we will, in fact, write ``Dom`` instead of ``CN``. - Having thus parametrized categories on domains, we have to reformulate the rules of predication, etc, accordingly. This is straightforward: ``` fun PredV1 : (A : Dom) -> NP A -> V1 A -> S ; ComplV2 : (A,B : Dom) -> V2 A B -> NP B -> V1 A ; - UsePN : (A : Dom) -> PN A -> NP A ; DetCN : Det -> (A : Dom) -> NP A ; ModA1 : (A : Dom) -> A1 A -> Dom ; ConjS : Conj -> S -> S -> S ; ConjV1 : (A : Dom) -> Conj -> V1 A -> V1 A -> V1 A ; ``` In linearization rules, -we typically use wildcards for the domain arguments, -to get arities right: +we use wildcards for the domain arguments, +because they don't affect linearization: ``` lin PredV1 _ np vp = ss (np.s ++ vp.s) ; ComplV2 _ _ v2 np = ss (v2.s ++ np.s) ; - UsePN _ pn = pn ; DetCN det cn = ss (det.s ++ cn.s) ; ModA1 cn a1 = ss (a1.s ++ cn.s) ; ConjS conj s1 s2 = ss (s1.s ++ conj.s ++ s2.s) ; @@ -2232,20 +2409,19 @@ To explain the contrast, we introduce the functions human : Dom ; game : Dom ; play : V2 human game ; - John : PN human ; - Golf : PN game ; + John : NP human ; + Golf : NP game ; ``` Both sentences still pass the context-free parser, returning trees with lots of metavariables of type ``Dom``: ``` - PredV1 ?0 (UsePN ?1 John) (ComplV2 ?2 ?3 play (UsePN ?4 Golf)) - - PredV1 ?0 (UsePN ?1 Golf) (ComplV2 ?2 ?3 play (UsePN ?4 John)) + PredV1 ?0 John (ComplV2 ?1 ?2 play Golf) + PredV1 ?0 Golf (ComplV2 ?1 ?2 play John) ``` But only the former sentence passes the type checker, which moreover infers the domain arguments: ``` - PredV1 human (UsePN human John) (ComplV2 human game play (UsePN game Golf)) + PredV1 human John (ComplV2 human game play Golf) ``` To try this out in GF, use ``pt = put_term`` with the **tree transformation** that solves the metavariables by type checking: @@ -2261,7 +2437,7 @@ or less liberal. For instance, John loves Mary John loves golf ``` -both make sense, even though ``Mary`` and ``golf`` +should both make sense, even though ``Mary`` and ``golf`` are of different types. A natural solution in this case is to formalize ``love`` as a **polymorphic** verb, which takes a human as its first argument but an object of any type as its second @@ -2270,19 +2446,21 @@ argument: fun love : (A : Dom) -> V2 human A ; lin love _ = ss "loves" ; ``` -Problems remain, such as **subtyping** (e.g. what -is meaningful for a ``human`` is also meaningful for -a ``man`` and a ``woman``, but not the other way round) -and the **extended use** of expressions (e.g. a metaphoric use that -makes sense of "golf plays John"). +In other words, it is possible for a human to love anything. +A problem related to polymorphism is **subtyping**: what +is meaningful for a ``human`` is also meaningful for +a ``man`` and a ``woman``, but not the other way round. +One solution to this is **coercions**: functions that +"lift" objects from subtypes to supertypes. +===Case study: selectional restrictions and statistical language models TODO=== ===Proof objects=== -Perhaps the most well-known feature of constructive type theory is +Perhaps the most well-known idea in constructive type theory is the **Curry-Howard isomorphism**, also known as the **propositions as types principle**. Its earliest formulations were attempts to give semantics to the logical systems of @@ -2300,14 +2478,14 @@ natural numbers: The **successor function** ``Succ`` generates an infinite sequence of natural numbers, beginning from ``Zero``. -We then define what it means for a number //x// to be less than +We then define what it means for a number //x// to be //less than// a number //y//. Our definition is based on two axioms: -- ``Zero`` is less than ``Succ y`` for any ``y``. -- If ``x`` is less than ``y``, then``Succ x`` is less than ``Succ y``. +- ``Zero`` is less than ``Succ`` //y// for any //y//. +- If //x// is less than //y//, then``Succ`` //x// is less than ``Succ`` //y//. The most straightforward way of expressing these axioms in type theory -is as typing judgements that introduce objects of a type ``Less x y``: +is as typing judgements that introduce objects of a type ``Less`` //x y //: ``` cat Less Nat Nat ; fun lessZ : (y : Nat) -> Less Zero (Succ y) ; @@ -2326,14 +2504,13 @@ whose type is ``` Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero)))) ``` -which is the same thing as the proposition that 2 is less than 4. +which is the formalization of the proposition that 2 is less than 4. GF grammars can be used to provide a **semantic control** of well-formedness of expressions. We have already seen examples of this: the grammar of well-formed addresses and the grammar with selectional restrictions above. By introducing proof objects -we have now added a very powerful -technique of expressing semantic conditions. +we have now added a very powerful technique of expressing semantic conditions. A simple example of the use of proof objects is the definition of well-formed //time spans//: a time span is expected to be from an earlier to @@ -2351,7 +2528,35 @@ by using the ``Less`` predicate: cat Span ; fun span : (m,n : Nat) -> Less m n -> Span ; ``` +A possible practical application of this idea is **proof-carrying documents**: +to be semantically well-formed, the abstract syntax of a document must contain a proof +of some property, although the proof is not shown in the concrete document. +Think, for instance, of small documents describing flight connections: +//To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.// + +The well-formedness of this text is partly expressible by dependent typing: +``` + cat + City ; + Flight City City ; + fun + Gothenburg, Frankfurt, Prague : City ; + LH3043 : Flight Gothenburg Frankfurt ; + OK0537 : Flight Frankfurt Prague ; +``` +This rules out texts saying //take OK0537 from Gothenburg to Prague//. However, there is a +further condition saying that it must be possible to change from LH3043 to OK0537 in Frankfurt. +This can be modelled as a proof object of a suitable type, which is required by the constructor +that connects flights. +``` + cat + IsPossible (x,y,z : City)(Flight x y)(Flight y z) ; + fun + Connect : (x,y,z : City) -> + (u : Flight x y) -> (v : Flight y z) -> + IsPossible x y z u v -> Flight x z ; +``` @@ -2364,8 +2569,8 @@ a universally quantifier proposition (All x)B(x) ``` consists of the **binding** ``(All x)`` of the variable ``x``, -and the **body** ``B(x)``, where the variable ``x`` is -said to occur bound. +and the **body** ``B(x)``, where the variable ``x`` can have +**bound occurrences**. Variable bindings appear in informal mathematical language as well, for instance, @@ -2433,7 +2638,6 @@ since the linearization type of ``Prop`` is ``` {s : Str} ``` -(we remind that the order of fields in a record does not matter). In other words, the linearization of a function consists of a linearization of the body together with a field for a linearization of the bound variable. @@ -2442,13 +2646,13 @@ should notice that GF requires trees to be in **eta-expanded** form in order to be linearizable: any function of type ``` - A -> C + A -> B ``` always has a syntax tree of the form ``` - \x -> c + \x -> b ``` -where ``c : C`` under the assumption ``x : A``. +where ``b : B`` under the assumption ``x : A``. It is in this form that an expression can be analysed as having a bound variable and a body. @@ -2477,8 +2681,7 @@ linearized into the same strings that represent them in the print-out of the abstract syntax. -To be able to -//parse// variable symbols, however, GF needs to know what +To be able to //parse// variable symbols, however, GF needs to know what to look for (instead of e.g. trying to parse //any// string as a variable). What strings are parsed as variable symbols is defined in the lexical analysis part of GF parsing @@ -2486,9 +2689,8 @@ is defined in the lexical analysis part of GF parsing > p -cat=Prop -lexer=codevars "(All x)(x = x)" All (\x -> Eq x x) ``` -(see more details on lexers below). -If several variables are bound in the same argument, the -labels are ``$0, $1, $2``, etc. +(see more details on lexers below). If several variables are bound in the +same argument, the labels are ``$0, $1, $2``, etc. @@ -2506,12 +2708,11 @@ us very much. As we will see, however, computation does play a role even in the well-formedness of expressions when dependent types are present. - GF has a form of judgement for **semantic definitions**, recognized by the key word ``def``. At its simplest, it is just the definition of one constant, e.g. ``` - def one = succ zero ; + def one = Succ Zero ; ``` We can also define a function with arguments, ``` @@ -2520,24 +2721,25 @@ We can also define a function with arguments, which is still a special case of the most general notion of definition, that of a group of **pattern equations**: ``` - def sum x zero = x ; - def sum x (succ y) = succ (sum x y) ; + def + sum x Zero = x ; + sum x (Succ y) = Succ (Sum x y) ; ``` To compute a term is, as in functional programming languages, simply to follow a chain of reductions until no definition can be applied. For instance, we compute ``` - sum one one --> - sum (succ zero) (succ zero) --> - succ (sum (succ zero) zero) --> - succ (succ zero) + Sum one one --> + Sum (Succ Zero) (Succ Zero) --> + Succ (sum (Succ Zero) Zero) --> + Succ (Succ Zero) ``` Computation in GF is performed with the ``pt`` command and the ``compute`` transformation, e.g. ``` > p -tr "1 + 1" | pt -transform=compute -tr | l sum one one - succ (succ zero) + Succ (Succ Zero) s(s(0)) ``` @@ -2548,9 +2750,9 @@ Thus, trivially, all trees in a chain of computation (such as the one above) are definitionally equal to each other. So are the trees ``` - sum zero (succ one) - succ one - sum (sum zero zero) (sum (succ zero) one) + sum Zero (Succ one) + Succ one + sum (sum Zero Zero) (sum (Succ Zero) one) ``` and infinitely many other trees. @@ -2558,8 +2760,8 @@ A fact that has to be emphasized about ``def`` definitions is that they are //not// performed as a first step of linearization. We say that **linearization is intensional**, which means that the definitional equality of two trees does not imply that -they have the same linearizations. For instance, the seven terms -above all have different linearizations in arithmetic notation: +they have the same linearizations. For instance, each of the seven terms +shown above has a different linearizations in arithmetic notation: ``` 1 + 1 s(0) + s(0) @@ -2587,7 +2789,7 @@ Two types depending on terms that are definitionally equal are equal types. For instance, ``` Proof (Odd one) - Proof (Odd (succ zero)) + Proof (Odd (Succ Zero)) ``` are equal types. Hence, any tree that type checks as a proof that 1 is odd also type checks as a proof that the successor of 0 is odd. @@ -2609,13 +2811,12 @@ unidiomatic or ambiguous. In such a case, the translation algorithm may be made to consider translation by a paraphrase. - To stress express the distinction between **constructors** (=**canonical** functions) and other functions, GF has a judgement form ``data`` to tell that certain functions are canonical, e.g. ``` - data Nat = succ | zero ; + data Nat = Succ | Zero ; ``` Unlike in Haskell, but similarly to ALF (where constructor functions are marked with a flag ``C``), @@ -2624,247 +2825,19 @@ a type with new ``data`` judgements. The type signatures of constructors are given separately, in ordinary ``fun`` judgements. One can also write directly ``` - data succ : Nat -> Nat ; + data Succ : Nat -> Nat ; ``` which is equivalent to the two judgements ``` - fun succ : Nat -> Nat ; - data Nat = succ ; -``` - - - - -%--! -==More features of the module system== - -===Interfaces, instances, and functors=== - - -===Resource grammars and their reuse=== - -A resource grammar is a grammar built on linguistic grounds, -to describe a language rather than a domain. -The GF resource grammar library, which contains resource grammars for -10 languages, is described more closely in the following -documents: -- [Resource library API documentation ../../lib/resource-1.0/doc/]: - for application grammarians using the resource. -- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]: - for resource grammarians developing the resource. - - -However, to give a flavour of both using and writing resource grammars, -we have created a miniature resource, which resides in the -subdirectory [``resource`` resource]. Its API consists of the following -three modules: - -[Syntax resource/Syntax.gf] - syntactic structures, language-independent: -``` - -``` -[LexEng resource/LexEng.gf] - lexical paradigms, English: -``` - -``` -[LexIta resource/LexIta.gf] - lexical paradigms, Italian: -``` - -``` - - -Only these three modules should be ``open``ed in applications. -The implementations of the resource are given in the following four modules: - -[MorphoEng resource/MorphoEng.gf], -``` - -``` -[MorphoIta resource/MorphoIta.gf]: low-level morphology -- [SyntaxEng resource/SyntaxEng.gf]. - [SyntaxIta resource/SyntaxIta.gf]: definitions of syntactic structures - - -An example use of the resource resides in the -subdirectory [``applications`` applications]. -It implements the abstract syntax -[``FoodComments`` applications/FoodComments.gf] for English and Italian. -The following diagram shows the module structure, indicating by -colours which modules are written by the grammarian. The two blue modules -form the abstract syntax. The three red modules form the concrete syntax. -The two green modules are trivial instantiations of a functor. -The rest of the modules (black) come from the resource. - -[Multi.png] - - - -===Restricted inheritance and qualified opening=== - - -==Using the standard resource library== - -The example files of this chapter can be found in -the directory [``arithm`` ./arithm]. - - -===The simplest way=== - -The simplest way is to ``open`` a top-level ``Lang`` module -and a ``Paradigms`` module: -``` - abstract Foo = ... - - concrete FooEng = open LangEng, ParadigmsEng in ... - concrete FooSwe = open LangSwe, ParadigmsSwe in ... -``` -Here is an example. -``` -abstract Arithm = { - cat - Prop ; - Nat ; - fun - Zero : Nat ; - Succ : Nat -> Nat ; - Even : Nat -> Prop ; - And : Prop -> Prop -> Prop ; -} - ---# -path=.:alltenses:prelude - -concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in { - lincat - Prop = S ; - Nat = NP ; - lin - Zero = - UsePN (regPN "zero" nonhuman) ; - Succ n = - DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ; - Even n = - UseCl TPres ASimul PPos - (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ; - And x y = - ConjS and_Conj (BaseS x y) ; - -} - ---# -path=.:alltenses:prelude - -concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in { - lincat - Prop = S ; - Nat = NP ; - lin - Zero = - UsePN (regPN "noll" neutrum) ; - Succ n = - DetCN (DetSg (SgQuant DefArt) NoOrd) - (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") - (mkPreposition "till")) n) ; - Even n = - UseCl TPres ASimul PPos - (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ; - And x y = - ConjS and_Conj (BaseS x y) ; -} -``` - - -===How to find resource functions=== - -The definitions in this example were found by parsing: -``` - > i LangEng.gf - - -- for Successor: - > p -cat=NP -mcfg -parser=topdown "the mother of Paris" - - -- for Even: - > p -cat=S -mcfg -parser=topdown "Paris is old" - - -- for And: - > p -cat=S -mcfg -parser=topdown "Paris is old and I am old" -``` -The use of parsing can be systematized by **example-based grammar writing**, -to which we will return later. - - -===A functor implementation=== - -The interesting thing now is that the -code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for -some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor", -"jämn" vs. "even"). How can we exploit the similarities and -actually share code between the languages? - -The solution is to use a functor: an ``incomplete`` module that opens -an ``abstract`` as an ``interface``, and then instantiate it to different -languages that implement the interface. The structure is as follows: -``` - abstract Foo ... - - incomplete concrete FooI = open Lang, Lex in ... - - concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ; - concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ; -``` -where ``Lex`` is an abstract lexicon that includes the vocabulary -specific to this application: + fun Succ : Nat -> Nat ; + data Nat = Succ ; ``` - abstract Lex = Cat ** ... - concrete LexEng of Lex = CatEng ** open ParadigmsEng in ... - concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ... -``` -Here, again, a complete example (``abstract Arithm`` is as above): -``` -incomplete concrete ArithmI of Arithm = open Lang, Lex in { - lincat - Prop = S ; - Nat = NP ; - lin - Zero = - UsePN zero_PN ; - Succ n = - DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ; - Even n = - UseCl TPres ASimul PPos - (PredVP n (UseComp (CompAP (PositA even_A)))) ; - And x y = - ConjS and_Conj (BaseS x y) ; -} ---# -path=.:alltenses:prelude -concrete ArithmEng of Arithm = ArithmI with - (Lang = LangEng), - (Lex = LexEng) ; - ---# -path=.:alltenses:prelude -concrete ArithmSwe of Arithm = ArithmI with - (Lang = LangSwe), - (Lex = LexSwe) ; - -abstract Lex = Cat ** { - fun - zero_PN : PN ; - successor_N2 : N2 ; - even_A : A ; -} - -concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in { - lin - zero_PN = regPN "noll" neutrum ; - successor_N2 = - mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ; - even_A = regA "jämn" ; -} -``` +===Case study: representing anaphoric reference TODO=== - -==Transfer modules== +==Transfer modules TODO== Transfer means noncompositional tree-transforming operations. The command ``apply_transfer = at`` is typically used in a pipe: @@ -2880,7 +2853,7 @@ See the for more information. -==Practical issues== +==Practical issues TODO== ===Lexers and unlexers=== @@ -2913,7 +2886,6 @@ Given by ``help -lexer``, ``help -unlexer``: -unlexer=codelit like code, but remove string literal quotes -unlexer=concat remove all spaces -unlexer=bind like identity, but bind at "&+" - ``` @@ -2946,8 +2918,6 @@ Both Flite and ATK are freely available through the links above, but they are not distributed together with GF. - - ===Multilingual syntax editor=== The @@ -2956,7 +2926,7 @@ describes the use of the editor, which works for any multilingual GF grammar. Here is a snapshot of the editor: -[../quick-editor.gif] +[../quick-editor.png] The grammars of the snapshot are from the [Letter grammar package http://www.cs.chalmers.se/~aarne/GF/examples/letter]. @@ -2999,7 +2969,7 @@ A summary is given in the following chart of GF grammar compiler phases: [../gf-compiler.png] -==Case studies== +==Larger case studies TODO== ===Interfacing formal and natural languages=== @@ -3011,3 +2981,8 @@ English and German. A simpler example will be explained here. + +===A multimodal dialogue system=== + +See TALK project deliverables, [TALK homepage http://www.talk-project.org] + |
