summaryrefslogtreecommitdiff
path: root/doc/tutorial/gf-tutorial2.txt
diff options
context:
space:
mode:
authoraarne <aarne@cs.chalmers.se>2007-05-30 19:29:24 +0000
committeraarne <aarne@cs.chalmers.se>2007-05-30 19:29:24 +0000
commit93b4888b7868205f09ee0002290675d86ed335d5 (patch)
treef638767b34d0a3e819e1ff41d5bd168d1ea469fe /doc/tutorial/gf-tutorial2.txt
parentebfbf323106e7a7f3afa62585b1c1bd3c795c352 (diff)
further work on tutorial
Diffstat (limited to 'doc/tutorial/gf-tutorial2.txt')
-rw-r--r--doc/tutorial/gf-tutorial2.txt809
1 files changed, 392 insertions, 417 deletions
diff --git a/doc/tutorial/gf-tutorial2.txt b/doc/tutorial/gf-tutorial2.txt
index 4b20f38cd..9c3ae71b2 100644
--- a/doc/tutorial/gf-tutorial2.txt
+++ b/doc/tutorial/gf-tutorial2.txt
@@ -1,5 +1,5 @@
Grammatical Framework Tutorial
-Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Author: Aarne Ranta aarne (at) cs.chalmers.se
Last update: %%date(%c)
% NOTE: this is a txt2tags file.
@@ -20,7 +20,7 @@ Last update: %%date(%c)
-[../gf-logo.gif]
+[../gf-logo.png]
@@ -552,7 +552,7 @@ module forms are
%--!
-===Record types, records, and ``Str``s===
+===Records and strings===
The linearization type of a category is a **record type**, with
zero of more **fields** of different types. The simplest record
@@ -922,7 +922,7 @@ The graph uses
%--!
-==System commands==
+===System commands===
To document your grammar, you may want to print the
graph into a file, e.g. a ``.png`` file that
@@ -1385,7 +1385,8 @@ Why does the command also show the operations that form
the same as the value of ``Noun``.
-==Using morphology in concrete syntax==
+
+==Using parameters in concrete syntax==
We can now enrich the concrete syntax definitions to
comprise morphology. This will involve a more radical
@@ -1596,8 +1597,246 @@ of a grammar application.
%--!
+===Free variation===
+
+Sometimes there are many alternative ways to define a concrete syntax.
+For instance, the verb negation in English can be expressed both by
+//does not// and //doesn't//. In linguistic terms, these expressions
+are in **free variation**. The ``variants`` construct of GF can
+be used to give a list of strings in free variation. For example,
+```
+ NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ;
+```
+An empty variant list
+```
+ variants {}
+```
+can be used e.g. if a word lacks a certain form.
+
+In general, ``variants`` should be used cautiously. It is not
+recommended for modules aimed to be libraries, because the
+user of the library has no way to choose among the variants.
+
+
+===Overloading of operations===
+
+Large libraries, such as the GF Resource Grammar Library, may define
+hundreds of names, which can be unpractical
+for both the library writer and the user. The writer has to invent longer
+and longer names which are not always intuitive,
+and the user has to learn or at least be able to find all these names.
+A solution to this problem, adopted by languages such as C++, is **overloading**:
+the same name can be used for several functions. When such a name is used, the
+compiler performs **overload resolution** to find out which of the possible functions
+is meant. The resolution is based on the types of the functions: all functions that
+have the same name must have different types.
+
+In C++, functions with the same name can be scattered everywhere in the program.
+In GF, they must be grouped together in ``overload`` groups. Here is an example
+of an overload group, defining four ways to define nouns in Italian:
+```
+ oper mkN = overload {
+ mkN : Str -> N = -- regular nouns
+ mkN : Str -> Gender -> N = -- regular nouns with unexpected gender
+ mkN : Str -> Str -> N = -- irregular nouns
+ mkN : Str -> Str -> Gender -> N = -- irregular nouns with unexpected gender
+ }
+```
+All of the following uses of ``mkN`` are easy to resolve:
+```
+ lin Pizza = mkN "pizza" ; -- Str -> N
+ lin Hand = mkN "mano" Fem ; -- Str -> Gender -> N
+ lin Man = mkN "uomo" "uomini" ; -- Str -> Str -> N
+```
+
+
+
+
+
+
+
+%--!
+==Using the resource grammar library TODO==
+
+A resource grammar is a grammar built on linguistic grounds,
+to describe a language rather than a domain.
+The GF resource grammar library, which contains resource grammars for
+10 languages, is described more closely in the following
+documents:
+- [Resource library API documentation ../../lib/resource-1.0/doc/]:
+ for application grammarians using the resource.
+- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]:
+ for resource grammarians developing the resource.
+
+
+===Interfaces, instances, and functors===
+
+===The simplest way===
+
+The simplest way is to ``open`` a top-level ``Lang`` module
+and a ``Paradigms`` module:
+```
+ abstract Foo = ...
+
+ concrete FooEng = open LangEng, ParadigmsEng in ...
+ concrete FooSwe = open LangSwe, ParadigmsSwe in ...
+```
+Here is an example.
+```
+abstract Arithm = {
+ cat
+ Prop ;
+ Nat ;
+ fun
+ Zero : Nat ;
+ Succ : Nat -> Nat ;
+ Even : Nat -> Prop ;
+ And : Prop -> Prop -> Prop ;
+}
+
+--# -path=.:alltenses:prelude
+
+concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
+ lincat
+ Prop = S ;
+ Nat = NP ;
+ lin
+ Zero =
+ UsePN (regPN "zero" nonhuman) ;
+ Succ n =
+ DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
+ Even n =
+ UseCl TPres ASimul PPos
+ (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
+ And x y =
+ ConjS and_Conj (BaseS x y) ;
+
+}
+
+--# -path=.:alltenses:prelude
+
+concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
+ lincat
+ Prop = S ;
+ Nat = NP ;
+ lin
+ Zero =
+ UsePN (regPN "noll" neutrum) ;
+ Succ n =
+ DetCN (DetSg (SgQuant DefArt) NoOrd)
+ (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare")
+ (mkPreposition "till")) n) ;
+ Even n =
+ UseCl TPres ASimul PPos
+ (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
+ And x y =
+ ConjS and_Conj (BaseS x y) ;
+}
+```
+
+
+===How to find resource functions===
+
+The definitions in this example were found by parsing:
+```
+ > i LangEng.gf
+
+ -- for Successor:
+ > p -cat=NP -mcfg -parser=topdown "the mother of Paris"
+
+ -- for Even:
+ > p -cat=S -mcfg -parser=topdown "Paris is old"
+
+ -- for And:
+ > p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
+```
+The use of parsing can be systematized by **example-based grammar writing**,
+to which we will return later.
+
+
+===A functor implementation===
+
+The interesting thing now is that the
+code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for
+some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
+"jämn" vs. "even"). How can we exploit the similarities and
+actually share code between the languages?
+
+The solution is to use a functor: an ``incomplete`` module that opens
+an ``abstract`` as an ``interface``, and then instantiate it to different
+languages that implement the interface. The structure is as follows:
+```
+ abstract Foo ...
+
+ incomplete concrete FooI = open Lang, Lex in ...
+
+ concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
+ concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
+```
+where ``Lex`` is an abstract lexicon that includes the vocabulary
+specific to this application:
+```
+ abstract Lex = Cat ** ...
+
+ concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
+ concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...
+```
+Here, again, a complete example (``abstract Arithm`` is as above):
+```
+incomplete concrete ArithmI of Arithm = open Lang, Lex in {
+ lincat
+ Prop = S ;
+ Nat = NP ;
+ lin
+ Zero =
+ UsePN zero_PN ;
+ Succ n =
+ DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
+ Even n =
+ UseCl TPres ASimul PPos
+ (PredVP n (UseComp (CompAP (PositA even_A)))) ;
+ And x y =
+ ConjS and_Conj (BaseS x y) ;
+}
+
+--# -path=.:alltenses:prelude
+concrete ArithmEng of Arithm = ArithmI with
+ (Lang = LangEng),
+ (Lex = LexEng) ;
+
+--# -path=.:alltenses:prelude
+concrete ArithmSwe of Arithm = ArithmI with
+ (Lang = LangSwe),
+ (Lex = LexSwe) ;
+
+abstract Lex = Cat ** {
+ fun
+ zero_PN : PN ;
+ successor_N2 : N2 ;
+ even_A : A ;
+}
+
+concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
+ lin
+ zero_PN = regPN "noll" neutrum ;
+ successor_N2 =
+ mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
+ even_A = regA "jämn" ;
+}
+```
+
+===Restricted inheritance and qualified opening===
+
+
+
+
+%--!
==More constructs for concrete syntax==
+In this chapter, we go through constructs that are not necessary in simple grammars
+or when the concrete syntax relies on libraries, but very useful when writing advanced
+concrete syntax implementations, such as resource grammar libraries.
+
%--!
===Local definitions===
@@ -1621,48 +1860,6 @@ the same expression. Here is an example, from
```
-
-
-%--!
-===Free variation===
-
-Sometimes there are many alternative ways to define a concrete syntax.
-For instance, the verb negation in English can be expressed both by
-//does not// and //doesn't//. In linguistic terms, these expressions
-are in **free variation**. The ``variants`` construct of GF can
-be used to give a list of strings in free variation. For example,
-```
- NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ;
-```
-An empty variant list
-```
- variants {}
-```
-can be used e.g. if a word lacks a certain form.
-
-In general, ``variants`` should be used cautiously. It is not
-recommended for modules aimed to be libraries, because the
-user of the library has no way to choose among the variants.
-
-%Moreover, ``variants`` is only defined for basic types (``Str``
-%and parameter types). The grammar compiler will admit
-%``variants`` for any types, but it will push it to the
-%level of basic types in a way that may be unwanted.
-%For instance, German has two words meaning "car",
-%//Wagen//, which is Masculine, and //Auto//, which is Neuter.
-%However, if one writes
-%```
-% variants {{s = "Wagen" ; g = Masc} ; {s = "Auto" ; g = Neutr}}
-%```
-%this will compute to
-%```
-% {s = variants {"Wagen" ; "Auto"} ; g = variants {Masc ; Neutr}}
-%```
-%which will also accept erroneous combinations of strings and genders.
-
-
-
-
===Record extension and subtyping===
Record types and records can be **extended** with new fields. For instance,
@@ -1707,7 +1904,7 @@ A typical example is a record of agreement features, e.g. French
oper Agr : PType = {g : Gender ; n : Number ; p : Person} ;
```
Notice the term ``PType`` rather than just ``Type`` referring to
-parameter types. Every ``PType`` is also a ``Type``.
+parameter types. Every ``PType`` is also a ``Type``, but not vice-versa.
Pattern matching is done in the expected way, but it can moreover
utilize partial records: the branch
@@ -1728,6 +1925,7 @@ possible to write, slightly surprisingly,
}
```
+
%--!
===Regular expression patterns===
@@ -1742,8 +1940,22 @@ as in morphology, it is handy to use regular expression patterns:
The last three apply to all types of patterns, the first two only to token strings.
-Example: plural formation in Swedish 2nd declension
-(//pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar//):
+As an example, we give a rule for the formation of English word forms
+ending with an //s// and used in the formation of both plural nouns and
+third-person present-tense verbs.
+```
+ add_s : Str -> Str = \w -> case w of {
+ _ + "oo" => s + "s" ; -- bamboo
+ _ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ; -- bus, hero
+ _ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; -- boy
+ x + "y" => x + "ies" ; -- fly
+ _ => w + "s" -- car
+ } ;
+```
+Here is another example, the plural formation in Swedish 2nd declension.
+The second branch uses a variable binding with ``@`` to cover the cases where an
+unstressed pre-final vowel //e// disappears in the plural
+(//nyckel-nycklar, seger-segrar, bil-bilar//):
```
plural2 : Str -> Str = \w -> case w of {
pojk + "e" => pojk + "ar" ;
@@ -1751,15 +1963,8 @@ Example: plural formation in Swedish 2nd declension
bil => bil + "ar"
} ;
```
-Another example: English noun plural formation.
-```
- plural : Str -> Str = \w -> case w of {
- _ + ("s" | "z" | "x" | "sh") => w + "es" ;
- _ + ("a" | "o" | "u" | "e") + "y" => w + "s" ;
- x + "y" => x + "ies" ;
- _ => w + "s"
- } ;
-```
+
+
Semantics: variables are always bound to the **first match**, which is the first
in the sequence of binding lists ``Match p v`` defined as follows. In the definition,
``p`` is a pattern and ``v`` is a value.
@@ -1834,8 +2039,15 @@ FIXME: The linearization type is ``{s : Str}`` for all these categories.
+
==More concepts of abstract syntax==
+This section is about the use of the type theory part of GF for
+including more semantics in grammars. Some of the subsections present
+ideas that have not yet been used in real-world applications, and whose
+tool support outside the GF program is not complete.
+
+
===GF as a logical framework===
In this section, we will show how
@@ -1852,8 +2064,8 @@ of such a theory, represented as an ``abstract`` module in GF.
```
abstract Arithm = {
cat
- Prop ; -- proposition
- Nat ; -- natural number
+ Prop ; -- proposition
+ Nat ; -- natural number
fun
Zero : Nat ; -- 0
Succ : Nat -> Nat ; -- successor of x
@@ -1899,11 +2111,9 @@ abstract Address = {
OxfordSt, ShaftesburyAve, BdRaspail, RueBlondel, AvAlsaceLorraine : Street ;
}
```
-The linearization rules
-are straightforward,
+The linearization rules are straightforward,
```
lin
-
mkAddress country city street =
ss (street.s ++ "," ++ city.s ++ "," ++ country.s) ;
UK = ss ("U.K.") ;
@@ -1917,13 +2127,11 @@ are straightforward,
RueBlondel = ss ("rue" ++ "Blondel") ;
AvAlsaceLorraine = ss ("avenue" ++ "Alsace-Lorraine") ;
```
-with the exception of ``mkAddress``, where we have
+Notice that, in ``mkAddress``, we have
reversed the order of the constituents. The type of ``mkAddress``
in the abstract syntax takes its arguments in a "logical" order,
-with increasing precision. (This order is sometimes even used in the concrete
-syntax of addresses, e.g. in Russia).
-
-
+with increasing precision. (This order is sometimes even used in the
+concrete syntax of addresses, e.g. in Russia).
Both existing and non-existing addresses are recognized by this
grammar. The non-existing ones in the following randomly generated
@@ -1943,10 +2151,11 @@ Dependent types provide a way to guarantee that addresses are
well-formed. What we do is to include **contexts** in
``cat`` judgements:
```
- cat Address ;
- cat Country ;
- cat City Country ;
- cat Street (x : Country)(y : City x) ;
+ cat
+ Address ;
+ Country ;
+ City Country ;
+ Street (x : Country)(City x) ;
```
The first two judgements are as before, but the third one makes
``City`` dependent on ``Country``: there are no longer just cities,
@@ -1966,19 +2175,18 @@ which is only possible if the subsequent context does not depend on ``x``.
The ``fun`` judgements of the grammar are modified accordingly:
```
fun
+ mkAddress : (x : Country) -> (y : City x) -> Street x y -> Address ;
- mkAddress : (x : Country) -> (y : City x) -> Street x y -> Address ;
-
- UK : Country ;
- France : Country ;
- Paris : City France ;
- London : City UK ;
- Grenoble : City France ;
- OxfordSt : Street UK London ;
- ShaftesburyAve : Street UK London ;
- BdRaspail : Street France Paris ;
- RueBlondel : Street France Paris ;
- AvAlsaceLorraine : Street France Grenoble ;
+ UK : Country ;
+ France : Country ;
+ Paris : City France ;
+ London : City UK ;
+ Grenoble : City France ;
+ OxfordSt : Street UK London ;
+ ShaftesburyAve : Street UK London ;
+ BdRaspail : Street France Paris ;
+ RueBlondel : Street France Paris ;
+ AvAlsaceLorraine : Street France Grenoble ;
```
Since the type of ``mkAddress`` now has dependencies among
its argument types, we have to use variables just like we used in
@@ -2012,6 +2220,12 @@ sometimes shortens the code, since we can write e.g.
```
oper triple : (x,y,z : Str) -> Str = ...
```
+If a bound variable is not used, it can here, as elswhere in GF, be replaced by
+a wildcard:
+```
+ oper triple : (_,_,_ : Str) -> Str = ...
+```
+
===Dependent types in concrete syntax===
@@ -2074,8 +2288,8 @@ For instance, the sentence
is syntactically well-formed but semantically ill-formed.
It is well-formed because it combines a well-formed
noun phrase ("the number 2") with a well-formed
-verb phrase ("is equilateral") in accordance with the
-rule that the verb phrase is inflected in the
+verb phrase ("is equilateral") and satisfies the agreement
+rule saying that the verb phrase is inflected in the
number of the noun phrase:
```
fun PredVP : NP -> VP -> S ;
@@ -2117,6 +2331,7 @@ but no proposition linearized to
the number 2 is equilateral
```
since ``Equilateral two`` is not a well-formed type-theoretical object.
+It is not even accepted by the context-free parser.
When formalizing mathematics, e.g. in the purpose of
computer-assisted theorem proving, we are certainly interested
@@ -2147,70 +2362,32 @@ and dependencies of other categories on this:
```
cat
S ; -- sentence
- V1 Dom ; -- one-place verb
- V2 Dom Dom ; -- two-place verb
+ V1 Dom ; -- one-place verb with specific subject type
+ V2 Dom Dom ; -- two-place verb with specific subject and object types
A1 Dom ; -- one-place adjective
A2 Dom Dom ; -- two-place adjective
- PN Dom ; -- proper name
- NP Dom ; -- noun phrase
+ NP Dom ; -- noun phrase for an object of specific type
Conj ; -- conjunction
Det ; -- determiner
```
-The number of ``Dom`` arguments depends on the semantic type
-corresponding to the category: one-place verbs and adjectives
-correspond to types of the form
-```
- A -> Prop
-```
-whereas two-place verbs and adjectives correspond to types of the form
-```
- A -> B -> Prop
-```
-where the domains ``A`` and ``B`` can be distinct.
-Proper names correspond to types of the form
-```
- A
-```
-that is, individual objects of the domain ``A``. Noun phrases
-correspond to
-```
- (A -> Prop) -> Prop
-```
-that is, **quantifiers** over the domain ``A``.
-Sentences, conjunctions, and determiners correspond to
-```
- Prop
- Prop -> Prop -> Prop
- (A : Dom) -> (A -> Prop) -> Prop
-```
-respectively,
-and are thus independent of domain. As for common nouns ``CN``,
-the simplest semantics is that they correspond to
-```
- Dom
-```
-In this section, we will, in fact, write ``Dom`` instead of ``CN``.
-
Having thus parametrized categories on domains, we have to reformulate
the rules of predication, etc, accordingly. This is straightforward:
```
fun
PredV1 : (A : Dom) -> NP A -> V1 A -> S ;
ComplV2 : (A,B : Dom) -> V2 A B -> NP B -> V1 A ;
- UsePN : (A : Dom) -> PN A -> NP A ;
DetCN : Det -> (A : Dom) -> NP A ;
ModA1 : (A : Dom) -> A1 A -> Dom ;
ConjS : Conj -> S -> S -> S ;
ConjV1 : (A : Dom) -> Conj -> V1 A -> V1 A -> V1 A ;
```
In linearization rules,
-we typically use wildcards for the domain arguments,
-to get arities right:
+we use wildcards for the domain arguments,
+because they don't affect linearization:
```
lin
PredV1 _ np vp = ss (np.s ++ vp.s) ;
ComplV2 _ _ v2 np = ss (v2.s ++ np.s) ;
- UsePN _ pn = pn ;
DetCN det cn = ss (det.s ++ cn.s) ;
ModA1 cn a1 = ss (a1.s ++ cn.s) ;
ConjS conj s1 s2 = ss (s1.s ++ conj.s ++ s2.s) ;
@@ -2232,20 +2409,19 @@ To explain the contrast, we introduce the functions
human : Dom ;
game : Dom ;
play : V2 human game ;
- John : PN human ;
- Golf : PN game ;
+ John : NP human ;
+ Golf : NP game ;
```
Both sentences still pass the context-free parser,
returning trees with lots of metavariables of type ``Dom``:
```
- PredV1 ?0 (UsePN ?1 John) (ComplV2 ?2 ?3 play (UsePN ?4 Golf))
-
- PredV1 ?0 (UsePN ?1 Golf) (ComplV2 ?2 ?3 play (UsePN ?4 John))
+ PredV1 ?0 John (ComplV2 ?1 ?2 play Golf)
+ PredV1 ?0 Golf (ComplV2 ?1 ?2 play John)
```
But only the former sentence passes the type checker, which moreover
infers the domain arguments:
```
- PredV1 human (UsePN human John) (ComplV2 human game play (UsePN game Golf))
+ PredV1 human John (ComplV2 human game play Golf)
```
To try this out in GF, use ``pt = put_term`` with the **tree transformation**
that solves the metavariables by type checking:
@@ -2261,7 +2437,7 @@ or less liberal. For instance,
John loves Mary
John loves golf
```
-both make sense, even though ``Mary`` and ``golf``
+should both make sense, even though ``Mary`` and ``golf``
are of different types. A natural solution in this case is to
formalize ``love`` as a **polymorphic** verb, which takes
a human as its first argument but an object of any type as its second
@@ -2270,19 +2446,21 @@ argument:
fun love : (A : Dom) -> V2 human A ;
lin love _ = ss "loves" ;
```
-Problems remain, such as **subtyping** (e.g. what
-is meaningful for a ``human`` is also meaningful for
-a ``man`` and a ``woman``, but not the other way round)
-and the **extended use** of expressions (e.g. a metaphoric use that
-makes sense of "golf plays John").
+In other words, it is possible for a human to love anything.
+A problem related to polymorphism is **subtyping**: what
+is meaningful for a ``human`` is also meaningful for
+a ``man`` and a ``woman``, but not the other way round.
+One solution to this is **coercions**: functions that
+"lift" objects from subtypes to supertypes.
+===Case study: selectional restrictions and statistical language models TODO===
===Proof objects===
-Perhaps the most well-known feature of constructive type theory is
+Perhaps the most well-known idea in constructive type theory is
the **Curry-Howard isomorphism**, also known as the
**propositions as types principle**. Its earliest formulations
were attempts to give semantics to the logical systems of
@@ -2300,14 +2478,14 @@ natural numbers:
The **successor function** ``Succ`` generates an infinite
sequence of natural numbers, beginning from ``Zero``.
-We then define what it means for a number //x// to be less than
+We then define what it means for a number //x// to be //less than//
a number //y//. Our definition is based on two axioms:
-- ``Zero`` is less than ``Succ y`` for any ``y``.
-- If ``x`` is less than ``y``, then``Succ x`` is less than ``Succ y``.
+- ``Zero`` is less than ``Succ`` //y// for any //y//.
+- If //x// is less than //y//, then``Succ`` //x// is less than ``Succ`` //y//.
The most straightforward way of expressing these axioms in type theory
-is as typing judgements that introduce objects of a type ``Less x y``:
+is as typing judgements that introduce objects of a type ``Less`` //x y //:
```
cat Less Nat Nat ;
fun lessZ : (y : Nat) -> Less Zero (Succ y) ;
@@ -2326,14 +2504,13 @@ whose type is
```
Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero))))
```
-which is the same thing as the proposition that 2 is less than 4.
+which is the formalization of the proposition that 2 is less than 4.
GF grammars can be used to provide a **semantic control** of
well-formedness of expressions. We have already seen examples of this:
the grammar of well-formed addresses and the grammar with
selectional restrictions above. By introducing proof objects
-we have now added a very powerful
-technique of expressing semantic conditions.
+we have now added a very powerful technique of expressing semantic conditions.
A simple example of the use of proof objects is the definition of
well-formed //time spans//: a time span is expected to be from an earlier to
@@ -2351,7 +2528,35 @@ by using the ``Less`` predicate:
cat Span ;
fun span : (m,n : Nat) -> Less m n -> Span ;
```
+A possible practical application of this idea is **proof-carrying documents**:
+to be semantically well-formed, the abstract syntax of a document must contain a proof
+of some property, although the proof is not shown in the concrete document.
+Think, for instance, of small documents describing flight connections:
+//To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.//
+
+The well-formedness of this text is partly expressible by dependent typing:
+```
+ cat
+ City ;
+ Flight City City ;
+ fun
+ Gothenburg, Frankfurt, Prague : City ;
+ LH3043 : Flight Gothenburg Frankfurt ;
+ OK0537 : Flight Frankfurt Prague ;
+```
+This rules out texts saying //take OK0537 from Gothenburg to Prague//. However, there is a
+further condition saying that it must be possible to change from LH3043 to OK0537 in Frankfurt.
+This can be modelled as a proof object of a suitable type, which is required by the constructor
+that connects flights.
+```
+ cat
+ IsPossible (x,y,z : City)(Flight x y)(Flight y z) ;
+ fun
+ Connect : (x,y,z : City) ->
+ (u : Flight x y) -> (v : Flight y z) ->
+ IsPossible x y z u v -> Flight x z ;
+```
@@ -2364,8 +2569,8 @@ a universally quantifier proposition
(All x)B(x)
```
consists of the **binding** ``(All x)`` of the variable ``x``,
-and the **body** ``B(x)``, where the variable ``x`` is
-said to occur bound.
+and the **body** ``B(x)``, where the variable ``x`` can have
+**bound occurrences**.
Variable bindings appear in informal mathematical language as well, for
instance,
@@ -2433,7 +2638,6 @@ since the linearization type of ``Prop`` is
```
{s : Str}
```
-(we remind that the order of fields in a record does not matter).
In other words, the linearization of a function
consists of a linearization of the body together with a
field for a linearization of the bound variable.
@@ -2442,13 +2646,13 @@ should notice that GF requires trees to be in
**eta-expanded** form in order to be linearizable:
any function of type
```
- A -> C
+ A -> B
```
always has a syntax tree of the form
```
- \x -> c
+ \x -> b
```
-where ``c : C`` under the assumption ``x : A``.
+where ``b : B`` under the assumption ``x : A``.
It is in this form that an expression can be analysed
as having a bound variable and a body.
@@ -2477,8 +2681,7 @@ linearized into the same strings that represent them in
the print-out of the abstract syntax.
-To be able to
-//parse// variable symbols, however, GF needs to know what
+To be able to //parse// variable symbols, however, GF needs to know what
to look for (instead of e.g. trying to parse //any//
string as a variable). What strings are parsed as variable symbols
is defined in the lexical analysis part of GF parsing
@@ -2486,9 +2689,8 @@ is defined in the lexical analysis part of GF parsing
> p -cat=Prop -lexer=codevars "(All x)(x = x)"
All (\x -> Eq x x)
```
-(see more details on lexers below).
-If several variables are bound in the same argument, the
-labels are ``$0, $1, $2``, etc.
+(see more details on lexers below). If several variables are bound in the
+same argument, the labels are ``$0, $1, $2``, etc.
@@ -2506,12 +2708,11 @@ us very much. As we will see, however, computation does play a role
even in the well-formedness of expressions when dependent types are
present.
-
GF has a form of judgement for **semantic definitions**,
recognized by the key word ``def``. At its simplest, it is just
the definition of one constant, e.g.
```
- def one = succ zero ;
+ def one = Succ Zero ;
```
We can also define a function with arguments,
```
@@ -2520,24 +2721,25 @@ We can also define a function with arguments,
which is still a special case of the most general notion of
definition, that of a group of **pattern equations**:
```
- def sum x zero = x ;
- def sum x (succ y) = succ (sum x y) ;
+ def
+ sum x Zero = x ;
+ sum x (Succ y) = Succ (Sum x y) ;
```
To compute a term is, as in functional programming languages,
simply to follow a chain of reductions until no definition
can be applied. For instance, we compute
```
- sum one one -->
- sum (succ zero) (succ zero) -->
- succ (sum (succ zero) zero) -->
- succ (succ zero)
+ Sum one one -->
+ Sum (Succ Zero) (Succ Zero) -->
+ Succ (sum (Succ Zero) Zero) -->
+ Succ (Succ Zero)
```
Computation in GF is performed with the ``pt`` command and the
``compute`` transformation, e.g.
```
> p -tr "1 + 1" | pt -transform=compute -tr | l
sum one one
- succ (succ zero)
+ Succ (Succ Zero)
s(s(0))
```
@@ -2548,9 +2750,9 @@ Thus, trivially, all trees in a chain of computation
(such as the one above)
are definitionally equal to each other. So are the trees
```
- sum zero (succ one)
- succ one
- sum (sum zero zero) (sum (succ zero) one)
+ sum Zero (Succ one)
+ Succ one
+ sum (sum Zero Zero) (sum (Succ Zero) one)
```
and infinitely many other trees.
@@ -2558,8 +2760,8 @@ A fact that has to be emphasized about ``def`` definitions is that
they are //not// performed as a first step of linearization.
We say that **linearization is intensional**, which means that
the definitional equality of two trees does not imply that
-they have the same linearizations. For instance, the seven terms
-above all have different linearizations in arithmetic notation:
+they have the same linearizations. For instance, each of the seven terms
+shown above has a different linearizations in arithmetic notation:
```
1 + 1
s(0) + s(0)
@@ -2587,7 +2789,7 @@ Two types depending on terms that are definitionally equal are
equal types. For instance,
```
Proof (Odd one)
- Proof (Odd (succ zero))
+ Proof (Odd (Succ Zero))
```
are equal types. Hence, any tree that type checks as a proof that
1 is odd also type checks as a proof that the successor of 0 is odd.
@@ -2609,13 +2811,12 @@ unidiomatic or ambiguous. In such a case,
the translation algorithm may be made to consider
translation by a paraphrase.
-
To stress express the distinction between
**constructors** (=**canonical** functions)
and other functions, GF has a judgement form
``data`` to tell that certain functions are canonical, e.g.
```
- data Nat = succ | zero ;
+ data Nat = Succ | Zero ;
```
Unlike in Haskell, but similarly to ALF (where constructor functions
are marked with a flag ``C``),
@@ -2624,247 +2825,19 @@ a type with new ``data`` judgements. The type signatures of constructors
are given separately, in ordinary ``fun`` judgements.
One can also write directly
```
- data succ : Nat -> Nat ;
+ data Succ : Nat -> Nat ;
```
which is equivalent to the two judgements
```
- fun succ : Nat -> Nat ;
- data Nat = succ ;
-```
-
-
-
-
-%--!
-==More features of the module system==
-
-===Interfaces, instances, and functors===
-
-
-===Resource grammars and their reuse===
-
-A resource grammar is a grammar built on linguistic grounds,
-to describe a language rather than a domain.
-The GF resource grammar library, which contains resource grammars for
-10 languages, is described more closely in the following
-documents:
-- [Resource library API documentation ../../lib/resource-1.0/doc/]:
- for application grammarians using the resource.
-- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]:
- for resource grammarians developing the resource.
-
-
-However, to give a flavour of both using and writing resource grammars,
-we have created a miniature resource, which resides in the
-subdirectory [``resource`` resource]. Its API consists of the following
-three modules:
-
-[Syntax resource/Syntax.gf] - syntactic structures, language-independent:
-```
-
-```
-[LexEng resource/LexEng.gf] - lexical paradigms, English:
-```
-
-```
-[LexIta resource/LexIta.gf] - lexical paradigms, Italian:
-```
-
-```
-
-
-Only these three modules should be ``open``ed in applications.
-The implementations of the resource are given in the following four modules:
-
-[MorphoEng resource/MorphoEng.gf],
-```
-
-```
-[MorphoIta resource/MorphoIta.gf]: low-level morphology
-- [SyntaxEng resource/SyntaxEng.gf].
- [SyntaxIta resource/SyntaxIta.gf]: definitions of syntactic structures
-
-
-An example use of the resource resides in the
-subdirectory [``applications`` applications].
-It implements the abstract syntax
-[``FoodComments`` applications/FoodComments.gf] for English and Italian.
-The following diagram shows the module structure, indicating by
-colours which modules are written by the grammarian. The two blue modules
-form the abstract syntax. The three red modules form the concrete syntax.
-The two green modules are trivial instantiations of a functor.
-The rest of the modules (black) come from the resource.
-
-[Multi.png]
-
-
-
-===Restricted inheritance and qualified opening===
-
-
-==Using the standard resource library==
-
-The example files of this chapter can be found in
-the directory [``arithm`` ./arithm].
-
-
-===The simplest way===
-
-The simplest way is to ``open`` a top-level ``Lang`` module
-and a ``Paradigms`` module:
-```
- abstract Foo = ...
-
- concrete FooEng = open LangEng, ParadigmsEng in ...
- concrete FooSwe = open LangSwe, ParadigmsSwe in ...
-```
-Here is an example.
-```
-abstract Arithm = {
- cat
- Prop ;
- Nat ;
- fun
- Zero : Nat ;
- Succ : Nat -> Nat ;
- Even : Nat -> Prop ;
- And : Prop -> Prop -> Prop ;
-}
-
---# -path=.:alltenses:prelude
-
-concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
- lincat
- Prop = S ;
- Nat = NP ;
- lin
- Zero =
- UsePN (regPN "zero" nonhuman) ;
- Succ n =
- DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
- Even n =
- UseCl TPres ASimul PPos
- (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
- And x y =
- ConjS and_Conj (BaseS x y) ;
-
-}
-
---# -path=.:alltenses:prelude
-
-concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
- lincat
- Prop = S ;
- Nat = NP ;
- lin
- Zero =
- UsePN (regPN "noll" neutrum) ;
- Succ n =
- DetCN (DetSg (SgQuant DefArt) NoOrd)
- (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare")
- (mkPreposition "till")) n) ;
- Even n =
- UseCl TPres ASimul PPos
- (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
- And x y =
- ConjS and_Conj (BaseS x y) ;
-}
-```
-
-
-===How to find resource functions===
-
-The definitions in this example were found by parsing:
-```
- > i LangEng.gf
-
- -- for Successor:
- > p -cat=NP -mcfg -parser=topdown "the mother of Paris"
-
- -- for Even:
- > p -cat=S -mcfg -parser=topdown "Paris is old"
-
- -- for And:
- > p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
-```
-The use of parsing can be systematized by **example-based grammar writing**,
-to which we will return later.
-
-
-===A functor implementation===
-
-The interesting thing now is that the
-code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for
-some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
-"jämn" vs. "even"). How can we exploit the similarities and
-actually share code between the languages?
-
-The solution is to use a functor: an ``incomplete`` module that opens
-an ``abstract`` as an ``interface``, and then instantiate it to different
-languages that implement the interface. The structure is as follows:
-```
- abstract Foo ...
-
- incomplete concrete FooI = open Lang, Lex in ...
-
- concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
- concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
-```
-where ``Lex`` is an abstract lexicon that includes the vocabulary
-specific to this application:
+ fun Succ : Nat -> Nat ;
+ data Nat = Succ ;
```
- abstract Lex = Cat ** ...
- concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
- concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...
-```
-Here, again, a complete example (``abstract Arithm`` is as above):
-```
-incomplete concrete ArithmI of Arithm = open Lang, Lex in {
- lincat
- Prop = S ;
- Nat = NP ;
- lin
- Zero =
- UsePN zero_PN ;
- Succ n =
- DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
- Even n =
- UseCl TPres ASimul PPos
- (PredVP n (UseComp (CompAP (PositA even_A)))) ;
- And x y =
- ConjS and_Conj (BaseS x y) ;
-}
---# -path=.:alltenses:prelude
-concrete ArithmEng of Arithm = ArithmI with
- (Lang = LangEng),
- (Lex = LexEng) ;
-
---# -path=.:alltenses:prelude
-concrete ArithmSwe of Arithm = ArithmI with
- (Lang = LangSwe),
- (Lex = LexSwe) ;
-
-abstract Lex = Cat ** {
- fun
- zero_PN : PN ;
- successor_N2 : N2 ;
- even_A : A ;
-}
-
-concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
- lin
- zero_PN = regPN "noll" neutrum ;
- successor_N2 =
- mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
- even_A = regA "jämn" ;
-}
-```
+===Case study: representing anaphoric reference TODO===
-
-==Transfer modules==
+==Transfer modules TODO==
Transfer means noncompositional tree-transforming operations.
The command ``apply_transfer = at`` is typically used in a pipe:
@@ -2880,7 +2853,7 @@ See the
for more information.
-==Practical issues==
+==Practical issues TODO==
===Lexers and unlexers===
@@ -2913,7 +2886,6 @@ Given by ``help -lexer``, ``help -unlexer``:
-unlexer=codelit like code, but remove string literal quotes
-unlexer=concat remove all spaces
-unlexer=bind like identity, but bind at "&+"
-
```
@@ -2946,8 +2918,6 @@ Both Flite and ATK are freely available through the links
above, but they are not distributed together with GF.
-
-
===Multilingual syntax editor===
The
@@ -2956,7 +2926,7 @@ describes the use of the editor, which works for any multilingual GF grammar.
Here is a snapshot of the editor:
-[../quick-editor.gif]
+[../quick-editor.png]
The grammars of the snapshot are from the
[Letter grammar package http://www.cs.chalmers.se/~aarne/GF/examples/letter].
@@ -2999,7 +2969,7 @@ A summary is given in the following chart of GF grammar compiler phases:
[../gf-compiler.png]
-==Case studies==
+==Larger case studies TODO==
===Interfacing formal and natural languages===
@@ -3011,3 +2981,8 @@ English and German.
A simpler example will be explained here.
+
+===A multimodal dialogue system===
+
+See TALK project deliverables, [TALK homepage http://www.talk-project.org]
+