From 032531c6a690edbb377ff11ee2a743a30c5bf500 Mon Sep 17 00:00:00 2001 From: aarne Date: Fri, 27 Jun 2008 11:27:40 +0000 Subject: rm old tutorials --- doc/tutorial/gf-tutorial2_1.html | 3504 -------------------------------------- 1 file changed, 3504 deletions(-) delete mode 100644 doc/tutorial/gf-tutorial2_1.html (limited to 'doc/tutorial/gf-tutorial2_1.html') diff --git a/doc/tutorial/gf-tutorial2_1.html b/doc/tutorial/gf-tutorial2_1.html deleted file mode 100644 index 5576428b5..000000000 --- a/doc/tutorial/gf-tutorial2_1.html +++ /dev/null @@ -1,3504 +0,0 @@ - - - - - -Grammatical Framework Tutorial - -

Grammatical Framework Tutorial

- -Author: Aarne Ranta aarne (at) cs.chalmers.se
-Last update: Wed May 30 21:26:11 2007 -
- -

-
-

- - -

-
-

-

- -

- -

Introduction

- -

GF = Grammatical Framework

-

-The term GF is used for different things: -

- - -

-This tutorial is primarily about the GF program and -the GF programming language. -It will guide you -

- - - -

What are GF grammars used for

-

-A grammar is a definition of a language. -From this definition, different language processing components -can be derived: -

- - -

-A GF grammar can be seen as a declarative program from which these -processing tasks can be automatically derived. In addition, many -other tasks are readily available for GF grammars: -

- - -

-A typical GF application is based on a multilingual grammar involving -translation on a special domain. Existing applications of this idea include -

- - -

-The specialization of a grammar to a domain makes it possible to -obtain much better translations than in an unlimited machine translation -system. This is due to the well-defined semantics of such domains. -Grammars having this character are called application grammars. -They are different from most grammars written by linguists just -because they are multilingual and domain-specific. -

-

-However, there is another kind of grammars, which we call resource grammars. -These are large, comprehensive grammars that can be used on any domain. -The GF Resource Grammar Library has resource grammars for 10 languages. -These grammars can be used as libraries to define application grammars. -In this way, it is possible to write a high-quality grammar without -knowing about linguistics: in general, to write an application grammar -by using the resource library just requires practical knowledge of -the target language. and all theoretical knowledge about its grammar -is given by the libraries. -

- -

Who is this tutorial for

-

-This tutorial is mainly for programmers who want to learn to write -application grammars. It will go through GF's programming concepts -without entering too deep into linguistics. Thus it should -be accessible to anyone who has some previous programming experience. -

-

-A separate document is being written on how to write resource grammars. -This includes the ways in which linguistic problems posed by different -languages are solved in GF. -

- -

The coverage of the tutorial

-

-The tutorial gives a hands-on introduction to grammar writing. -We start by building a small grammar for the domain of food: -in this grammar, you can say things like -

-
-    this Italian cheese is delicious
-
-

-in English and Italian. -

-

-The first English grammar -food.cf -is written in a context-free -notation (also known as BNF). The BNF format is often a good -starting point for GF grammar development, because it is -simple and widely used. However, the BNF format is not -good for multilingual grammars. While it is possible to -"translate" by just changing the words contained in a -BNF grammar to words of some other -language, proper translation usually involves more. -For instance, the order of words may have to be changed: -

-
-    Italian cheese ===> formaggio italiano
-
-

-The full GF grammar format is designed to support such -changes, by separating between the abstract syntax -(the logical structure) and the concrete syntax (the -sequence of words) of expressions. -

-

-There is more than words and word order that makes languages -different. Words can have different forms, and which forms -they have vary from language to language. For instance, -Italian adjectives usually have four forms where English -has just one: -

-
-    delicious (wine, wines, pizza, pizzas)
-    vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
-
-

-The morphology of a language describes the -forms of its words. While the complete description of morphology -belongs to resource grammars, this tutorial will explain the -programming concepts involved in morphology. This will moreover -make it possible to grow the fragment covered by the food example. -The tutorial will in fact build a miniature resource grammar in order -to illustrate the module structure of library-based application -grammar writing. -

-

-Thus it is by elaborating the initial food.cf example that -the tutorial makes a guided tour through all concepts of GF. -While the constructs of the GF language are the main focus, -also the commands of the GF system are introduced as they -are needed. -

-

-To learn how to write GF grammars is not the only goal of -this tutorial. To learn the commands of the GF system means -that simple applications of grammars, such as translation and -quiz systems, can be built simply by writing scripts for the -system. More complicated applications, such as natural-language -interfaces and dialogue systems, also require programming in -some general-purpose language. We will briefly explain how -GF grammars are used as components of Haskell, Java, Javascript, -and Prolog grammars. The tutorial concludes with a couple of -case studies showing how such complete systems can be built. -

- -

Getting the GF program

-

-The GF program is open-source free software, which you can download via the -GF Homepage: -http://www.cs.chalmers.se/~aarne/GF -

-

-There you can download -

- - -

-If you want to compile GF from source, you need Haskell and Java -compilers. But normally you don't have to compile, and you definitely -don't need to know Haskell or Java to use GF. -

-

-To start the GF program, assuming you have installed it, just type -

-
-    % gf
-
-

-in the shell. You will see GF's welcome message and the prompt >. -The command -

-
-    > help
-
-

-will give you a list of available commands. -

-

-As a common convention in this Tutorial, we will use -

- - -

-Thus you should not type these prompts, but only the lines that -follow them. -

- -

The .cf grammar format

-

-Now you are ready to try out your first grammar. -We start with one that is not written in the GF language, but -in the much more common BNF notation (Backus Naur Form). The GF -program understands a variant of this notation and translates it -internally to GF's own representation. -

-

-To get started, type (or copy) the following lines into a file named -food.cf: -

-
-  Is.        S       ::= Item "is" Quality ;
-  That.      Item    ::= "that" Kind ;
-  This.      Item    ::= "this" Kind ;
-  QKind.     Kind    ::= Quality Kind ;
-  Cheese.    Kind    ::= "cheese" ;
-  Fish.      Kind    ::= "fish" ;
-  Wine.      Kind    ::= "wine" ;
-  Italian.   Quality ::= "Italian" ;
-  Boring.    Quality ::= "boring" ;
-  Delicious. Quality ::= "delicious" ;
-  Expensive. Quality ::= "expensive" ;
-  Fresh.     Quality ::= "fresh" ;
-  Very.      Quality ::= "very" Quality ;
-  Warm.      Quality ::= "warm" ;
-
-

-For those who know ordinary BNF, the -notation we use includes one extra element: a label appearing -as the first element of each rule and terminated by a full stop. -

-

-The grammar we wrote defines a set of phrases usable for speaking about food. -It builds sentences (S) by assigning Qualitys to -Items. Items are build from Kinds by prepending the -word "this" or "that". Kinds are either atomic, such as -"cheese" and "wine", or formed by prepending a Quality to a -Kind. A Quality is either atomic, such as "Italian" and "boring", -or built by another Quality by prepending "very". Those familiar with -the context-free grammar notation will notice that, for instance, the -following sentence can be built using this grammar: -

-
-    this delicious Italian wine is very very expensive
-
-

- -

Importing grammars and parsing strings

-

-The first GF command needed when using a grammar is to import it. -The command has a long name, import, and a short name, i. -You can type either -

-
-    > import food.cf
-
-

-or -

-
-    > i food.cf
-
-

-to get the same effect. -The effect is that the GF program compiles your grammar into an internal -representation, and shows a new prompt when it is ready. -

-

-You can now use GF for parsing: -

-
-    > parse "this cheese is delicious"
-    Is (This Cheese) Delicious
-  
-    > p "that wine is very very Italian"
-    Is (That Wine) (Very (Very Italian))
-
-

-The parse (= p) command takes a string -(in double quotes) and returns an abstract syntax tree - the thing -beginning with Is. Trees are built from the rule labels given in the -grammar, and record the ways in which the rules are used to produce the -strings. A tree is, in general, something easier than a string -for a machine to understand and to process further. -

-

-Strings that return a tree when parsed do so in virtue of the grammar -you imported. Try parsing something else, and you fail -

-
-    > p "hello world"
-    No success in cf parsing hello world
-    no tree found
-
-

- -

Generating trees and strings

-

-You can also use GF for linearizing -(linearize = l). This is the inverse of -parsing, taking trees into strings: -

-
-    > linearize Is (That Wine) Warm
-    that wine is warm
-
-

-What is the use of this? Typically not that you type in a tree at -the GF prompt. The utility of linearization comes from the fact that -you can obtain a tree from somewhere else. One way to do so is -random generation (generate_random = gr): -

-
-    > generate_random
-    Is (This (QKind Italian Fish)) Fresh
-
-

-Now you can copy the tree and paste it to the linearize command. -Or, more conveniently, feed random generation into linearization by using -a pipe. -

-
-    > gr | l
-    this Italian fish is fresh
-
-

- -

Visualizing trees

-

-The gibberish code with parentheses returned by the parser does not -look like trees. Why is it called so? From the abstract mathematical -point of view, trees are a data structure that -represents nesting: trees are branching entities, and the branches -are themselves trees. Parentheses give a linear representation of trees, -useful for the computer. But the human eye may prefer to see a visualization; -for this purpose, GF provides the command visualizre_tree = vt, to which -parsing (and any other tree-producing command) can be piped: -

-
-    parse "this delicious cheese is very Italian" | vt
-
-

-

- -

- -

Some random-generated sentences

-

-Random generation is a good way to test a grammar; it can also -be quite amusing. So you may want to -generate ten strings with one and the same command: -

-
-    > gr -number=10 | l
-    that wine is boring
-    that fresh cheese is fresh
-    that cheese is very boring
-    this cheese is Italian
-    that expensive cheese is expensive
-    that fish is fresh
-    that wine is very Italian
-    this wine is Italian
-    this cheese is boring
-    this fish is boring
-
-

- -

Systematic generation

-

-To generate all sentence that a grammar -can generate, use the command generate_trees = gt. -

-
-    > generate_trees | l
-    that cheese is very Italian
-    that cheese is very boring
-    that cheese is very delicious
-    that cheese is very expensive
-    that cheese is very fresh
-    ...
-    this wine is expensive
-    this wine is fresh
-    this wine is warm
-  
-
-

-You get quite a few trees but not all of them: only up to a given -depth of trees. To see how you can get more, use the -help = h command, -

-
-    help gt
-
-

-Quiz. If the command gt generated all -trees in your grammar, it would never terminate. Why? -

- -

More on pipes; tracing

-

-A pipe of GF commands can have any length, but the "output type" -(either string or tree) of one command must always match the "input type" -of the next command. -

-

-The intermediate results in a pipe can be observed by putting the -tracing flag -tr to each command whose output you -want to see: -

-
-    > gr -tr | l -tr | p
-  
-    Is (This Cheese) Boring
-    this cheese is boring
-    Is (This Cheese) Boring  
-
-

-This facility is good for test purposes: for instance, you -may want to see if a grammar is ambiguous, i.e. -contains strings that can be parsed in more than one way. -

- -

Writing and reading files

-

-To save the outputs of GF commands into a file, you can -pipe it to the write_file = wf command, -

-
-    > gr -number=10 | l | write_file exx.tmp
-
-

-You can read the file back to GF with the -read_file = rf command, -

-
-    > read_file exx.tmp | p -lines
-
-

-Notice the flag -lines given to the parsing -command. This flag tells GF to parse each line of -the file separately. Without the flag, the grammar could -not recognize the string in the file, because it is not -a sentence but a sequence of ten sentences. -

- -

The .gf grammar format

-

-To see GF's internal representation of a grammar -that you have imported, you can give the command -print_grammar = pg, -

-
-    > print_grammar
-
-

-The output is quite unreadable at this stage, and you may feel happy that -you did not need to write the grammar in that notation, but that the -GF grammar compiler produced it. -

-

-However, we will now start the demonstration -how GF's own notation gives you -much more expressive power than the .cf -format. We will introduce the .gf format by presenting -another way of defining the same grammar as in -food.cf. -Then we will show how the full GF grammar format enables you -to do things that are not possible in the context-free format. -

- -

Abstract and concrete syntax

-

-A GF grammar consists of two main parts: -

- - -

-The context-free format fuses these two things together, but it is always -possible to take them apart. For instance, the sentence formation rule -

-
-    Is. S ::= Item "is" Quality ;
-
-

-is interpreted as the following pair of GF rules: -

-
-    fun Is : Item -> Quality -> S ;
-    lin Is item quality = {s = item.s ++ "is" ++ quality.s} ;
-
-

-The former rule, with the keyword fun, belongs to the abstract syntax. -It defines the function -Is which constructs syntax trees of form -(Is item quality). -

-

-The latter rule, with the keyword lin, belongs to the concrete syntax. -It defines the linearization function for -syntax trees of form (Is item quality). -

- -

Judgement forms

-

-Rules in a GF grammar are called judgements, and the keywords -fun and lin are used for distinguishing between two -judgement forms. Here is a summary of the most important -judgement forms: -

- - - - - - - - - - - - - - - -
formreading
cat CC is a category
fun f : Af is a function of type A
- - - - - - - - - - - - - - - - -
formreading
lincat C = Tcategory C has linearization type T
lin f = tfunction f has linearization t
- -

-We return to the precise meanings of these judgement forms later. -First we will look at how judgements are grouped into modules, and -show how the food grammar is -expressed by using modules and judgements. -

- -

Module types

-

-A GF grammar consists of modules, -into which judgements are grouped. The most important -module forms are -

- - - -

Records and strings

-

-The linearization type of a category is a record type, with -zero of more fields of different types. The simplest record -type used for linearization in GF is -

-
-    {s : Str}
-
-

-which has one field, with label s and type Str. -

-

-Examples of records of this type are -

-
-    {s = "foo"}
-    {s = "hello" ++ "world"}
-
-

-

-Whenever a record r of type {s : Str} is given, -r.s is an object of type Str. This is -a special case of the projection rule, allowing the extraction -of fields from a record: -

- - -

-The type Str is really the type of token lists, but -most of the time one can conveniently think of it as the type of strings, -denoted by string literals in double quotes. -

-

-Notice that -

-
-   "hello world"
-
-

-is not recommended as an expression of type Str. It denotes -a token with a space in it, and will usually -not work with the lexical analysis that precedes parsing. A shorthand -exemplified by -

-
-   ["hello world and people"]  === "hello" ++ "world" ++ "and" ++ "people"
-
-

-can be used for lists of tokens. The expression -

-
-   []
-
-

-denotes the empty token list. -

- -

An abstract syntax example

-

-To express the abstract syntax of food.cf in -a file Food.gf, we write two kinds of judgements: -

- - -
-    abstract Food = {
-  
-    cat
-      S ; Item ; Kind ; Quality ;
-  
-    fun
-      Is : Item -> Quality -> S ;
-      This, That : Kind -> Item ;
-      QKind : Quality -> Kind -> Kind ;
-      Wine, Cheese, Fish : Kind ;
-      Very : Quality -> Quality ;
-      Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
-    }
-
-

-Notice the use of shorthands permitting the sharing of -the keyword in subsequent judgements, -

-
-    cat S ; Item ;   ===   cat S ; cat Item ; 
-
-

-and of the type in subsequent fun judgements, -

-
-    fun Wine, Fish : Kind ;            ===
-    fun Wine : Kind ; Fish : Kind ;    ===
-    fun Wine : Kind ; fun Fish : Kind ;
-
-

-The order of judgements in a module is free. -

- -

A concrete syntax example

-

-Each category introduced in Food.gf is -given a lincat rule, and each -function is given a lin rule. Similar shorthands -apply as in abstract modules. -

-
-    concrete FoodEng of Food = {
-  
-    lincat
-      S, Item, Kind, Quality = {s : Str} ;
-  
-    lin
-      Is item quality = {s = item.s ++ "is" ++ quality.s} ;
-      This kind = {s = "this" ++ kind.s} ;
-      That kind = {s = "that" ++ kind.s} ;
-      QKind quality kind = {s = quality.s ++ kind.s} ;
-      Wine = {s = "wine"} ;
-      Cheese = {s = "cheese"} ;
-      Fish = {s = "fish"} ;
-      Very quality = {s = "very" ++ quality.s} ;
-      Fresh = {s = "fresh"} ;
-      Warm = {s = "warm"} ;
-      Italian = {s = "Italian"} ;
-      Expensive = {s = "expensive"} ;
-      Delicious = {s = "delicious"} ;
-      Boring = {s = "boring"} ;
-    }
-
-

- -

Modules and files

-

-Source files: Module name + .gf = file name -

-

-Target files: each module is compiled into a .gfc file. -

-

-Import FoodEng.gf and see what happens -

-
-    > i FoodEng.gf
-
-

-The GF program does not only read the file -FoodEng.gf, but also all other files that it -depends on - in this case, Food.gf. -

-

-For each file that is compiled, a .gfc file -is generated. The GFC format (="GF Canonical") is the -"machine code" of GF, which is faster to process than -GF source files. When reading a module, GF decides whether -to use an existing .gfc file or to generate -a new one, by looking at modification times. -

- -

Multilingual grammars and translation

-

-The main advantage of separating abstract from concrete syntax is that -one abstract syntax can be equipped with many concrete syntaxes. -A system with this property is called a multilingual grammar. -

-

-Multilingual grammars can be used for applications such as -translation. Let us build an Italian concrete syntax for -Food and then test the resulting -multilingual grammar. -

- -

An Italian concrete syntax

-
-  concrete FoodIta of Food = {
-  
-    lincat
-      S, Item, Kind, Quality = {s : Str} ;
-  
-    lin
-      Is item quality = {s = item.s ++ "è" ++ quality.s} ;
-      This kind = {s = "questo" ++ kind.s} ;
-      That kind = {s = "quello" ++ kind.s} ;
-      QKind quality kind = {s = kind.s ++ quality.s} ;
-      Wine = {s = "vino"} ;
-      Cheese = {s = "formaggio"} ;
-      Fish = {s = "pesce"} ;
-      Very quality = {s = "molto" ++ quality.s} ;
-      Fresh = {s = "fresco"} ;
-      Warm = {s = "caldo"} ;
-      Italian = {s = "italiano"} ;
-      Expensive = {s = "caro"} ;
-      Delicious = {s = "delizioso"} ;
-      Boring = {s = "noioso"} ;
-  
-  }
-  
-
-

- -

Using a multilingual grammar

-

-Import the two grammars in the same GF session. -

-
-    > i FoodEng.gf
-    > i FoodIta.gf
-
-

-Try generation now: -

-
-    > gr | l
-    quello formaggio molto noioso è italiano
-  
-    > gr | l -lang=FoodEng
-    this fish is warm
-
-

-Translate by using a pipe: -

-
-    > p -lang=FoodEng "this cheese is very delicious" | l -lang=FoodIta
-    questo formaggio è molto delizioso
-
-

-The lang flag tells GF which concrete syntax to use in parsing and -linearization. By default, the flag is set to the last-imported grammar. -To see what grammars are in scope and which is the main one, use the command -print_options = po: -

-
-    > print_options
-    main abstract :     Food
-    main concrete :     FoodIta
-    actual concretes :  FoodIta FoodEng
-
-

- -

Translation session

-

-If translation is what you want to do with a set of grammars, a convenient -way to do it is to open a translation_session = ts. In this session, -you can translate between all the languages that are in scope. -A dot . terminates the translation session. -

-
-    > ts
-  
-    trans> that very warm cheese is boring
-    quello formaggio molto caldo è noioso
-    that very warm cheese is boring
-  
-    trans> questo vino molto italiano è molto delizioso
-    questo vino molto italiano è molto delizioso
-    this very Italian wine is very delicious
-  
-    trans> .
-    >
-
-

- -

Translation quiz

-

-This is a simple language exercise that can be automatically -generated from a multilingual grammar. The system generates a set of -random sentences, displays them in one language, and checks the user's -answer given in another language. The command translation_quiz = tq -makes this in a subshell of GF. -

-
-    > translation_quiz FoodEng FoodIta
-  
-    Welcome to GF Translation Quiz.
-    The quiz is over when you have done at least 10 examples
-    with at least 75 % success.
-    You can interrupt the quiz by entering a line consisting of a dot ('.').
-  
-    this fish is warm
-    questo pesce è caldo
-    > Yes.
-    Score 1/1
-  
-    this cheese is Italian
-    questo formaggio è noioso
-    > No, not questo formaggio è noioso, but
-    questo formaggio è italiano
-  
-    Score 1/2
-    this fish is expensive
-
-

-You can also generate a list of translation exercises and save it in a -file for later use, by the command translation_list = tl -

-
-    > translation_list -number=25 FoodEng FoodIta
-
-

-The number flag gives the number of sentences generated. -

- -

Grammar architecture

- -

Extending a grammar

-

-The module system of GF makes it possible to extend a -grammar in different ways. The syntax of extension is -shown by the following example. We extend Food by -adding a category of questions and two new functions. -

-
-    abstract Morefood = Food ** {
-      cat
-        Question ;
-      fun
-        QIs : Item -> Quality -> Question ;
-        Pizza : Kind ;
-        
-    }
-
-

-Parallel to the abstract syntax, extensions can -be built for concrete syntaxes: -

-
-    concrete MorefoodEng of Morefood = FoodEng ** {
-      lincat
-        Question = {s : Str} ;
-      lin
-        QIs item quality = {s = "is" ++ item.s ++ quality.s} ;
-        Pizza = {s = "pizza"} ;
-    }
-
-

-The effect of extension is that all of the contents of the extended -and extending module are put together. -

- -

Multiple inheritance

-

-Specialized vocabularies can be represented as small grammars that -only do "one thing" each. For instance, the following are grammars -for fruit and mushrooms -

-
-    abstract Fruit = {
-      cat Fruit ;
-      fun Apple, Peach : Fruit ;
-    }
-  
-    abstract Mushroom = {
-      cat Mushroom ;
-      fun Cep, Agaric : Mushroom ;
-    }
-
-

-They can afterwards be combined into bigger grammars by using -multiple inheritance, i.e. extension of several grammars at the -same time: -

-
-    abstract Foodmarket = Food, Fruit, Mushroom ** {
-      fun 
-        FruitKind    : Fruit    -> Kind ;
-        MushroomKind : Mushroom -> Kind ;
-      }
-
-

-At this point, you would perhaps like to go back to -Food and take apart Wine to build a special -Drink module. -

- -

Visualizing module structure

-

-When you have created all the abstract syntaxes and -one set of concrete syntaxes needed for Foodmarket, -your grammar consists of eight GF modules. To see how their -dependences look like, you can use the command -visualize_graph = vg, -

-
-    > visualize_graph
-
-

-and the graph will pop up in a separate window. -

-

-The graph uses -

- - -

- -

- -

System commands

-

-To document your grammar, you may want to print the -graph into a file, e.g. a .png file that -can be included in an HTML document. You can do this -by first printing the graph into a file .dot and then -processing this file with the dot program. -

-
-    > pm -printer=graph | wf Foodmarket.dot
-    > ! dot -Tpng Foodmarket.dot > Foodmarket.png
-
-

-The latter command is a Unix command, issued from GF by using the -shell escape symbol !. The resulting graph was shown in the previous section. -

-

-The command print_multi = pm is used for printing the current multilingual -grammar in various formats, of which the format -printer=graph just -shows the module dependencies. Use help to see what other formats -are available: -

-
-    > help pm
-    > help -printer
-
-

- -

Resource modules

- -

The golden rule of functional programming

-

-In comparison to the .cf format, the .gf format looks rather -verbose, and demands lots more characters to be written. You have probably -done this by the copy-paste-modify method, which is a common way to -avoid repeating work. -

-

-However, there is a more elegant way to avoid repeating work than the copy-and-paste -method. The golden rule of functional programming says that -

- - -

-A function separates the shared parts of different computations from the -changing parts, parameters. In functional programming languages, such as -Haskell, it is possible to share much more than in -languages such as C and Java. -

- -

Operation definitions

-

-GF is a functional programming language, not only in the sense that -the abstract syntax is a system of functions (fun), but also because -functional programming can be used to define concrete syntax. This is -done by using a new form of judgement, with the keyword oper (for -operation), distinct from fun for the sake of clarity. -Here is a simple example of an operation: -

-
-    oper ss : Str -> {s : Str} = \x -> {s = x} ;
-
-

-The operation can be applied to an argument, and GF will -compute the application into a value. For instance, -

-
-    ss "boy"  --->  {s = "boy"}
-
-

-(We use the symbol ---> to indicate how an expression is -computed into a value; this symbol is not a part of GF) -

-

-Thus an oper judgement includes the name of the defined operation, -its type, and an expression defining it. As for the syntax of the defining -expression, notice the lambda abstraction form \x -> t of -the function. -

- -

The ``resource`` module type

-

-Operator definitions can be included in a concrete syntax. -But they are not really tied to a particular set of linearization rules. -They should rather be seen as resources -usable in many concrete syntaxes. -

-

-The resource module type can be used to package -oper definitions into reusable resources. Here is -an example, with a handful of operations to manipulate -strings and records. -

-
-    resource StringOper = {
-      oper
-        SS : Type = {s : Str} ;
-        ss : Str -> SS = \x -> {s = x} ;
-        cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
-        prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
-    }
-
-

-Resource modules can extend other resource modules, in the -same way as modules of other types can extend modules of the -same type. Thus it is possible to build resource hierarchies. -

- -

Opening a ``resource``

-

-Any number of resource modules can be -opened in a concrete syntax, which -makes definitions contained -in the resource usable in the concrete syntax. Here is -an example, where the resource StringOper is -opened in a new version of FoodEng. -

-
-    concrete Food2Eng of Food = open StringOper in {
-  
-    lincat
-      S, Item, Kind, Quality = SS ;
-  
-    lin
-      Is item quality = cc item (prefix "is" quality) ;
-      This = prefix "this" ;
-      That = prefix "that" ;
-      QKind = cc ;
-      Wine = ss "wine" ;
-      Cheese = ss "cheese" ;
-      Fish = ss "fish" ;
-      Very = prefix "very" ;
-      Fresh = ss "fresh" ;
-      Warm = ss "warm" ;
-      Italian = ss "Italian" ;
-      Expensive = ss "expensive" ;
-      Delicious = ss "delicious" ;
-      Boring = ss "boring" ;
-  
-    }
-
-

-The same string operations could be used to write FoodIta -more concisely. -

- -

Division of labour

-

-Using operations defined in resource modules is a -way to avoid repetitive code. -In addition, it enables a new kind of modularity -and division of labour in grammar writing: grammarians familiar with -the linguistic details of a language can make this knowledge -available through resource grammar modules, whose users only need -to pick the right operations and not to know their implementation -details. -

- -

Morphology

-

-Suppose we want to say, with the vocabulary included in -Food.gf, things like -

-
-    all Italian wines are delicious
-
-

-The new grammatical facility we need are the plural forms -of nouns and verbs (wines, are), as opposed to their -singular forms. -

-

-The introduction of plural forms requires two things: -

- - -

-Different languages have different rules of inflection and agreement. -For instance, Italian has also agreement in gender (masculine vs. feminine). -We want to express such special features of languages in the -concrete syntax while ignoring them in the abstract syntax. -

-

-To be able to do all this, we need one new judgement form -and many new expression forms. -We also need to generalize linearization types -from strings to more complex types. -

- -

Parameters and tables

-

-We define the parameter type of number in Englisn by -using a new form of judgement: -

-
-    param Number = Sg | Pl ;
-
-

-To express that Kind expressions in English have a linearization -depending on number, we replace the linearization type {s : Str} -with a type where the s field is a table depending on number: -

-
-    lincat Kind = {s : Number => Str} ;
-
-

-The table type Number => Str is in many respects similar to -a function type (Number -> Str). The main difference is that the -argument type of a table type must always be a parameter type. This means -that the argument-value pairs can be listed in a finite table. The following -example shows such a table: -

-
-    lin Cheese = {s = table {
-      Sg => "cheese" ;
-      Pl => "cheeses"
-      }
-    } ;
-
-

-The table consists of branches, where a pattern on the -left of the arrow => is assigned a value on the right. -

-

-The application of a table to a parameter is done by the selection -operator !. For instance, -

-
-     table {Sg => "cheese" ; Pl => "cheeses"} ! Pl
-
-

-is a selection that computes into the value "cheeses". -This computation is performed by pattern matching: return -the value from the first branch whose pattern matches the -selection argument. -

- -

Inflection tables, paradigms, and ``oper`` definitions

-

-All English common nouns are inflected in number, most of them in the -same way: the plural form is obtained from the singular by adding the -ending s. This rule is an example of -a paradigm - a formula telling how the inflection -forms of a word are formed. -

-

-From the GF point of view, a paradigm is a function that takes a lemma - -also known as a dictionary form - and returns an inflection -table of desired type. Paradigms are not functions in the sense of the -fun judgements of abstract syntax (which operate on trees and not -on strings), but operations defined in oper judgements. -The following operation defines the regular noun paradigm of English: -

-
-    oper regNoun : Str -> {s : Number => Str} = \x -> {
-      s = table {
-        Sg => x ;
-        Pl => x + "s"
-        }
-      } ;
-
-

-The gluing operator + tells that -the string held in the variable x and the ending "s" -are written together to form one token. Thus, for instance, -

-
-    (regNoun "cheese").s ! Pl  ---> "cheese" + "s"  --->  "cheeses"
-
-

- -

Worst-case functions and data abstraction

-

-Some English nouns, such as mouse, are so irregular that -it makes no sense to see them as instances of a paradigm. Even -then, it is useful to perform data abstraction from the -definition of the type Noun, and introduce a constructor -operation, a worst-case function for nouns: -

-
-    oper mkNoun : Str -> Str -> Noun = \x,y -> {
-      s = table {
-        Sg => x ;
-        Pl => y
-        }
-      } ;
-
-

-Thus we could define -

-
-    lin Mouse = mkNoun "mouse" "mice" ;
-
-

-and -

-
-    oper regNoun : Str -> Noun = \x -> 
-      mkNoun x (x + "s") ;
-
-

-instead of writing the inflection table explicitly. -

-

-The grammar engineering advantage of worst-case functions is that -the author of the resource module may change the definitions of -Noun and mkNoun, and still retain the -interface (i.e. the system of type signatures) that makes it -correct to use these functions in concrete modules. In programming -terms, Noun is then treated as an abstract datatype. -

- -

A system of paradigms using Prelude operations

-

-In addition to the completely regular noun paradigm regNoun, -some other frequent noun paradigms deserve to be -defined, for instance, -

-
-    sNoun : Str -> Noun = \kiss  -> mkNoun kiss  (kiss  + "es") ;
-
-

-What about nouns like fly, with the plural flies? The already -available solution is to use the longest common prefix -fl (also known as the technical stem) as argument, and define -

-
-    yNoun : Str -> Noun = \fl -> mkNoun (fl  + "y") (fl  + "ies") ;
-
-

-But this paradigm would be very unintuitive to use, because the technical stem -is not an existing form of the word. A better solution is to use -the lemma and a string operator init, which returns the initial segment (i.e. -all characters but the last) of a string: -

-
-    yNoun : Str -> Noun = \fly -> mkNoun fly (init fly  + "ies") ;  
-
-

-The operation init belongs to a set of operations in the -resource module Prelude, which therefore has to be -opened so that init can be used. -

- -

An intelligent noun paradigm using ``case`` expressions

-

-It may be hard for the user of a resource morphology to pick the right -inflection paradigm. A way to help this is to define a more intelligent -paradigm, which chooses the ending by first analysing the lemma. -The following variant for English regular nouns puts together all the -previously shown paradigms, and chooses one of them on the basis of -the final letter of the lemma (found by the prelude operator last). -

-
-    regNoun : Str -> Noun = \s -> case last s of {
-      "s" | "z" => mkNoun s (s + "es") ;
-      "y"       => mkNoun s (init s + "ies") ;
-      _         => mkNoun s (s + "s")
-      } ;
-
-

-This definition displays many GF expression forms not shown befores; -these forms are explained in the next section. -

-

-The paradigms regNoun does not give the correct forms for -all nouns. For instance, mouse - mice and -fish - fish must be given by using mkNoun. -Also the word boy would be inflected incorrectly; to prevent -this, either use mkNoun or modify -regNoun so that the "y" case does not -apply if the second-last character is a vowel. -

- -

Pattern matching

-

-We have so far built all expressions of the table form -from branches whose patterns are constants introduced in -param definitions, as well as constant strings. -But there are more expressive patterns. Here is a summary of the possible forms: -

- - -

-Pattern matching is performed in the order in which the branches -appear in the table: the branch of the first matching pattern is followed. -

-

-As syntactic sugar, one-branch tables can be written concisely, -

-
-    \\P,...,Q => t  ===  table {P => ... table {Q => t} ...}
-
-

-Finally, the case expressions common in functional -programming languages are syntactic sugar for table selections: -

-
-    case e of {...} ===  table {...} ! e
-
-

- -

Morphological resource modules

-

-A common idiom is to -gather the oper and param definitions -needed for inflecting words in -a language into a morphology module. Here is a simple -example, MorphoEng. -

-
-    --# -path=.:prelude
-  
-    resource MorphoEng = open Prelude in {
-  
-      param
-        Number = Sg | Pl ;
-  
-      oper
-        Noun, Verb : Type = {s : Number => Str} ;
-  
-        mkNoun : Str -> Str -> Noun = \x,y -> {
-          s = table {
-            Sg => x ;
-            Pl => y
-            }
-          } ;
-  
-        regNoun : Str -> Noun = \s -> case last s of {
-          "s" | "z" => mkNoun s (s + "es") ;
-          "y"       => mkNoun s (init s + "ies") ;
-          _         => mkNoun s (s + "s")
-          } ;
-  
-        mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ;
-  
-        regVerb : Str -> Verb = \s -> case last s of {
-          "s" | "z" => mkVerb s (s + "es") ;
-          "y"       => mkVerb s (init s + "ies") ;
-          "o"       => mkVerb s (s + "es") ;
-          _         => mkVerb s (s + "s")
-          } ;
-    }
-
-

-The first line gives as a hint to the compiler the -search path needed to find all the other modules that the -module depends on. The directory prelude is a subdirectory of -GF/lib; to be able to refer to it in this simple way, you can -set the environment variable GF_LIB_PATH to point to this -directory. -

- -

Testing resource modules

-

-To test a resource module independently, you must import it -with the flag -retain, which tells GF to retain oper definitions -in the memory; the usual behaviour is that oper definitions -are just applied to compile linearization rules -(this is called inlining) and then thrown away. -

-
-    > i -retain MorphoEng.gf
-
-

-The command compute_concrete = cc computes any expression -formed by operations and other GF constructs. For example, -

-
-    > cc regVerb "echo"
-    {s : Number => Str = table Number {
-      Sg => "echoes" ;
-      Pl => "echo"
-      }
-    }
-
-

-

-The command show_operations = so` shows the type signatures -of all operations returning a given value type: -

-
-    > so Verb
-    MorphoEng.mkNoun : Str -> Str -> {s : {MorphoEng.Number} => Str}
-    MorphoEng.mkVerb : Str -> Str -> {s : {MorphoEng.Number} => Str}
-    MorphoEng.regNoun : Str -> {s : {MorphoEng.Number} => Str}
-    MorphoEng.regVerb : Str -> { s : {MorphoEng.Number} => Str}
-
-

-Why does the command also show the operations that form -Nouns? The reason is that the type expression -Verb is first computed, and its value happens to be -the same as the value of Noun. -

- -

Using parameters in concrete syntax

-

-We can now enrich the concrete syntax definitions to -comprise morphology. This will involve a more radical -variation between languages (e.g. English and Italian) -then just the use of different words. In general, -parameters and linearization types are different in -different languages - but this does not prevent the -use of a common abstract syntax. -

- -

Parametric vs. inherent features, agreement

-

-The rule of subject-verb agreement in English says that the verb -phrase must be inflected in the number of the subject. This -means that a noun phrase (functioning as a subject), inherently -has a number, which it passes to the verb. The verb does not -have a number, but must be able to receive whatever number the -subject has. This distinction is nicely represented by the -different linearization types of noun phrases and verb phrases: -

-
-    lincat NP = {s : Str ; n : Number} ;
-    lincat VP = {s : Number => Str} ;
-
-

-We say that the number of NP is an inherent feature, -whereas the number of NP is a variable feature (or a -parametric feature). -

-

-The agreement rule itself is expressed in the linearization rule of -the predication function: -

-
-    lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
-
-

-The following section will present -FoodsEng, assuming the abstract syntax Foods -that is similar to Food but also has the -plural determiners These and Those. -The reader is invited to inspect the way in which agreement works in -the formation of sentences. -

- -

English concrete syntax with parameters

-

-The grammar uses both -Prelude and -MorphoEng. -We will later see how to make the grammar even -more high-level by using a resource grammar library -and parametrized modules. -

-
-  --# -path=.:resource:prelude
-  
-  concrete FoodsEng of Foods = open Prelude, MorphoEng in {
-  
-    lincat
-      S, Quality = SS ; 
-      Kind = {s : Number => Str} ; 
-      Item = {s : Str ; n : Number} ; 
-  
-    lin
-      Is item quality = ss (item.s ++ (mkVerb "are" "is").s ! item.n ++ quality.s) ;
-      This  = det Sg "this" ;
-      That  = det Sg "that" ;
-      These = det Pl "these" ;
-      Those = det Pl "those" ;
-      QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ;
-      Wine = regNoun "wine" ;
-      Cheese = regNoun "cheese" ;
-      Fish = mkNoun "fish" "fish" ;
-      Very = prefixSS "very" ;
-      Fresh = ss "fresh" ;
-      Warm = ss "warm" ;
-      Italian = ss "Italian" ;
-      Expensive = ss "expensive" ;
-      Delicious = ss "delicious" ;
-      Boring = ss "boring" ;
-  
-    oper
-      det : Number -> Str -> Noun -> {s : Str ; n : Number} = \n,d,cn -> {
-        s = d ++ cn.s ! n ;
-        n = n
-        } ;
-  
-  }
-
-

- -

Hierarchic parameter types

-

-The reader familiar with a functional programming language such as -Haskell must have noticed the similarity -between parameter types in GF and algebraic datatypes (data definitions -in Haskell). The GF parameter types are actually a special case of algebraic -datatypes: the main restriction is that in GF, these types must be finite. -(It is this restriction that makes it possible to invert linearization rules into -parsing methods.) -

-

-However, finite is not the same thing as enumerated. Even in GF, parameter -constructors can take arguments, provided these arguments are from other -parameter types - only recursion is forbidden. Such parameter types impose a -hierarchic order among parameters. They are often needed to define -the linguistically most accurate parameter systems. -

-

-To give an example, Swedish adjectives -are inflected in number (singular or plural) and -gender (uter or neuter). These parameters would suggest 2*2=4 different -forms. However, the gender distinction is done only in the singular. Therefore, -it would be inaccurate to define adjective paradigms using the type -Gender => Number => Str. The following hierarchic definition -yields an accurate system of three adjectival forms. -

-
-    param AdjForm = ASg Gender | APl ;
-    param Gender  = Utr | Neutr ;
-
-

-Here is an example of pattern matching, the paradigm of regular adjectives. -

-
-    oper regAdj : Str -> AdjForm => Str = \fin -> table {
-      ASg Utr   => fin ;
-      ASg Neutr => fin + "t" ;
-      APl       => fin + "a" ;
-      }
-
-

-A constructor can be used as a pattern that has patterns as arguments. For instance, -the adjectival paradigm in which the two singular forms are the same, -can be defined -

-
-    oper plattAdj : Str -> AdjForm => Str = \platt -> table {
-      ASg _ => platt ;
-      APl   => platt + "a" ;
-      }
-
-

- -

Morphological analysis and morphology quiz

-

-Even though morphology is in GF -mostly used as an auxiliary for syntax, it -can also be useful on its own right. The command morpho_analyse = ma -can be used to read a text and return for each word the analyses that -it has in the current concrete syntax. -

-
-    > rf bible.txt | morpho_analyse
-
-

-In the same way as translation exercises, morphological exercises can -be generated, by the command morpho_quiz = mq. Usually, -the category is set to be something else than S. For instance, -

-
-    > i lib/resource/french/VerbsFre.gf
-    > morpho_quiz -cat=V
-  
-    Welcome to GF Morphology Quiz.
-    ...
-  
-    réapparaître : VFin VCondit  Pl  P2
-    réapparaitriez
-    > No, not réapparaitriez, but
-    réapparaîtriez
-    Score 0/1
-
-

-Finally, a list of morphological exercises can be generated -off-line and saved in a -file for later use, by the command morpho_list = ml -

-
-    > morpho_list -number=25 -cat=V | wf exx.txt
-
-

-The number flag gives the number of exercises generated. -

- -

Discontinuous constituents

-

-A linearization type may contain more strings than one. -An example of where this is useful are English particle -verbs, such as switch off. The linearization of -a sentence may place the object between the verb and the particle: -he switched it off. -

-

-The following judgement defines transitive verbs as -discontinuous constituents, i.e. as having a linearization -type with two strings and not just one. -

-
-    lincat TV = {s : Number => Str ; part : Str} ;
-
-

-This linearization rule -shows how the constituents are separated by the object in complementization. -

-
-    lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ;
-
-

-There is no restriction in the number of discontinuous constituents -(or other fields) a lincat may contain. The only condition is that -the fields must be of finite types, i.e. built from records, tables, -parameters, and Str, and not functions. -

-

-A mathematical result -about parsing in GF says that the worst-case complexity of parsing -increases with the number of discontinuous constituents. This is -potentially a reason to avoid discontinuous constituents. -Moreover, the parsing and linearization commands only give accurate -results for categories whose linearization type has a unique Str -valued field labelled s. Therefore, discontinuous constituents -are not a good idea in top-level categories accessed by the users -of a grammar application. -

- -

Free variation

-

-Sometimes there are many alternative ways to define a concrete syntax. -For instance, the verb negation in English can be expressed both by -does not and doesn't. In linguistic terms, these expressions -are in free variation. The variants construct of GF can -be used to give a list of strings in free variation. For example, -

-
-    NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ;
-
-

-An empty variant list -

-
-    variants {}
-
-

-can be used e.g. if a word lacks a certain form. -

-

-In general, variants should be used cautiously. It is not -recommended for modules aimed to be libraries, because the -user of the library has no way to choose among the variants. -

- -

Overloading of operations

-

-Large libraries, such as the GF Resource Grammar Library, may define -hundreds of names, which can be unpractical -for both the library writer and the user. The writer has to invent longer -and longer names which are not always intuitive, -and the user has to learn or at least be able to find all these names. -A solution to this problem, adopted by languages such as C++, is overloading: -the same name can be used for several functions. When such a name is used, the -compiler performs overload resolution to find out which of the possible functions -is meant. The resolution is based on the types of the functions: all functions that -have the same name must have different types. -

-

-In C++, functions with the same name can be scattered everywhere in the program. -In GF, they must be grouped together in overload groups. Here is an example -of an overload group, defining four ways to define nouns in Italian: -

-
-    oper mkN = overload {
-      mkN : Str -> N                  = -- regular nouns
-      mkN : Str -> Gender -> N        = -- regular nouns with unexpected gender
-      mkN : Str -> Str -> N           = -- irregular nouns
-      mkN : Str -> Str -> Gender -> N = -- irregular nouns with unexpected gender
-    }
-
-

-All of the following uses of mkN are easy to resolve: -

-
-    lin Pizza = mkN "pizza" ;         -- Str -> N
-    lin Hand  = mkN "mano" Fem ;      -- Str -> Gender -> N
-    lin Man   = mkN "uomo" "uomini" ; -- Str -> Str -> N
-
-

- -

Using the resource grammar library TODO

-

-A resource grammar is a grammar built on linguistic grounds, -to describe a language rather than a domain. -The GF resource grammar library, which contains resource grammars for -10 languages, is described more closely in the following -documents: -

- - - -

Interfaces, instances, and functors

- -

The simplest way

-

-The simplest way is to open a top-level Lang module -and a Paradigms module: -

-
-    abstract Foo = ...
-  
-    concrete FooEng = open LangEng, ParadigmsEng in ...
-    concrete FooSwe = open LangSwe, ParadigmsSwe in ...
-
-

-Here is an example. -

-
-  abstract Arithm = {
-    cat
-      Prop ;
-      Nat ;
-    fun
-      Zero : Nat ;
-      Succ : Nat -> Nat ;
-      Even : Nat -> Prop ;
-      And  : Prop -> Prop -> Prop ;
-  }
-  
-  --# -path=.:alltenses:prelude
-  
-  concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
-    lincat
-      Prop = S ;
-      Nat  = NP ;
-    lin
-      Zero = 
-        UsePN (regPN "zero" nonhuman) ;
-      Succ n = 
-        DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
-      Even n = 
-        UseCl TPres ASimul PPos 
-          (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
-      And x y = 
-        ConjS and_Conj (BaseS x y) ;
-  
-  }
-  
-  --# -path=.:alltenses:prelude
-  
-  concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
-    lincat
-      Prop = S ;
-      Nat  = NP ;
-    lin
-      Zero = 
-        UsePN (regPN "noll" neutrum) ;
-      Succ n = 
-        DetCN (DetSg (SgQuant DefArt) NoOrd) 
-          (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") 
-             (mkPreposition "till")) n) ;
-      Even n = 
-        UseCl TPres ASimul PPos 
-          (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
-      And x y = 
-        ConjS and_Conj (BaseS x y) ;
-  }
-
-

- -

How to find resource functions

-

-The definitions in this example were found by parsing: -

-
-    > i LangEng.gf
-  
-    -- for Successor:
-    > p -cat=NP -mcfg -parser=topdown "the mother of Paris"
-  
-    -- for Even:
-    > p -cat=S -mcfg -parser=topdown "Paris is old"
-  
-    -- for And:
-    > p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
-
-

-The use of parsing can be systematized by example-based grammar writing, -to which we will return later. -

- -

A functor implementation

-

-The interesting thing now is that the -code in ArithmSwe is similar to the code in ArithmEng, except for -some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor", -"jämn" vs. "even"). How can we exploit the similarities and -actually share code between the languages? -

-

-The solution is to use a functor: an incomplete module that opens -an abstract as an interface, and then instantiate it to different -languages that implement the interface. The structure is as follows: -

-
-    abstract Foo ...
-  
-    incomplete concrete FooI = open Lang, Lex in ...
-  
-    concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
-    concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
-
-

-where Lex is an abstract lexicon that includes the vocabulary -specific to this application: -

-
-    abstract Lex = Cat ** ...
-  
-    concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
-    concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...  
-
-

-Here, again, a complete example (abstract Arithm is as above): -

-
-  incomplete concrete ArithmI of Arithm = open Lang, Lex in {
-    lincat
-      Prop = S ;
-      Nat  = NP ;
-    lin
-      Zero = 
-        UsePN zero_PN ;
-      Succ n = 
-        DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
-      Even n = 
-        UseCl TPres ASimul PPos 
-          (PredVP n (UseComp (CompAP (PositA even_A)))) ;
-      And x y = 
-        ConjS and_Conj (BaseS x y) ;
-  }
-  
-  --# -path=.:alltenses:prelude
-  concrete ArithmEng of Arithm = ArithmI with
-    (Lang = LangEng),
-    (Lex = LexEng) ;
-  
-  --# -path=.:alltenses:prelude
-  concrete ArithmSwe of Arithm = ArithmI with
-    (Lang = LangSwe),
-    (Lex = LexSwe) ;
-  
-  abstract Lex = Cat ** {
-    fun
-      zero_PN : PN ;
-      successor_N2 : N2 ;  
-      even_A : A ;
-  }
-  
-  concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
-    lin 
-      zero_PN = regPN "noll" neutrum ;
-      successor_N2 = 
-        mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
-      even_A = regA "jämn" ;
-  }
-
-

- -

Restricted inheritance and qualified opening

- -

More constructs for concrete syntax

-

-In this chapter, we go through constructs that are not necessary in simple grammars -or when the concrete syntax relies on libraries, but very useful when writing advanced -concrete syntax implementations, such as resource grammar libraries. -

- -

Local definitions

-

-Local definitions ("let expressions") are used in functional -programming for two reasons: to structure the code into smaller -expressions, and to avoid repeated computation of one and -the same expression. Here is an example, from -MorphoIta: -

-
-    oper regNoun : Str -> Noun = \vino -> 
-          let 
-            vin = init vino ;
-            o   = last vino
-          in
-          case o of {
-            "a"       => mkNoun Fem  vino (vin + "e") ;
-            "o" | "e" => mkNoun Masc vino (vin + "i") ;
-            _         => mkNoun Masc vino vino         
-            } ;
-
-

- -

Record extension and subtyping

-

-Record types and records can be extended with new fields. For instance, -in German it is natural to see transitive verbs as verbs with a case. -The symbol ** is used for both constructs. -

-
-    lincat TV = Verb ** {c : Case} ;
-  
-    lin Follow = regVerb "folgen" ** {c = Dative} ; 
-
-

-To extend a record type or a record with a field whose label it -already has is a type error. -

-

-A record type T is a subtype of another one R, if T has -all the fields of R and possibly other fields. For instance, -an extension of a record type is always a subtype of it. -

-

-If T is a subtype of R, an object of T can be used whenever -an object of R is required. For instance, a transitive verb can -be used whenever a verb is required. -

-

-Contravariance means that a function taking an R as argument -can also be applied to any object of a subtype T. -

- -

Tuples and product types

-

-Product types and tuples are syntactic sugar for record types and records: -

-
-    T1 * ... * Tn   ===   {p1 : T1 ; ... ; pn : Tn}
-    <t1, ...,  tn>  ===   {p1 = T1 ; ... ; pn = Tn}
-
-

-Thus the labels p1, p2,... are hard-coded. -

- -

Record and tuple patterns

-

-Record types of parameter types are also parameter types. -A typical example is a record of agreement features, e.g. French -

-
-    oper Agr : PType = {g : Gender ; n : Number ; p : Person} ;
-
-

-Notice the term PType rather than just Type referring to -parameter types. Every PType is also a Type, but not vice-versa. -

-

-Pattern matching is done in the expected way, but it can moreover -utilize partial records: the branch -

-
-    {g = Fem} => t
-
-

-in a table of type Agr => T means the same as -

-
-    {g = Fem ; n = _ ; p = _} => t
-
-

-Tuple patterns are translated to record patterns in the -same way as tuples to records; partial patterns make it -possible to write, slightly surprisingly, -

-
-    case <g,n,p> of {
-      <Fem> => t
-      ...
-      }
-
-

- -

Regular expression patterns

-

-To define string operations computed at compile time, such -as in morphology, it is handy to use regular expression patterns: -

- - -

-The last three apply to all types of patterns, the first two only to token strings. -As an example, we give a rule for the formation of English word forms -ending with an s and used in the formation of both plural nouns and -third-person present-tense verbs. -

-
-    add_s : Str -> Str = \w -> case w of {
-      _ + "oo"                           => s + "s" ;   -- bamboo
-      _ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ;  -- bus, hero
-      _ + ("a" | "o" | "u" | "e") + "y"  => w + "s" ;   -- boy
-      x + "y"                            => x + "ies" ; -- fly
-      _                                  => w + "s"     -- car
-      } ;
-
-

-Here is another example, the plural formation in Swedish 2nd declension. -The second branch uses a variable binding with @ to cover the cases where an -unstressed pre-final vowel e disappears in the plural -(nyckel-nycklar, seger-segrar, bil-bilar): -

-
-    plural2 : Str -> Str = \w -> case w of {
-      pojk + "e"                       => pojk + "ar" ;
-      nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
-      bil                              => bil + "ar"
-      } ;
-
-

-

-Semantics: variables are always bound to the first match, which is the first -in the sequence of binding lists Match p v defined as follows. In the definition, -p is a pattern and v is a value. -

-
-    Match (p1|p2) v = Match p1 v ++ Match p2 v
-    Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | 
-                         i <- [0..length s], (s1,s2) = splitAt i s]
-    Match p*      s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= []
-    Match -p      v = [[]] if Match p v = []
-    Match c       v = [[]] if c == v  -- for constant and literal patterns c
-    Match x       v = [[(x,v)]]       -- for variable patterns x
-    Match x@p     v = [[(x,v)]] + M   if M = Match p v /= []
-    Match p       v = [] otherwise    -- failure
-
-

-Examples: -

- - - -

Prefix-dependent choices

-

-Sometimes a token has different forms depending on the token -that follows. An example is the English indefinite article, -which is an if a vowel follows, a otherwise. -Which form is chosen can only be decided at run time, i.e. -when a string is actually build. GF has a special construct for -such tokens, the pre construct exemplified in -

-
-    oper artIndef : Str = 
-      pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
-
-

-Thus -

-
-    artIndef ++ "cheese"  --->  "a" ++ "cheese"
-    artIndef ++ "apple"   --->  "an" ++ "apple"
-
-

-This very example does not work in all situations: the prefix -u has no general rules, and some problematic words are -euphemism, one-eyed, n-gram. It is possible to write -

-
-    oper artIndef : Str = 
-      pre {"a" ; 
-           "a"  / strs {"eu" ; "one"} ;
-           "an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"}
-          } ;
-
-

- -

Predefined types and operations

-

-GF has the following predefined categories in abstract syntax: -

-
-    cat Int ;     -- integers, e.g. 0, 5, 743145151019
-    cat Float ;   -- floats,   e.g. 0.0, 3.1415926
-    cat String ;  -- strings,  e.g. "", "foo", "123"
-
-

-The objects of each of these categories are literals -as indicated in the comments above. No fun definition -can have a predefined category as its value type, but -they can be used as arguments. For example: -

-
-    fun StreetAddress : Int -> String -> Address ;
-    lin StreetAddress number street = {s = number.s ++ street.s} ;
-  
-    -- e.g. (StreetAddress 10 "Downing Street") : Address
-
-

-FIXME: The linearization type is {s : Str} for all these categories. -

- -

More concepts of abstract syntax

-

-This section is about the use of the type theory part of GF for -including more semantics in grammars. Some of the subsections present -ideas that have not yet been used in real-world applications, and whose -tool support outside the GF program is not complete. -

- -

GF as a logical framework

-

-In this section, we will show how -to encode advanced semantic concepts in an abstract syntax. -We use concepts inherited from type theory. Type theory -is the basis of many systems known as logical frameworks, which are -used for representing mathematical theorems and their proofs on a computer. -In fact, GF has a logical framework as its proper part: -this part is the abstract syntax. -

-

-In a logical framework, the formalization of a mathematical theory -is a set of type and function declarations. The following is an example -of such a theory, represented as an abstract module in GF. -

-
-  abstract Arithm = {
-    cat
-      Prop ;                        -- proposition
-      Nat ;                         -- natural number
-    fun
-      Zero : Nat ;                  -- 0
-      Succ : Nat -> Nat ;           -- successor of x
-      Even : Nat -> Prop ;          -- x is even
-      And  : Prop -> Prop -> Prop ; -- A and B
-      } 
-
-

-A concrete syntax is given below, as an example of using the resource grammar -library. -

- -

Dependent types

-

-Dependent types are a characteristic feature of GF, -inherited from the -constructive type theory of Martin-Löf and -distinguishing GF from most other grammar formalisms and -functional programming languages. -The initial main motivation for developing GF was, indeed, -to have a grammar formalism with dependent types. -As can be inferred from the fact that we introduce them only now, -after having written lots of grammars without them, -dependent types are no longer the only motivation for GF. -But they are still important and interesting. -

-

-Dependent types can be used for stating stronger -conditions of well-formedness than non-dependent types. -A simple example is postal addresses. Ignoring the other details, -let us take a look at addresses consisting of -a street, a city, and a country. -

-
-  abstract Address = {
-    cat 
-      Address ; Country ; City ; Street ;
-  
-    fun
-      mkAddress : Country -> City -> Street -> Address ;
-  
-      UK, France : Country ;
-      Paris, London, Grenoble : City ;
-      OxfordSt, ShaftesburyAve, BdRaspail, RueBlondel, AvAlsaceLorraine : Street ;
-    }
-
-

-The linearization rules are straightforward, -

-
-    lin
-      mkAddress country city street = 
-        ss (street.s ++ "," ++ city.s ++ "," ++ country.s) ;
-      UK = ss ("U.K.") ;
-      France = ss ("France") ;
-      Paris = ss ("Paris") ;
-      London = ss ("London") ;
-      Grenoble = ss ("Grenoble") ;
-      OxfordSt = ss ("Oxford" ++ "Street") ;
-      ShaftesburyAve = ss ("Shaftesbury" ++ "Avenue") ;
-      BdRaspail = ss ("boulevard" ++ "Raspail") ;
-      RueBlondel = ss ("rue" ++ "Blondel") ;
-      AvAlsaceLorraine = ss ("avenue" ++ "Alsace-Lorraine") ;
-
-

-Notice that, in mkAddress, we have -reversed the order of the constituents. The type of mkAddress -in the abstract syntax takes its arguments in a "logical" order, -with increasing precision. (This order is sometimes even used in the -concrete syntax of addresses, e.g. in Russia). -

-

-Both existing and non-existing addresses are recognized by this -grammar. The non-existing ones in the following randomly generated -list have afterwards been marked by *: -

-
-    > gr -cat=Address -number=7 | l
-  
-    * Oxford Street , Paris , France
-    * Shaftesbury Avenue , Grenoble , U.K.
-    boulevard Raspail , Paris , France
-    * rue Blondel , Grenoble , U.K.
-    * Shaftesbury Avenue , Grenoble , France
-    * Oxford Street , London , France
-    * Shaftesbury Avenue , Grenoble , France
-
-

-Dependent types provide a way to guarantee that addresses are -well-formed. What we do is to include contexts in -cat judgements: -

-
-    cat 
-      Address ; 
-      Country ; 
-      City Country ; 
-      Street (x : Country)(City x) ;
-
-

-The first two judgements are as before, but the third one makes -City dependent on Country: there are no longer just cities, -but cities of the U.K. and cities of France. The fourth judgement -makes Street dependent on City; but since -City is itself dependent on Country, we must -include them both in the context, moreover guaranteeing that -the city is one of the given country. Since the context itself -is built by using a dependent type, we have to use variables -to indicate the dependencies. The judgement we used for City -is actually shorthand for -

-
-    cat City (x : Country)
-
-

-which is only possible if the subsequent context does not depend on x. -

-

-The fun judgements of the grammar are modified accordingly: -

-
-    fun
-      mkAddress : (x : Country) -> (y : City x) -> Street x y -> Address ;
-    
-      UK : Country ;
-      France : Country ;
-      Paris : City France ; 
-      London : City UK ; 
-      Grenoble : City France ;
-      OxfordSt : Street UK London ; 
-      ShaftesburyAve : Street UK London ;
-      BdRaspail : Street France Paris ; 
-      RueBlondel : Street France Paris ; 
-      AvAlsaceLorraine : Street France Grenoble ;
-
-

-Since the type of mkAddress now has dependencies among -its argument types, we have to use variables just like we used in -the context of Street above. What we claimed to be the -"logical" order of the arguments is now forced by the type system -of GF: a variable must be declared (=bound) before it can be -referenced (=used). -

-

-The effect of dependent types is that the *-marked addresses above are -no longer well-formed. What the GF parser actually does is that it -initially accepts them (by using a context-free parsing algorithm) -and then rejects them (by type checking). The random generator does not produce -illegal addresses (this could be useful in bulk mailing!). -The linearization algorithm does not care about type dependencies; -actually, since the categories (ignoring their arguments) -are the same in both abstract syntaxes, -we use the same concrete syntax -for both of them. -

-

-Remark. Function types without -variables are actually a shorthand notation: writing -

-
-    fun PredV1 : NP -> V1 -> S
-
-

-is shorthand for -

-
-    fun PredV1 : (x : NP) -> (y : V1) -> S
-
-

-or any other naming of the variables. Actually the use of variables -sometimes shortens the code, since we can write e.g. -

-
-    oper triple : (x,y,z : Str) -> Str = ...
-
-

-If a bound variable is not used, it can here, as elswhere in GF, be replaced by -a wildcard: -

-
-    oper triple : (_,_,_ : Str) -> Str = ...
-
-

- -

Dependent types in concrete syntax

-

-The functional fragment of GF -terms and types comprises function types, applications, lambda -abstracts, constants, and variables. This fragment is similar in -abstract and concrete syntax. In particular, -dependent types are also available in concrete syntax. -We have not made use of them yet, -but we will now look at one example of how they -can be used. -

-

-Those readers who are familiar with functional programming languages -like ML and Haskell, may already have missed polymorphic -functions. For instance, Haskell programmers have access to -the functions -

-
-    const :: a -> b -> a
-    const c _ = c
-  
-    flip :: (a -> b -> c) -> b -> a -> c
-    flip f y x = f x y
-
-

-which can be used for any given types a,b, and c. -

-

-The GF counterpart of polymorphic functions are monomorphic -functions with explicit type variables. Thus the above -definitions can be written -

-
-    oper const :(a,b : Type) -> a -> b -> a =
-      \_,_,c,_ -> c ;
-  
-    oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c =
-      \_,_,_,f,x,y -> f y x ;
-
-

-When the operations are used, the type checker requires -them to be equipped with all their arguments; this may be a nuisance -for a Haskell or ML programmer. -

- -

Expressing selectional restrictions

-

-This section introduces a way of using dependent types to -formalize a notion known as selectional restrictions -in linguistics. We first present a mathematical model -of the notion, and then integrate it in a linguistically -motivated syntax. -

-

-In linguistics, a -grammar is usually thought of as being about syntactic well-formedness -in a rather liberal sense: an expression can be well-formed without -being meaningful, in other words, without being -semantically well-formed. -For instance, the sentence -

-
-    the number 2 is equilateral
-
-

-is syntactically well-formed but semantically ill-formed. -It is well-formed because it combines a well-formed -noun phrase ("the number 2") with a well-formed -verb phrase ("is equilateral") and satisfies the agreement -rule saying that the verb phrase is inflected in the -number of the noun phrase: -

-
-    fun PredVP : NP -> VP -> S ;
-    lin PredVP np v = {s = np.s ++ vp.s ! np.n} ;
-
-

-It is ill-formed because the predicate "is equilateral" -is only defined for triangles, not for numbers. -

-

-In a straightforward type-theoretical formalization of -mathematics, domains of mathematical objects -are defined as types. In GF, we could write -

-
-    cat Nat ;
-    cat Triangle ;
-    cat Prop ;
-
-

-for the types of natural numbers, triangles, and propositions, -respectively. -Noun phrases are typed as objects of basic types other than -Prop, whereas verb phrases are functions from basic types -to Prop. For instance, -

-
-    fun two : Nat ;
-    fun Even : Nat -> Prop ;
-    fun Equilateral : Triangle -> Prop ;
-
-

-With these judgements, and the linearization rules -

-
-    lin two = ss ["the number 2"] ;
-    lin Even x = ss (x.s ++ ["is even"]) ;
-    lin Equilateral x = ss (x.s ++ ["is equilateral"]) ;
-
-

-we can form the proposition Even two -

-
-    the number 2 is even
-
-

-but no proposition linearized to -

-
-    the number 2 is equilateral
-
-

-since Equilateral two is not a well-formed type-theoretical object. -It is not even accepted by the context-free parser. -

-

-When formalizing mathematics, e.g. in the purpose of -computer-assisted theorem proving, we are certainly interested -in semantic well-formedness: we want to be sure that a proposition makes -sense before we make the effort of proving it. The straightforward typing -of nouns and predicates shown above is the way in which this -is guaranteed in various proof systems based on type theory. -(Notice that it is still possible to form false propositions, -e.g. "the number 3 is even". -False and meaningless are different things.) -

-

-As shown by the linearization rules for two, Even, -etc, it is possible to use straightforward mathematical typings -as the abstract syntax of a grammar. However, this syntax is not very -expressive linguistically: for instance, there is no distinction between -adjectives and verbs. It is hard to give rules for structures like -adjectival modification ("even number") and conjunction of predicates -("even or odd"). -

-

-By using dependent types, it is simple to combine a linguistically -motivated system of categories with mathematically motivated -type restrictions. What we need is a category of domains of -individual objects, -

-
-    cat Dom
-
-

-and dependencies of other categories on this: -

-
-    cat 
-      S ;            -- sentence
-      V1 Dom ;       -- one-place verb with specific subject type
-      V2 Dom Dom ;   -- two-place verb with specific subject and object types
-      A1 Dom ;       -- one-place adjective
-      A2 Dom Dom ;   -- two-place adjective
-      NP Dom ;       -- noun phrase for an object of specific type
-      Conj ;         -- conjunction
-      Det ;          -- determiner
-
-

-Having thus parametrized categories on domains, we have to reformulate -the rules of predication, etc, accordingly. This is straightforward: -

-
-    fun
-      PredV1  : (A   : Dom) -> NP A -> V1 A -> S ;
-      ComplV2 : (A,B : Dom) -> V2 A B -> NP B -> V1 A ;
-      DetCN   :                Det -> (A : Dom) -> NP A ;
-      ModA1   :                (A : Dom) -> A1 A -> Dom ;
-      ConjS   :                Conj -> S -> S -> S ;
-      ConjV1  : (A   : Dom) -> Conj -> V1 A -> V1 A -> V1 A ;
-
-

-In linearization rules, -we use wildcards for the domain arguments, -because they don't affect linearization: -

-
-    lin
-      PredV1 _ np vp = ss (np.s ++ vp.s) ;
-      ComplV2 _ _ v2 np = ss (v2.s ++ np.s) ;
-      DetCN det cn = ss (det.s ++ cn.s) ;
-      ModA1 cn a1 = ss (a1.s ++ cn.s) ;
-      ConjS conj s1 s2 = ss (s1.s ++ conj.s ++ s2.s) ;
-      ConjV1 _ conj v1 v2 = ss (v1.s ++ conj.s ++ v2.s) ;
-
-

-The domain arguments thus get suppressed in linearization. -Parsing initially returns metavariables for them, -but type checking can usually restore them -by inference from those arguments that are not suppressed. -

-

-One traditional linguistic example of domain restrictions -(= selectional restrictions) is the contrast between the two sentences -

-
-    John plays golf
-    golf plays John
-
-

-To explain the contrast, we introduce the functions -

-
-    human : Dom ; 
-    game : Dom ;
-    play : V2 human game ;
-    John : NP human ;
-    Golf : NP game ;
-
-

-Both sentences still pass the context-free parser, -returning trees with lots of metavariables of type Dom: -

-
-    PredV1 ?0 John (ComplV2 ?1 ?2 play Golf)
-    PredV1 ?0 Golf (ComplV2 ?1 ?2 play John)
-
-

-But only the former sentence passes the type checker, which moreover -infers the domain arguments: -

-
-    PredV1 human John (ComplV2 human game play Golf)
-
-

-To try this out in GF, use pt = put_term with the tree transformation -that solves the metavariables by type checking: -

-
-    > p -tr "John plays golf" | pt -transform=solve
-    > p -tr "golf plays John" | pt -transform=solve
-
-

-In the latter case, no solutions are found. -

-

-A known problem with selectional restrictions is that they can be more -or less liberal. For instance, -

-
-    John loves Mary
-    John loves golf
-
-

-should both make sense, even though Mary and golf -are of different types. A natural solution in this case is to -formalize love as a polymorphic verb, which takes -a human as its first argument but an object of any type as its second -argument: -

-
-    fun love : (A : Dom) -> V2 human A ;
-    lin love _ = ss "loves" ;
-
-

-In other words, it is possible for a human to love anything. -

-

-A problem related to polymorphism is subtyping: what -is meaningful for a human is also meaningful for -a man and a woman, but not the other way round. -One solution to this is coercions: functions that -"lift" objects from subtypes to supertypes. -

- -

Case study: selectional restrictions and statistical language models TODO

- -

Proof objects

-

-Perhaps the most well-known idea in constructive type theory is -the Curry-Howard isomorphism, also known as the -propositions as types principle. Its earliest formulations -were attempts to give semantics to the logical systems of -propositional and predicate calculus. In this section, we will consider -a more elementary example, showing how the notion of proof is useful -outside mathematics, as well. -

-

-We first define the category of unary (also known as Peano-style) -natural numbers: -

-
-    cat Nat ; 
-    fun Zero : Nat ;
-    fun Succ : Nat -> Nat ;
-
-

-The successor function Succ generates an infinite -sequence of natural numbers, beginning from Zero. -

-

-We then define what it means for a number x to be less than -a number y. Our definition is based on two axioms: -

- - -

-The most straightforward way of expressing these axioms in type theory -is as typing judgements that introduce objects of a type Less //x y //: -

-
-    cat Less Nat Nat ; 
-    fun lessZ : (y : Nat) -> Less Zero (Succ y) ;
-    fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ;
-
-

-Objects formed by lessZ and lessS are -called proof objects: they establish the truth of certain -mathematical propositions. -For instance, the fact that 2 is less that -4 has the proof object -

-
-    lessS (Succ Zero) (Succ (Succ (Succ Zero)))
-          (lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero)))
-
-

-whose type is -

-
-    Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero))))
-
-

-which is the formalization of the proposition that 2 is less than 4. -

-

-GF grammars can be used to provide a semantic control of -well-formedness of expressions. We have already seen examples of this: -the grammar of well-formed addresses and the grammar with -selectional restrictions above. By introducing proof objects -we have now added a very powerful technique of expressing semantic conditions. -

-

-A simple example of the use of proof objects is the definition of -well-formed time spans: a time span is expected to be from an earlier to -a later time: -

-
-    from 3 to 8
-
-

-is thus well-formed, whereas -

-
-    from 8 to 3
-
-

-is not. The following rules for spans impose this condition -by using the Less predicate: -

-
-    cat Span ;
-    fun span : (m,n : Nat) -> Less m n -> Span ;
-
-

-A possible practical application of this idea is proof-carrying documents: -to be semantically well-formed, the abstract syntax of a document must contain a proof -of some property, although the proof is not shown in the concrete document. -Think, for instance, of small documents describing flight connections: -

-

-To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague. -

-

-The well-formedness of this text is partly expressible by dependent typing: -

-
-    cat
-      City ;
-      Flight City City ;
-    fun
-      Gothenburg, Frankfurt, Prague : City ;
-      LH3043 : Flight Gothenburg Frankfurt ;
-      OK0537 : Flight Frankfurt Prague ;
-
-

-This rules out texts saying take OK0537 from Gothenburg to Prague. However, there is a -further condition saying that it must be possible to change from LH3043 to OK0537 in Frankfurt. -This can be modelled as a proof object of a suitable type, which is required by the constructor -that connects flights. -

-
-    cat
-      IsPossible (x,y,z : City)(Flight x y)(Flight y z) ;
-    fun
-      Connect : (x,y,z : City) -> 
-        (u : Flight x y) -> (v : Flight y z) -> 
-          IsPossible x y z u v -> Flight x z ;
-
-

- -

Variable bindings

-

-Mathematical notation and programming languages have lots of -expressions that bind variables. For instance, -a universally quantifier proposition -

-
-    (All x)B(x)
-
-

-consists of the binding (All x) of the variable x, -and the body B(x), where the variable x can have -bound occurrences. -

-

-Variable bindings appear in informal mathematical language as well, for -instance, -

-
-    for all x, x is equal to x
-  
-    the function that for any numbers x and y returns the maximum of x+y
-    and x*y
-
-

-In type theory, variable-binding expression forms can be formalized -as functions that take functions as arguments. The universal -quantifier is defined -

-
-    fun All : (Ind -> Prop) -> Prop
-
-

-where Ind is the type of individuals and Prop, -the type of propositions. If we have, for instance, the equality predicate -

-
-    fun Eq : Ind -> Ind -> Prop
-
-

-we may form the tree -

-
-    All (\x -> Eq x x)
-
-

-which corresponds to the ordinary notation -

-
-    (All x)(x = x).
-
-

-

-An abstract syntax where trees have functions as arguments, as in -the two examples above, has turned out to be precisely the right -thing for the semantics and computer implementation of -variable-binding expressions. The advantage lies in the fact that -only one variable-binding expression form is needed, the lambda abstract -\x -> b, and all other bindings can be reduced to it. -This makes it easier to implement mathematical theories and reason -about them, since variable binding is tricky to implement and -to reason about. The idea of using functions as arguments of -syntactic constructors is known as higher-order abstract syntax. -

-

-The question now arises: how to define linearization rules -for variable-binding expressions? -Let us first consider universal quantification, -

-
-    fun All : (Ind -> Prop) -> Prop
-
-

-We write -

-
-    lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s}
-
-

-to obtain the form shown above. -This linearization rule brings in a new GF concept - the $0 -field of B containing a bound variable symbol. -The general rule is that, if an argument type of a function is -itself a function type A -> C, the linearization type of -this argument is the linearization type of C -together with a new field $0 : Str. In the linearization rule -for All, the argument B thus has the linearization -type -

-
-    {$0 : Str ; s : Str},
-
-

-since the linearization type of Prop is -

-
-    {s : Str}
-
-

-In other words, the linearization of a function -consists of a linearization of the body together with a -field for a linearization of the bound variable. -Those familiar with type theory or lambda calculus -should notice that GF requires trees to be in -eta-expanded form in order to be linearizable: -any function of type -

-
-    A -> B
-
-

-always has a syntax tree of the form -

-
-    \x -> b
-
-

-where b : B under the assumption x : A. -It is in this form that an expression can be analysed -as having a bound variable and a body. -

-

-Given the linearization rule -

-
-    lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"}
-
-

-the linearization of -

-
-    \x -> Eq x x
-
-

-is the record -

-
-    {$0 = "x", s = ["( x = x )"]}
-
-

-Thus we can compute the linearization of the formula, -

-
-    All (\x -> Eq x x)  --> {s = "[( All x ) ( x = x )]"}.
-
-

-

-How did we get the linearization of the variable x -into the string "x"? GF grammars have no rules for -this: it is just hard-wired in GF that variable symbols are -linearized into the same strings that represent them in -the print-out of the abstract syntax. -

-

-To be able to parse variable symbols, however, GF needs to know what -to look for (instead of e.g. trying to parse any -string as a variable). What strings are parsed as variable symbols -is defined in the lexical analysis part of GF parsing -

-
-    > p -cat=Prop -lexer=codevars "(All x)(x = x)"
-    All (\x -> Eq x x)
-
-

-(see more details on lexers below). If several variables are bound in the -same argument, the labels are $0, $1, $2, etc. -

- -

Semantic definitions

-

-We have seen that, -just like functional programming languages, GF has declarations -of functions, telling what the type of a function is. -But we have not yet shown how to compute -these functions: all we can do is provide them with arguments -and linearize the resulting terms. -Since our main interest is the well-formedness of expressions, -this has not yet bothered -us very much. As we will see, however, computation does play a role -even in the well-formedness of expressions when dependent types are -present. -

-

-GF has a form of judgement for semantic definitions, -recognized by the key word def. At its simplest, it is just -the definition of one constant, e.g. -

-
-    def one = Succ Zero ;
-
-

-We can also define a function with arguments, -

-
-    def Neg A = Impl A Abs ;
-
-

-which is still a special case of the most general notion of -definition, that of a group of pattern equations: -

-
-    def 
-      sum x Zero = x ;
-      sum x (Succ y) = Succ (Sum x y) ;
-
-

-To compute a term is, as in functional programming languages, -simply to follow a chain of reductions until no definition -can be applied. For instance, we compute -

-
-    Sum one one -->
-    Sum (Succ Zero) (Succ Zero) -->
-    Succ (sum (Succ Zero) Zero) -->
-    Succ (Succ Zero)
-
-

-Computation in GF is performed with the pt command and the -compute transformation, e.g. -

-
-    > p -tr "1 + 1" | pt -transform=compute -tr | l
-    sum one one
-    Succ (Succ Zero)
-    s(s(0))
-
-

-

-The def definitions of a grammar induce a notion of -definitional equality among trees: two trees are -definitionally equal if they compute into the same tree. -Thus, trivially, all trees in a chain of computation -(such as the one above) -are definitionally equal to each other. So are the trees -

-
-    sum Zero (Succ one)
-    Succ one
-    sum (sum Zero Zero) (sum (Succ Zero) one)
-
-

-and infinitely many other trees. -

-

-A fact that has to be emphasized about def definitions is that -they are not performed as a first step of linearization. -We say that linearization is intensional, which means that -the definitional equality of two trees does not imply that -they have the same linearizations. For instance, each of the seven terms -shown above has a different linearizations in arithmetic notation: -

-
-    1 + 1
-    s(0) + s(0)
-    s(s(0) + 0)
-    s(s(0))
-    0 + s(0)
-    s(1)
-    0 + 0 + s(0) + 1
-
-

-This notion of intensionality is -no more exotic than the intensionality of any pretty-printing -function of a programming language (function that shows -the expressions of the language as strings). It is vital for -pretty-printing to be intensional in this sense - if we want, -for instance, to trace a chain of computation by pretty-printing each -intermediate step, what we want to see is a sequence of different -expression, which are definitionally equal. -

-

-What is more exotic is that GF has two ways of referring to the -abstract syntax objects. In the concrete syntax, the reference is intensional. -In the abstract syntax, the reference is extensional, since -type checking is extensional. The reason is that, -in the type theory with dependent types, types may depend on terms. -Two types depending on terms that are definitionally equal are -equal types. For instance, -

-
-    Proof (Odd one)
-    Proof (Odd (Succ Zero))
-
-

-are equal types. Hence, any tree that type checks as a proof that -1 is odd also type checks as a proof that the successor of 0 is odd. -(Recall, in this connection, that the -arguments a category depends on never play any role -in the linearization of trees of that category, -nor in the definition of the linearization type.) -

-

-In addition to computation, definitions impose a -paraphrase relation on expressions: -two strings are paraphrases if they -are linearizations of trees that are -definitionally equal. -Paraphrases are sometimes interesting for -translation: the direct translation -of a string, which is the linearization of the same tree -in the targer language, may be inadequate because it is e.g. -unidiomatic or ambiguous. In such a case, -the translation algorithm may be made to consider -translation by a paraphrase. -

-

-To stress express the distinction between -constructors (=canonical functions) -and other functions, GF has a judgement form -data to tell that certain functions are canonical, e.g. -

-
-    data Nat = Succ | Zero ;
-
-

-Unlike in Haskell, but similarly to ALF (where constructor functions -are marked with a flag C), -new constructors can be added to -a type with new data judgements. The type signatures of constructors -are given separately, in ordinary fun judgements. -One can also write directly -

-
-    data Succ : Nat -> Nat ;
-
-

-which is equivalent to the two judgements -

-
-    fun Succ : Nat -> Nat ;
-    data Nat = Succ ;
-
-

- -

Case study: representing anaphoric reference TODO

- -

Transfer modules TODO

-

-Transfer means noncompositional tree-transforming operations. -The command apply_transfer = at is typically used in a pipe: -

-
-    > p "John walks and John runs" | apply_transfer aggregate | l
-    John walks and runs
-
-

-See the -sources of this example. -

-

-See the -transfer language documentation -for more information. -

- -

Practical issues TODO

- -

Lexers and unlexers

-

-Lexers and unlexers can be chosen from -a list of predefined ones, using the flags-lexer and `` -unlexer`` either -in the grammar file or on the GF command line. -

-

-Given by help -lexer, help -unlexer: -

-
-      The default is words.
-      -lexer=words         tokens are separated by spaces or newlines
-      -lexer=literals      like words, but GF integer and string literals recognized
-      -lexer=vars          like words, but "x","x_...","$...$" as vars, "?..." as meta
-      -lexer=chars         each character is a token
-      -lexer=code          use Haskell's lex
-      -lexer=codevars      like code, but treat unknown words as variables, ?? as meta
-      -lexer=text          with conventions on punctuation and capital letters
-      -lexer=codelit       like code, but treat unknown words as string literals
-      -lexer=textlit       like text, but treat unknown words as string literals
-      -lexer=codeC         use a C-like lexer
-      -lexer=ignore        like literals, but ignore unknown words
-      -lexer=subseqs       like ignore, but then try all subsequences from longest
-  
-      The default is unwords.
-      -unlexer=unwords     space-separated token list (like unwords)
-      -unlexer=text        format as text: punctuation, capitals, paragraph <p>
-      -unlexer=code        format as code (spacing, indentation)
-      -unlexer=textlit     like text, but remove string literal quotes
-      -unlexer=codelit     like code, but remove string literal quotes
-      -unlexer=concat      remove all spaces
-      -unlexer=bind        like identity, but bind at "&+"
-
-

- -

Efficiency of grammars

-

-Issues: -

- - - -

Speech input and output

-

-Thespeak_aloud = sa command sends a string to the speech -synthesizer -Flite. -It is typically used via a pipe: -

-
-   generate_random | linearize | speak_aloud
-
-

-The result is only satisfactory for English. -

-

-The speech_input = si command receives a string from a -speech recognizer that requires the installation of -ATK. -It is typically used to pipe input to a parser: -

-
-   speech_input -tr | parse
-
-

-The method words only for grammars of English. -

-

-Both Flite and ATK are freely available through the links -above, but they are not distributed together with GF. -

- -

Multilingual syntax editor

-

-The -Editor User Manual -describes the use of the editor, which works for any multilingual GF grammar. -

-

-Here is a snapshot of the editor: -

-

- -

-

-The grammars of the snapshot are from the -Letter grammar package. -

- -

Interactive Development Environment (IDE)

-

-Forthcoming. -

- -

Communicating with GF

-

-Other processes can communicate with the GF command interpreter, -and also with the GF syntax editor. Useful flags when invoking GF are -

- - - -

Embedded grammars in Haskell, Java, and Prolog

-

-GF grammars can be used as parts of programs written in the -following languages. The links give more documentation. -

- - - -

Alternative input and output grammar formats

-

-A summary is given in the following chart of GF grammar compiler phases: - -

- -

Larger case studies TODO

- -

Interfacing formal and natural languages

-

-Formal and Informal Software Specifications, -PhD Thesis by -Kristofer Johannisson, is an extensive example of this. -The system is based on a multilingual grammar relating the formal language OCL with -English and German. -

-

-A simpler example will be explained here. -

- -

A multimodal dialogue system

-

-See TALK project deliverables, TALK homepage -

- - - - -- cgit v1.2.3