diff options
Diffstat (limited to 'doc/tutorial/gf-tutorial2.html')
| -rw-r--r-- | doc/tutorial/gf-tutorial2.html | 4854 |
1 files changed, 0 insertions, 4854 deletions
diff --git a/doc/tutorial/gf-tutorial2.html b/doc/tutorial/gf-tutorial2.html deleted file mode 100644 index 702e2dfb9..000000000 --- a/doc/tutorial/gf-tutorial2.html +++ /dev/null @@ -1,4854 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> -<HTML> -<HEAD> -<META NAME="generator" CONTENT="http://txt2tags.sf.net"> -<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> -<TITLE>Grammatical Framework Tutorial</TITLE> -</HEAD><BODY BGCOLOR="white" TEXT="black"> -<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1> -<FONT SIZE="4"> -<I>Author: Aarne Ranta aarne (at) cs.chalmers.se</I><BR> -Last update: Sun Jul 8 18:36:23 2007 -</FONT></CENTER> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> - <UL> - <LI><A HREF="#toc1">Introduction</A> - <UL> - <LI><A HREF="#toc2">GF = Grammatical Framework</A> - <LI><A HREF="#toc3">What are GF grammars used for</A> - <LI><A HREF="#toc4">Who is this tutorial for</A> - <LI><A HREF="#toc5">The coverage of the tutorial</A> - <LI><A HREF="#toc6">Getting the GF program</A> - <LI><A HREF="#toc7">Running the GF program</A> - </UL> - <LI><A HREF="#toc8">The .cf grammar format</A> - <UL> - <LI><A HREF="#toc9">Importing grammars and parsing strings</A> - <LI><A HREF="#toc10">Generating trees and strings</A> - <LI><A HREF="#toc11">Visualizing trees</A> - <LI><A HREF="#toc12">Some random-generated sentences</A> - <LI><A HREF="#toc13">Systematic generation</A> - <LI><A HREF="#toc14">More on pipes; tracing</A> - <LI><A HREF="#toc15">Writing and reading files</A> - </UL> - <LI><A HREF="#toc16">The .gf grammar format</A> - <UL> - <LI><A HREF="#toc17">Abstract and concrete syntax</A> - <LI><A HREF="#toc18">Judgement forms</A> - <LI><A HREF="#toc19">Module types</A> - <LI><A HREF="#toc20">Basic types and function types</A> - <LI><A HREF="#toc21">Records and strings</A> - <LI><A HREF="#toc22">An abstract syntax example</A> - <LI><A HREF="#toc23">A concrete syntax example</A> - <LI><A HREF="#toc24">Modules and files</A> - </UL> - <LI><A HREF="#toc25">Multilingual grammars and translation</A> - <UL> - <LI><A HREF="#toc26">An Italian concrete syntax</A> - <LI><A HREF="#toc27">Using a multilingual grammar</A> - <LI><A HREF="#toc28">Translation session</A> - <LI><A HREF="#toc29">Translation quiz</A> - </UL> - <LI><A HREF="#toc30">Grammar architecture</A> - <UL> - <LI><A HREF="#toc31">Extending a grammar</A> - <LI><A HREF="#toc32">Multiple inheritance</A> - <LI><A HREF="#toc33">Visualizing module structure</A> - <LI><A HREF="#toc34">System commands</A> - </UL> - <LI><A HREF="#toc35">Resource modules</A> - <UL> - <LI><A HREF="#toc36">The golden rule of functional programming</A> - <LI><A HREF="#toc37">Operation definitions</A> - <LI><A HREF="#toc38">The ``resource`` module type</A> - <LI><A HREF="#toc39">Opening a resource</A> - <LI><A HREF="#toc40">Partial application</A> - <LI><A HREF="#toc41">Testing resource modules</A> - <LI><A HREF="#toc42">Division of labour</A> - </UL> - <LI><A HREF="#toc43">Morphology</A> - <UL> - <LI><A HREF="#toc44">Parameters and tables</A> - <LI><A HREF="#toc45">Inflection tables and paradigms</A> - <LI><A HREF="#toc46">Worst-case functions and data abstraction</A> - <LI><A HREF="#toc47">A system of paradigms using Prelude operations</A> - <LI><A HREF="#toc48">Pattern matching</A> - <LI><A HREF="#toc49">An intelligent noun paradigm using pattern matching</A> - <LI><A HREF="#toc50">Morphological resource modules</A> - </UL> - <LI><A HREF="#toc51">Using parameters in concrete syntax</A> - <UL> - <LI><A HREF="#toc52">Parametric vs. inherent features, agreement</A> - <LI><A HREF="#toc53">English concrete syntax with parameters</A> - <LI><A HREF="#toc54">Hierarchic parameter types</A> - <LI><A HREF="#toc55">Morphological analysis and morphology quiz</A> - <LI><A HREF="#toc56">Discontinuous constituents</A> - <LI><A HREF="#toc57">Free variation</A> - <LI><A HREF="#toc58">Overloading of operations</A> - </UL> - <LI><A HREF="#toc59">More constructs for concrete syntax</A> - <UL> - <LI><A HREF="#toc60">Local definitions</A> - <LI><A HREF="#toc61">Record extension and subtyping</A> - <LI><A HREF="#toc62">Tuples and product types</A> - <LI><A HREF="#toc63">Record and tuple patterns</A> - <LI><A HREF="#toc64">Regular expression patterns</A> - <LI><A HREF="#toc65">Prefix-dependent choices</A> - <LI><A HREF="#toc66">Predefined types</A> - </UL> - <LI><A HREF="#toc67">Using the resource grammar library</A> - <UL> - <LI><A HREF="#toc68">The coverage of the library</A> - <LI><A HREF="#toc69">The resource API</A> - <LI><A HREF="#toc70">Example: French</A> - <LI><A HREF="#toc71">Functor implementation of multilingual grammars</A> - <LI><A HREF="#toc72">Interfaces and instances</A> - <LI><A HREF="#toc73">Adding languages to a functor implementation</A> - <LI><A HREF="#toc74">Division of labour revisited</A> - <LI><A HREF="#toc75">Restricted inheritance</A> - <LI><A HREF="#toc76">Browsing the resource with GF commands</A> - </UL> - <LI><A HREF="#toc77">More concepts of abstract syntax</A> - <UL> - <LI><A HREF="#toc78">GF as a logical framework</A> - <LI><A HREF="#toc79">Dependent types</A> - <LI><A HREF="#toc80">Polymorphism</A> - <LI><A HREF="#toc81">Dependent types and spoken language models</A> - <UL> - <LI><A HREF="#toc82">Grammar-based language models</A> - <LI><A HREF="#toc83">Statistical language models</A> - </UL> - <LI><A HREF="#toc84">Digression: dependent types in concrete syntax</A> - <UL> - <LI><A HREF="#toc85">Variables in function types</A> - <LI><A HREF="#toc86">Polymorphism in concrete syntax</A> - </UL> - <LI><A HREF="#toc87">Proof objects</A> - <UL> - <LI><A HREF="#toc88">Proof-carrying documents</A> - </UL> - <LI><A HREF="#toc89">Restricted polymorphism</A> - <LI><A HREF="#toc90">Variable bindings</A> - <LI><A HREF="#toc91">Semantic definitions</A> - </UL> - <LI><A HREF="#toc92">Practical issues</A> - <UL> - <LI><A HREF="#toc93">Lexers and unlexers</A> - <LI><A HREF="#toc94">Speech input and output</A> - <LI><A HREF="#toc95">Multilingual syntax editor</A> - <LI><A HREF="#toc96">Communicating with GF</A> - </UL> - <LI><A HREF="#toc97">Embedded grammars in Haskell and Java</A> - <UL> - <LI><A HREF="#toc98">Writing GF grammars</A> - <UL> - <LI><A HREF="#toc99">Creating the first grammar</A> - <LI><A HREF="#toc100">Testing</A> - <LI><A HREF="#toc101">Adding a new language</A> - <LI><A HREF="#toc102">Extending the language</A> - </UL> - <LI><A HREF="#toc103">Building a user program</A> - <UL> - <LI><A HREF="#toc104">Producing a compiled grammar package</A> - <LI><A HREF="#toc105">Writing the Haskell application</A> - <LI><A HREF="#toc106">Compiling the Haskell grammar</A> - <LI><A HREF="#toc107">Building a distribution</A> - <LI><A HREF="#toc108">Using a Makefile</A> - </UL> - </UL> - <LI><A HREF="#toc109">Embedded grammars in Java</A> - <LI><A HREF="#toc110">Further reading</A> - </UL> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> -<P> -<IMG ALIGN="middle" SRC="../gf-logo.png" BORDER="0" ALT=""> -</P> -<A NAME="toc1"></A> -<H1>Introduction</H1> -<A NAME="toc2"></A> -<H2>GF = Grammatical Framework</H2> -<P> -The term GF is used for different things: -</P> -<UL> -<LI>a <B>program</B> used for working with grammars -<LI>a <B>programming language</B> in which grammars can be written -<LI>a <B>theory</B> about grammars and languages -</UL> - -<P> -This tutorial is primarily about the GF program and -the GF programming language. -It will guide you -</P> -<UL> -<LI>to use the GF program -<LI>to write GF grammars -<LI>to write programs in which GF grammars are used as components -</UL> - -<A NAME="toc3"></A> -<H2>What are GF grammars used for</H2> -<P> -A grammar is a definition of a language. -From this definition, different language processing components -can be derived: -</P> -<UL> -<LI><B>parsing</B>: to analyse the language -<LI><B>linearization</B>: to generate the language -<LI><B>translation</B>: to analyse one language and generate another -</UL> - -<P> -A GF grammar can be seen as a declarative program from which these -processing tasks can be automatically derived. In addition, many -other tasks are readily available for GF grammars: -</P> -<UL> -<LI><B>morphological analysis</B>: find out the possible inflection forms of words -<LI><B>morphological synthesis</B>: generate all inflection forms of words -<LI><B>random generation</B>: generate random expressions -<LI><B>corpus generation</B>: generate all expressions -<LI><B>treebank generation</B>: generate a list of trees with multiple linearizations -<LI><B>teaching quizzes</B>: train morphology and translation -<LI><B>multilingual authoring</B>: create a document in many languages simultaneously -<LI><B>speech input</B>: optimize a speech recognition system for your grammar -</UL> - -<P> -A typical GF application is based on a <B>multilingual grammar</B> involving -translation on a special domain. Existing applications of this idea include -</P> -<UL> -<LI><A HREF="http://www.cs.chalmers.se/~hallgren/Alfa/Tutorial/GFplugin.html">Alfa:</A>: - a natural-language interface to a proof editor - (languages: English, French, Swedish) -<LI><A HREF="http://www.key-project.org/">KeY</A>: - a multilingual authoring system for creating software specifications - (languages: OCL, English, German) -<LI><A HREF="http://www.talk-project.org">TALK</A>: - multilingual and multimodal dialogue systems - (languages: English, Finnish, French, German, Italian, Spanish, Swedish) -<LI><A HREF="http://webalt.math.helsinki.fi/content/index_eng.html">WebALT</A>: - a multilingual translator of mathematical exercises - (languages: Catalan, English, Finnish, French, Spanish, Swedish) -<LI><A HREF="http://www.cs.chalmers.se/~bringert/gf/translate/">Numeral translator</A>: - number words from 1 to 999,999 - (88 languages) -</UL> - -<P> -The specialization of a grammar to a domain makes it possible to -obtain much better translations than in an unlimited machine translation -system. This is due to the well-defined semantics of such domains. -Grammars having this character are called <B>application grammars</B>. -They are different from most grammars written by linguists just -because they are multilingual and domain-specific. -</P> -<P> -However, there is another kind of grammars, which we call <B>resource grammars</B>. -These are large, comprehensive grammars that can be used on any domain. -The GF Resource Grammar Library has resource grammars for 10 languages. -These grammars can be used as <B>libraries</B> to define application grammars. -In this way, it is possible to write a high-quality grammar without -knowing about linguistics: in general, to write an application grammar -by using the resource library just requires practical knowledge of -the target language. and all theoretical knowledge about its grammar -is given by the libraries. -</P> -<A NAME="toc4"></A> -<H2>Who is this tutorial for</H2> -<P> -This tutorial is mainly for programmers who want to learn to write -application grammars. It will go through GF's programming concepts -without entering too deep into linguistics. Thus it should -be accessible to anyone who has some previous programming experience. -</P> -<P> -A separate document has been written on how to write resource grammars: the -<A HREF="../../lib/resource-1.0/doc/Resource-HOWTO.html">Resource HOWTO</A>. -In this tutorial, we will just cover the programming concepts that are used for -solving linguistic problems in the resource grammars. -</P> -<P> -The easiest way to use GF is probably via the interactive syntax editor. -Its use does not require any knowledge of the GF formalism. There is -a separate -<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">Editor User Manual</A> -by Janna Khegai, covering the use of the editor. The editor is also a platform for many -kinds of GF applications, implementing the slogan -</P> -<P> -<I>write a document in a language you don't know, while seeing it in a language you know</I>. -</P> -<A NAME="toc5"></A> -<H2>The coverage of the tutorial</H2> -<P> -The tutorial gives a hands-on introduction to grammar writing. -We start by building a small grammar for the domain of food: -in this grammar, you can say things like -</P> -<PRE> - this Italian cheese is delicious -</PRE> -<P> -in English and Italian. -</P> -<P> -The first English grammar -<A HREF="food.cf"><CODE>food.cf</CODE></A> -is written in a context-free -notation (also known as BNF). The BNF format is often a good -starting point for GF grammar development, because it is -simple and widely used. However, the BNF format is not -good for multilingual grammars. While it is possible to -"translate" by just changing the words contained in a -BNF grammar to words of some other -language, proper translation usually involves more. -For instance, the order of words may have to be changed: -</P> -<PRE> - Italian cheese ===> formaggio italiano -</PRE> -<P> -The full GF grammar format is designed to support such -changes, by separating between the <B>abstract syntax</B> -(the logical structure) and the <B>concrete syntax</B> (the -sequence of words) of expressions. -</P> -<P> -There is more than words and word order that makes languages -different. Words can have different forms, and which forms -they have vary from language to language. For instance, -Italian adjectives usually have four forms where English -has just one: -</P> -<PRE> - delicious (wine, wines, pizza, pizzas) - vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose -</PRE> -<P> -The <B>morphology</B> of a language describes the -forms of its words. While the complete description of morphology -belongs to resource grammars, this tutorial will explain the -programming concepts involved in morphology. This will moreover -make it possible to grow the fragment covered by the food example. -The tutorial will in fact build a miniature resource grammar in order -to give an introduction to linguistically oriented grammar writing. -</P> -<P> -Thus it is by elaborating the initial <CODE>food.cf</CODE> example that -the tutorial makes a guided tour through all concepts of GF. -While the constructs of the GF language are the main focus, -also the commands of the GF system are introduced as they -are needed. -</P> -<P> -To learn how to write GF grammars is not the only goal of -this tutorial. We will also explain the most important -commands of the GF system. With these commands, -simple applications of grammars, such as translation and -quiz systems, can be built simply by writing scripts for the -system. -</P> -<P> -More complicated applications, such as natural-language -interfaces and dialogue systems, moreover require programming in -some general-purpose language. Thus we will briefly explain how -GF grammars are used as components of Haskell programs. -Chapters on using them in Java and Javascript programs are -forthcoming; a comprehensive manual on GF embedded in Java, by Björn Bringert, is -available in -<A HREF="http://www.cs.chalmers.se/~bringert/gf/gf-java.html"><CODE>http://www.cs.chalmers.se/~bringert/gf/gf-java.html</CODE></A>. -</P> -<A NAME="toc6"></A> -<H2>Getting the GF program</H2> -<P> -The GF program is open-source free software, which you can download via the -GF Homepage: -</P> -<P> -<A HREF="http://www.cs.chalmers.se/~aarne/GF"><CODE>http://www.cs.chalmers.se/~aarne/GF</CODE></A> -</P> -<P> -There you can download -</P> -<UL> -<LI>binaries for Linux, Mac OS X, and Windows -<LI>source code and documentation -<LI>grammar libraries and examples -</UL> - -<P> -If you want to compile GF from source, you need a Haskell compiler. -To compile the interactive editor, you also need a Java compilers. -But normally you don't have to compile, and you definitely -don't need to know Haskell or Java to use GF. -</P> -<P> -We are assuming the availability of a Unix shell. Linux and Mac OS X users -have it automatically, the latter under the name "terminal". -Windows users are recommended to install Cywgin, the free Unix shell for Windows. -</P> -<A NAME="toc7"></A> -<H2>Running the GF program</H2> -<P> -To start the GF program, assuming you have installed it, just type -</P> -<PRE> - % gf -</PRE> -<P> -in the shell. -You will see GF's welcome message and the prompt <CODE>></CODE>. -The command -</P> -<PRE> - > help -</PRE> -<P> -will give you a list of available commands. -</P> -<P> -As a common convention in this Tutorial, we will use -</P> -<UL> -<LI><CODE>%</CODE> as a prompt that marks system commands -<LI><CODE>></CODE> as a prompt that marks GF commands -</UL> - -<P> -Thus you should not type these prompts, but only the lines that -follow them. -</P> -<A NAME="toc8"></A> -<H1>The .cf grammar format</H1> -<P> -Now you are ready to try out your first grammar. -We start with one that is not written in the GF language, but -in the much more common BNF notation (Backus Naur Form). The GF -program understands a variant of this notation and translates it -internally to GF's own representation. -</P> -<P> -To get started, type (or copy) the following lines into a file named -<CODE>food.cf</CODE>: -</P> -<PRE> - Is. S ::= Item "is" Quality ; - That. Item ::= "that" Kind ; - This. Item ::= "this" Kind ; - QKind. Kind ::= Quality Kind ; - Cheese. Kind ::= "cheese" ; - Fish. Kind ::= "fish" ; - Wine. Kind ::= "wine" ; - Italian. Quality ::= "Italian" ; - Boring. Quality ::= "boring" ; - Delicious. Quality ::= "delicious" ; - Expensive. Quality ::= "expensive" ; - Fresh. Quality ::= "fresh" ; - Very. Quality ::= "very" Quality ; - Warm. Quality ::= "warm" ; -</PRE> -<P> -For those who know ordinary BNF, the -notation we use includes one extra element: a <B>label</B> appearing -as the first element of each rule and terminated by a full stop. -</P> -<P> -The grammar we wrote defines a set of phrases usable for speaking about food. -It builds <B>sentences</B> (<CODE>S</CODE>) by assigning <CODE>Quality</CODE>s to -<CODE>Item</CODE>s. <CODE>Item</CODE>s are build from <CODE>Kind</CODE>s by prepending the -word "this" or "that". <CODE>Kind</CODE>s are either <B>atomic</B>, such as -"cheese" and "wine", or formed by prepending a <CODE>Quality</CODE> to a -<CODE>Kind</CODE>. A <CODE>Quality</CODE> is either atomic, such as "Italian" and "boring", -or built by another <CODE>Quality</CODE> by prepending "very". Those familiar with -the context-free grammar notation will notice that, for instance, the -following sentence can be built using this grammar: -</P> -<PRE> - this delicious Italian wine is very very expensive -</PRE> -<P></P> -<A NAME="toc9"></A> -<H2>Importing grammars and parsing strings</H2> -<P> -The first GF command needed when using a grammar is to <B>import</B> it. -The command has a long name, <CODE>import</CODE>, and a short name, <CODE>i</CODE>. -You can type either -</P> -<PRE> - > import food.cf -</PRE> -<P> -or -</P> -<PRE> - > i food.cf -</PRE> -<P> -to get the same effect. -The effect is that the GF program <B>compiles</B> your grammar into an internal -representation, and shows a new prompt when it is ready. It will also show how much -CPU time is consumed: -</P> -<PRE> - > i food.cf - - parsing cf food.cf 12 msec - 16 msec - > -</PRE> -<P> -You can now use GF for <B>parsing</B>: -</P> -<PRE> - > parse "this cheese is delicious" - Is (This Cheese) Delicious - - > p "that wine is very very Italian" - Is (That Wine) (Very (Very Italian)) -</PRE> -<P> -The <CODE>parse</CODE> (= <CODE>p</CODE>) command takes a <B>string</B> -(in double quotes) and returns an <B>abstract syntax tree</B> - the thing -beginning with <CODE>Is</CODE>. Trees are built from the rule labels given in the -grammar, and record the ways in which the rules are used to produce the -strings. A tree is, in general, something easier than a string -for a machine to understand and to process further. -</P> -<P> -Strings that return a tree when parsed do so in virtue of the grammar -you imported. Try parsing something else, and you fail -</P> -<PRE> - > p "hello world" - Unknown words: hello world -</PRE> -<P></P> -<P> -<B>Exercise</B>. Extend the grammar <CODE>food.cf</CODE> by ten new food kinds and -qualities, and run the parser with new kinds of examples. -</P> -<P> -<B>Exercise</B>. Add a rule that enables questions of the form -<I>is this cheese Italian</I>. -</P> -<P> -<B>Exercise</B>. Add the rule -</P> -<PRE> - IsVery. S ::= Item "is" "very" Quality ; -</PRE> -<P> -and see what happens when parsing <CODE>this wine is very very Italian</CODE>. -You have just made the grammar <B>ambiguous</B>: it now assigns several -trees to some strings. -</P> -<P> -<B>Exercise</B>. Modify the grammar so that at most one <CODE>Quality</CODE> may -attach to a given <CODE>Kind</CODE>. Thus <I>boring Italian fish</I> will no longer -be recognized. -</P> -<A NAME="toc10"></A> -<H2>Generating trees and strings</H2> -<P> -You can also use GF for <B>linearizing</B> -(<CODE>linearize = l</CODE>). This is the inverse of -parsing, taking trees into strings: -</P> -<PRE> - > linearize Is (That Wine) Warm - that wine is warm -</PRE> -<P> -What is the use of this? Typically not that you type in a tree at -the GF prompt. The utility of linearization comes from the fact that -you can obtain a tree from somewhere else. One way to do so is -<B>random generation</B> (<CODE>generate_random = gr</CODE>): -</P> -<PRE> - > generate_random - Is (This (QKind Italian Fish)) Fresh -</PRE> -<P> -Now you can copy the tree and paste it to the <CODE>linearize command</CODE>. -Or, more conveniently, feed random generation into linearization by using -a <B>pipe</B>. -</P> -<PRE> - > gr | l - this Italian fish is fresh -</PRE> -<P> -Pipes in GF work much the same way as Unix pipes: they feed the output -of one command into another command as its input. -</P> -<A NAME="toc11"></A> -<H2>Visualizing trees</H2> -<P> -The gibberish code with parentheses returned by the parser does not -look like trees. Why is it called so? From the abstract mathematical -point of view, trees are a data structure that -represents <B>nesting</B>: trees are branching entities, and the branches -are themselves trees. Parentheses give a linear representation of trees, -useful for the computer. But the human eye may prefer to see a visualization; -for this purpose, GF provides the command <CODE>visualizre_tree = vt</CODE>, to which -parsing (and any other tree-producing command) can be piped: -</P> -<PRE> - > parse "this delicious cheese is very Italian" | vt -</PRE> -<P></P> -<P> -<IMG ALIGN="middle" SRC="Tree2.png" BORDER="0" ALT=""> -</P> -<P> -This command uses the programs Graphviz and Ghostview, which you -might not have, but which are freely available on the web. -</P> -<A NAME="toc12"></A> -<H2>Some random-generated sentences</H2> -<P> -Random generation is a good way to test a grammar; it can also -be fun. So you may want to -generate ten strings with one and the same command: -</P> -<PRE> - > gr -number=10 | l - that wine is boring - that fresh cheese is fresh - that cheese is very boring - this cheese is Italian - that expensive cheese is expensive - that fish is fresh - that wine is very Italian - this wine is Italian - this cheese is boring - this fish is boring -</PRE> -<P></P> -<A NAME="toc13"></A> -<H2>Systematic generation</H2> -<P> -To generate <I>all</I> sentence that a grammar -can generate, use the command <CODE>generate_trees = gt</CODE>. -</P> -<PRE> - > generate_trees | l - that cheese is very Italian - that cheese is very boring - that cheese is very delicious - that cheese is very expensive - that cheese is very fresh - ... - this wine is expensive - this wine is fresh - this wine is warm - -</PRE> -<P> -You get quite a few trees but not all of them: only up to a given -<B>depth</B> of trees. To see how you can get more, use the -<CODE>help = h</CODE> command, -</P> -<PRE> - > help gt -</PRE> -<P></P> -<P> -<B>Exercise</B>. If the command <CODE>gt</CODE> generated all -trees in your grammar, it would never terminate. Why? -</P> -<P> -<B>Exercise</B>. Measure how many trees the grammar gives with depths 4 and 5, -respectively. You use the Unix <B>word count</B> command <CODE>wc</CODE> to count lines. -<B>Hint</B>. You can pipe the output of a GF command into a Unix command by -using the escape <CODE>?</CODE>, as follows: -</P> -<PRE> - > generate_trees | ? wc -</PRE> -<P></P> -<A NAME="toc14"></A> -<H2>More on pipes; tracing</H2> -<P> -A pipe of GF commands can have any length, but the "output type" -(either string or tree) of one command must always match the "input type" -of the next command. -</P> -<P> -The intermediate results in a pipe can be observed by putting the -<B>tracing</B> flag <CODE>-tr</CODE> to each command whose output you -want to see: -</P> -<PRE> - > gr -tr | l -tr | p - - Is (This Cheese) Boring - this cheese is boring - Is (This Cheese) Boring -</PRE> -<P> -This facility is good for test purposes: for instance, you -may want to see if a grammar is <B>ambiguous</B>, i.e. -contains strings that can be parsed in more than one way. -</P> -<P> -<B>Exercise</B>. Extend the grammar <CODE>food.cf</CODE> so that it produces ambiguous strings, -and try out the ambiguity test. -</P> -<A NAME="toc15"></A> -<H2>Writing and reading files</H2> -<P> -To save the outputs of GF commands into a file, you can -pipe it to the <CODE>write_file = wf</CODE> command, -</P> -<PRE> - > gr -number=10 | l | write_file exx.tmp -</PRE> -<P> -You can read the file back to GF with the -<CODE>read_file = rf</CODE> command, -</P> -<PRE> - > read_file exx.tmp | p -lines -</PRE> -<P> -Notice the flag <CODE>-lines</CODE> given to the parsing -command. This flag tells GF to parse each line of -the file separately. Without the flag, the grammar could -not recognize the string in the file, because it is not -a sentence but a sequence of ten sentences. -</P> -<A NAME="toc16"></A> -<H1>The .gf grammar format</H1> -<P> -To see GF's internal representation of a grammar -that you have imported, you can give the command -<CODE>print_grammar = pg</CODE>, -</P> -<PRE> - > print_grammar -</PRE> -<P> -The output is quite unreadable at this stage, and you may feel happy that -you did not need to write the grammar in that notation, but that the -GF grammar compiler produced it. -</P> -<P> -However, we will now start the demonstration -how GF's own notation gives you -much more expressive power than the <CODE>.cf</CODE> -format. We will introduce the <CODE>.gf</CODE> format by presenting -another way of defining the same grammar as in -<CODE>food.cf</CODE>. -Then we will show how the full GF grammar format enables you -to do things that are not possible in the context-free format. -</P> -<A NAME="toc17"></A> -<H2>Abstract and concrete syntax</H2> -<P> -A GF grammar consists of two main parts: -</P> -<UL> -<LI><B>abstract syntax</B>, defining what syntax trees there are -<LI><B>concrete syntax</B>, defining how trees are linearized into strings -</UL> - -<P> -The context-free format fuses these two things together, but it is always -possible to take them apart. For instance, the sentence formation rule -</P> -<PRE> - Is. S ::= Item "is" Quality ; -</PRE> -<P> -is interpreted as the following pair of GF rules: -</P> -<PRE> - fun Is : Item -> Quality -> S ; - lin Is item quality = {s = item.s ++ "is" ++ quality.s} ; -</PRE> -<P> -The former rule, with the keyword <CODE>fun</CODE>, belongs to the abstract syntax. -It defines the <B>function</B> -<CODE>Is</CODE> which constructs syntax trees of form -(<CODE>Is</CODE> <I>item</I> <I>quality</I>). -</P> -<P> -The latter rule, with the keyword <CODE>lin</CODE>, belongs to the concrete syntax. -It defines the <B>linearization function</B> for -syntax trees of form (<CODE>Is</CODE> <I>item</I> <I>quality</I>). -</P> -<A NAME="toc18"></A> -<H2>Judgement forms</H2> -<P> -Rules in a GF grammar are called <B>judgements</B>, and the keywords -<CODE>fun</CODE> and <CODE>lin</CODE> are used for distinguishing between two -<B>judgement forms</B>. Here is a summary of the most important -judgement forms: -</P> - <UL> - <LI>abstract syntax - <P></P> - </UL> - -<TABLE ALIGN="center" CELLPADDING="4" BORDER="1"> -<TR> -<TD>form</TD> -<TD>reading</TD> -</TR> -<TR> -<TD><CODE>cat</CODE> C</TD> -<TD>C is a category</TD> -</TR> -<TR> -<TD><CODE>fun</CODE> f <CODE>:</CODE> A</TD> -<TD>f is a function of type A</TD> -</TR> -</TABLE> - -<P></P> - <UL> - <LI>concrete syntax - <P></P> - </UL> - -<TABLE ALIGN="center" CELLPADDING="4" BORDER="1"> -<TR> -<TD>form</TD> -<TD>reading</TD> -</TR> -<TR> -<TD><CODE>lincat</CODE> C <CODE>=</CODE> T</TD> -<TD>category C has linearization type T</TD> -</TR> -<TR> -<TD><CODE>lin</CODE> f <CODE>=</CODE> t</TD> -<TD>function f has linearization t</TD> -</TR> -</TABLE> - -<P></P> -<P> -We return to the precise meanings of these judgement forms later. -First we will look at how judgements are grouped into modules, and -show how the food grammar is -expressed by using modules and judgements. -</P> -<A NAME="toc19"></A> -<H2>Module types</H2> -<P> -A GF grammar consists of <B>modules</B>, -into which judgements are grouped. The most important -module forms are -</P> - <UL> - <LI><CODE>abstract</CODE> A <CODE>=</CODE> M, abstract syntax A with judgements in - the module body M. - <LI><CODE>concrete</CODE> C <CODE>of</CODE> A <CODE>=</CODE> M, concrete syntax C of the - abstract syntax A, with judgements in the module body M. - </UL> - -<A NAME="toc20"></A> -<H2>Basic types and function types</H2> -<P> -The nonterminals of a context-free grammar, i.e. categories, -are called <B>basic types</B> in the type system of GF. In addition -to them, there are <B>function types</B> such as -</P> -<PRE> - Item -> Quality -> S -</PRE> -<P> -This type is read "a function from iterms and qualities to sentences". -The last type in the arrow-separated sequence is the <B>value type</B> -of the function type, the earlier types are its <B>argument types</B>. -</P> -<A NAME="toc21"></A> -<H2>Records and strings</H2> -<P> -The linearization type of a category is a <B>record type</B>, with -zero of more <B>fields</B> of different types. The simplest record -type used for linearization in GF is -</P> -<PRE> - {s : Str} -</PRE> -<P> -which has one field, with <B>label</B> <CODE>s</CODE> and type <CODE>Str</CODE>. -</P> -<P> -Examples of records of this type are -</P> -<PRE> - {s = "foo"} - {s = "hello" ++ "world"} -</PRE> -<P></P> -<P> -Whenever a record <CODE>r</CODE> of type <CODE>{s : Str}</CODE> is given, -<CODE>r.s</CODE> is an object of type <CODE>Str</CODE>. This is -a special case of the <B>projection</B> rule, allowing the extraction -of fields from a record: -</P> -<UL> -<LI>if <I>r</I> : <CODE>{</CODE> ... <I>p</I> : <I>T</I> ... <CODE>}</CODE> then <I>r.p</I> : <I>T</I> -</UL> - -<P> -The type <CODE>Str</CODE> is really the type of <B>token lists</B>, but -most of the time one can conveniently think of it as the type of strings, -denoted by string literals in double quotes. -</P> -<P> -Notice that -</P> -<PRE> - "hello world" -</PRE> -<P> -is not recommended as an expression of type <CODE>Str</CODE>. It denotes -a token with a space in it, and will usually -not work with the lexical analysis that precedes parsing. A shorthand -exemplified by -</P> -<PRE> - ["hello world and people"] === "hello" ++ "world" ++ "and" ++ "people" -</PRE> -<P> -can be used for lists of tokens. The expression -</P> -<PRE> - [] -</PRE> -<P> -denotes the empty token list. -</P> -<A NAME="toc22"></A> -<H2>An abstract syntax example</H2> -<P> -To express the abstract syntax of <CODE>food.cf</CODE> in -a file <CODE>Food.gf</CODE>, we write two kinds of judgements: -</P> -<UL> -<LI>Each category is introduced by a <CODE>cat</CODE> judgement. -<LI>Each rule label is introduced by a <CODE>fun</CODE> judgement, - with the type formed from the nonterminals of the rule. -</UL> - -<PRE> - abstract Food = { - - cat - S ; Item ; Kind ; Quality ; - - fun - Is : Item -> Quality -> S ; - This, That : Kind -> Item ; - QKind : Quality -> Kind -> Kind ; - Wine, Cheese, Fish : Kind ; - Very : Quality -> Quality ; - Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; - } -</PRE> -<P> -Notice the use of shorthands permitting the sharing of -the keyword in subsequent judgements, -</P> -<PRE> - cat S ; Item ; === cat S ; cat Item ; -</PRE> -<P> -and of the type in subsequent <CODE>fun</CODE> judgements, -</P> -<PRE> - fun Wine, Fish : Kind ; === - fun Wine : Kind ; Fish : Kind ; === - fun Wine : Kind ; fun Fish : Kind ; -</PRE> -<P> -The order of judgements in a module is free. -</P> -<P> -<B>Exercise</B>. Extend the abstract syntax <CODE>Food</CODE> with ten new -kinds and qualities, and with questions of the form -<I>is this wine Italian</I>. -</P> -<A NAME="toc23"></A> -<H2>A concrete syntax example</H2> -<P> -Each category introduced in <CODE>Food.gf</CODE> is -given a <CODE>lincat</CODE> rule, and each -function is given a <CODE>lin</CODE> rule. Similar shorthands -apply as in <CODE>abstract</CODE> modules. -</P> -<PRE> - concrete FoodEng of Food = { - - lincat - S, Item, Kind, Quality = {s : Str} ; - - lin - Is item quality = {s = item.s ++ "is" ++ quality.s} ; - This kind = {s = "this" ++ kind.s} ; - That kind = {s = "that" ++ kind.s} ; - QKind quality kind = {s = quality.s ++ kind.s} ; - Wine = {s = "wine"} ; - Cheese = {s = "cheese"} ; - Fish = {s = "fish"} ; - Very quality = {s = "very" ++ quality.s} ; - Fresh = {s = "fresh"} ; - Warm = {s = "warm"} ; - Italian = {s = "Italian"} ; - Expensive = {s = "expensive"} ; - Delicious = {s = "delicious"} ; - Boring = {s = "boring"} ; - } -</PRE> -<P></P> -<P> -<B>Exercise</B>. Extend the concrete syntax <CODE>FoodEng</CODE> so that it -matches the abstract syntax defined in the exercise of the previous -section. What happens if the concrete syntax lacks some of the -new functions? -</P> -<A NAME="toc24"></A> -<H2>Modules and files</H2> -<P> -GF uses suffixes to recognize different file formats. The most -important ones are: -</P> -<UL> -<LI>Source files: Module name + <CODE>.gf</CODE> = file name -<LI>Target files: each module is compiled into a <CODE>.gfc</CODE> file. -</UL> - -<P> -Import <CODE>FoodEng.gf</CODE> and see what happens: -</P> -<PRE> - > i FoodEng.gf - - compiling Food.gf... wrote file Food.gfc 16 msec - - compiling FoodEng.gf... wrote file FoodEng.gfc 20 msec -</PRE> -<P> -The GF program does not only read the file -<CODE>FoodEng.gf</CODE>, but also all other files that it -depends on - in this case, <CODE>Food.gf</CODE>. -</P> -<P> -For each file that is compiled, a <CODE>.gfc</CODE> file -is generated. The GFC format (="GF Canonical") is the -"machine code" of GF, which is faster to process than -GF source files. When reading a module, GF decides whether -to use an existing <CODE>.gfc</CODE> file or to generate -a new one, by looking at modification times. -</P> -<P> -<B>Exercise</B>. What happens when you import <CODE>FoodEng.gf</CODE> for -a second time? Try this in different situations: -</P> -<UL> -<LI>Right after importing it the first time (the modules are kept in - the memory of GF and need no reloading). -<LI>After issuing the command <CODE>empty</CODE> (<CODE>e</CODE>), which clears the memory - of GF. -<LI>After making a small change in <CODE>FoodEng.gf</CODE>, be it only an added space. -<LI>After making a change in <CODE>Food.gf</CODE>. -</UL> - -<A NAME="toc25"></A> -<H1>Multilingual grammars and translation</H1> -<P> -The main advantage of separating abstract from concrete syntax is that -one abstract syntax can be equipped with many concrete syntaxes. -A system with this property is called a <B>multilingual grammar</B>. -</P> -<P> -Multilingual grammars can be used for applications such as -translation. Let us build an Italian concrete syntax for -<CODE>Food</CODE> and then test the resulting -multilingual grammar. -</P> -<A NAME="toc26"></A> -<H2>An Italian concrete syntax</H2> -<PRE> - concrete FoodIta of Food = { - - lincat - S, Item, Kind, Quality = {s : Str} ; - - lin - Is item quality = {s = item.s ++ "č" ++ quality.s} ; - This kind = {s = "questo" ++ kind.s} ; - That kind = {s = "quello" ++ kind.s} ; - QKind quality kind = {s = kind.s ++ quality.s} ; - Wine = {s = "vino"} ; - Cheese = {s = "formaggio"} ; - Fish = {s = "pesce"} ; - Very quality = {s = "molto" ++ quality.s} ; - Fresh = {s = "fresco"} ; - Warm = {s = "caldo"} ; - Italian = {s = "italiano"} ; - Expensive = {s = "caro"} ; - Delicious = {s = "delizioso"} ; - Boring = {s = "noioso"} ; - - } -</PRE> -<P></P> -<P> -<B>Exercise</B>. Write a concrete syntax of <CODE>Food</CODE> for some other language. -You will probably end up with grammatically incorrect output - but don't -worry about this yet. -</P> -<P> -<B>Exercise</B>. If you have written <CODE>Food</CODE> for German, Swedish, or some -other language, test with random or exhaustive generation what constructs -come out incorrect, and prepare a list of those ones that cannot be helped -with the currently available fragment of GF. -</P> -<A NAME="toc27"></A> -<H2>Using a multilingual grammar</H2> -<P> -Import the two grammars in the same GF session. -</P> -<PRE> - > i FoodEng.gf - > i FoodIta.gf -</PRE> -<P> -Try generation now: -</P> -<PRE> - > gr | l - quello formaggio molto noioso č italiano - - > gr | l -lang=FoodEng - this fish is warm -</PRE> -<P> -Translate by using a pipe: -</P> -<PRE> - > p -lang=FoodEng "this cheese is very delicious" | l -lang=FoodIta - questo formaggio č molto delizioso -</PRE> -<P> -Generate a <B>multilingual treebank</B>, i.e. a set of trees with their -translations in different languages: -</P> -<PRE> - > gr -number=2 | tree_bank - - Is (That Cheese) (Very Boring) - quello formaggio č molto noioso - that cheese is very boring - - Is (That Cheese) Fresh - quello formaggio č fresco - that cheese is fresh -</PRE> -<P> -The <CODE>lang</CODE> flag tells GF which concrete syntax to use in parsing and -linearization. By default, the flag is set to the last-imported grammar. -To see what grammars are in scope and which is the main one, use the command -<CODE>print_options = po</CODE>: -</P> -<PRE> - > print_options - main abstract : Food - main concrete : FoodIta - actual concretes : FoodIta FoodEng -</PRE> -<P> -You can change the main grammar by the command <CODE>change_main = cm</CODE>: -</P> -<PRE> - > change_main FoodEng - main abstract : Food - main concrete : FoodEng - actual concretes : FoodIta FoodEng -</PRE> -<P></P> -<A NAME="toc28"></A> -<H2>Translation session</H2> -<P> -If translation is what you want to do with a set of grammars, a convenient -way to do it is to open a <CODE>translation_session = ts</CODE>. In this session, -you can translate between all the languages that are in scope. -A dot <CODE>.</CODE> terminates the translation session. -</P> -<PRE> - > ts - - trans> that very warm cheese is boring - quello formaggio molto caldo č noioso - that very warm cheese is boring - - trans> questo vino molto italiano č molto delizioso - questo vino molto italiano č molto delizioso - this very Italian wine is very delicious - - trans> . - > -</PRE> -<P></P> -<A NAME="toc29"></A> -<H2>Translation quiz</H2> -<P> -This is a simple language exercise that can be automatically -generated from a multilingual grammar. The system generates a set of -random sentences, displays them in one language, and checks the user's -answer given in another language. The command <CODE>translation_quiz = tq</CODE> -makes this in a subshell of GF. -</P> -<PRE> - > translation_quiz FoodEng FoodIta - - Welcome to GF Translation Quiz. - The quiz is over when you have done at least 10 examples - with at least 75 % success. - You can interrupt the quiz by entering a line consisting of a dot ('.'). - - this fish is warm - questo pesce č caldo - > Yes. - Score 1/1 - - this cheese is Italian - questo formaggio č noioso - > No, not questo formaggio č noioso, but - questo formaggio č italiano - - Score 1/2 - this fish is expensive -</PRE> -<P> -You can also generate a list of translation exercises and save it in a -file for later use, by the command <CODE>translation_list = tl</CODE> -</P> -<PRE> - > translation_list -number=25 FoodEng FoodIta | write_file transl.txt -</PRE> -<P> -The <CODE>number</CODE> flag gives the number of sentences generated. -</P> -<A NAME="toc30"></A> -<H1>Grammar architecture</H1> -<A NAME="toc31"></A> -<H2>Extending a grammar</H2> -<P> -The module system of GF makes it possible to <B>extend</B> a -grammar in different ways. The syntax of extension is -shown by the following example. We extend <CODE>Food</CODE> by -adding a category of questions and two new functions. -</P> -<PRE> - abstract Morefood = Food ** { - cat - Question ; - fun - QIs : Item -> Quality -> Question ; - Pizza : Kind ; - - } -</PRE> -<P> -Parallel to the abstract syntax, extensions can -be built for concrete syntaxes: -</P> -<PRE> - concrete MorefoodEng of Morefood = FoodEng ** { - lincat - Question = {s : Str} ; - lin - QIs item quality = {s = "is" ++ item.s ++ quality.s} ; - Pizza = {s = "pizza"} ; - } -</PRE> -<P> -The effect of extension is that all of the contents of the extended -and extending module are put together. We also say that the new -module <B>inherits</B> the contents of the old module. -</P> -<A NAME="toc32"></A> -<H2>Multiple inheritance</H2> -<P> -Specialized vocabularies can be represented as small grammars that -only do "one thing" each. For instance, the following are grammars -for fruit and mushrooms -</P> -<PRE> - abstract Fruit = { - cat Fruit ; - fun Apple, Peach : Fruit ; - } - - abstract Mushroom = { - cat Mushroom ; - fun Cep, Agaric : Mushroom ; - } -</PRE> -<P> -They can afterwards be combined into bigger grammars by using -<B>multiple inheritance</B>, i.e. extension of several grammars at the -same time: -</P> -<PRE> - abstract Foodmarket = Food, Fruit, Mushroom ** { - fun - FruitKind : Fruit -> Kind ; - MushroomKind : Mushroom -> Kind ; - } -</PRE> -<P> -At this point, you would perhaps like to go back to -<CODE>Food</CODE> and take apart <CODE>Wine</CODE> to build a special -<CODE>Drink</CODE> module. -</P> -<A NAME="toc33"></A> -<H2>Visualizing module structure</H2> -<P> -When you have created all the abstract syntaxes and -one set of concrete syntaxes needed for <CODE>Foodmarket</CODE>, -your grammar consists of eight GF modules. To see how their -dependences look like, you can use the command -<CODE>visualize_graph = vg</CODE>, -</P> -<PRE> - > visualize_graph -</PRE> -<P> -and the graph will pop up in a separate window. -</P> -<P> -The graph uses -</P> -<UL> -<LI>oval boxes for abstract modules -<LI>square boxes for concrete modules -<LI>black-headed arrows for inheritance -<LI>white-headed arrows for the concrete-of-abstract relation -</UL> - -<P> -<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT=""> -</P> -<P> -Just as the <CODE>visualize_tree = vt</CODE> command, the open source tools -Ghostview and Graphviz are needed. -</P> -<A NAME="toc34"></A> -<H2>System commands</H2> -<P> -To document your grammar, you may want to print the -graph into a file, e.g. a <CODE>.png</CODE> file that -can be included in an HTML document. You can do this -by first printing the graph into a file <CODE>.dot</CODE> and then -processing this file with the <CODE>dot</CODE> program (from the Graphviz package). -</P> -<PRE> - > pm -printer=graph | wf Foodmarket.dot - > ! dot -Tpng Foodmarket.dot > Foodmarket.png -</PRE> -<P> -The latter command is a Unix command, issued from GF by using the -shell escape symbol <CODE>!</CODE>. The resulting graph was shown in the previous section. -</P> -<P> -The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual -grammar in various formats, of which the format <CODE>-printer=graph</CODE> just -shows the module dependencies. Use <CODE>help</CODE> to see what other formats -are available: -</P> -<PRE> - > help pm - > help -printer - > help help -</PRE> -<P> -Another form of system commands are those usable in GF pipes. The escape symbol -is then <CODE>?</CODE>. -</P> -<PRE> - > generate_trees | ? wc -</PRE> -<P></P> -<A NAME="toc35"></A> -<H1>Resource modules</H1> -<A NAME="toc36"></A> -<H2>The golden rule of functional programming</H2> -<P> -In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format looks rather -verbose, and demands lots more characters to be written. You have probably -done this by the copy-paste-modify method, which is a common way to -avoid repeating work. -</P> -<P> -However, there is a more elegant way to avoid repeating work than the copy-and-paste -method. The <B>golden rule of functional programming</B> says that -</P> -<UL> -<LI>whenever you find yourself programming by copy-and-paste, write a function instead. -</UL> - -<P> -A function separates the shared parts of different computations from the -changing parts, its <B>arguments</B>, or <B>parameters</B>. -In functional programming languages, such as -<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share much more -code with functions than in imperative languages such as C and Java. -</P> -<A NAME="toc37"></A> -<H2>Operation definitions</H2> -<P> -GF is a functional programming language, not only in the sense that -the abstract syntax is a system of functions (<CODE>fun</CODE>), but also because -functional programming can be used to define concrete syntax. This is -done by using a new form of judgement, with the keyword <CODE>oper</CODE> (for -<B>operation</B>), distinct from <CODE>fun</CODE> for the sake of clarity. -Here is a simple example of an operation: -</P> -<PRE> - oper ss : Str -> {s : Str} = \x -> {s = x} ; -</PRE> -<P> -The operation can be <B>applied</B> to an argument, and GF will -<B>compute</B> the application into a value. For instance, -</P> -<PRE> - ss "boy" ===> {s = "boy"} -</PRE> -<P> -(We use the symbol <CODE>===></CODE> to indicate how an expression is -computed into a value; this symbol is not a part of GF) -</P> -<P> -Thus an <CODE>oper</CODE> judgement includes the name of the defined operation, -its type, and an expression defining it. As for the syntax of the defining -expression, notice the <B>lambda abstraction</B> form <CODE>\x -> t</CODE> of -the function. -</P> -<A NAME="toc38"></A> -<H2>The ``resource`` module type</H2> -<P> -Operator definitions can be included in a concrete syntax. -But they are not really tied to a particular set of linearization rules. -They should rather be seen as <B>resources</B> -usable in many concrete syntaxes. -</P> -<P> -The <CODE>resource</CODE> module type can be used to package -<CODE>oper</CODE> definitions into reusable resources. Here is -an example, with a handful of operations to manipulate -strings and records. -</P> -<PRE> - resource StringOper = { - oper - SS : Type = {s : Str} ; - ss : Str -> SS = \x -> {s = x} ; - cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ; - prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; - } -</PRE> -<P> -Resource modules can extend other resource modules, in the -same way as modules of other types can extend modules of the -same type. Thus it is possible to build resource hierarchies. -</P> -<A NAME="toc39"></A> -<H2>Opening a resource</H2> -<P> -Any number of <CODE>resource</CODE> modules can be -<B>opened</B> in a <CODE>concrete</CODE> syntax, which -makes definitions contained -in the resource usable in the concrete syntax. Here is -an example, where the resource <CODE>StringOper</CODE> is -opened in a new version of <CODE>FoodEng</CODE>. -</P> -<PRE> - concrete Food2Eng of Food = open StringOper in { - - lincat - S, Item, Kind, Quality = SS ; - - lin - Is item quality = cc item (prefix "is" quality) ; - This k = prefix "this" k ; - That k = prefix "that" k ; - QKind k q = cc k q ; - Wine = ss "wine" ; - Cheese = ss "cheese" ; - Fish = ss "fish" ; - Very = prefix "very" ; - Fresh = ss "fresh" ; - Warm = ss "warm" ; - Italian = ss "Italian" ; - Expensive = ss "expensive" ; - Delicious = ss "delicious" ; - Boring = ss "boring" ; - - } -</PRE> -<P> -<B>Exercise</B>. Use the same string operations to write <CODE>FoodIta</CODE> -more concisely. -</P> -<A NAME="toc40"></A> -<H2>Partial application</H2> -<P> -GF, like Haskell, permits <B>partial application</B> of -functions. An example of this is the rule -</P> -<PRE> - lin This k = prefix "this" k ; -</PRE> -<P> -which can be written more concisely -</P> -<PRE> - lin This = prefix "this" ; -</PRE> -<P> -The first form is perhaps more intuitive to write -but, once you get used to partial application, you will appreciate its -conciseness and elegance. The logic of partial application -is known as <B>currying</B>, with a reference to Haskell B. Curry. -The idea is that any <I>n</I>-place function can be defined as a 1-place -function whose value is an <I>n-</I>1 -place function. Thus -</P> -<PRE> - oper prefix : Str -> SS -> SS ; -</PRE> -<P> -can be used as a 1-place function that takes a <CODE>Str</CODE> into a -function <CODE>SS -> SS</CODE>. The expected linearization of <CODE>This</CODE> is exactly -a function of such a type, operating on an argument of type <CODE>Kind</CODE> -whose linearization is of type <CODE>SS</CODE>. Thus we can define the -linearization directly as <CODE>prefix "this"</CODE>. -</P> -<P> -<B>Exercise</B>. Define an operation <CODE>infix</CODE> analogous to <CODE>prefix</CODE>, -such that it allows you to write -</P> -<PRE> - lin Is = infix "is" ; -</PRE> -<P></P> -<A NAME="toc41"></A> -<H2>Testing resource modules</H2> -<P> -To test a <CODE>resource</CODE> module independently, you must import it -with the flag <CODE>-retain</CODE>, which tells GF to retain <CODE>oper</CODE> definitions -in the memory; the usual behaviour is that <CODE>oper</CODE> definitions -are just applied to compile linearization rules -(this is called <B>inlining</B>) and then thrown away. -</P> -<PRE> - > i -retain StringOper.gf -</PRE> -<P> -The command <CODE>compute_concrete = cc</CODE> computes any expression -formed by operations and other GF constructs. For example, -</P> -<PRE> - > compute_concrete prefix "in" (ss "addition") - { - s : Str = "in" ++ "addition" - } -</PRE> -<P></P> -<A NAME="toc42"></A> -<H2>Division of labour</H2> -<P> -Using operations defined in resource modules is a -way to avoid repetitive code. -In addition, it enables a new kind of modularity -and division of labour in grammar writing: grammarians familiar with -the linguistic details of a language can make their knowledge -available through resource grammar modules, whose users only need -to pick the right operations and not to know their implementation -details. -</P> -<P> -In the following sections, we will go through some -such linguistic details. The programming constructs needed when -doing this are useful for all GF programmers, even if they don't -hand-code the linguistics of their applications but get them -from libraries. It is also useful to know something about the -linguistic concepts of inflection, agreement, and parts of speech. -</P> -<A NAME="toc43"></A> -<H1>Morphology</H1> -<P> -Suppose we want to say, with the vocabulary included in -<CODE>Food.gf</CODE>, things like -</P> -<PRE> - all Italian wines are delicious -</PRE> -<P> -The new grammatical facility we need are the plural forms -of nouns and verbs (<I>wines, are</I>), as opposed to their -singular forms. -</P> -<P> -The introduction of plural forms requires two things: -</P> -<UL> -<LI>the <B>inflection</B> of nouns and verbs in singular and plural -<LI>the <B>agreement</B> of the verb to subject: - the verb must have the same number as the subject -</UL> - -<P> -Different languages have different rules of inflection and agreement. -For instance, Italian has also agreement in gender (masculine vs. feminine). -We want to express such special features of languages in the -concrete syntax while ignoring them in the abstract syntax. -</P> -<P> -To be able to do all this, we need one new judgement form -and many new expression forms. -We also need to generalize linearization types -from strings to more complex types. -</P> -<P> -<B>Exercise</B>. Make a list of the possible forms that nouns, -adjectives, and verbs can have in some languages that you know. -</P> -<A NAME="toc44"></A> -<H2>Parameters and tables</H2> -<P> -We define the <B>parameter type</B> of number in Englisn by -using a new form of judgement: -</P> -<PRE> - param Number = Sg | Pl ; -</PRE> -<P> -To express that <CODE>Kind</CODE> expressions in English have a linearization -depending on number, we replace the linearization type <CODE>{s : Str}</CODE> -with a type where the <CODE>s</CODE> field is a <B>table</B> depending on number: -</P> -<PRE> - lincat Kind = {s : Number => Str} ; -</PRE> -<P> -The <B>table type</B> <CODE>Number => Str</CODE> is in many respects similar to -a function type (<CODE>Number -> Str</CODE>). The main difference is that the -argument type of a table type must always be a parameter type. This means -that the argument-value pairs can be listed in a finite table. The following -example shows such a table: -</P> -<PRE> - lin Cheese = {s = table { - Sg => "cheese" ; - Pl => "cheeses" - } - } ; -</PRE> -<P> -The table consists of <B>branches</B>, where a <B>pattern</B> on the -left of the arrow <CODE>=></CODE> is assigned a <B>value</B> on the right. -</P> -<P> -The application of a table to a parameter is done by the <B>selection</B> -operator <CODE>!</CODE>. For instance, -</P> -<PRE> - table {Sg => "cheese" ; Pl => "cheeses"} ! Pl -</PRE> -<P> -is a selection that computes into the value <CODE>"cheeses"</CODE>. -This computation is performed by <B>pattern matching</B>: return -the value from the first branch whose pattern matches the -selection argument. Thus -</P> -<PRE> - table {Sg => "cheese" ; Pl => "cheeses"} ! Pl - ===> "cheeses" -</PRE> -<P></P> -<P> -<B>Exercise</B>. In a previous exercise, we make a list of the possible -forms that nouns, adjectives, and verbs can have in some languages that -you know. Now take some of the results and implement them by -using parameter type definitions and tables. Write them into a <CODE>resource</CODE> -module, which you can test by using the command <CODE>compute_concrete</CODE>. -</P> -<A NAME="toc45"></A> -<H2>Inflection tables and paradigms</H2> -<P> -All English common nouns are inflected in number, most of them in the -same way: the plural form is obtained from the singular by adding the -ending <I>s</I>. This rule is an example of -a <B>paradigm</B> - a formula telling how the inflection -forms of a word are formed. -</P> -<P> -From the GF point of view, a paradigm is a function that takes a <B>lemma</B> - -also known as a <B>dictionary form</B> - and returns an inflection -table of desired type. Paradigms are not functions in the sense of the -<CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not -on strings), but operations defined in <CODE>oper</CODE> judgements. -The following operation defines the regular noun paradigm of English: -</P> -<PRE> - oper regNoun : Str -> {s : Number => Str} = \x -> { - s = table { - Sg => x ; - Pl => x + "s" - } - } ; -</PRE> -<P> -The <B>gluing</B> operator <CODE>+</CODE> tells that -the string held in the variable <CODE>x</CODE> and the ending <CODE>"s"</CODE> -are written together to form one <B>token</B>. Thus, for instance, -</P> -<PRE> - (regNoun "cheese").s ! Pl ---> "cheese" + "s" ---> "cheeses" -</PRE> -<P></P> -<P> -<B>Exercise</B>. Identify cases in which the <CODE>regNoun</CODE> paradigm does not -apply in English, and implement some alternative paradigms. -</P> -<P> -<B>Exercise</B>. Implement a paradigm for regular verbs in English. -</P> -<P> -<B>Exercise</B>. Implement some regular paradigms for other languages you have -considered in earlier exercises. -</P> -<A NAME="toc46"></A> -<H2>Worst-case functions and data abstraction</H2> -<P> -Some English nouns, such as <CODE>mouse</CODE>, are so irregular that -it makes no sense to see them as instances of a paradigm. Even -then, it is useful to perform <B>data abstraction</B> from the -definition of the type <CODE>Noun</CODE>, and introduce a constructor -operation, a <B>worst-case function</B> for nouns: -</P> -<PRE> - oper mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; -</PRE> -<P> -Thus we can define -</P> -<PRE> - lin Mouse = mkNoun "mouse" "mice" ; -</PRE> -<P> -and -</P> -<PRE> - oper regNoun : Str -> Noun = \x -> - mkNoun x (x + "s") ; -</PRE> -<P> -instead of writing the inflection tables explicitly. -</P> -<P> -The grammar engineering advantage of worst-case functions is that -the author of the resource module may change the definitions of -<CODE>Noun</CODE> and <CODE>mkNoun</CODE>, and still retain the -interface (i.e. the system of type signatures) that makes it -correct to use these functions in concrete modules. In programming -terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>. -</P> -<A NAME="toc47"></A> -<H2>A system of paradigms using Prelude operations</H2> -<P> -In addition to the completely regular noun paradigm <CODE>regNoun</CODE>, -some other frequent noun paradigms deserve to be -defined, for instance, -</P> -<PRE> - sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ; -</PRE> -<P> -What about nouns like <I>fly</I>, with the plural <I>flies</I>? The already -available solution is to use the longest common prefix -<I>fl</I> (also known as the <B>technical stem</B>) as argument, and define -</P> -<PRE> - yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ; -</PRE> -<P> -But this paradigm would be very unintuitive to use, because the technical stem -is not an existing form of the word. A better solution is to use -the lemma and a string operator <CODE>init</CODE>, which returns the initial segment (i.e. -all characters but the last) of a string: -</P> -<PRE> - yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ; -</PRE> -<P> -The operation <CODE>init</CODE> belongs to a set of operations in the -resource module <CODE>Prelude</CODE>, which therefore has to be -<CODE>open</CODE>ed so that <CODE>init</CODE> can be used. Its dual is <CODE>last</CODE>: -</P> -<PRE> - > cc init "curry" - "curr" - - > cc last "curry" - "y" -</PRE> -<P> -As generalizations of the library functions <CODE>init</CODE> and <CODE>last</CODE>, GF has -two predefined funtions: -<CODE>Predef.dp</CODE>, which "drops" suffixes of any length, -and <CODE>Predef.tk</CODE>, which "takes" a prefix -just omitting a number of characters from the end. For instance, -</P> -<PRE> - > cc Predef.tk 3 "worried" - "worr" - > cc Predef.dp 3 "worried" - "ied" -</PRE> -<P> -The prefix <CODE>Predef</CODE> is given to a handful of functions that could -not be defined internally in GF. They are available in all modules -without explicit <CODE>open</CODE> of the module <CODE>Predef</CODE>. -</P> -<A NAME="toc48"></A> -<H2>Pattern matching</H2> -<P> -We have so far built all expressions of the <CODE>table</CODE> form -from branches whose patterns are constants introduced in -<CODE>param</CODE> definitions, as well as constant strings. -But there are more expressive patterns. Here is a summary of the possible forms: -</P> -<UL> -<LI>a variable pattern (identifier other than constant parameter) matches anything -<LI>the wild card <CODE>_</CODE> matches anything -<LI>a string literal pattern, e.g. <CODE>"s"</CODE>, matches the same string -<LI>a disjunctive pattern <CODE>P | ... | Q</CODE> matches anything that - one of the disjuncts matches -</UL> - -<P> -Pattern matching is performed in the order in which the branches -appear in the table: the branch of the first matching pattern is followed. -</P> -<P> -As syntactic sugar, one-branch tables can be written concisely, -</P> -<PRE> - \\P,...,Q => t === table {P => ... table {Q => t} ...} -</PRE> -<P> -Finally, the <CODE>case</CODE> expressions common in functional -programming languages are syntactic sugar for table selections: -</P> -<PRE> - case e of {...} === table {...} ! e -</PRE> -<P></P> -<A NAME="toc49"></A> -<H2>An intelligent noun paradigm using pattern matching</H2> -<P> -It may be hard for the user of a resource morphology to pick the right -inflection paradigm. A way to help this is to define a more intelligent -paradigm, which chooses the ending by first analysing the lemma. -The following variant for English regular nouns puts together all the -previously shown paradigms, and chooses one of them on the basis of -the final letter of the lemma (found by the prelude operator <CODE>last</CODE>). -</P> -<PRE> - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; -</PRE> -<P> -This definition displays many GF expression forms not shown befores; -these forms are explained in the next section. -</P> -<P> -The paradigms <CODE>regNoun</CODE> does not give the correct forms for -all nouns. For instance, <I>mouse - mice</I> and -<I>fish - fish</I> must be given by using <CODE>mkNoun</CODE>. -Also the word <I>boy</I> would be inflected incorrectly; to prevent -this, either use <CODE>mkNoun</CODE> or modify -<CODE>regNoun</CODE> so that the <CODE>"y"</CODE> case does not -apply if the second-last character is a vowel. -</P> -<P> -<B>Exercise</B>. Extend the <CODE>regNoun</CODE> paradigm so that it takes care -of all variations there are in English. Test it with the nouns -<I>ax</I>, <I>bamboo</I>, <I>boy</I>, <I>bush</I>, <I>hero</I>, <I>match</I>. -<B>Hint</B>. The library functions <CODE>Predef.dp</CODE> and <CODE>Predef.tk</CODE> -are useful in this task. -</P> -<P> -<B>Exercise</B>. The same rules that form plural nouns in English also -apply in the formation of third-person singular verbs. -Write a regular verb paradigm that uses this idea, but first -rewrite <CODE>regNoun</CODE> so that the analysis needed to build <I>s</I>-forms -is factored out as a separate <CODE>oper</CODE>, which is shared with -<CODE>regVerb</CODE>. -</P> -<A NAME="toc50"></A> -<H2>Morphological resource modules</H2> -<P> -A common idiom is to -gather the <CODE>oper</CODE> and <CODE>param</CODE> definitions -needed for inflecting words in -a language into a morphology module. Here is a simple -example, <A HREF="resource/MorphoEng.gf"><CODE>MorphoEng</CODE></A>. -</P> -<PRE> - --# -path=.:prelude - - resource MorphoEng = open Prelude in { - - param - Number = Sg | Pl ; - - oper - Noun, Verb : Type = {s : Number => Str} ; - - mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; - - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; - - mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ; - - regVerb : Str -> Verb = \s -> case last s of { - "s" | "z" => mkVerb s (s + "es") ; - "y" => mkVerb s (init s + "ies") ; - "o" => mkVerb s (s + "es") ; - _ => mkVerb s (s + "s") - } ; - } -</PRE> -<P> -The first line gives as a hint to the compiler the -<B>search path</B> needed to find all the other modules that the -module depends on. The directory <CODE>prelude</CODE> is a subdirectory of -<CODE>GF/lib</CODE>; to be able to refer to it in this simple way, you can -set the environment variable <CODE>GF_LIB_PATH</CODE> to point to this -directory. -</P> -<A NAME="toc51"></A> -<H1>Using parameters in concrete syntax</H1> -<P> -We can now enrich the concrete syntax definitions to -comprise morphology. This will involve a more radical -variation between languages (e.g. English and Italian) -then just the use of different words. In general, -parameters and linearization types are different in -different languages - but this does not prevent the -use of a common abstract syntax. -</P> -<A NAME="toc52"></A> -<H2>Parametric vs. inherent features, agreement</H2> -<P> -The rule of subject-verb agreement in English says that the verb -phrase must be inflected in the number of the subject. This -means that a noun phrase (functioning as a subject), inherently -<I>has</I> a number, which it passes to the verb. The verb does not -<I>have</I> a number, but must be able to <I>receive</I> whatever number the -subject has. This distinction is nicely represented by the -different linearization types of <B>noun phrases</B> and <B>verb phrases</B>: -</P> -<PRE> - lincat NP = {s : Str ; n : Number} ; - lincat VP = {s : Number => Str} ; -</PRE> -<P> -We say that the number of <CODE>NP</CODE> is an <B>inherent feature</B>, -whereas the number of <CODE>NP</CODE> is a <B>variable feature</B> (or a -<B>parametric feature</B>). -</P> -<P> -The agreement rule itself is expressed in the linearization rule of -the predication function: -</P> -<PRE> - lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ; -</PRE> -<P> -The following section will present -<CODE>FoodsEng</CODE>, assuming the abstract syntax <CODE>Foods</CODE> -that is similar to <CODE>Food</CODE> but also has the -plural determiners <CODE>These</CODE> and <CODE>Those</CODE>. -The reader is invited to inspect the way in which agreement works in -the formation of sentences. -</P> -<A NAME="toc53"></A> -<H2>English concrete syntax with parameters</H2> -<P> -The grammar uses both -<A HREF="../../lib/prelude/Prelude.gf"><CODE>Prelude</CODE></A> and -<A HREF="resource/MorphoEng"><CODE>MorphoEng</CODE></A>. -We will later see how to make the grammar even -more high-level by using a resource grammar library -and parametrized modules. -</P> -<PRE> - --# -path=.:resource:prelude - - concrete FoodsEng of Foods = open Prelude, MorphoEng in { - - lincat - S, Quality = SS ; - Kind = {s : Number => Str} ; - Item = {s : Str ; n : Number} ; - - lin - Is item quality = ss (item.s ++ (mkVerb "are" "is").s ! item.n ++ quality.s) ; - This = det Sg "this" ; - That = det Sg "that" ; - These = det Pl "these" ; - Those = det Pl "those" ; - QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ; - Wine = regNoun "wine" ; - Cheese = regNoun "cheese" ; - Fish = mkNoun "fish" "fish" ; - Very = prefixSS "very" ; - Fresh = ss "fresh" ; - Warm = ss "warm" ; - Italian = ss "Italian" ; - Expensive = ss "expensive" ; - Delicious = ss "delicious" ; - Boring = ss "boring" ; - - oper - det : Number -> Str -> Noun -> {s : Str ; n : Number} = \n,d,cn -> { - s = d ++ cn.s ! n ; - n = n - } ; - } -</PRE> -<P></P> -<A NAME="toc54"></A> -<H2>Hierarchic parameter types</H2> -<P> -The reader familiar with a functional programming language such as -<A HREF="http://www.haskell.org">Haskell</A> must have noticed the similarity -between parameter types in GF and <B>algebraic datatypes</B> (<CODE>data</CODE> definitions -in Haskell). The GF parameter types are actually a special case of algebraic -datatypes: the main restriction is that in GF, these types must be finite. -(It is this restriction that makes it possible to invert linearization rules into -parsing methods.) -</P> -<P> -However, finite is not the same thing as enumerated. Even in GF, parameter -constructors can take arguments, provided these arguments are from other -parameter types - only recursion is forbidden. Such parameter types impose a -hierarchic order among parameters. They are often needed to define -the linguistically most accurate parameter systems. -</P> -<P> -To give an example, Swedish adjectives -are inflected in number (singular or plural) and -gender (uter or neuter). These parameters would suggest 2*2=4 different -forms. However, the gender distinction is done only in the singular. Therefore, -it would be inaccurate to define adjective paradigms using the type -<CODE>Gender => Number => Str</CODE>. The following hierarchic definition -yields an accurate system of three adjectival forms. -</P> -<PRE> - param AdjForm = ASg Gender | APl ; - param Gender = Utr | Neutr ; -</PRE> -<P> -Here is an example of pattern matching, the paradigm of regular adjectives. -</P> -<PRE> - oper regAdj : Str -> AdjForm => Str = \fin -> table { - ASg Utr => fin ; - ASg Neutr => fin + "t" ; - APl => fin + "a" ; - } -</PRE> -<P> -A constructor can be used as a pattern that has patterns as arguments. For instance, -the adjectival paradigm in which the two singular forms are the same, -can be defined -</P> -<PRE> - oper plattAdj : Str -> AdjForm => Str = \platt -> table { - ASg _ => platt ; - APl => platt + "a" ; - } -</PRE> -<P></P> -<A NAME="toc55"></A> -<H2>Morphological analysis and morphology quiz</H2> -<P> -Even though morphology is in GF -mostly used as an auxiliary for syntax, it -can also be useful on its own right. The command <CODE>morpho_analyse = ma</CODE> -can be used to read a text and return for each word the analyses that -it has in the current concrete syntax. -</P> -<PRE> - > rf bible.txt | morpho_analyse -</PRE> -<P> -In the same way as translation exercises, morphological exercises can -be generated, by the command <CODE>morpho_quiz = mq</CODE>. Usually, -the category is set to be something else than <CODE>S</CODE>. For instance, -</P> -<PRE> - > cd GF/lib/resource-1.0/ - > i french/IrregFre.gf - > morpho_quiz -cat=V - - Welcome to GF Morphology Quiz. - ... - - réapparaître : VFin VCondit Pl P2 - réapparaitriez - > No, not réapparaitriez, but - réapparaîtriez - Score 0/1 -</PRE> -<P> -Finally, a list of morphological exercises can be generated -off-line and saved in a -file for later use, by the command <CODE>morpho_list = ml</CODE> -</P> -<PRE> - > morpho_list -number=25 -cat=V | wf exx.txt -</PRE> -<P> -The <CODE>number</CODE> flag gives the number of exercises generated. -</P> -<A NAME="toc56"></A> -<H2>Discontinuous constituents</H2> -<P> -A linearization type may contain more strings than one. -An example of where this is useful are English particle -verbs, such as <I>switch off</I>. The linearization of -a sentence may place the object between the verb and the particle: -<I>he switched it off</I>. -</P> -<P> -The following judgement defines transitive verbs as -<B>discontinuous constituents</B>, i.e. as having a linearization -type with two strings and not just one. -</P> -<PRE> - lincat TV = {s : Number => Str ; part : Str} ; -</PRE> -<P> -This linearization rule -shows how the constituents are separated by the object in complementization. -</P> -<PRE> - lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ; -</PRE> -<P> -There is no restriction in the number of discontinuous constituents -(or other fields) a <CODE>lincat</CODE> may contain. The only condition is that -the fields must be of finite types, i.e. built from records, tables, -parameters, and <CODE>Str</CODE>, and not functions. -</P> -<P> -A mathematical result -about parsing in GF says that the worst-case complexity of parsing -increases with the number of discontinuous constituents. This is -potentially a reason to avoid discontinuous constituents. -Moreover, the parsing and linearization commands only give accurate -results for categories whose linearization type has a unique <CODE>Str</CODE> -valued field labelled <CODE>s</CODE>. Therefore, discontinuous constituents -are not a good idea in top-level categories accessed by the users -of a grammar application. -</P> -<A NAME="toc57"></A> -<H2>Free variation</H2> -<P> -Sometimes there are many alternative ways to define a concrete syntax. -For instance, the verb negation in English can be expressed both by -<I>does not</I> and <I>doesn't</I>. In linguistic terms, these expressions -are in <B>free variation</B>. The <CODE>variants</CODE> construct of GF can -be used to give a list of strings in free variation. For example, -</P> -<PRE> - NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ; -</PRE> -<P> -An empty variant list -</P> -<PRE> - variants {} -</PRE> -<P> -can be used e.g. if a word lacks a certain form. -</P> -<P> -In general, <CODE>variants</CODE> should be used cautiously. It is not -recommended for modules aimed to be libraries, because the -user of the library has no way to choose among the variants. -</P> -<A NAME="toc58"></A> -<H2>Overloading of operations</H2> -<P> -Large libraries, such as the GF Resource Grammar Library, may define -hundreds of names, which can be unpractical -for both the library writer and the user. The writer has to invent longer -and longer names which are not always intuitive, -and the user has to learn or at least be able to find all these names. -A solution to this problem, adopted by languages such as C++, is <B>overloading</B>: -the same name can be used for several functions. When such a name is used, the -compiler performs <B>overload resolution</B> to find out which of the possible functions -is meant. The resolution is based on the types of the functions: all functions that -have the same name must have different types. -</P> -<P> -In C++, functions with the same name can be scattered everywhere in the program. -In GF, they must be grouped together in <CODE>overload</CODE> groups. Here is an example -of an overload group, defining four ways to define nouns in Italian: -</P> -<PRE> - oper mkN = overload { - mkN : Str -> N = -- regular nouns - mkN : Str -> Gender -> N = -- regular nouns with unexpected gender - mkN : Str -> Str -> N = -- irregular nouns - mkN : Str -> Str -> Gender -> N = -- irregular nouns with unexpected gender - } -</PRE> -<P> -All of the following uses of <CODE>mkN</CODE> are easy to resolve: -</P> -<PRE> - lin Pizza = mkN "pizza" ; -- Str -> N - lin Hand = mkN "mano" Fem ; -- Str -> Gender -> N - lin Man = mkN "uomo" "uomini" ; -- Str -> Str -> N -</PRE> -<P></P> -<A NAME="toc59"></A> -<H1>More constructs for concrete syntax</H1> -<P> -In this chapter, we go through constructs that are not necessary in simple grammars -or when the concrete syntax relies on libraries. But they are useful when -writing advanced concrete syntax implementations, such as resource grammar libraries. -This chapter can safely be skipped if the reader prefers to continue to the -chapter on using libraries. -</P> -<A NAME="toc60"></A> -<H2>Local definitions</H2> -<P> -Local definitions ("<CODE>let</CODE> expressions") are used in functional -programming for two reasons: to structure the code into smaller -expressions, and to avoid repeated computation of one and -the same expression. Here is an example, from -<A HREF="resource/MorphoIta.gf"><CODE>MorphoIta</CODE></A>: -</P> -<PRE> - oper regNoun : Str -> Noun = \vino -> - let - vin = init vino ; - o = last vino - in - case o of { - "a" => mkNoun Fem vino (vin + "e") ; - "o" | "e" => mkNoun Masc vino (vin + "i") ; - _ => mkNoun Masc vino vino - } ; -</PRE> -<P></P> -<A NAME="toc61"></A> -<H2>Record extension and subtyping</H2> -<P> -Record types and records can be <B>extended</B> with new fields. For instance, -in German it is natural to see transitive verbs as verbs with a case. -The symbol <CODE>**</CODE> is used for both constructs. -</P> -<PRE> - lincat TV = Verb ** {c : Case} ; - - lin Follow = regVerb "folgen" ** {c = Dative} ; -</PRE> -<P> -To extend a record type or a record with a field whose label it -already has is a type error. -</P> -<P> -A record type <I>T</I> is a <B>subtype</B> of another one <I>R</I>, if <I>T</I> has -all the fields of <I>R</I> and possibly other fields. For instance, -an extension of a record type is always a subtype of it. -</P> -<P> -If <I>T</I> is a subtype of <I>R</I>, an object of <I>T</I> can be used whenever -an object of <I>R</I> is required. For instance, a transitive verb can -be used whenever a verb is required. -</P> -<P> -<B>Contravariance</B> means that a function taking an <I>R</I> as argument -can also be applied to any object of a subtype <I>T</I>. -</P> -<A NAME="toc62"></A> -<H2>Tuples and product types</H2> -<P> -Product types and tuples are syntactic sugar for record types and records: -</P> -<PRE> - T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn} - <t1, ..., tn> === {p1 = T1 ; ... ; pn = Tn} -</PRE> -<P> -Thus the labels <CODE>p1, p2,...</CODE> are hard-coded. -</P> -<A NAME="toc63"></A> -<H2>Record and tuple patterns</H2> -<P> -Record types of parameter types are also parameter types. -A typical example is a record of agreement features, e.g. French -</P> -<PRE> - oper Agr : PType = {g : Gender ; n : Number ; p : Person} ; -</PRE> -<P> -Notice the term <CODE>PType</CODE> rather than just <CODE>Type</CODE> referring to -parameter types. Every <CODE>PType</CODE> is also a <CODE>Type</CODE>, but not vice-versa. -</P> -<P> -Pattern matching is done in the expected way, but it can moreover -utilize partial records: the branch -</P> -<PRE> - {g = Fem} => t -</PRE> -<P> -in a table of type <CODE>Agr => T</CODE> means the same as -</P> -<PRE> - {g = Fem ; n = _ ; p = _} => t -</PRE> -<P> -Tuple patterns are translated to record patterns in the -same way as tuples to records; partial patterns make it -possible to write, slightly surprisingly, -</P> -<PRE> - case <g,n,p> of { - <Fem> => t - ... - } -</PRE> -<P></P> -<A NAME="toc64"></A> -<H2>Regular expression patterns</H2> -<P> -To define string operations computed at compile time, such -as in morphology, it is handy to use regular expression patterns: -</P> - <UL> - <LI><I>p</I> <CODE>+</CODE> <I>q</I> : token consisting of <I>p</I> followed by <I>q</I> - <LI><I>p</I> <CODE>*</CODE> : token <I>p</I> repeated 0 or more times - (max the length of the string to be matched) - <LI><CODE>-</CODE> <I>p</I> : matches anything that <I>p</I> does not match - <LI><I>x</I> <CODE>@</CODE> <I>p</I> : bind to <I>x</I> what <I>p</I> matches - <LI><I>p</I> <CODE>|</CODE> <I>q</I> : matches what either <I>p</I> or <I>q</I> matches - </UL> - -<P> -The last three apply to all types of patterns, the first two only to token strings. -As an example, we give a rule for the formation of English word forms -ending with an <I>s</I> and used in the formation of both plural nouns and -third-person present-tense verbs. -</P> -<PRE> - add_s : Str -> Str = \w -> case w of { - _ + "oo" => w + "s" ; -- bamboo - _ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ; -- bus, hero - _ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; -- boy - x + "y" => x + "ies" ; -- fly - _ => w + "s" -- car - } ; -</PRE> -<P> -Here is another example, the plural formation in Swedish 2nd declension. -The second branch uses a variable binding with <CODE>@</CODE> to cover the cases where an -unstressed pre-final vowel <I>e</I> disappears in the plural -(<I>nyckel-nycklar, seger-segrar, bil-bilar</I>): -</P> -<PRE> - plural2 : Str -> Str = \w -> case w of { - pojk + "e" => pojk + "ar" ; - nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ; - bil => bil + "ar" - } ; -</PRE> -<P></P> -<P> -Semantics: variables are always bound to the <B>first match</B>, which is the first -in the sequence of binding lists <CODE>Match p v</CODE> defined as follows. In the definition, -<CODE>p</CODE> is a pattern and <CODE>v</CODE> is a value. The semantics is given in Haskell notation. -</P> -<PRE> - Match (p1|p2) v = Match p1 ++ U Match p2 v - Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | - i <- [0..length s], (s1,s2) = splitAt i s] - Match p* s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= [] - Match -p v = [[]] if Match p v = [] - Match c v = [[]] if c == v -- for constant and literal patterns c - Match x v = [[(x,v)]] -- for variable patterns x - Match x@p v = [[(x,v)]] + M if M = Match p v /= [] - Match p v = [] otherwise -- failure -</PRE> -<P> -Examples: -</P> -<UL> -<LI><CODE>x + "e" + y</CODE> matches <CODE>"peter"</CODE> with <CODE>x = "p", y = "ter"</CODE> -<LI><CODE>x + "er"*</CODE> matches <CODE>"burgerer"</CODE> with ``x = "burg" -</UL> - -<P> -<B>Exercise</B>. Implement the German <B>Umlaut</B> operation on word stems. -The operation changes the vowel of the stressed stem syllable as follows: -<I>a</I> to <I>ä</I>, <I>au</I> to <I>äu</I>, <I>o</I> to <I>ö</I>, and <I>u</I> to <I>ü</I>. You -can assume that the operation only takes syllables as arguments. Test the -operation to see whether it correctly changes <I>Arzt</I> to <I>Ärzt</I>, -<I>Baum</I> to <I>Bäum</I>, <I>Topf</I> to <I>Töpf</I>, and <I>Kuh</I> to <I>Küh</I>. -</P> -<A NAME="toc65"></A> -<H2>Prefix-dependent choices</H2> -<P> -Sometimes a token has different forms depending on the token -that follows. An example is the English indefinite article, -which is <I>an</I> if a vowel follows, <I>a</I> otherwise. -Which form is chosen can only be decided at run time, i.e. -when a string is actually build. GF has a special construct for -such tokens, the <CODE>pre</CODE> construct exemplified in -</P> -<PRE> - oper artIndef : Str = - pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ; -</PRE> -<P> -Thus -</P> -<PRE> - artIndef ++ "cheese" ---> "a" ++ "cheese" - artIndef ++ "apple" ---> "an" ++ "apple" -</PRE> -<P> -This very example does not work in all situations: the prefix -<I>u</I> has no general rules, and some problematic words are -<I>euphemism, one-eyed, n-gram</I>. It is possible to write -</P> -<PRE> - oper artIndef : Str = - pre {"a" ; - "a" / strs {"eu" ; "one"} ; - "an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"} - } ; -</PRE> -<P></P> -<A NAME="toc66"></A> -<H2>Predefined types</H2> -<P> -GF has the following predefined categories in abstract syntax: -</P> -<PRE> - cat Int ; -- integers, e.g. 0, 5, 743145151019 - cat Float ; -- floats, e.g. 0.0, 3.1415926 - cat String ; -- strings, e.g. "", "foo", "123" -</PRE> -<P> -The objects of each of these categories are <B>literals</B> -as indicated in the comments above. No <CODE>fun</CODE> definition -can have a predefined category as its value type, but -they can be used as arguments. For example: -</P> -<PRE> - fun StreetAddress : Int -> String -> Address ; - lin StreetAddress number street = {s = number.s ++ street.s} ; - - -- e.g. (StreetAddress 10 "Downing Street") : Address -</PRE> -<P> -FIXME: The linearization type is <CODE>{s : Str}</CODE> for all these categories. -</P> -<A NAME="toc67"></A> -<H1>Using the resource grammar library</H1> -<P> -In this chapter, we will take a look at the GF resource grammar library. -We will use the library to implement a slightly extended <CODE>Food</CODE> grammar -and port it to some new languages. -</P> -<A NAME="toc68"></A> -<H2>The coverage of the library</H2> -<P> -The GF Resource Grammar Library contains grammar rules for -10 languages (in addition, 2 languages are available as incomplete -implementations, and a few more are under construction). Its purpose -is to make these rules available for application programmers, -who can thereby concentrate on the semantic and stylistic -aspects of their grammars, without having to think about -grammaticality. The targeted level of application grammarians -is that of a skilled programmer with -a practical knowledge of the target languages, but without -theoretical knowledge about their grammars. -Such a combination of -skills is typical of programmers who, for instance, want to localize -software to new languages. -</P> -<P> -The current resource languages are -</P> -<UL> -<LI><CODE>Ara</CODE>bic (incomplete) -<LI><CODE>Cat</CODE>alan (incomplete) -<LI><CODE>Dan</CODE>ish -<LI><CODE>Eng</CODE>lish -<LI><CODE>Fin</CODE>nish -<LI><CODE>Fre</CODE>nch -<LI><CODE>Ger</CODE>man -<LI><CODE>Ita</CODE>lian -<LI><CODE>Nor</CODE>wegian -<LI><CODE>Rus</CODE>sian -<LI><CODE>Spa</CODE>nish -<LI><CODE>Swe</CODE>dish -</UL> - -<P> -The first three letters (<CODE>Eng</CODE> etc) are used in grammar module names. -The incomplete Arabic and Catalan implementations are -enough to be used in many applications; they both contain, amoung other -things, complete inflectional morphology. -</P> -<A NAME="toc69"></A> -<H2>The resource API</H2> -<P> -The resource library API is devided into language-specific -and language-independent parts. To put it roughly, -</P> -<UL> -<LI>the syntax API is language-independent, i.e. has the same types and functions for all - languages. - Its name is <CODE>Syntax</CODE><I>L</I> for each language <I>L</I> -<LI>the morphology API is language-specific, i.e. has partly different types and functions - for different languages. - Its name is <CODE>Paradigms</CODE><I>L</I> for each language <I>L</I> -</UL> - -<P> -A full documentation of the API is available on-line in the -<A HREF="../../lib/resource-1.0/synopsis.html">resource synopsis</A>. For our -examples, we will only need a fragment of the full API. -</P> -<P> -In the first examples, -we will make use of the following categories, from the module <CODE>Syntax</CODE>. -</P> -<TABLE CELLPADDING="4" BORDER="1"> -<TR> -<TH>Category</TH> -<TH>Explanation</TH> -<TH COLSPAN="2">Example</TH> -</TR> -<TR> -<TD><CODE>Utt</CODE></TD> -<TD>sentence, question, word...</TD> -<TD>"be quiet"</TD> -</TR> -<TR> -<TD><CODE>Adv</CODE></TD> -<TD>verb-phrase-modifying adverb,</TD> -<TD>"in the house"</TD> -</TR> -<TR> -<TD><CODE>AdA</CODE></TD> -<TD>adjective-modifying adverb,</TD> -<TD>"very"</TD> -</TR> -<TR> -<TD><CODE>S</CODE></TD> -<TD>declarative sentence</TD> -<TD>"she lived here"</TD> -</TR> -<TR> -<TD><CODE>Cl</CODE></TD> -<TD>declarative clause, with all tenses</TD> -<TD>"she looks at this"</TD> -</TR> -<TR> -<TD><CODE>AP</CODE></TD> -<TD>adjectival phrase</TD> -<TD>"very warm"</TD> -</TR> -<TR> -<TD><CODE>CN</CODE></TD> -<TD>common noun (without determiner)</TD> -<TD>"red house"</TD> -</TR> -<TR> -<TD><CODE>NP</CODE></TD> -<TD>noun phrase (subject or object)</TD> -<TD>"the red house"</TD> -</TR> -<TR> -<TD><CODE>Det</CODE></TD> -<TD>determiner phrase</TD> -<TD>"those seven"</TD> -</TR> -<TR> -<TD><CODE>Predet</CODE></TD> -<TD>predeterminer</TD> -<TD>"only"</TD> -</TR> -<TR> -<TD><CODE>Quant</CODE></TD> -<TD>quantifier with both sg and pl</TD> -<TD>"this/these"</TD> -</TR> -<TR> -<TD><CODE>Prep</CODE></TD> -<TD>preposition, or just case</TD> -<TD>"in"</TD> -</TR> -<TR> -<TD><CODE>A</CODE></TD> -<TD>one-place adjective</TD> -<TD>"warm"</TD> -</TR> -<TR> -<TD><CODE>N</CODE></TD> -<TD>common noun</TD> -<TD>"house"</TD> -</TR> -</TABLE> - -<P></P> -<P> -We will need the following syntax rules from <CODE>Syntax</CODE>. -</P> -<TABLE CELLPADDING="4" BORDER="1"> -<TR> -<TH>Function</TH> -<TH>Type</TH> -<TH COLSPAN="2">Example</TH> -</TR> -<TR> -<TD><CODE>mkUtt</CODE></TD> -<TD><CODE>S -> Utt</CODE></TD> -<TD><I>John walked</I></TD> -</TR> -<TR> -<TD><CODE>mkUtt</CODE></TD> -<TD><CODE>Cl -> Utt</CODE></TD> -<TD><I>John walks</I></TD> -</TR> -<TR> -<TD><CODE>mkCl</CODE></TD> -<TD><CODE>NP -> AP -> Cl</CODE></TD> -<TD><I>John is very old</I></TD> -</TR> -<TR> -<TD><CODE>mkNP</CODE></TD> -<TD><CODE>Det -> CN -> NP</CODE></TD> -<TD><I>the first old man</I></TD> -</TR> -<TR> -<TD><CODE>mkNP</CODE></TD> -<TD><CODE>Predet -> NP -> NP</CODE></TD> -<TD><I>only John</I></TD> -</TR> -<TR> -<TD><CODE>mkDet</CODE></TD> -<TD><CODE>Quant -> Det</CODE></TD> -<TD><I>this</I></TD> -</TR> -<TR> -<TD><CODE>mkCN</CODE></TD> -<TD><CODE>N -> CN</CODE></TD> -<TD><I>house</I></TD> -</TR> -<TR> -<TD><CODE>mkCN</CODE></TD> -<TD><CODE>AP -> CN -> CN</CODE></TD> -<TD><I>very big blue house</I></TD> -</TR> -<TR> -<TD><CODE>mkAP</CODE></TD> -<TD><CODE>A -> AP</CODE></TD> -<TD><I>old</I></TD> -</TR> -<TR> -<TD><CODE>mkAP</CODE></TD> -<TD><CODE>AdA -> AP -> AP</CODE></TD> -<TD><I>very very old</I></TD> -</TR> -</TABLE> - -<P></P> -<P> -We will also need the following structural words from <CODE>Syntax</CODE>. -</P> -<TABLE CELLPADDING="4" BORDER="1"> -<TR> -<TH>Function</TH> -<TH>Type</TH> -<TH COLSPAN="2">Example</TH> -</TR> -<TR> -<TD><CODE>all_Predet</CODE></TD> -<TD><CODE>Predet</CODE></TD> -<TD><I>all</I></TD> -</TR> -<TR> -<TD><CODE>defPlDet</CODE></TD> -<TD><CODE>Det</CODE></TD> -<TD><I>the (houses)</I></TD> -</TR> -<TR> -<TD><CODE>this_Quant</CODE></TD> -<TD><CODE>Quant</CODE></TD> -<TD><I>this</I></TD> -</TR> -<TR> -<TD><CODE>very_AdA</CODE></TD> -<TD><CODE>AdA</CODE></TD> -<TD><I>very</I></TD> -</TR> -</TABLE> - -<P></P> -<P> -For French, we will use the following part of <CODE>ParadigmsFre</CODE>. -</P> -<TABLE CELLPADDING="4" BORDER="1"> -<TR> -<TH>Function</TH> -<TH>Type</TH> -<TH COLSPAN="2">Example</TH> -</TR> -<TR> -<TD><CODE>Gender</CODE></TD> -<TD><CODE>Type</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>masculine</CODE></TD> -<TD><CODE>Gender</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>feminine</CODE></TD> -<TD><CODE>Gender</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkN</CODE></TD> -<TD><CODE>(cheval : Str) -> N</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkN</CODE></TD> -<TD><CODE>(foie : Str) -> Gender -> N</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkA</CODE></TD> -<TD><CODE>(cher : Str) -> A</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkA</CODE></TD> -<TD><CODE>(sec,seche : Str) -> A</CODE></TD> -<TD>-</TD> -</TR> -</TABLE> - -<P></P> -<P> -For German, we will use the following part of <CODE>ParadigmsGer</CODE>. -</P> -<TABLE CELLPADDING="4" BORDER="1"> -<TR> -<TH>Function</TH> -<TH>Type</TH> -<TH COLSPAN="2">Example</TH> -</TR> -<TR> -<TD><CODE>Gender</CODE></TD> -<TD><CODE>Type</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>masculine</CODE></TD> -<TD><CODE>Gender</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>feminine</CODE></TD> -<TD><CODE>Gender</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>neuter</CODE></TD> -<TD><CODE>Gender</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkN</CODE></TD> -<TD><CODE>(Stufe : Str) -> N</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkN</CODE></TD> -<TD><CODE>(Bild,Bilder : Str) -> Gender -> N</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkA</CODE></TD> -<TD><CODE>Str -> A</CODE></TD> -<TD>-</TD> -</TR> -<TR> -<TD><CODE>mkA</CODE></TD> -<TD><CODE>(gut,besser,beste : Str) -> A</CODE></TD> -<TD><I>gut,besser,beste</I></TD> -</TR> -</TABLE> - -<P></P> -<P> -<B>Exercise</B>. Try out the morphological paradigms in different languages. Do -in this way: -</P> -<PRE> - > i -path=alltenses:prelude -retain alltenses/ParadigmsGer.gfr - > cc mkN "Farbe" - > cc mkA "gut" "besser" "beste" -</PRE> -<P></P> -<A NAME="toc70"></A> -<H2>Example: French</H2> -<P> -We start with an abstract syntax that is like <CODE>Food</CODE> before, but -has a plural determiner (<I>all wines</I>) and some new nouns that will -need different genders in most languages. -</P> -<PRE> - abstract Food = { - cat - S ; Item ; Kind ; Quality ; - fun - Is : Item -> Quality -> S ; - This, All : Kind -> Item ; - QKind : Quality -> Kind -> Kind ; - Wine, Cheese, Fish, Beer, Pizza : Kind ; - Very : Quality -> Quality ; - Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; - } -</PRE> -<P> -The French implementation opens <CODE>SyntaxFre</CODE> and <CODE>ParadigmsFre</CODE> -to get access to the resource libraries needed. In order to find -the libraries, a <CODE>path</CODE> directive is prepended; it is interpreted -relative to the environment variable <CODE>GF_LIB_PATH</CODE>. -</P> -<PRE> - --# -path=.:present:prelude - - concrete FoodFre of Food = open SyntaxFre,ParadigmsFre in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN (mkN "vin") ; - Beer = mkCN (mkN "bičre") ; - Pizza = mkCN (mkN "pizza" feminine) ; - Cheese = mkCN (mkN "fromage" masculine) ; - Fish = mkCN (mkN "poisson") ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP (mkA "frais" "fraîche") ; - Warm = mkAP (mkA "chaud") ; - Italian = mkAP (mkA "italien") ; - Expensive = mkAP (mkA "cher") ; - Delicious = mkAP (mkA "délicieux") ; - Boring = mkAP (mkA "ennuyeux") ; - } -</PRE> -<P> -The <CODE>lincat</CODE> definitions in <CODE>FoodFre</CODE> assign <B>resource categories</B> -to <B>application categories</B>. In a sense, the application categories -are <B>semantic</B>, as they correspond to concepts in the grammar application, -whereas the resource categories are <B>syntactic</B>: they give the linguistic -means to express concepts in any application. -</P> -<P> -The <CODE>lin</CODE> definitions likewise assign resource functions to application -functions. Under the hood, there is a lot of matching with parameters to -take care of word order, inflection, and agreement. But the user of the -library sees nothing of this: the only parameters you need to give are -the genders of some nouns, which cannot be correctly inferred from the word. -</P> -<P> -In French, for example, the one-argument <CODE>mkN</CODE> assigns the noun the feminine -gender if and only if it ends with an <I>e</I>. Therefore the words <I>fromage</I> and -<I>pizza</I> are given genders. One can of course always give genders manually, to -be on the safe side. -</P> -<P> -As for inflection, the one-argument adjective pattern <CODE>mkA</CODE> takes care of -completely regular adjective such as <I>chaud-chaude</I>, but also of special -cases such as <I>italien-italienne</I>, <I>cher-chčre</I>, and <I>délicieux-délicieuse</I>. -But it cannot form <I>frais-fraîche</I> properly. Once again, you can give more -forms to be on the safe side. You can also test the paradigms in the GF -program. -</P> -<P> -<B>Exercise</B>. Compile the grammar <CODE>FoodFre</CODE> and generate and parse some sentences. -</P> -<P> -<B>Exercise</B>. Write a concrete syntax of <CODE>Food</CODE> for English or some other language -included in the resource library. You can also compare the output with the hand-written -grammars presented earlier in this tutorial. -</P> -<P> -<B>Exercise</B>. In particular, try to write a concrete syntax for Italian, even if -you don't know Italian. What you need to know is that "beer" is <I>birra</I> and -"pizza" is <I>pizza</I>, and that all the nouns and adjectives in the grammar -are regular. -</P> -<A NAME="toc71"></A> -<H2>Functor implementation of multilingual grammars</H2> -<P> -If you did the exercise of writing a concrete syntax of <CODE>Food</CODE> for some other -language, you probably noticed that much of the code looks exactly the same -as for French. The immediate reason for this is that the <CODE>Syntax</CODE> API is the -same for all languages; the deeper reason is that all languages (at least those -in the resource package) implement the same syntactic structures and tend to use them -in similar ways. Thus it is only the lexical parts of a concrete syntax that -you need to write anew for a new language. In brief, -</P> -<UL> -<LI>first copy the concrete syntax for one language -<LI>then change the words (the strings and perhaps some paradigms) -</UL> - -<P> -But programming by copy-and-paste is not worthy of a functional programmer. -Can we write a function that takes care of the shared parts of grammar modules? -Yes, we can. It is not a function in the <CODE>fun</CODE> or <CODE>oper</CODE> sense, but -a function operating on modules, called a <B>functor</B>. This construct -is familiar from the functional languages ML and OCaml, but it does not -exist in Haskell. It also bears some resemblance to templates in C++. -Functors are also known as <B>parametrized modules</B>. -</P> -<P> -In GF, a functor is a module that <CODE>open</CODE>s one or more <B>interfaces</B>. -An <CODE>interface</CODE> is a module similar to a <CODE>resource</CODE>, but it only -contains the types of <CODE>oper</CODE>s, not their definitions. You can think -of an interface as a kind of a record type. Thus a functor is a kind -of a function taking records as arguments and producins a module -as value. -</P> -<P> -Let us look at a functor implementation of the <CODE>Food</CODE> grammar. -Consider its module header first: -</P> -<PRE> - incomplete concrete FoodI of Food = open Syntax, LexFood in -</PRE> -<P> -In the functor-function analogy, <CODE>FoodI</CODE> would be presented as a function -with the following type signature: -</P> -<PRE> - FoodI : instance of Syntax -> instance of LexFood -> concrete of Food -</PRE> -<P> -It takes as arguments two interfaces: -</P> -<UL> -<LI><CODE>Syntax</CODE>, the resource grammar interface -<LI><CODE>LexFood</CODE>, the domain-specific lexicon interface -</UL> - -<P> -Functors opening <CODE>Syntax</CODE> and a domain lexicon interface are in fact -so typical in GF applications, that this structure could be called a <B>design patter</B> -for GF grammars. The idea in this pattern is, again, that -the languages use the same syntactic structures but different words. -</P> -<P> -Before going to the details of the module bodies, let us look at how functors -are concretely used. An interface has a header such as -</P> -<PRE> - interface LexFood = open Syntax in -</PRE> -<P> -To give an <CODE>instance</CODE> of it means that all <CODE>oper</CODE>s are given definitione (of -appropriate types). For example, -</P> -<PRE> - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in -</PRE> -<P> -Notice that when an interface opens an interface, such as <CODE>Syntax</CODE>, then its instance -opens an instance of it. But the instance may also open some resources - typically, -a domain lexicon instance opens a <CODE>Paradigms</CODE> module. -</P> -<P> -In the function-functor analogy, we now have -</P> -<PRE> - SyntaxGer : instance of Syntax - LexFoodGer : instance of LexFood -</PRE> -<P> -Thus we can complete the German implementation by "applying" the functor: -</P> -<PRE> - FoodI SyntaxGer LexFoodGer : concrete of Food -</PRE> -<P> -The GF syntax for doing so is -</P> -<PRE> - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; -</PRE> -<P> -Notice that this is the <I>complete</I> module, not just a header of it. -The module body is received from <CODE>FoodI</CODE>, by instantiating the -interface constants with their definitions given in the German -instances. -</P> -<P> -A module of this form, characterized by the keyword <CODE>with</CODE>, is -called a <B>functor instantiation</B>. -</P> -<P> -Here is the complete code for the functor <CODE>FoodI</CODE>: -</P> -<PRE> - incomplete concrete FoodI of Food = open Syntax, LexFood in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN wine_N ; - Beer = mkCN beer_N ; - Pizza = mkCN pizza_N ; - Cheese = mkCN cheese_N ; - Fish = mkCN fish_N ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP fresh_A ; - Warm = mkAP warm_A ; - Italian = mkAP italian_A ; - Expensive = mkAP expensive_A ; - Delicious = mkAP delicious_A ; - Boring = mkAP boring_A ; - } -</PRE> -<P></P> -<A NAME="toc72"></A> -<H2>Interfaces and instances</H2> -<P> -Let us now define the <CODE>LexFood</CODE> interface: -</P> -<PRE> - interface LexFood = open Syntax in { - oper - wine_N : N ; - beer_N : N ; - pizza_N : N ; - cheese_N : N ; - fish_N : N ; - fresh_A : A ; - warm_A : A ; - italian_A : A ; - expensive_A : A ; - delicious_A : A ; - boring_A : A ; - } -</PRE> -<P> -In this interface, only lexical items are declared. In general, an -interface can declare any functions and also types. The <CODE>Syntax</CODE> -interface does so. -</P> -<P> -Here is the German instance of the interface: -</P> -<PRE> - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in { - oper - wine_N = mkN "Wein" ; - beer_N = mkN "Bier" "Biere" neuter ; - pizza_N = mkN "Pizza" "Pizzen" feminine ; - cheese_N = mkN "Käse" "Käsen" masculine ; - fish_N = mkN "Fisch" ; - fresh_A = mkA "frisch" ; - warm_A = mkA "warm" "wärmer" "wärmste" ; - italian_A = mkA "italienisch" ; - expensive_A = mkA "teuer" ; - delicious_A = mkA "köstlich" ; - boring_A = mkA "langweilig" ; - } -</PRE> -<P> -Just to complete the picture, we repeat the German functor instantiation -for <CODE>FoodI</CODE>, this time with a path directive that makes it compilable. -</P> -<PRE> - --# -path=.:present:prelude - - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; -</PRE> -<P></P> -<P> -<B>Exercise</B>. Compile and test <CODE>FoodGer</CODE>. -</P> -<P> -<B>Exercise</B>. Refactor <CODE>FoodFre</CODE> into a functor instantiation. -</P> -<A NAME="toc73"></A> -<H2>Adding languages to a functor implementation</H2> -<P> -Once we have an application grammar defined by using a functor, -adding a new language is simple. Just two modules need to be written: -</P> -<UL> -<LI>a domain lexicon instance -<LI>a functor instantiation -</UL> - -<P> -The functor instantiation is completely mechanical to write. -Here is one for Finnish: -</P> -<PRE> - --# -path=.:present:prelude - - concrete FoodFin of Food = FoodI with - (Syntax = SyntaxFin), - (LexFood = LexFoodFin) ; -</PRE> -<P> -The domain lexicon instance requires some knowledge of the words of the -language: what words are used for which concepts, how the words are -inflected, plus features such as genders. Here is a lexicon instance for -Finnish: -</P> -<PRE> - instance LexFoodFin of LexFood = open SyntaxFin, ParadigmsFin in { - oper - wine_N = mkN "viini" ; - beer_N = mkN "olut" ; - pizza_N = mkN "pizza" ; - cheese_N = mkN "juusto" ; - fish_N = mkN "kala" ; - fresh_A = mkA "tuore" ; - warm_A = mkA "lämmin" ; - italian_A = mkA "italialainen" ; - expensive_A = mkA "kallis" ; - delicious_A = mkA "herkullinen" ; - boring_A = mkA "tylsä" ; - } -</PRE> -<P></P> -<P> -<B>Exercise</B>. Instantiate the functor <CODE>FoodI</CODE> to some language of -your choice. -</P> -<A NAME="toc74"></A> -<H2>Division of labour revisited</H2> -<P> -One purpose with the resource grammars was stated to be a division -of labour between linguists and application grammarians. We can now -reflect on what this means more precisely, by asking ourselves what -skills are required of grammarians working on different components. -</P> -<P> -Building a GF application starts from the abstract syntax. Writing -an abstract syntax requires -</P> -<UL> -<LI>understanding the semantic structure of the application domain -<LI>knowledge of the GF fragment with categories and functions -</UL> - -<P> -If the concrete syntax is written by means of a functor, the programmer -has to decide what parts of the implementation are put to the interface -and what parts are shared in the functor. This requires -</P> -<UL> -<LI>knowing how the domain concepts are expressed in natural language -<LI>knowledge of the resource grammar library - the categories and combinators -<LI>understanding what parts are likely to be expressed in language-dependent - ways, so that they must belong to the interface and not the functor -<LI>knowledge of the GF fragment with function applications and strings -</UL> - -<P> -Instantiating a ready-made functor to a new language is less demanding. -It requires essentially -</P> -<UL> -<LI>knowing how the domain words are expressed in the language -<LI>knowing, roughly, how these words are inflected -<LI>knowledge of the paradigms available in the library -<LI>knowledge of the GF fragment with function applications and strings -</UL> - -<P> -Notice that none of these tasks requires the use of GF records, tables, -or parameters. Thus only a small fragment of GF is needed; the rest of -GF is only relevant for those who write the libraries. -</P> -<P> -Of course, grammar writing is not always straightforward usage of libraries. -For example, GF can be used for other languages than just those in the -libraries - for both natural and formal languages. A knowledge of records -and tables can, unfortunately, also be needed for understanding GF's error -messages. -</P> -<P> -<B>Exercise</B>. Design a small grammar that can be used for controlling -an MP3 player. The grammar should be able to recognize commands such -as <I>play this song</I>, with the following variations: -</P> -<UL> -<LI>verbs: <I>play</I>, <I>remove</I> -<LI>objects: <I>song</I>, <I>artist</I> -<LI>determiners: <I>this</I>, <I>the previous</I> -<LI>verbs without arguments: <I>stop</I>, <I>pause</I> -</UL> - -<P> -The implementation goes in the following phases: -</P> -<OL> -<LI>abstract syntax -<LI>functor and lexicon interface -<LI>lexicon instance for the first language -<LI>functor instantiation for the first language -<LI>lexicon instance for the second language -<LI>functor instantiation for the second language -<LI>... -</OL> - -<A NAME="toc75"></A> -<H2>Restricted inheritance</H2> -<P> -A functor implementation using the resource <CODE>Syntax</CODE> interface -works as long as all concepts are expressed by using the same structures -in all languages. If this is not the case, the deviant linearization can -be made into a parameter and moved to the domain lexicon interface. -</P> -<P> -Let us take a slightly contrived example: assume that English has -no word for <CODE>Pizza</CODE>, but has to use the paraphrase <I>Italian pie</I>. -This paraphrase is no longer a noun <CODE>N</CODE>, but a complex phrase -in the category <CODE>CN</CODE>. An obvious way to solve this problem is -to change interface <CODE>LexEng</CODE> so that the constant declared for -<CODE>Pizza</CODE> gets a new type: -</P> -<PRE> - oper pizza_CN : CN ; -</PRE> -<P> -But this solution is unstable: we may end up changing the interface -and the function with each new language, and we must every time also -change the interface instances for the old languages to maintain -type correctness. -</P> -<P> -A better solution is to use <B>restricted inheritance</B>: the English -instantiation inherits the functor implementation except for the -constant <CODE>Pizza</CODE>. This is how we write: -</P> -<PRE> - --# -path=.:present:prelude - - concrete FoodEng of Food = FoodI - [Pizza] with - (Syntax = SyntaxEng), - (LexFood = LexFoodEng) ** - open SyntaxEng, ParadigmsEng in { - - lin Pizza = mkCN (mkA "Italian") (mkN "pie") ; - } -</PRE> -<P> -Restricted inheritance is available for all inherited modules. One can for -instance exclude some mushrooms and pick up just some fruit in -the <CODE>FoodMarket</CODE> example: -</P> -<PRE> - abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric] -</PRE> -<P> -A concrete syntax of <CODE>Foodmarket</CODE> must then indicate the same inheritance -restrictions. -</P> -<P> -<B>Exercise</B>. Change <CODE>FoodGer</CODE> in such a way that it says, instead of -<I>X is Y</I>, the equivalent of <I>X must be Y</I> (<I>X muss Y sein</I>). -You will have to browse the full resource API to find all -the functions needed. -</P> -<A NAME="toc76"></A> -<H2>Browsing the resource with GF commands</H2> -<P> -In addition to reading the -<A HREF="../../lib/resource-1.0/synopsis.html">resource synopsis</A>, you -can find resource function combinations by using the parser. This -is so because the resource library is in the end implemented as -a top-level <CODE>abstract-concrete</CODE> grammar, on which parsing -and linearization work. -</P> -<P> -Unfortunately, only English and the Scandinavian languages can be -parsed within acceptable computer resource limits when the full -resource is used. -</P> -<P> -To look for a syntax tree in the overload API by parsing, do like this: -</P> -<PRE> - > $GF_LIB_PATH - > i -path=alltenses:prelude alltenses/OverLangEng.gfc - > p -cat=S -overload "this grammar is too big" - mkS (mkCl (mkNP (mkDet this_Quant) grammar_N) (mkAP too_AdA big_A)) -</PRE> -<P> -To view linearizations in all languages by parsing from English: -</P> -<PRE> - > i alltenses/langs.gfcm - > p -cat=S -lang=LangEng "this grammar is too big" | tb - UseCl TPres ASimul PPos (PredVP (DetCN (DetSg (SgQuant this_Quant) - NoOrd) (UseN grammar_N)) (UseComp (CompAP (AdAP too_AdA (PositA big_A))))) - Den här grammatiken är för stor - Esta gramática es demasiado grande - (Cyrillic: eta grammatika govorit des'at' jazykov) - Denne grammatikken er for stor - Questa grammatica č troppo grande - Diese Grammatik ist zu groß - Cette grammaire est trop grande - Tämä kielioppi on liian suuri - This grammar is too big - Denne grammatik er for stor -</PRE> -<P> -Unfortunately, the Russian grammar uses at the moment a different -character encoding than the rest and is therefore not displayed correctly -in a terminal window. However, the GF syntax editor does display all -examples correctly: -</P> -<PRE> - % gfeditor alltenses/langs.gfcm -</PRE> -<P> -When you have constructed the tree, you will see the following screen: -</P> -<P> -<center> -</P> -<P> - <IMG ALIGN="right" SRC="../../lib/resource-1.0/doc/10lang-small.png" BORDER="0" ALT=""> -</P> -<P> -</center> -</P> -<P> -<B>Exercise</B>. Find the resource grammar translations for the following -English phrases (parse in the category <CODE>Phr</CODE>). You can first try to -build the terms manually. -</P> -<P> -<I>every man loves a woman</I> -</P> -<P> -<I>this grammar speaks more than ten languages</I> -</P> -<P> -<I>which languages aren't in the grammar</I> -</P> -<P> -<I>which languages did you want to speak</I> -</P> -<A NAME="toc77"></A> -<H1>More concepts of abstract syntax</H1> -<A NAME="toc78"></A> -<H2>GF as a logical framework</H2> -<P> -In this section, we will show how -to encode advanced semantic concepts in an abstract syntax. -We use concepts inherited from <B>type theory</B>. Type theory -is the basis of many systems known as <B>logical frameworks</B>, which are -used for representing mathematical theorems and their proofs on a computer. -In fact, GF has a logical framework as its proper part: -this part is the abstract syntax. -</P> -<P> -In a logical framework, the formalization of a mathematical theory -is a set of type and function declarations. The following is an example -of such a theory, represented as an <CODE>abstract</CODE> module in GF. -</P> -<PRE> - abstract Arithm = { - cat - Prop ; -- proposition - Nat ; -- natural number - fun - Zero : Nat ; -- 0 - Succ : Nat -> Nat ; -- successor of x - Even : Nat -> Prop ; -- x is even - And : Prop -> Prop -> Prop ; -- A and B - } -</PRE> -<P></P> -<P> -<B>Exercise</B>. Give a concrete syntax of <CODE>Arithm</CODE>, either from scatch or -by using the resource library. -</P> -<A NAME="toc79"></A> -<H2>Dependent types</H2> -<P> -<B>Dependent types</B> are a characteristic feature of GF, -inherited from the <B>constructive type theory</B> of Martin-Löf and -distinguishing GF from most other grammar formalisms and -functional programming languages. -</P> -<P> -Dependent types can be used for stating stronger -<B>conditions of well-formedness</B> than ordinary types. -A simple example is a "smart house" system, which -defines voice commands for household appliances. This example -is borrowed from the -<A HREF="http://cslipublications.stanford.edu/site/1575865262.html">Regulus Book</A> -(Rayner & al. 2006). -</P> -<P> -One who enters a smart house can use speech to dim lights, switch -on the fan, etc. For each <CODE>Kind</CODE> of a device, there is a set of -<CODE>Actions</CODE> that can be performed on it; thus one can dim the lights but - not the fan, for example. These dependencies can be expressed by -by making the type <CODE>Action</CODE> dependent on <CODE>Kind</CODE>. We express this -as follows in <CODE>cat</CODE> declarations: -</P> -<PRE> - cat - Command ; - Kind ; - Action Kind ; - Device Kind ; -</PRE> -<P> -The crucial use of the dependencies is made in the rule for forming commands: -</P> -<PRE> - fun CAction : (k : Kind) -> Action k -> Device k -> Command ; -</PRE> -<P> -In other words: an action and a device can be combined into a command only -if they are of the same <CODE>Kind</CODE> <CODE>k</CODE>. If we have the functions -</P> -<PRE> - DKindOne : (k : Kind) -> Device k ; -- the light - - light, fan : Kind ; - dim : Action light ; -</PRE> -<P> -we can form the syntax tree -</P> -<PRE> - CAction light dim (DKindOne light) -</PRE> -<P> -but we cannot form the trees -</P> -<PRE> - CAction light dim (DKindOne fan) - CAction fan dim (DKindOne light) - CAction fan dim (DKindOne fan) -</PRE> -<P> -Linearization rules are written as usual: the concrete syntax does not -know if a category is a dependent type. In English, you can write as follows: -</P> -<PRE> - lincat Action = {s : Str} ; - lin CAction kind act dev = {s = act.s ++ dev.s} ; -</PRE> -<P> -Notice that the argument <CODE>kind</CODE> does not appear in the linearization. -The type checker will be able to reconstruct it from the <CODE>dev</CODE> argument. -</P> -<P> -Parsing with dependent types is performed in two phases: -</P> -<OL> -<LI>context-free parsing -<LI>filtering through type checker -</OL> - -<P> -If you just parse in the usual way, you don't enter the second phase, and -the <CODE>kind</CODE> argument is not found: -</P> -<PRE> - > parse "dim the light" - CAction ? dim (DKindOne light) -</PRE> -<P> -Moreover, type-incorrect commands are not rejected: -</P> -<PRE> - > parse "dim the fan" - CAction ? dim (DKindOne fan) -</PRE> -<P> -The question mark <CODE>?</CODE> is a <B>metavariable</B>, and is returned by the parser -for any subtree that is suppressed by a linearization rule. -</P> -<P> -To get rid of metavariables, you must feed the parse result into the -second phase of <B>solving</B> them. The <CODE>solve</CODE> process uses the dependent -type checker to restore the values of the metavariables. It is invoked by -the command <CODE>put_tree = pt</CODE> with the flag <CODE>-transform=solve</CODE>: -</P> -<PRE> - > parse "dim the light" | put_tree -transform=solve - CAction light dim (DKindOne light) -</PRE> -<P> -The <CODE>solve</CODE> process may fail, in which case no tree is returned: -</P> -<PRE> - > parse "dim the fan" | put_tree -transform=solve - no tree found -</PRE> -<P></P> -<P> -<B>Exercise</B>. Write an abstract syntax module with above contents -and an appropriate English concrete syntax. Try to parse the commands -<I>dim the light</I> and <I>dim the fan</I>, with and without <CODE>solve</CODE> filtering. -</P> -<P> -<B>Exercise</B>. Perform random and exhaustive generation, with and without -<CODE>solve</CODE> filtering. -</P> -<P> -<B>Exercise</B>. Add some device kinds and actions to the grammar. -</P> -<A NAME="toc80"></A> -<H2>Polymorphism</H2> -<P> -Sometimes an action can be performed on all kinds of devices. It would be -possible to introduce separate <CODE>fun</CODE> constants for each kind-action pair, -but this would be tedious. Instead, one can use <B>polymorphic</B> actions, -i.e. actions that take a <CODE>Kind</CODE> as an argument and produce an <CODE>Action</CODE> -for that <CODE>Kind</CODE>: -</P> -<PRE> - fun switchOn, switchOff : (k : Kind) -> Action k ; -</PRE> -<P> -Functions that are not polymorphic are <B>monomorphic</B>. However, the -dichotomy into monomorphism and full polymorphism is not always sufficien -for good semantic modelling: very typically, some actions are defined -for a proper subset of devices, but not just one. For instance, both doors and -windows can be opened, whereas lights cannot. -We will return to this problem by introducing the -concept of <B>restricted polymorphism</B> later, -after a chapter on proof objects. -</P> -<A NAME="toc81"></A> -<H2>Dependent types and spoken language models</H2> -<P> -We have used dependent types to control semantic well-formedness -in grammars. This is important in traditional type theory -applications such as proof assistants, where only mathematically -meaningful formulas should be constructed. But semantic filtering has -also proved important in speech recognition, because it reduces the -ambiguity of the results. -</P> -<A NAME="toc82"></A> -<H3>Grammar-based language models</H3> -<P> -The standard way of using GF in speech recognition is by building -<B>grammar-based language models</B>. To this end, GF comes with compilers -into several formats that are used in speech recognition systems. -One such format is GSL, used in the <A HREF="http://www.nuance.com">Nuance speech recognizer</A>. -It is produced from GF simply by printing a grammar with the flag -<CODE>-printer=gsl</CODE>. -</P> -<PRE> - > import -conversion=finite SmartEng.gf - > print_grammar -printer=gsl - - ;GSL2.0 - ; Nuance speech recognition grammar for SmartEng - ; Generated by GF - - .MAIN SmartEng_2 - - SmartEng_0 [("switch" "off") ("switch" "on")] - SmartEng_1 ["dim" ("switch" "off") - ("switch" "on")] - SmartEng_2 [(SmartEng_0 SmartEng_3) - (SmartEng_1 SmartEng_4)] - SmartEng_3 ("the" SmartEng_5) - SmartEng_4 ("the" SmartEng_6) - SmartEng_5 "fan" - SmartEng_6 "light" -</PRE> -<P> -Now, GSL is a context-free format, so how does it cope with dependent types? -In general, dependent types can give rise to infinitely many basic types -(exercise!), whereas a context-free grammar can by definition only have -finitely many nonterminals. -</P> -<P> -This is where the flag <CODE>-conversion=finite</CODE> is needed in the <CODE>import</CODE> -command. Its effect is to convert a GF grammar with dependent types to -one without, so that each instance of a dependent type is replaced by -an atomic type. This can then be used as a nonterminal in a context-free -grammar. The <CODE>finite</CODE> conversion presupposes that every -dependent type has only finitely many instances, which is in fact -the case in the <CODE>Smart</CODE> grammar. -</P> -<P> -<B>Exercise</B>. If you have access to the Nuance speech recognizer, -test it with GF-generated language models for <CODE>SmartEng</CODE>. Do this -both with and without <CODE>-conversion=finite</CODE>. -</P> -<P> -<B>Exercise</B>. Construct an abstract syntax with infinitely many instances -of dependent types. -</P> -<A NAME="toc83"></A> -<H3>Statistical language models</H3> -<P> -An alternative to grammar-based language models are -<B>statistical language models</B> (<B>SLM</B>s). An SLM is -built from a <B>corpus</B>, i.e. a set of utterances. It specifies the -probability of each <B>n-gram</B>, i.e. sequence of <I>n</I> words. The -typical value of <I>n</I> is 2 (bigrams) or 3 (trigrams). -</P> -<P> -One advantage of SLMs over grammar-based models is that they are -<B>robust</B>, i.e. they can be used to recognize sequences that would -be out of the grammar or the corpus. Another advantage is that -an SLM can be built "for free" if a corpus is available. -</P> -<P> -However, collecting a corpus can require a lot of work, and writing -a grammar can be less demanding, especially with tools such as GF or -Regulus. This advantage of grammars can be combined with robustness -by creating a back-up SLM from a <B>synthesized corpus</B>. This means -simply that the grammar is used for generating such a corpus. -In GF, this can be done with the <CODE>generate_trees</CODE> command. -As with grammar-based models, the quality of the SLM is better -if meaningless utterances are excluded from the corpus. Thus -a good way to generate an SLM from a GF grammar is by using -dependent types and filter the results through the type checker: -</P> -<PRE> - > generate_trees | put_trees -transform=solve | linearize -</PRE> -<P></P> -<P> -<B>Exercise</B>. Measure the size of the corpus generated from -<CODE>SmartEng</CODE>, with and without type checker filtering. -</P> -<A NAME="toc84"></A> -<H2>Digression: dependent types in concrete syntax</H2> -<A NAME="toc85"></A> -<H3>Variables in function types</H3> -<P> -A dependent function type needs to introduce a variable for -its argument type, as in -</P> -<PRE> - switchOff : (k : Kind) -> Action k -</PRE> -<P> -Function types <I>without</I> -variables are actually a shorthand notation: writing -</P> -<PRE> - fun PredVP : NP -> VP -> S -</PRE> -<P> -is shorthand for -</P> -<PRE> - fun PredVP : (x : NP) -> (y : VP) -> S -</PRE> -<P> -or any other naming of the variables. Actually the use of variables -sometimes shortens the code, since they can share a type: -</P> -<PRE> - octuple : (x,y,z,u,v,w,s,t : Str) -> Str -</PRE> -<P> -If a bound variable is not used, it can here, as elsewhere in GF, be replaced by -a wildcard: -</P> -<PRE> - octuple : (_,_,_,_,_,_,_,_ : Str) -> Str -</PRE> -<P> -A good practice for functions with many arguments of the same type -is to indicate the number of arguments: -</P> -<PRE> - octuple : (x1,_,_,_,_,_,_,x8 : Str) -> Str -</PRE> -<P> -One can also use the variables to document what each argument is expected -to provide, as is done in inflection paradigms in the resource grammar. -</P> -<PRE> - mkV : (drink,drank,drunk : Str) -> V -</PRE> -<P></P> -<A NAME="toc86"></A> -<H3>Polymorphism in concrete syntax</H3> -<P> -The <B>functional fragment</B> of GF -terms and types comprises function types, applications, lambda -abstracts, constants, and variables. This fragment is similar in -abstract and concrete syntax. In particular, -dependent types are also available in concrete syntax. -We have not made use of them yet, -but we will now look at one example of how they -can be used. -</P> -<P> -Those readers who are familiar with functional programming languages -like ML and Haskell, may already have missed <B>polymorphic</B> -functions. For instance, Haskell programmers have access to -the functions -</P> -<PRE> - const :: a -> b -> a - const c _ = c - - flip :: (a -> b -> c) -> b -> a -> c - flip f y x = f x y -</PRE> -<P> -which can be used for any given types <CODE>a</CODE>,<CODE>b</CODE>, and <CODE>c</CODE>. -</P> -<P> -The GF counterpart of polymorphic functions are <B>monomorphic</B> -functions with explicit <B>type variables</B>. Thus the above -definitions can be written -</P> -<PRE> - oper const :(a,b : Type) -> a -> b -> a = - \_,_,c,_ -> c ; - - oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c = - \_,_,_,f,x,y -> f y x ; -</PRE> -<P> -When the operations are used, the type checker requires -them to be equipped with all their arguments; this may be a nuisance -for a Haskell or ML programmer. -</P> -<A NAME="toc87"></A> -<H2>Proof objects</H2> -<P> -Perhaps the most well-known idea in constructive type theory is -the <B>Curry-Howard isomorphism</B>, also known as the -<B>propositions as types principle</B>. Its earliest formulations -were attempts to give semantics to the logical systems of -propositional and predicate calculus. In this section, we will consider -a more elementary example, showing how the notion of proof is useful -outside mathematics, as well. -</P> -<P> -We first define the category of unary (also known as Peano-style) -natural numbers: -</P> -<PRE> - cat Nat ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; -</PRE> -<P> -The <B>successor function</B> <CODE>Succ</CODE> generates an infinite -sequence of natural numbers, beginning from <CODE>Zero</CODE>. -</P> -<P> -We then define what it means for a number <I>x</I> to be <I>less than</I> -a number <I>y</I>. Our definition is based on two axioms: -</P> -<UL> -<LI><CODE>Zero</CODE> is less than <CODE>Succ</CODE> <I>y</I> for any <I>y</I>. -<LI>If <I>x</I> is less than <I>y</I>, then <CODE>Succ</CODE> <I>x</I> is less than <CODE>Succ</CODE> <I>y</I>. -</UL> - -<P> -The most straightforward way of expressing these axioms in type theory -is as typing judgements that introduce objects of a type <CODE>Less</CODE> <I>x y</I>: -</P> -<PRE> - cat Less Nat Nat ; - fun lessZ : (y : Nat) -> Less Zero (Succ y) ; - fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ; -</PRE> -<P> -Objects formed by <CODE>lessZ</CODE> and <CODE>lessS</CODE> are -called <B>proof objects</B>: they establish the truth of certain -mathematical propositions. -For instance, the fact that 2 is less that -4 has the proof object -</P> -<PRE> - lessS (Succ Zero) (Succ (Succ (Succ Zero))) - (lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero))) -</PRE> -<P> -whose type is -</P> -<PRE> - Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero)))) -</PRE> -<P> -which is the formalization of the proposition that 2 is less than 4. -</P> -<P> -GF grammars can be used to provide a <B>semantic control</B> of -well-formedness of expressions. We have already seen examples of this: -the grammar of well-formed actions on household devices. By introducing proof objects -we have now added a very powerful technique of expressing semantic conditions. -</P> -<P> -A simple example of the use of proof objects is the definition of -well-formed <I>time spans</I>: a time span is expected to be from an earlier to -a later time: -</P> -<PRE> - from 3 to 8 -</PRE> -<P> -is thus well-formed, whereas -</P> -<PRE> - from 8 to 3 -</PRE> -<P> -is not. The following rules for spans impose this condition -by using the <CODE>Less</CODE> predicate: -</P> -<PRE> - cat Span ; - fun span : (m,n : Nat) -> Less m n -> Span ; -</PRE> -<P></P> -<P> -<B>Exercise</B>. Write an abstract and concrete syntax with the -concepts of this section, and experiment with it in GF. -</P> -<P> -<B>Exercise</B>. Define the notions of "even" and "odd" in terms -of proof objects. <B>Hint</B>. You need one function for proving -that 0 is even, and two other functions for propagating the -properties. -</P> -<A NAME="toc88"></A> -<H3>Proof-carrying documents</H3> -<P> -Another possible application of proof objects is <B>proof-carrying documents</B>: -to be semantically well-formed, the abstract syntax of a document must contain a proof -of some property, although the proof is not shown in the concrete document. -Think, for instance, of small documents describing flight connections: -</P> -<P> -<I>To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.</I> -</P> -<P> -The well-formedness of this text is partly expressible by dependent typing: -</P> -<PRE> - cat - City ; - Flight City City ; - fun - Gothenburg, Frankfurt, Prague : City ; - LH3043 : Flight Gothenburg Frankfurt ; - OK0537 : Flight Frankfurt Prague ; -</PRE> -<P> -This rules out texts saying <I>take OK0537 from Gothenburg to Prague</I>. -However, there is a -further condition saying that it must be possible to -change from LH3043 to OK0537 in Frankfurt. -This can be modelled as a proof object of a suitable type, -which is required by the constructor -that connects flights. -</P> -<PRE> - cat - IsPossible (x,y,z : City)(Flight x y)(Flight y z) ; - fun - Connect : (x,y,z : City) -> - (u : Flight x y) -> (v : Flight y z) -> - IsPossible x y z u v -> Flight x z ; -</PRE> -<P></P> -<A NAME="toc89"></A> -<H2>Restricted polymorphism</H2> -<P> -In the first version of the smart house grammar <CODE>Smart</CODE>, -all Actions were either of -</P> -<UL> -<LI><B>monomorphic</B>: defined for one Kind -<LI><B>polymorphic</B>: defined for all Kinds -</UL> - -<P> -To make this scale up for new Kinds, we can refine this to -<B>restricted polymorphism</B>: defined for Kinds of a certain <B>class</B> -</P> -<P> -The notion of class can be expressed in abstract syntax -by using the Curry-Howard isomorphism as follows: -</P> -<UL> -<LI>a class is a <B>predicate</B> of Kinds - i.e. a type depending of Kinds -<LI>a Kind is in a class if there is a proof object of this type -</UL> - -<P> -Here is an example with switching and dimming. The classes are called -<CODE>switchable</CODE> and <CODE>dimmable</CODE>. -</P> -<PRE> - cat - Switchable Kind ; - Dimmable Kind ; - fun - switchable_light : Switchable light ; - switchable_fan : Switchable fan ; - dimmable_light : Dimmable light ; - - switchOn : (k : Kind) -> Switchable k -> Action k ; - dim : (k : Kind) -> Dimmable k -> Action k ; -</PRE> -<P> -One advantage of this formalization is that classes for new -actions can be added incrementally. -</P> -<P> -<B>Exercise</B>. Write a new version of the <CODE>Smart</CODE> grammar with -classes, and test it in GF. -</P> -<P> -<B>Exercise</B>. Add some actions, kinds, and classes to the grammar. -Try to port the grammar to a new language. You will probably find -out that restricted polymorphism works differently in different languages. -For instance, in Finnish not only doors but also TVs and radios -can be "opened", which means switching them on. -</P> -<A NAME="toc90"></A> -<H2>Variable bindings</H2> -<P> -Mathematical notation and programming languages have -expressions that <B>bind</B> variables. For instance, -a universally quantifier proposition -</P> -<PRE> - (All x)B(x) -</PRE> -<P> -consists of the <B>binding</B> <CODE>(All x)</CODE> of the variable <CODE>x</CODE>, -and the <B>body</B> <CODE>B(x)</CODE>, where the variable <CODE>x</CODE> can have -<B>bound occurrences</B>. -</P> -<P> -Variable bindings appear in informal mathematical language as well, for -instance, -</P> -<PRE> - for all x, x is equal to x - - the function that for any numbers x and y returns the maximum of x+y - and x*y - - Let x be a natural number. Assume that x is even. Then x + 3 is odd. -</PRE> -<P> -In type theory, variable-binding expression forms can be formalized -as functions that take functions as arguments. The universal -quantifier is defined -</P> -<PRE> - fun All : (Ind -> Prop) -> Prop -</PRE> -<P> -where <CODE>Ind</CODE> is the type of individuals and <CODE>Prop</CODE>, -the type of propositions. If we have, for instance, the equality predicate -</P> -<PRE> - fun Eq : Ind -> Ind -> Prop -</PRE> -<P> -we may form the tree -</P> -<PRE> - All (\x -> Eq x x) -</PRE> -<P> -which corresponds to the ordinary notation -</P> -<PRE> - (All x)(x = x). -</PRE> -<P> -An abstract syntax where trees have functions as arguments, as in -the two examples above, has turned out to be precisely the right -thing for the semantics and computer implementation of -variable-binding expressions. The advantage lies in the fact that -only one variable-binding expression form is needed, the lambda abstract -<CODE>\x -> b</CODE>, and all other bindings can be reduced to it. -This makes it easier to implement mathematical theories and reason -about them, since variable binding is tricky to implement and -to reason about. The idea of using functions as arguments of -syntactic constructors is known as <B>higher-order abstract syntax</B>. -</P> -<P> -The question now arises: how to define linearization rules -for variable-binding expressions? -Let us first consider universal quantification, -</P> -<PRE> - fun All : (Ind -> Prop) -> Prop -</PRE> -<P> -We write -</P> -<PRE> - lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s} -</PRE> -<P> -to obtain the form shown above. -This linearization rule brings in a new GF concept - the <CODE>$0</CODE> -field of <CODE>B</CODE> containing a bound variable symbol. -The general rule is that, if an argument type of a function is -itself a function type <CODE>A -> C</CODE>, the linearization type of -this argument is the linearization type of <CODE>C</CODE> -together with a new field <CODE>$0 : Str</CODE>. In the linearization rule -for <CODE>All</CODE>, the argument <CODE>B</CODE> thus has the linearization -type -</P> -<PRE> - {$0 : Str ; s : Str}, -</PRE> -<P> -since the linearization type of <CODE>Prop</CODE> is -</P> -<PRE> - {s : Str} -</PRE> -<P> -In other words, the linearization of a function -consists of a linearization of the body together with a -field for a linearization of the bound variable. -Those familiar with type theory or lambda calculus -should notice that GF requires trees to be in -<B>eta-expanded</B> form in order to be linearizable: -any function of type -</P> -<PRE> - A -> B -</PRE> -<P> -always has a syntax tree of the form -</P> -<PRE> - \x -> b -</PRE> -<P> -where <CODE>b : B</CODE> under the assumption <CODE>x : A</CODE>. -It is in this form that an expression can be analysed -as having a bound variable and a body. -</P> -<P> -Given the linearization rule -</P> -<PRE> - lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"} -</PRE> -<P> -the linearization of -</P> -<PRE> - \x -> Eq x x -</PRE> -<P> -is the record -</P> -<PRE> - {$0 = "x", s = ["( x = x )"]} -</PRE> -<P> -Thus we can compute the linearization of the formula, -</P> -<PRE> - All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}. -</PRE> -<P> -How did we get the <I>linearization</I> of the variable <CODE>x</CODE> -into the string <CODE>"x"</CODE>? GF grammars have no rules for -this: it is just hard-wired in GF that variable symbols are -linearized into the same strings that represent them in -the print-out of the abstract syntax. -</P> -<P> -To be able to <I>parse</I> variable symbols, however, GF needs to know what -to look for (instead of e.g. trying to parse <I>any</I> -string as a variable). What strings are parsed as variable symbols -is defined in the lexical analysis part of GF parsing -</P> -<PRE> - > p -cat=Prop -lexer=codevars "(All x)(x = x)" - All (\x -> Eq x x) -</PRE> -<P> -(see more details on lexers below). If several variables are bound in the -same argument, the labels are <CODE>$0, $1, $2</CODE>, etc. -</P> -<P> -<B>Exercise</B>. Write an abstract syntax of the whole -<B>predicate calculus</B>, with the -<B>connectives</B> "and", "or", "implies", and "not", and the -<B>quantifiers</B> "exists" and "for all". Use higher-order functions -to guarantee that unbounded variables do not occur. -</P> -<P> -<B>Exercise</B>. Write a concrete syntax for your favourite -notation of predicate calculus. Use Latex as target language -if you want nice output. You can also try producing Haskell boolean -expressions. Use as many parenthesis as you need to -guarantee non-ambiguity. -</P> -<A NAME="toc91"></A> -<H2>Semantic definitions</H2> -<P> -We have seen that, -just like functional programming languages, GF has declarations -of functions, telling what the type of a function is. -But we have not yet shown how to <B>compute</B> -these functions: all we can do is provide them with arguments -and linearize the resulting terms. -Since our main interest is the well-formedness of expressions, -this has not yet bothered -us very much. As we will see, however, computation does play a role -even in the well-formedness of expressions when dependent types are -present. -</P> -<P> -GF has a form of judgement for <B>semantic definitions</B>, -recognized by the key word <CODE>def</CODE>. At its simplest, it is just -the definition of one constant, e.g. -</P> -<PRE> - def one = Succ Zero ; -</PRE> -<P> -We can also define a function with arguments, -</P> -<PRE> - def Neg A = Impl A Abs ; -</PRE> -<P> -which is still a special case of the most general notion of -definition, that of a group of <B>pattern equations</B>: -</P> -<PRE> - def - sum x Zero = x ; - sum x (Succ y) = Succ (Sum x y) ; -</PRE> -<P> -To compute a term is, as in functional programming languages, -simply to follow a chain of reductions until no definition -can be applied. For instance, we compute -</P> -<PRE> - Sum one one --> - Sum (Succ Zero) (Succ Zero) --> - Succ (sum (Succ Zero) Zero) --> - Succ (Succ Zero) -</PRE> -<P> -Computation in GF is performed with the <CODE>pt</CODE> command and the -<CODE>compute</CODE> transformation, e.g. -</P> -<PRE> - > p -tr "1 + 1" | pt -transform=compute -tr | l - sum one one - Succ (Succ Zero) - s(s(0)) -</PRE> -<P></P> -<P> -The <CODE>def</CODE> definitions of a grammar induce a notion of -<B>definitional equality</B> among trees: two trees are -definitionally equal if they compute into the same tree. -Thus, trivially, all trees in a chain of computation -(such as the one above) -are definitionally equal to each other. So are the trees -</P> -<PRE> - sum Zero (Succ one) - Succ one - sum (sum Zero Zero) (sum (Succ Zero) one) -</PRE> -<P> -and infinitely many other trees. -</P> -<P> -A fact that has to be emphasized about <CODE>def</CODE> definitions is that -they are <I>not</I> performed as a first step of linearization. -We say that <B>linearization is intensional</B>, which means that -the definitional equality of two trees does not imply that -they have the same linearizations. For instance, each of the seven terms -shown above has a different linearizations in arithmetic notation: -</P> -<PRE> - 1 + 1 - s(0) + s(0) - s(s(0) + 0) - s(s(0)) - 0 + s(0) - s(1) - 0 + 0 + s(0) + 1 -</PRE> -<P> -This notion of intensionality is -no more exotic than the intensionality of any <B>pretty-printing</B> -function of a programming language (function that shows -the expressions of the language as strings). It is vital for -pretty-printing to be intensional in this sense - if we want, -for instance, to trace a chain of computation by pretty-printing each -intermediate step, what we want to see is a sequence of different -expression, which are definitionally equal. -</P> -<P> -What is more exotic is that GF has two ways of referring to the -abstract syntax objects. In the concrete syntax, the reference is intensional. -In the abstract syntax, the reference is extensional, since -<B>type checking is extensional</B>. The reason is that, -in the type theory with dependent types, types may depend on terms. -Two types depending on terms that are definitionally equal are -equal types. For instance, -</P> -<PRE> - Proof (Odd one) - Proof (Odd (Succ Zero)) -</PRE> -<P> -are equal types. Hence, any tree that type checks as a proof that -1 is odd also type checks as a proof that the successor of 0 is odd. -(Recall, in this connection, that the -arguments a category depends on never play any role -in the linearization of trees of that category, -nor in the definition of the linearization type.) -</P> -<P> -In addition to computation, definitions impose a -<B>paraphrase</B> relation on expressions: -two strings are paraphrases if they -are linearizations of trees that are -definitionally equal. -Paraphrases are sometimes interesting for -translation: the <B>direct translation</B> -of a string, which is the linearization of the same tree -in the targer language, may be inadequate because it is e.g. -unidiomatic or ambiguous. In such a case, -the translation algorithm may be made to consider -translation by a paraphrase. -</P> -<P> -To stress express the distinction between -<B>constructors</B> (=<B>canonical</B> functions) -and other functions, GF has a judgement form -<CODE>data</CODE> to tell that certain functions are canonical, e.g. -</P> -<PRE> - data Nat = Succ | Zero ; -</PRE> -<P> -Unlike in Haskell, but similarly to ALF (where constructor functions -are marked with a flag <CODE>C</CODE>), -new constructors can be added to -a type with new <CODE>data</CODE> judgements. The type signatures of constructors -are given separately, in ordinary <CODE>fun</CODE> judgements. -One can also write directly -</P> -<PRE> - data Succ : Nat -> Nat ; -</PRE> -<P> -which is equivalent to the two judgements -</P> -<PRE> - fun Succ : Nat -> Nat ; - data Nat = Succ ; -</PRE> -<P></P> -<P> -<B>Exercise</B>. Implement an interpreter of a small functional programming -language with natural numbers, lists, pairs, lambdas, etc. Use higher-order -abstract syntax with semantic definitions. As target language, use -your favourite programming language. -</P> -<P> -<B>Exercise</B>. To make your interpreted language look nice, use -<B>precedences</B> instead of putting parentheses everywhere. -You can use the <A HREF="../../lib/prelude/Precedence.gf">precedence library</A> -of GF to facilitate this. -</P> -<A NAME="toc92"></A> -<H1>Practical issues</H1> -<A NAME="toc93"></A> -<H2>Lexers and unlexers</H2> -<P> -Lexers and unlexers can be chosen from -a list of predefined ones, using the flags<CODE>-lexer</CODE> and `` -unlexer`` either -in the grammar file or on the GF command line. Here are some often-used lexers -and unlexers: -</P> -<PRE> - The default is words. - -lexer=words tokens are separated by spaces or newlines - -lexer=literals like words, but GF integer and string literals recognized - -lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta - -lexer=chars each character is a token - -lexer=code use Haskell's lex - -lexer=codevars like code, but treat unknown words as variables, ?? as meta - -lexer=text with conventions on punctuation and capital letters - -lexer=codelit like code, but treat unknown words as string literals - -lexer=textlit like text, but treat unknown words as string literals - - The default is unwords. - -unlexer=unwords space-separated token list (like unwords) - -unlexer=text format as text: punctuation, capitals, paragraph <p> - -unlexer=code format as code (spacing, indentation) - -unlexer=textlit like text, but remove string literal quotes - -unlexer=codelit like code, but remove string literal quotes - -unlexer=concat remove all spaces -</PRE> -<P> -More options can be found by <CODE>help -lexer</CODE> and <CODE>help -unlexer</CODE>: -</P> -<A NAME="toc94"></A> -<H2>Speech input and output</H2> -<P> -The <CODE>speak_aloud = sa</CODE> command sends a string to the speech -synthesizer -<A HREF="http://www.speech.cs.cmu.edu/flite/doc/">Flite</A>. -It is typically used via a pipe: -</P> -<PRE> - generate_random | linearize | speak_aloud -</PRE> -<P> -The result is only satisfactory for English. -</P> -<P> -The <CODE>speech_input = si</CODE> command receives a string from a -speech recognizer that requires the installation of -<A HREF="http://mi.eng.cam.ac.uk/~sjy/software.htm">ATK</A>. -It is typically used to pipe input to a parser: -</P> -<PRE> - speech_input -tr | parse -</PRE> -<P> -The method words only for grammars of English. -</P> -<P> -Both Flite and ATK are freely available through the links -above, but they are not distributed together with GF. -</P> -<A NAME="toc95"></A> -<H2>Multilingual syntax editor</H2> -<P> -The -<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">Editor User Manual</A> -describes the use of the editor, which works for any multilingual GF grammar. -</P> -<P> -Here is a snapshot of the editor: -</P> -<P> -<center> -</P> -<P> -<IMG ALIGN="middle" SRC="../quick-editor.png" BORDER="0" ALT=""> -</P> -<P> -</center> -</P> -<P> -The grammars of the snapshot are from the -<A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/letter">Letter grammar package</A>. -</P> -<A NAME="toc96"></A> -<H2>Communicating with GF</H2> -<P> -Other processes can communicate with the GF command interpreter, -and also with the GF syntax editor. Useful flags when invoking GF are -</P> -<UL> -<LI><CODE>-batch</CODE> suppresses the promps and structures the communication with XML tags. -<LI><CODE>-s</CODE> suppresses non-output non-error messages and XML tags. -<LI><CODE>-nocpu</CODE> suppresses CPU time indication. -</UL> - -<P> -Thus the most silent way to invoke GF is -</P> -<PRE> - gf -batch -s -nocpu -</PRE> -<P></P> -<A NAME="toc97"></A> -<H1>Embedded grammars in Haskell and Java</H1> -<P> -GF grammars can be used as parts of programs written in the -following languages. We will go through a skeleton application in -Haskell, while the next chapter will show how to build an -application in Java. -</P> -<P> -We will show how to build a minimal resource grammar -application whose architecture scales up to much -larger applications. The application is run from the -shell by the command -</P> -<PRE> - math -</PRE> -<P> -whereafter it reads user input in English and French. -To each input line, it answers by the truth value of -the sentence. -</P> -<PRE> - ./math - zéro est pair - True - zero is odd - False - zero is even and zero is odd - False -</PRE> -<P> -The source of the application consists of the following -files: -</P> -<PRE> - LexEng.gf -- English instance of Lex - LexFre.gf -- French instance of Lex - Lex.gf -- lexicon interface - Makefile -- a makefile - MathEng.gf -- English instantiation of MathI - MathFre.gf -- French instantiation of MathI - Math.gf -- abstract syntax - MathI.gf -- concrete syntax functor for Math - Run.hs -- Haskell Main module -</PRE> -<P> -The system was built in 22 steps explained below. -</P> -<A NAME="toc98"></A> -<H2>Writing GF grammars</H2> -<A NAME="toc99"></A> -<H3>Creating the first grammar</H3> -<P> -1. Write <CODE>Math.gf</CODE>, which defines what you want to say. -</P> -<PRE> - abstract Math = { - cat Prop ; Elem ; - fun - And : Prop -> Prop -> Prop ; - Even : Elem -> Prop ; - Zero : Elem ; - } -</PRE> -<P> -2. Write <CODE>Lex.gf</CODE>, which defines which language-dependent -parts are needed in the concrete syntax. These are mostly -words (lexicon), but can in fact be any operations. The definitions -only use resource abstract syntax, which is opened. -</P> -<PRE> - interface Lex = open Syntax in { - oper - even_A : A ; - zero_PN : PN ; - } -</PRE> -<P> -3. Write <CODE>LexEng.gf</CODE>, the English implementation of <CODE>Lex.gf</CODE> -This module uses English resource libraries. -</P> -<PRE> - instance LexEng of Lex = open GrammarEng, ParadigmsEng in { - oper - even_A = regA "even" ; - zero_PN = regPN "zero" ; - - } -</PRE> -<P> -4. Write <CODE>MathI.gf</CODE>, a language-independent concrete syntax of -<CODE>Math.gf</CODE>. It opens interfaces. -which makes it an incomplete module, aka. parametrized module, aka. -functor. -</P> -<PRE> - incomplete concrete MathI of Math = - - open Syntax, Lex in { - - flags startcat = Prop ; - - lincat - Prop = S ; - Elem = NP ; - lin - And x y = mkS and_Conj x y ; - Even x = mkS (mkCl x even_A) ; - Zero = mkNP zero_PN ; - } -</PRE> -<P> -5. Write <CODE>MathEng.gf</CODE>, which is just an instatiation of <CODE>MathI.gf</CODE>, -replacing the interfaces by their English instances. This is the module -that will be used as a top module in GF, so it contains a path to -the libraries. -</P> -<PRE> - instance LexEng of Lex = open SyntaxEng, ParadigmsEng in { - oper - even_A = mkA "even" ; - zero_PN = mkPN "zero" ; - } -</PRE> -<P></P> -<A NAME="toc100"></A> -<H3>Testing</H3> -<P> -6. Test the grammar in GF by random generation and parsing. -</P> -<PRE> - $ gf - > i MathEng.gf - > gr -tr | l -tr | p - And (Even Zero) (Even Zero) - zero is evenand zero is even - And (Even Zero) (Even Zero) -</PRE> -<P> -When importing the grammar, you will fail if you haven't -</P> -<UL> -<LI>correctly defined your <CODE>GF_LIB_PATH</CODE> as <CODE>GF/lib</CODE> -<LI>installed the resource package or - compiled the resource from source by <CODE>make</CODE> in <CODE>GF/lib/resource-1.0</CODE> -</UL> - -<A NAME="toc101"></A> -<H3>Adding a new language</H3> -<P> -7. Now it is time to add a new language. Write a French lexicon <CODE>LexFre.gf</CODE>: -</P> -<PRE> - instance LexFre of Lex = open SyntaxFre, ParadigmsFre in { - oper - even_A = mkA "pair" ; - zero_PN = mkPN "zéro" ; - } -</PRE> -<P> -8. You also need a French concrete syntax, <CODE>MathFre.gf</CODE>: -</P> -<PRE> - --# -path=.:present:prelude - - concrete MathFre of Math = MathI with - (Syntax = SyntaxFre), - (Lex = LexFre) ; -</PRE> -<P> -9. This time, you can test multilingual generation: -</P> -<PRE> - > i MathFre.gf - > gr | tb - Even Zero - zéro est pair - zero is even -</PRE> -<P></P> -<A NAME="toc102"></A> -<H3>Extending the language</H3> -<P> -10. You want to add a predicate saying that a number is odd. -It is first added to <CODE>Math.gf</CODE>: -</P> -<PRE> - fun Odd : Elem -> Prop ; -</PRE> -<P> -11. You need a new word in <CODE>Lex.gf</CODE>. -</P> -<PRE> - oper odd_A : A ; -</PRE> -<P> -12. Then you can give a language-independent concrete syntax in -<CODE>MathI.gf</CODE>: -</P> -<PRE> - lin Odd x = mkS (mkCl x odd_A) ; -</PRE> -<P> -13. The new word is implemented in <CODE>LexEng.gf</CODE>. -</P> -<PRE> - oper odd_A = mkA "odd" ; -</PRE> -<P> -14. The new word is implemented in <CODE>LexFre.gf</CODE>. -</P> -<PRE> - oper odd_A = mkA "impair" ; -</PRE> -<P> -15. Now you can test with the extended lexicon. First empty -the environment to get rid of the old abstract syntax, then -import the new versions of the grammars. -</P> -<PRE> - > e - > i MathEng.gf - > i MathFre.gf - > gr | tb - And (Odd Zero) (Even Zero) - zéro est impair et zéro est pair - zero is odd and zero is even -</PRE> -<P></P> -<A NAME="toc103"></A> -<H2>Building a user program</H2> -<A NAME="toc104"></A> -<H3>Producing a compiled grammar package</H3> -<P> -16. Your grammar is going to be used by persons wh<CODE>MathEng.gf</CODE>o do not need -to compile it again. They may not have access to the resource library, -either. Therefore it is advisable to produce a multilingual grammar -package in a single file. We call this package <CODE>math.gfcm</CODE> and -produce it, when we have <CODE>MathEng.gf</CODE> and -<CODE>MathEng.gf</CODE> in the GF state, by the command -</P> -<PRE> - > pm | wf math.gfcm -</PRE> -<P></P> -<A NAME="toc105"></A> -<H3>Writing the Haskell application</H3> -<P> -17. Write the Haskell main file <CODE>Run.hs</CODE>. It uses the <CODE>EmbeddedAPI</CODE> -module defining some basic functionalities such as parsing. -The answer is produced by an interpreter of trees returned by the parser. -</P> -<PRE> - module Main where - - import GSyntax - import GF.Embed.EmbedAPI - - main :: IO () - main = do - gr <- file2grammar "math.gfcm" - loop gr - - loop :: MultiGrammar -> IO () - loop gr = do - s <- getLine - interpret gr s - loop gr - - interpret :: MultiGrammar -> String -> IO () - interpret gr s = do - let tss = parseAll gr "Prop" s - case (concat tss) of - [] -> putStrLn "no parse" - t:_ -> print $ answer $ fg t - - answer :: GProp -> Bool - answer p = case p of - (GOdd x1) -> odd (value x1) - (GEven x1) -> even (value x1) - (GAnd x1 x2) -> answer x1 && answer x2 - - value :: GElem -> Int - value e = case e of - GZero -> 0 -</PRE> -<P></P> -<P> -18. The syntax trees manipulated by the interpreter are not raw -GF trees, but objects of the Haskell datatype <CODE>GProp</CODE>. -From any GF grammar, a file <CODE>GFSyntax.hs</CODE> with -datatypes corresponding to its abstract -syntax can be produced by the command -</P> -<PRE> - > pg -printer=haskell | wf GSyntax.hs -</PRE> -<P> -The module also defines the overloaded functions -<CODE>gf</CODE> and <CODE>fg</CODE> for translating from these types to -raw trees and back. -</P> -<A NAME="toc106"></A> -<H3>Compiling the Haskell grammar</H3> -<P> -19. Before compiling <CODE>Run.hs</CODE>, you must check that the -embedded GF modules are found. The easiest way to do this -is by two symbolic links to your GF source directories: -</P> -<PRE> - $ ln -s /home/aarne/GF/src/GF - $ ln -s /home/aarne/GF/src/Transfer/ -</PRE> -<P></P> -<P> -20. Now you can run the GHC Haskell compiler to produce the program. -</P> -<PRE> - $ ghc --make -o math Run.hs -</PRE> -<P> -The program can be tested with the command <CODE>./math</CODE>. -</P> -<A NAME="toc107"></A> -<H3>Building a distribution</H3> -<P> -21. For a stand-alone binary-only distribution, only -the two files <CODE>math</CODE> and <CODE>math.gfcm</CODE> are needed. -For a source distribution, the files mentioned in -the beginning of this documents are needed. -</P> -<A NAME="toc108"></A> -<H3>Using a Makefile</H3> -<P> -22. As a part of the source distribution, a <CODE>Makefile</CODE> is -essential. The <CODE>Makefile</CODE> is also useful when developing the -application. It should always be possible to build an executable -from source by typing <CODE>make</CODE>. Here is a minimal such <CODE>Makefile</CODE>: -</P> -<PRE> - all: - echo "pm | wf math.gfcm" | gf MathEng.gf MathFre.gf - echo "pg -printer=haskell | wf GSyntax.hs" | gf math.gfcm - ghc --make -o math Run.hs -</PRE> -<P></P> -<A NAME="toc109"></A> -<H1>Embedded grammars in Java</H1> -<P> -Forthcoming; at the moment, the document -</P> -<P> - <A HREF="http://www.cs.chalmers.se/~bringert/gf/gf-java.html"><CODE>http://www.cs.chalmers.se/~bringert/gf/gf-java.html</CODE></A> -</P> -<P> -by Björn Bringert gives more information on Java. -</P> -<A NAME="toc110"></A> -<H1>Further reading</H1> -<P> -Syntax Editor User Manual: -</P> -<P> -<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm"><CODE>http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm</CODE></A> -</P> -<P> -Resource Grammar Synopsis (on using resource grammars): -</P> -<P> -<A HREF="../../lib/resource-1.0/synopsis.html"><CODE>http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html</CODE></A> -</P> -<P> -Resource Grammar HOWTO (on writing resource grammars): -</P> -<P> -<A HREF="../../lib/resource-1.0/doc/Resource-HOWTO.html"><CODE>http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html</CODE></A> -</P> -<P> -GF Homepage: -</P> -<P> -<A HREF="../.."><CODE>http://www.cs.chalmers.se/~aarne/GF/doc</CODE></A> -</P> - -<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) --> -<!-- cmdline: txt2tags -thtml -\-toc gf-tutorial2.txt --> -</BODY></HTML> |
