diff options
| author | aarne <aarne@cs.chalmers.se> | 2006-06-15 23:05:42 +0000 |
|---|---|---|
| committer | aarne <aarne@cs.chalmers.se> | 2006-06-15 23:05:42 +0000 |
| commit | cb3dfbd9bf54f9b3cf403ba5e1629bf7fff132f4 (patch) | |
| tree | b3987feacd3b9ca8db55bd906de0698ac6582f77 /doc/tutorial/gf-tutorial2.html | |
| parent | a25c73cb1ae11c5a249ccd1466bf91bc2965f145 (diff) | |
updated tutorial and resource howto
Diffstat (limited to 'doc/tutorial/gf-tutorial2.html')
| -rw-r--r-- | doc/tutorial/gf-tutorial2.html | 496 |
1 files changed, 378 insertions, 118 deletions
diff --git a/doc/tutorial/gf-tutorial2.html b/doc/tutorial/gf-tutorial2.html index 00caa1d58..d657f7cc8 100644 --- a/doc/tutorial/gf-tutorial2.html +++ b/doc/tutorial/gf-tutorial2.html @@ -7,7 +7,7 @@ <P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1> <FONT SIZE="4"> <I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR> -Last update: Wed Jan 25 16:03:03 2006 +Last update: Fri Jun 16 01:02:28 2006 </FONT></CENTER> <P></P> @@ -34,7 +34,7 @@ Last update: Wed Jan 25 16:03:03 2006 <LI><A HREF="#toc15">Labelled context-free grammars</A> <LI><A HREF="#toc16">The labelled context-free format</A> </UL> - <LI><A HREF="#toc17">The ``.gf`` grammar format</A> + <LI><A HREF="#toc17">The .gf grammar format</A> <UL> <LI><A HREF="#toc18">Abstract and concrete syntax</A> <LI><A HREF="#toc19">Judgement forms</A> @@ -70,8 +70,8 @@ Last update: Wed Jan 25 16:03:03 2006 <UL> <LI><A HREF="#toc42">Parameters and tables</A> <LI><A HREF="#toc43">Inflection tables, paradigms, and ``oper`` definitions</A> - <LI><A HREF="#toc44">Worst-case macros and data abstraction</A> - <LI><A HREF="#toc45">A system of paradigms using ``Prelude`` operations</A> + <LI><A HREF="#toc44">Worst-case functions and data abstraction</A> + <LI><A HREF="#toc45">A system of paradigms using Prelude operations</A> <LI><A HREF="#toc46">An intelligent noun paradigm using ``case`` expressions</A> <LI><A HREF="#toc47">Pattern matching</A> <LI><A HREF="#toc48">Morphological ``resource`` modules</A> @@ -96,34 +96,41 @@ Last update: Wed Jan 25 16:03:03 2006 <LI><A HREF="#toc63">Prefix-dependent choices</A> <LI><A HREF="#toc64">Predefined types and operations</A> </UL> - <LI><A HREF="#toc65">More features of the module system</A> + <LI><A HREF="#toc65">More concepts of abstract syntax</A> <UL> - <LI><A HREF="#toc66">Interfaces, instances, and functors</A> - <LI><A HREF="#toc67">Resource grammars and their reuse</A> - <LI><A HREF="#toc68">Restricted inheritance and qualified opening</A> + <LI><A HREF="#toc66">GF as a logical framework</A> + <LI><A HREF="#toc67">Dependent types</A> + <LI><A HREF="#toc68">Higher-order abstract syntax</A> + <LI><A HREF="#toc69">Semantic definitions</A> + <LI><A HREF="#toc70">List categories</A> </UL> - <LI><A HREF="#toc69">More concepts of abstract syntax</A> + <LI><A HREF="#toc71">More features of the module system</A> <UL> - <LI><A HREF="#toc70">Dependent types</A> - <LI><A HREF="#toc71">Higher-order abstract syntax</A> - <LI><A HREF="#toc72">Semantic definitions</A> - <LI><A HREF="#toc73">List categories</A> + <LI><A HREF="#toc72">Interfaces, instances, and functors</A> + <LI><A HREF="#toc73">Resource grammars and their reuse</A> + <LI><A HREF="#toc74">Restricted inheritance and qualified opening</A> </UL> - <LI><A HREF="#toc74">Transfer modules</A> - <LI><A HREF="#toc75">Practical issues</A> + <LI><A HREF="#toc75">Using the standard resource library</A> <UL> - <LI><A HREF="#toc76">Lexers and unlexers</A> - <LI><A HREF="#toc77">Efficiency of grammars</A> - <LI><A HREF="#toc78">Speech input and output</A> - <LI><A HREF="#toc79">Multilingual syntax editor</A> - <LI><A HREF="#toc80">Interactive Development Environment (IDE)</A> - <LI><A HREF="#toc81">Communicating with GF</A> - <LI><A HREF="#toc82">Embedded grammars in Haskell, Java, and Prolog</A> - <LI><A HREF="#toc83">Alternative input and output grammar formats</A> + <LI><A HREF="#toc76">The simplest way</A> + <LI><A HREF="#toc77">How to find resource functions</A> + <LI><A HREF="#toc78">A functor implementation</A> </UL> - <LI><A HREF="#toc84">Case studies</A> + <LI><A HREF="#toc79">Transfer modules</A> + <LI><A HREF="#toc80">Practical issues</A> <UL> - <LI><A HREF="#toc85">Interfacing formal and natural languages</A> + <LI><A HREF="#toc81">Lexers and unlexers</A> + <LI><A HREF="#toc82">Efficiency of grammars</A> + <LI><A HREF="#toc83">Speech input and output</A> + <LI><A HREF="#toc84">Multilingual syntax editor</A> + <LI><A HREF="#toc85">Interactive Development Environment (IDE)</A> + <LI><A HREF="#toc86">Communicating with GF</A> + <LI><A HREF="#toc87">Embedded grammars in Haskell, Java, and Prolog</A> + <LI><A HREF="#toc88">Alternative input and output grammar formats</A> + </UL> + <LI><A HREF="#toc89">Case studies</A> + <UL> + <LI><A HREF="#toc90">Interfacing formal and natural languages</A> </UL> </UL> @@ -222,7 +229,8 @@ These grammars can be used as <B>libraries</B> to define application grammars. In this way, it is possible to write a high-quality grammar without knowing about linguistics: in general, to write an application grammar by using the resource library just requires practical knowledge of -the target language. +the target language. and all theoretical knowledge about its grammar +is given by the libraries. </P> <A NAME="toc4"></A> <H3>Who is this tutorial for</H3> @@ -258,9 +266,10 @@ notation (also known as BNF). The BNF format is often a good starting point for GF grammar development, because it is simple and widely used. However, the BNF format is not good for multilingual grammars. While it is possible to -translate the words contained in a BNF grammar to another -language, proper translation usually involves more, e.g. -changing the word order in +"translate" by just changing the words contained in a +BNF grammar to words of some other +language, proper translation usually involves more. +For instance, the order of words may have to be changed: </P> <PRE> Italian cheese ===> formaggio italiano @@ -279,14 +288,14 @@ Italian adjectives usually have four forms where English has just one: </P> <PRE> - delicious (wine | wines | pizza | pizzas) + delicious (wine, wines, pizza, pizzas) vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose </PRE> <P> The <B>morphology</B> of a language describes the forms of its words. While the complete description of morphology -belongs to resource grammars, the tutorial will explain the -main programming concepts involved. This will moreover +belongs to resource grammars, this tutorial will explain the +programming concepts involved in morphology. This will moreover make it possible to grow the fragment covered by the food example. The tutorial will in fact build a toy resource grammar in order to illustrate the module structure of library-based application @@ -584,7 +593,7 @@ a sentence but a sequence of ten sentences. <H3>Labelled context-free grammars</H3> <P> The syntax trees returned by GF's parser in the previous examples -are not so nice to look at. The identifiers of form <CODE>Mks</CODE> +are not so nice to look at. The identifiers that form the tree are <B>labels</B> of the BNF rules. To see which label corresponds to which rule, you can use the <CODE>print_grammar = pg</CODE> command with the <CODE>printer</CODE> flag set to <CODE>cf</CODE> (which means context-free): @@ -631,7 +640,7 @@ labels to each rule. In files with the suffix <CODE>.cf</CODE>, you can prefix rules with labels that you provide yourself - these may be more useful than the automatically generated ones. The following is a possible -labelling of <CODE>paleolithic.cf</CODE> with nicer-looking labels. +labelling of <CODE>food.cf</CODE> with nicer-looking labels. </P> <PRE> Is. S ::= Item "is" Quality ; @@ -661,7 +670,7 @@ With this grammar, the trees look as follows: <IMG ALIGN="middle" SRC="Tree2.png" BORDER="0" ALT=""> </P> <A NAME="toc17"></A> -<H2>The ``.gf`` grammar format</H2> +<H2>The .gf grammar format</H2> <P> To see what there is in GF's shell state when a grammar has been imported, you can give the plain command @@ -696,7 +705,7 @@ A GF grammar consists of two main parts: </UL> <P> -The EBNF and CF formats fuse these two things together, but it is possible +The CF format fuses these two things together, but it is possible to take them apart. For instance, the sentence formation rule </P> <PRE> @@ -773,7 +782,7 @@ judgement forms: <P> We return to the precise meanings of these judgement forms later. First we will look at how judgements are grouped into modules, and -show how the paleolithic grammar is +show how the food grammar is expressed by using modules and judgements. </P> <A NAME="toc20"></A> @@ -950,7 +959,7 @@ A system with this property is called a <B>multilingual grammar</B>. </P> <P> Multilingual grammars can be used for applications such as -translation. Let us buid an Italian concrete syntax for +translation. Let us build an Italian concrete syntax for <CODE>Food</CODE> and then test the resulting multilingual grammar. </P> @@ -1179,10 +1188,11 @@ The graph uses <LI>square boxes for concrete modules <LI>black-headed arrows for inheritance <LI>white-headed arrows for the concrete-of-abstract relation -<P></P> -<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT=""> </UL> +<P> +<IMG ALIGN="middle" SRC="Foodmarket.png" BORDER="0" ALT=""> +</P> <A NAME="toc34"></A> <H2>System commands</H2> <P> @@ -1203,7 +1213,7 @@ shell escape symbol <CODE>!</CODE>. The resulting graph was shown in the previou <P> The command <CODE>print_multi = pm</CODE> is used for printing the current multilingual grammar in various formats, of which the format <CODE>-printer=graph</CODE> just -shows the module dependencies. Use the <CODE>help</CODE> to see what other formats +shows the module dependencies. Use <CODE>help</CODE> to see what other formats are available: </P> <PRE> @@ -1216,9 +1226,9 @@ are available: <A NAME="toc36"></A> <H3>The golden rule of functional programming</H3> <P> -In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format still looks rather +In comparison to the <CODE>.cf</CODE> format, the <CODE>.gf</CODE> format looks rather verbose, and demands lots more characters to be written. You have probably -done this by the copy-paste-modify method, which is a standard way to +done this by the copy-paste-modify method, which is a common way to avoid repeating work. </P> <P> @@ -1232,8 +1242,8 @@ method. The <B>golden rule of functional programming</B> says that <P> A function separates the shared parts of different computations from the changing parts, parameters. In functional programming languages, such as -<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share muc more than in -the languages such as C and Java. +<A HREF="http://www.haskell.org">Haskell</A>, it is possible to share much more than in +languages such as C and Java. </P> <A NAME="toc37"></A> <H3>Operation definitions</H3> @@ -1283,11 +1293,8 @@ strings and records. resource StringOper = { oper SS : Type = {s : Str} ; - ss : Str -> SS = \x -> {s = x} ; - cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ; - prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; } </PRE> @@ -1433,7 +1440,7 @@ forms of a word are formed. </P> <P> From GF point of view, a paradigm is a function that takes a <B>lemma</B> - -a string also known as a <B>dictionary form</B> - and returns an inflection +also known as a <B>dictionary form</B> - and returns an inflection table of desired type. Paradigms are not functions in the sense of the <CODE>fun</CODE> judgements of abstract syntax (which operate on trees and not on strings), but operations defined in <CODE>oper</CODE> judgements. @@ -1457,13 +1464,13 @@ are written together to form one <B>token</B>. Thus, for instance, </PRE> <P></P> <A NAME="toc44"></A> -<H3>Worst-case macros and data abstraction</H3> +<H3>Worst-case functions and data abstraction</H3> <P> Some English nouns, such as <CODE>mouse</CODE>, are so irregular that it makes no sense to see them as instances of a paradigm. Even then, it is useful to perform <B>data abstraction</B> from the definition of the type <CODE>Noun</CODE>, and introduce a constructor -operation, a <B>worst-case macro</B> for nouns: +operation, a <B>worst-case function</B> for nouns: </P> <PRE> oper mkNoun : Str -> Str -> Noun = \x,y -> { @@ -1490,7 +1497,7 @@ and instead of writing the inflection table explicitly. </P> <P> -The grammar engineering advantage of worst-case macros is that +The grammar engineering advantage of worst-case functions is that the author of the resource module may change the definitions of <CODE>Noun</CODE> and <CODE>mkNoun</CODE>, and still retain the interface (i.e. the system of type signatures) that makes it @@ -1498,7 +1505,7 @@ correct to use these functions in concrete modules. In programming terms, <CODE>Noun</CODE> is then treated as an <B>abstract datatype</B>. </P> <A NAME="toc45"></A> -<H3>A system of paradigms using ``Prelude`` operations</H3> +<H3>A system of paradigms using Prelude operations</H3> <P> In addition to the completely regular noun paradigm <CODE>regNoun</CODE>, some other frequent noun paradigms deserve to be @@ -1707,7 +1714,7 @@ The rule of subject-verb agreement in English says that the verb phrase must be inflected in the number of the subject. This means that a noun phrase (functioning as a subject), inherently <I>has</I> a number, which it passes to the verb. The verb does not -<I>have</I> a number, but must be able to receive whatever number the +<I>have</I> a number, but must be able to <I>receive</I> whatever number the subject has. This distinction is nicely represented by the different linearization types of <B>noun phrases</B> and <B>verb phrases</B>: </P> @@ -1717,7 +1724,8 @@ different linearization types of <B>noun phrases</B> and <B>verb phrases</B>: </PRE> <P> We say that the number of <CODE>NP</CODE> is an <B>inherent feature</B>, -whereas the number of <CODE>NP</CODE> is <B>parametric</B>. +whereas the number of <CODE>NP</CODE> is a <B>variable feature</B> (or a +<B>parametric feature</B>). </P> <P> The agreement rule itself is expressed in the linearization rule of @@ -1823,7 +1831,7 @@ Here is an example of pattern matching, the paradigm of regular adjectives. } </PRE> <P> -A constructor can have patterns as arguments. For instance, +A constructor can be used as a pattern that has patterns as arguments. For instance, the adjectival paradigm in which the two singular forms are the same, can be defined </P> @@ -1837,9 +1845,9 @@ can be defined <A NAME="toc54"></A> <H3>Morphological analysis and morphology quiz</H3> <P> -Even though in GF morphology -is mostly seen as an auxiliary of syntax, a morphology once defined -can be used on its own right. The command <CODE>morpho_analyse = ma</CODE> +Even though morphology is in GF +mostly used as an auxiliary for syntax, it +can also be useful on its own right. The command <CODE>morpho_analyse = ma</CODE> can be used to read a text and return for each word the analyses that it has in the current concrete syntax. </P> @@ -1865,11 +1873,12 @@ the category is set to be something else than <CODE>S</CODE>. For instance, Score 0/1 </PRE> <P> -Finally, a list of morphological exercises and save it in a +Finally, a list of morphological exercises can be generated +off-line saved in a file for later use, by the command <CODE>morpho_list = ml</CODE> </P> <PRE> - > morpho_list -number=25 -cat=V + > morpho_list -number=25 -cat=V | wf exx.txt </PRE> <P> The <CODE>number</CODE> flag gives the number of exercises generated. @@ -1884,25 +1893,36 @@ a sentence may place the object between the verb and the particle: <I>he switched it off</I>. </P> <P> -The first of the following judgements defines transitive verbs as +The following judgement defines transitive verbs as <B>discontinuous constituents</B>, i.e. as having a linearization -type with two strings and not just one. The second judgement +type with two strings and not just one. +</P> +<PRE> + lincat TV = {s : Number => Str ; part : Str} ; +</PRE> +<P> +This linearization rule shows how the constituents are separated by the object in complementization. </P> <PRE> - lincat TV = {s : Number => Str ; part : Str} ; lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ; </PRE> <P> There is no restriction in the number of discontinuous constituents (or other fields) a <CODE>lincat</CODE> may contain. The only condition is that the fields must be of finite types, i.e. built from records, tables, -parameters, and <CODE>Str</CODE>, and not functions. A mathematical result +parameters, and <CODE>Str</CODE>, and not functions. +</P> +<P> +A mathematical result about parsing in GF says that the worst-case complexity of parsing -increases with the number of discontinuous constituents. Moreover, -the parsing and linearization commands only give reliable results -for categories whose linearization type has a unique <CODE>Str</CODE> valued -field labelled <CODE>s</CODE>. +increases with the number of discontinuous constituents. This is +potentially a reason to avoid discontinuous constituents. +Moreover, the parsing and linearization commands only give accurate +results for categories whose linearization type has a unique <CODE>Str</CODE> +valued field labelled <CODE>s</CODE>. Therefore, discontinuous constituents +are not a good idea in top-level categories accessed by the users +of a grammar application. </P> <A NAME="toc56"></A> <H2>More constructs for concrete syntax</H2> @@ -1953,8 +1973,25 @@ can be used e.g. if a word lacks a certain form. In general, <CODE>variants</CODE> should be used cautiously. It is not recommended for modules aimed to be libraries, because the user of the library has no way to choose among the variants. -Moreover, even though <CODE>variants</CODE> admits lists of any type, -its semantics for complex types can cause surprises. +Moreover, <CODE>variants</CODE> is only defined for basic types (<CODE>Str</CODE> +and parameter types). The grammar compiler will admit +<CODE>variants</CODE> for any types, but it will push it to the +level of basic types in a way that may be unwanted. +For instance, German has two words meaning "car", +<I>Wagen</I>, which is Masculine, and <I>Auto</I>, which is Neuter. +However, if one writes +</P> +<PRE> + variants {{s = "Wagen" ; g = Masc} ; {s = "Auto" ; g = Neutr}} +</PRE> +<P> +this will compute to +</P> +<PRE> + {s = variants {"Wagen" ; "Auto"} ; g = variants {Masc ; Neutr}} +</PRE> +<P> +which will also accept erroneous combinations of strings and genders. </P> <A NAME="toc59"></A> <H3>Record extension and subtyping</H3> @@ -2039,9 +2076,6 @@ possible to write, slightly surprisingly, <A NAME="toc62"></A> <H3>Regular expression patterns</H3> <P> -(New since 7 January 2006.) -</P> -<P> To define string operations computed at compile time, such as in morphology, it is handy to use regular expression patterns: </P> @@ -2076,7 +2110,6 @@ Another example: English noun plural formation. x + "y" => x + "ies" ; _ => w + "s" } ; - </PRE> <P> Semantics: variables are always bound to the <B>first match</B>, which is the first @@ -2085,8 +2118,10 @@ in the sequence of binding lists <CODE>Match p v</CODE> defined as follows. In t </P> <PRE> Match (p1|p2) v = Match p1 v ++ Match p2 v - Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s] - Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ... + Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | + i <- [0..length s], (s1,s2) = splitAt i s] + Match p* s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= [] + Match -p v = [[]] if Match p v = [] Match c v = [[]] if c == v -- for constant and literal patterns c Match x v = [[(x,v)]] -- for variable patterns x Match x@p v = [[(x,v)]] + M if M = Match p v /= [] @@ -2097,14 +2132,18 @@ Examples: </P> <UL> <LI><CODE>x + "e" + y</CODE> matches <CODE>"peter"</CODE> with <CODE>x = "p", y = "ter"</CODE> -<LI><CODE>x@("foo"*)</CODE> matches any token with <CODE>x = ""</CODE> -<LI><CODE>x + y@("er"*)</CODE> matches <CODE>"burgerer"</CODE> with <CODE>x = "burg", y = "erer"</CODE> +<LI><CODE>x + "er"*</CODE> matches <CODE>"burgerer"</CODE> with ``x = "burg" </UL> <A NAME="toc63"></A> <H3>Prefix-dependent choices</H3> <P> -The construct exemplified in +Sometimes a token has different forms depending on the token +that follows. An example is the English indefinite article, +which is <I>an</I> if a vowel follows, <I>a</I> otherwise. +Which form is chosen can only be decided at run time, i.e. +when a string is actually build. GF has a special construct for +such tokens, the <CODE>pre</CODE> construct exemplified in </P> <PRE> oper artIndef : Str = @@ -2152,22 +2191,61 @@ they can be used as arguments. For example: -- e.g. (StreetAddress 10 "Downing Street") : Address </PRE> -<P></P> +<P> +The linearization type is <CODE>{s : Str}</CODE> for all these categories. +</P> <A NAME="toc65"></A> -<H2>More features of the module system</H2> +<H2>More concepts of abstract syntax</H2> <A NAME="toc66"></A> -<H3>Interfaces, instances, and functors</H3> +<H3>GF as a logical framework</H3> +<P> +In this section, we will show how +to encode advanced semantic concepts in an abstract syntax. +We use concepts inherited from <B>type theory</B>. Type theory +is the basis of many systems known as <B>logical frameworks</B>, which are +used for representing mathematical theorems and their proofs on a computer. +In fact, GF has a logical framework as its proper part: +this part is the abstract syntax. +</P> +<P> +In a logical framework, the formalization of a mathematical theory +is a set of type and function declarations. The following is an example +of such a theory, represented as an <CODE>abstract</CODE> module in GF. +</P> +<PRE> + abstract Geometry = { + cat + Line ; Point ; Circle ; -- basic types of figures + Prop ; -- proposition + fun + Parallel : Line -> Line -> Prop ; -- x is parallel to y + Centre : Circle -> Point ; -- the centre of c + } +</PRE> +<P></P> <A NAME="toc67"></A> +<H3>Dependent types</H3> +<A NAME="toc68"></A> +<H3>Higher-order abstract syntax</H3> +<A NAME="toc69"></A> +<H3>Semantic definitions</H3> +<A NAME="toc70"></A> +<H3>List categories</H3> +<A NAME="toc71"></A> +<H2>More features of the module system</H2> +<A NAME="toc72"></A> +<H3>Interfaces, instances, and functors</H3> +<A NAME="toc73"></A> <H3>Resource grammars and their reuse</H3> <P> A resource grammar is a grammar built on linguistic grounds, to describe a language rather than a domain. -The GF resource grammar library contains resource grammars for +The GF resource grammar library, which contains resource grammars for 10 languages, is described more closely in the following documents: </P> <UL> -<LI><A HREF="../../lib/resource/doc/gf-resource.html">Resource library API documentation</A>: +<LI><A HREF="../../lib/resource-1.0/doc/">Resource library API documentation</A>: for application grammarians using the resource. <LI><A HREF="../../lib/resource-1.0/doc/Resource-HOWTO.html">Resource writing HOWTO</A>: for resource grammarians developing the resource. @@ -2177,21 +2255,41 @@ documents: However, to give a flavour of both using and writing resource grammars, we have created a miniature resource, which resides in the subdirectory <A HREF="resource"><CODE>resource</CODE></A>. Its API consists of the following -modules: +three modules: </P> -<UL> -<LI><A HREF="resource/Syntax.gf">Syntax</A>: syntactic structures, language-independent -<LI><A HREF="resource/LexEng.gf">LexEng</A>: lexical paradigms, English -<LI><A HREF="resource/LexIta.gf">LexIta</A>: lexical paradigms, Italian -</UL> - +<P> +<A HREF="resource/Syntax.gf">Syntax</A> - syntactic structures, language-independent: +</P> +<PRE> + +</PRE> +<P> +<A HREF="resource/LexEng.gf">LexEng</A> - lexical paradigms, English: +</P> +<PRE> + +</PRE> +<P> +<A HREF="resource/LexIta.gf">LexIta</A> - lexical paradigms, Italian: +</P> +<PRE> + +</PRE> +<P></P> <P> Only these three modules should be <CODE>open</CODE>ed in applications. The implementations of the resource are given in the following four modules: </P> +<P> +<A HREF="resource/MorphoEng.gf">MorphoEng</A>, +</P> +<PRE> + +</PRE> +<P> +<A HREF="resource/MorphoIta.gf">MorphoIta</A>: low-level morphology +</P> <UL> -<LI><A HREF="resource/MorphoEng.gf">MorphoEng</A>, - <A HREF="resource/MorphoIta.gf">MorphoIta</A>: low-level morphology <LI><A HREF="resource/SyntaxEng.gf">SyntaxEng</A>. <A HREF="resource/SyntaxIta.gf">SyntaxIta</A>: definitions of syntactic structures </UL> @@ -2210,19 +2308,181 @@ The rest of the modules (black) come from the resource. <P> <IMG ALIGN="middle" SRC="Multi.png" BORDER="0" ALT=""> </P> -<A NAME="toc68"></A> -<H3>Restricted inheritance and qualified opening</H3> -<A NAME="toc69"></A> -<H2>More concepts of abstract syntax</H2> -<A NAME="toc70"></A> -<H3>Dependent types</H3> -<A NAME="toc71"></A> -<H3>Higher-order abstract syntax</H3> -<A NAME="toc72"></A> -<H3>Semantic definitions</H3> -<A NAME="toc73"></A> -<H3>List categories</H3> <A NAME="toc74"></A> +<H3>Restricted inheritance and qualified opening</H3> +<A NAME="toc75"></A> +<H2>Using the standard resource library</H2> +<P> +The example files of this chapter can be found in +the directory <A HREF="./arithm"><CODE>arithm</CODE></A>. +</P> +<A NAME="toc76"></A> +<H3>The simplest way</H3> +<P> +The simplest way is to <CODE>open</CODE> a top-level <CODE>Lang</CODE> module +and a <CODE>Paradigms</CODE> module: +</P> +<PRE> + abstract Foo = ... + + concrete FooEng = open LangEng, ParadigmsEng in ... + concrete FooSwe = open LangSwe, ParadigmsSwe in ... +</PRE> +<P> +Here is an example. +</P> +<PRE> + abstract Arithm = { + cat + Prop ; + Nat ; + fun + Zero : Nat ; + Succ : Nat -> Nat ; + Even : Nat -> Prop ; + And : Prop -> Prop -> Prop ; + } + + --# -path=.:alltenses:prelude + + concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in { + lincat + Prop = S ; + Nat = NP ; + lin + Zero = + UsePN (regPN "zero" nonhuman) ; + Succ n = + DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ; + Even n = + UseCl TPres ASimul PPos + (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ; + And x y = + ConjS and_Conj (BaseS x y) ; + + } + + --# -path=.:alltenses:prelude + + concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in { + lincat + Prop = S ; + Nat = NP ; + lin + Zero = + UsePN (regPN "noll" neutrum) ; + Succ n = + DetCN (DetSg (SgQuant DefArt) NoOrd) + (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") + (mkPreposition "till")) n) ; + Even n = + UseCl TPres ASimul PPos + (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ; + And x y = + ConjS and_Conj (BaseS x y) ; + } +</PRE> +<P></P> +<A NAME="toc77"></A> +<H3>How to find resource functions</H3> +<P> +The definitions in this example were found by parsing: +</P> +<PRE> + > i LangEng.gf + + -- for Successor: + > p -cat=NP -mcfg -parser=topdown "the mother of Paris" + + -- for Even: + > p -cat=S -mcfg -parser=topdown "Paris is old" + + -- for And: + > p -cat=S -mcfg -parser=topdown "Paris is old and I am old" +</PRE> +<P> +The use of parsing can be systematized by <B>example-based grammar writing</B>, +to which we will return later. +</P> +<A NAME="toc78"></A> +<H3>A functor implementation</H3> +<P> +The interesting thing now is that the +code in <CODE>ArithmSwe</CODE> is similar to the code in <CODE>ArithmEng</CODE>, except for +some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor", +"jämn" vs. "even"). How can we exploit the similarities and +actually share code between the languages? +</P> +<P> +The solution is to use a functor: an <CODE>incomplete</CODE> module that opens +an <CODE>abstract</CODE> as an <CODE>interface</CODE>, and then instantiate it to different +languages that implement the interface. The structure is as follows: +</P> +<PRE> + abstract Foo ... + + incomplete concrete FooI = open Lang, Lex in ... + + concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ; + concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ; +</PRE> +<P> +where <CODE>Lex</CODE> is an abstract lexicon that includes the vocabulary +specific to this application: +</P> +<PRE> + abstract Lex = Cat ** ... + + concrete LexEng of Lex = CatEng ** open ParadigmsEng in ... + concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ... +</PRE> +<P> +Here, again, a complete example (<CODE>abstract Arithm</CODE> is as above): +</P> +<PRE> + incomplete concrete ArithmI of Arithm = open Lang, Lex in { + lincat + Prop = S ; + Nat = NP ; + lin + Zero = + UsePN zero_PN ; + Succ n = + DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ; + Even n = + UseCl TPres ASimul PPos + (PredVP n (UseComp (CompAP (PositA even_A)))) ; + And x y = + ConjS and_Conj (BaseS x y) ; + } + + --# -path=.:alltenses:prelude + concrete ArithmEng of Arithm = ArithmI with + (Lang = LangEng), + (Lex = LexEng) ; + + --# -path=.:alltenses:prelude + concrete ArithmSwe of Arithm = ArithmI with + (Lang = LangSwe), + (Lex = LexSwe) ; + + abstract Lex = Cat ** { + fun + zero_PN : PN ; + successor_N2 : N2 ; + even_A : A ; + } + + concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in { + lin + zero_PN = regPN "noll" neutrum ; + successor_N2 = + mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ; + even_A = regA "jämn" ; + } +</PRE> +<P></P> +<A NAME="toc79"></A> <H2>Transfer modules</H2> <P> Transfer means noncompositional tree-transforming operations. @@ -2241,9 +2501,9 @@ See the <A HREF="../transfer.html">transfer language documentation</A> for more information. </P> -<A NAME="toc75"></A> +<A NAME="toc80"></A> <H2>Practical issues</H2> -<A NAME="toc76"></A> +<A NAME="toc81"></A> <H3>Lexers and unlexers</H3> <P> Lexers and unlexers can be chosen from @@ -2279,7 +2539,7 @@ Given by <CODE>help -lexer</CODE>, <CODE>help -unlexer</CODE>: </PRE> <P></P> -<A NAME="toc77"></A> +<A NAME="toc82"></A> <H3>Efficiency of grammars</H3> <P> Issues: @@ -2290,7 +2550,7 @@ Issues: <LI>parsing efficiency: <CODE>-mcfg</CODE> vs. others </UL> -<A NAME="toc78"></A> +<A NAME="toc83"></A> <H3>Speech input and output</H3> <P> The<CODE>speak_aloud = sa</CODE> command sends a string to the speech @@ -2320,7 +2580,7 @@ The method words only for grammars of English. Both Flite and ATK are freely available through the links above, but they are not distributed together with GF. </P> -<A NAME="toc79"></A> +<A NAME="toc84"></A> <H3>Multilingual syntax editor</H3> <P> The @@ -2337,12 +2597,12 @@ Here is a snapshot of the editor: The grammars of the snapshot are from the <A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/letter">Letter grammar package</A>. </P> -<A NAME="toc80"></A> +<A NAME="toc85"></A> <H3>Interactive Development Environment (IDE)</H3> <P> Forthcoming. </P> -<A NAME="toc81"></A> +<A NAME="toc86"></A> <H3>Communicating with GF</H3> <P> Other processes can communicate with the GF command interpreter, @@ -2359,7 +2619,7 @@ Thus the most silent way to invoke GF is </PRE> </UL> -<A NAME="toc82"></A> +<A NAME="toc87"></A> <H3>Embedded grammars in Haskell, Java, and Prolog</H3> <P> GF grammars can be used as parts of programs written in the @@ -2371,15 +2631,15 @@ following languages. The links give more documentation. <LI><A HREF="http://www.cs.chalmers.se/~peb/software.html">Prolog</A> </UL> -<A NAME="toc83"></A> +<A NAME="toc88"></A> <H3>Alternative input and output grammar formats</H3> <P> A summary is given in the following chart of GF grammar compiler phases: <IMG ALIGN="middle" SRC="../gf-compiler.png" BORDER="0" ALT=""> </P> -<A NAME="toc84"></A> +<A NAME="toc89"></A> <H2>Case studies</H2> -<A NAME="toc85"></A> +<A NAME="toc90"></A> <H3>Interfacing formal and natural languages</H3> <P> <A HREF="http://www.cs.chalmers.se/~krijo/thesis/thesisA4.pdf">Formal and Informal Software Specifications</A>, @@ -2392,6 +2652,6 @@ English and German. A simpler example will be explained here. </P> -<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) --> +<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) --> <!-- cmdline: txt2tags -\-toc gf-tutorial2.txt --> </BODY></HTML> |
