summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authoraarne <aarne@cs.chalmers.se>2007-05-31 13:43:46 +0000
committeraarne <aarne@cs.chalmers.se>2007-05-31 13:43:46 +0000
commite7b7def3130881852ff4acd1845dd31266c166fe (patch)
treeacbe4524f10c1798261a355520660d47090cdf34 /doc
parent76268417db7dc617aaaae0214b0515d990a5c471 (diff)
resource doc in tutorial
Diffstat (limited to 'doc')
-rw-r--r--doc/resource.txt4
-rw-r--r--doc/tutorial/gf-tutorial2.txt448
2 files changed, 315 insertions, 137 deletions
diff --git a/doc/resource.txt b/doc/resource.txt
index cfad3000e..a1c855fb7 100644
--- a/doc/resource.txt
+++ b/doc/resource.txt
@@ -1,5 +1,5 @@
-The GF Resource Grammar Library
-Author: Aarne Ranta, Ali El Dada, and Janna Khegai
+The GF Resource Grammar Library, Version 1.2
+Authors: Aarne Ranta, Ali El Dada, Janna Khegai, and Björn Bringert
Last update: %%date(%c)
% NOTE: this is a txt2tags file.
diff --git a/doc/tutorial/gf-tutorial2.txt b/doc/tutorial/gf-tutorial2.txt
index 9c3ae71b2..3ca7414d9 100644
--- a/doc/tutorial/gf-tutorial2.txt
+++ b/doc/tutorial/gf-tutorial2.txt
@@ -1658,174 +1658,352 @@ All of the following uses of ``mkN`` are easy to resolve:
%--!
==Using the resource grammar library TODO==
-A resource grammar is a grammar built on linguistic grounds,
-to describe a language rather than a domain.
-The GF resource grammar library, which contains resource grammars for
-10 languages, is described more closely in the following
-documents:
-- [Resource library API documentation ../../lib/resource-1.0/doc/]:
- for application grammarians using the resource.
-- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]:
- for resource grammarians developing the resource.
+===Coverage===
+
+The GF Resource Grammar Library contains grammar rules for
+10 languages (in addition, 2 languages are available as incomplete
+implementations, and a few more are under construction). Its purpose
+is to make these rules available for application programmers,
+who can thereby concentrate on the semantic and stylistic
+aspects of their grammars, without having to think about
+grammaticality. The targeted level of application grammarians
+is that of a skilled programmer with
+a practical knowledge of the target languages, but without
+theoretical knowledge about their grammars.
+Such a combination of
+skills is typical of programmers who want to localize
+software to new languages.
+
+The current resource languages are
+- ``Ara``bic
+- ``Cat``alan
+- ``Dan``ish
+- ``Eng``lish
+- ``Fin``nish
+- ``Fre``nch
+- ``Ger``man
+- ``Ita``lian
+- ``Nor``wegian
+- ``Rus``sian
+- ``Spa``nish
+- ``Swe``dish
+
+
+The first three letters (``Eng`` etc) are used in grammar module names.
+The Arabic and Catalan implementations are still incomplete, but
+enough to be used in many applications.
+
+To give an example application, consider
+music playing devices. In the application,
+we may have a semantical category ``Kind``, examples
+of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song``
+is linearized into the noun "Lied", but knowing this is not
+enough to make the application work, because the noun must be
+produced in both singular and plural, and in four different
+cases. By using the resource grammar library, it is enough to
+write
+```
+ lin Song = mkN "Lied" "Lieder" neuter
+```
+and the eight forms are correctly generated. The resource grammar
+library contains a complete set of inflectional paradigms (such as
+``mkN`` here), enabling the definition of any lexical items.
+
+The resource grammar library is not only about inflectional paradigms - it
+also has syntax rules. The music player application
+might also want to modify songs with properties, such as "American",
+"old", "good". The German grammar for adjectival modifications is
+particularly complex, because adjectives have to agree in gender,
+number, and case, and also depend on what determiner is used
+("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
+variation is taken care of by the resource grammar function
+```
+ fun AdjCN : AP -> CN -> CN
+```
+(see the tables in the end of this document for the list of all resource grammar
+functions). The resource grammar implementation of the rule adding properties
+to kinds is
+```
+ lin PropKind kind prop = AdjCN prop kind
+```
+given that
+```
+ lincat Prop = AP
+ lincat Kind = CN
+```
+The resource library API is devided into language-specific
+and language-independent parts. To put it roughly,
+- the lexicon API is language-specific
+- the syntax API is language-independent
+
+
+Thus, to render the above example in French instead of German, we need to
+pick a different linearization of ``Song``,
+```
+ lin Song = mkN "chanson" feminine
+```
+But to linearize ``PropKind``, we can use the very same rule as in German.
+The resource function ``AdjCN`` has different implementations in the two
+languages (e.g. a different word order in French),
+but the application programmer need not care about the difference.
+
+
+===Note on APIs===
+
+From version 1.1 onwards, the resource library is available via two
+APIs:
+- original ``fun`` and ``oper`` definitions
+- overloaded ``oper`` definitions
+
+
+Introducing overloading in GF version 2.7 has been a success in improving
+the accessibility of libraries. It has also created a layer of abstraction
+between the writers and users of libraries, and thereby makes the library
+easier to modify. We shall therefore use the overloaded API
+in this document. The original function names are mainly interesting
+for those who want to write or modify libraries.
+
+
+===A complete example===
-===Interfaces, instances, and functors===
-
-===The simplest way===
+To summarize the example, and also give a template for a programmer to work on,
+here is the complete implementation of a small system with songs and properties.
+The abstract syntax defines a "domain ontology":
+```
+ abstract Music = {
+ cat
+ Kind,
+ Property ;
+ fun
+ PropKind : Kind -> Property -> Kind ;
+ Song : Kind ;
+ American : Property ;
+ }
+```
+The concrete syntax is defined by a functor (parametrized module),
+independently of language, by opening
+two interfaces: the resource ``Grammar`` and an application lexicon.
+```
+ incomplete concrete MusicI of Music = open Grammar, MusicLex in {
+ lincat
+ Kind = CN ;
+ Property = AP ;
+ lin
+ PropKind k p = AdjCN p k ;
+ Song = UseN song_N ;
+ American = PositA american_A ;
+ }
+```
+The application lexicon ``MusicLex`` has an abstract syntax that extends
+the resource category system ``Cat``.
+```
+ abstract MusicLex = Cat ** {
+ fun
+ song_N : N ;
+ american_A : A ;
+ }
+```
+Each language has its own concrete syntax, which opens the
+inflectional paradigms module for that language:
+```
+ concrete MusicLexGer of MusicLex =
+ CatGer ** open ParadigmsGer in {
+ lin
+ song_N = reg2N "Lied" "Lieder" neuter ;
+ american_A = regA "amerikanisch" ;
+ }
-The simplest way is to ``open`` a top-level ``Lang`` module
-and a ``Paradigms`` module:
+ concrete MusicLexFre of MusicLex =
+ CatFre ** open ParadigmsFre in {
+ lin
+ song_N = regGenN "chanson" feminine ;
+ american_A = regA "américain" ;
+ }
+```
+The top-level ``Music`` grammars are obtained by
+instantiating the two interfaces of ``MusicI``:
```
- abstract Foo = ...
+ concrete MusicGer of Music = MusicI with
+ (Grammar = GrammarGer),
+ (MusicLex = MusicLexGer) ;
- concrete FooEng = open LangEng, ParadigmsEng in ...
- concrete FooSwe = open LangSwe, ParadigmsSwe in ...
+ concrete MusicFre of Music = MusicI with
+ (Grammar = GrammarFre),
+ (MusicLex = MusicLexFre) ;
```
-Here is an example.
+Both of these files can use the same ``path``, defined as
```
-abstract Arithm = {
- cat
- Prop ;
- Nat ;
- fun
- Zero : Nat ;
- Succ : Nat -> Nat ;
- Even : Nat -> Prop ;
- And : Prop -> Prop -> Prop ;
-}
+ --# -path=.:present:prelude
+```
+The ``present`` category contains the compiled resources, restricted to
+present tense; ``alltenses`` has the full resources.
---# -path=.:alltenses:prelude
+To localize the music player system to a new language,
+all that is needed is two modules,
+one implementing ``MusicLex`` and the other
+instantiating ``Music``. The latter is
+completely trivial, whereas the former one involves the choice of correct
+vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
+```
+ concrete MusicLexFin of MusicLex =
+ CatFin ** open ParadigmsFin in {
+ lin
+ song_N = regN "kappale" ;
+ american_A = regA "amerikkalainen" ;
+ }
-concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
- lincat
- Prop = S ;
- Nat = NP ;
- lin
- Zero =
- UsePN (regPN "zero" nonhuman) ;
- Succ n =
- DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
- Even n =
- UseCl TPres ASimul PPos
- (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
- And x y =
- ConjS and_Conj (BaseS x y) ;
+ concrete MusicFin of Music = MusicI with
+ (Grammar = GrammarFin),
+ (MusicLex = MusicLexFin) ;
+```
+More work is of course needed if the language-independent linearizations in
+MusicI are not satisfactory for some language. The resource grammar guarantees
+that the linearizations are possible in all languages, in the sense of grammatical,
+but they might of course be inadequate for stylistic reasons. Assume,
+for the sake of argument, that adjectival modification does not sound good in
+English, but that a relative clause would be preferrable. One can then start as
+before,
+```
+ concrete MusicLexEng of MusicLex =
+ CatEng ** open ParadigmsEng in {
+ lin
+ song_N = regN "song" ;
+ american_A = regA "American" ;
+ }
-}
+ concrete MusicEng0 of Music = MusicI with
+ (Grammar = GrammarEng),
+ (MusicLex = MusicLexEng) ;
+```
+The module ``MusicEng0`` would not be used on the top level, however, but
+another module would be built on top of it, with a restricted import from
+``MusicEng0``. ``MusicEng`` inherits everything from ``MusicEng0``
+except ``PropKind``, and
+gives its own definition of this function:
+```
+ concrete MusicEng of Music =
+ MusicEng0 - [PropKind] ** open GrammarEng in {
+ lin
+ PropKind k p =
+ RelCN k (UseRCl TPres ASimul PPos
+ (RelVP IdRP (UseComp (CompAP p)))) ;
+ }
+```
---# -path=.:alltenses:prelude
+===To find rules in the resource grammar library===
-concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
- lincat
- Prop = S ;
- Nat = NP ;
- lin
- Zero =
- UsePN (regPN "noll" neutrum) ;
- Succ n =
- DetCN (DetSg (SgQuant DefArt) NoOrd)
- (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare")
- (mkPreposition "till")) n) ;
- Even n =
- UseCl TPres ASimul PPos
- (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
- And x y =
- ConjS and_Conj (BaseS x y) ;
-}
+====Inflection paradigms====
+
+Inflection paradigms are defined separately for each language //L//
+in the module ``Paradigms``//L//. To test them, the command
+``cc`` (= ``compute_concrete``)
+can be used:
```
+ > i -retain german/ParadigmsGer.gf
+ > cc mkN "Schlange"
+ {
+ s : Number => Case => Str = table Number {
+ Sg => table Case {
+ Nom => "Schlange" ;
+ Acc => "Schlange" ;
+ Dat => "Schlange" ;
+ Gen => "Schlange"
+ } ;
+ Pl => table Case {
+ Nom => "Schlangen" ;
+ Acc => "Schlangen" ;
+ Dat => "Schlangen" ;
+ Gen => "Schlangen"
+ }
+ } ;
+ g : Gender = Fem
+ }
+```
+For the sake of convenience, every language implements these five paradigms:
+```
+ oper
+ mkN : Str -> N ; -- regular nouns
+ mkA : Str -> A : -- regular adjectives
+ mkV : Str -> V ; -- regular verbs
+ mkPN : Str -> PN ; -- regular proper names
+ mkV2 : V -> V2 ; -- direct transitive verbs
+```
+It is often possible to initialize a lexicon by just using these functions,
+and later revise it by using the more involved paradigms. For instance, in
+German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
+Masculine noun with the plural form ``"Liede"``.
+The individual ``Paradigms`` modules
+tell what cases are covered by the regular heuristics.
-===How to find resource functions===
+As a limiting case, one could even initialize the lexicon for a new language
+by copying the English (or some other already existing) lexicon. This would
+produce language with correct grammar but with content words directly borrowed from
+English - maybe not so strange in certain technical domains.
-The definitions in this example were found by parsing:
-```
- > i LangEng.gf
- -- for Successor:
- > p -cat=NP -mcfg -parser=topdown "the mother of Paris"
- -- for Even:
- > p -cat=S -mcfg -parser=topdown "Paris is old"
+====Syntax rules====
- -- for And:
- > p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
-```
-The use of parsing can be systematized by **example-based grammar writing**,
-to which we will return later.
+Syntax rules should be looked for in the module ``Constructors``.
+Below this top-level module exposing overloaded constructors,
+there are around 10 abstract modules, each defining constructors for
+a group of one or more related categories. For instance, the module
+``Noun`` defines how to construct common nouns, noun phrases, and determiners.
+But these special modules are seldom needed by the users of the library.
+TODO: when are they needed?
-===A functor implementation===
+Browsing the libraries is helped by the gfdoc-generated HTML pages,
+whose LaTeX versions are included in the present document.
-The interesting thing now is that the
-code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for
-some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
-"jämn" vs. "even"). How can we exploit the similarities and
-actually share code between the languages?
-The solution is to use a functor: an ``incomplete`` module that opens
-an ``abstract`` as an ``interface``, and then instantiate it to different
-languages that implement the interface. The structure is as follows:
-```
- abstract Foo ...
- incomplete concrete FooI = open Lang, Lex in ...
+====Browsing by the parser====
- concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
- concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
-```
-where ``Lex`` is an abstract lexicon that includes the vocabulary
-specific to this application:
+A method alternative to browsing library documentation is
+to use the parser.
+Even though parsing is not an intended end-user application
+of resource grammars, it is a useful technique for application grammarians
+to browse the library. To find out which resource function implements
+a particular structure, one can just parse a string that exemplifies this
+structure. For instance, to find out how sentences are built using
+transitive verbs, write
```
- abstract Lex = Cat ** ...
+ > i english/LangEng.gf
+
+ > p -cat=Cl -fcfg "she loves him"
- concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
- concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...
-```
-Here, again, a complete example (``abstract Arithm`` is as above):
+ PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
```
-incomplete concrete ArithmI of Arithm = open Lang, Lex in {
- lincat
- Prop = S ;
- Nat = NP ;
- lin
- Zero =
- UsePN zero_PN ;
- Succ n =
- DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
- Even n =
- UseCl TPres ASimul PPos
- (PredVP n (UseComp (CompAP (PositA even_A)))) ;
- And x y =
- ConjS and_Conj (BaseS x y) ;
-}
-
---# -path=.:alltenses:prelude
-concrete ArithmEng of Arithm = ArithmI with
- (Lang = LangEng),
- (Lex = LexEng) ;
+The parser returns original constructors, not overloaded ones.
---# -path=.:alltenses:prelude
-concrete ArithmSwe of Arithm = ArithmI with
- (Lang = LangSwe),
- (Lex = LexSwe) ;
+Parsing with the English resource grammar has an acceptable speed, but
+with most languages it takes just too much resources even to build the
+parser. However, examples parsed in one language can always be linearized into
+other languages:
+```
+ > i italian/LangIta.gf
-abstract Lex = Cat ** {
- fun
- zero_PN : PN ;
- successor_N2 : N2 ;
- even_A : A ;
-}
+ > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
-concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
- lin
- zero_PN = regPN "noll" neutrum ;
- successor_N2 =
- mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
- even_A = regA "jämn" ;
-}
+ lo ama
+```
+Therefore, one can use the English parser to write an Italian grammar, and also
+to write a language-independent (incomplete) grammar. One can also parse strings
+that are bizarre in English but the intended way of expression in another language.
+For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
+This can be built by parsing "I have beer" in LanEng and then writing
+```
+ lin IamHungry =
+ let beer_N = regGenN "fame" feminine
+ in
+ PredVP (UsePron i_Pron) (ComplV2 have_V2
+ (DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ;
```
+which uses ParadigmsIta.regGenN.
-===Restricted inheritance and qualified opening===