started next version of tutorial

author: aarne <aarne@cs.chalmers.se> 2007-07-03 15:28:50 +0000
committer: aarne <aarne@cs.chalmers.se> 2007-07-03 15:28:50 +0000
commit: 064df9267c125a878c0f41c9ffb5ed373b02f927 (patch)
tree: d82ae350b63da3bf0025b726145971ff3f71d8ad
parent: e0071bc69c1fef54d5a99db6d43dc00375850f09 (diff)
2 files changed, 806 insertions, 0 deletions
diff --git a/doc/intro-resource.txt b/doc/intro-resource.txt
new file mode 100644
index 000000000..74a366d87
--- /dev/null
+++ b/doc/intro-resource.txt
@@ -0,0 +1,506 @@
+==Coverage==
+
+The GF Resource Grammar Library contains grammar rules for
+10 languages (in addition, 2 languages are available as incomplete
+implementations, and a few more are under construction). Its purpose
+is to make these rules available for application programmers,
+who can thereby concentrate on the semantic and stylistic
+aspects of their grammars, without having to think about 
+grammaticality. The targeted level of application grammarians
+is that of a skilled programmer with
+a practical knowledge of the target languages, but without
+theoretical knowledge about their grammars.
+Such a combination of
+skills is typical of programmers who, for instance, want to localize
+software to new languages.
+
+The current resource languages are
+- ``Ara``bic (incomplete)
+- ``Cat``alan (incomplete)
+- ``Dan``ish
+- ``Eng``lish
+- ``Fin``nish
+- ``Fre``nch
+- ``Ger``man
+- ``Ita``lian
+- ``Nor``wegian
+- ``Rus``sian
+- ``Spa``nish
+- ``Swe``dish
+
+
+The first three letters (``Eng`` etc) are used in grammar module names.
+The incomplete Arabic and Catalan implementations are 
+enough to be used in many applications; they both contain, amoung other 
+things, complete inflectional morphology.
+
+
+
+==A first example==
+
+To give an example application, consider a system for steering
+music playing devices by voice commands. In the application,
+we may have a semantical category ``Kind``, examples
+of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song`` 
+is linearized into the noun "Lied", but knowing this is not
+enough to make the application work, because the noun must be 
+produced in both singular and plural, and in four different
+cases. By using the resource grammar library, it is enough to
+write
+```
+  lin Song = mkN "Lied" "Lieder" neuter
+```
+and the eight forms are correctly generated. The resource grammar
+library contains a complete set of inflectional paradigms (such as
+``mkN`` here), enabling the definition of any lexical items.
+
+The resource grammar library is not only about inflectional paradigms - it
+also has syntax rules. The music player application
+might also want to modify songs with properties, such as "American",
+"old", "good". The German grammar for adjectival modifications is
+particularly complex, because adjectives have to agree in gender,
+number, and case, and also depend on what determiner is used
+("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
+variation is taken care of by the resource grammar function
+```
+  mkCN : AP -> CN -> CN
+```
+(see the table in the end of this document for the list of all resource grammar
+functions). The resource grammar implementation of the rule adding properties
+to kinds is
+```
+  lin PropKind kind prop = mkCN prop kind
+```
+given that 
+```
+  lincat Prop = AP
+  lincat Kind = CN
+```
+The resource library API is devided into language-specific 
+and language-independent parts. To put it roughly,
+- the lexicon API is language-specific
+- the syntax API is language-independent
+
+
+Thus, to render the above example in French instead of German, we need to
+pick a different linearization of ``Song``,
+```
+  lin Song = mkN "chanson" feminine
+```
+But to linearize ``PropKind``, we can use the very same rule as in German.
+The resource function ``mkCN`` has different implementations in the two
+languages (e.g. a different word order in French), 
+but the application programmer need not care about the difference.
+
+
+
+==Note on APIs==
+
+From version 1.1 onwards, the resource library is available via two
+APIs:
+- original ``fun`` and ``oper`` definitions
+- overloaded ``oper`` definitions
+
+
+Introducing overloading in GF version 2.7 has been a success in improving
+the accessibility of libraries. It has also created a layer of abstraction
+between the writers and users of libraries, and thereby makes the library
+easier to modify. We shall therefore use the overloaded API
+in this document. The original function names are mainly interesting
+for those who want to write or modify libraries.
+
+
+
+==A complete example==
+
+To summarize the example, and also give a template for a programmer to work on,
+here is the complete implementation of a small system with songs and properties.
+The abstract syntax defines a "domain ontology":
+```
+  abstract Music = {
+    
+  cat 
+    Kind, 
+    Property ;
+  fun 
+    PropKind : Kind -> Property -> Kind ; 
+    Song : Kind ;
+    American : Property ;
+  }
+```
+The concrete syntax is defined by a functor (parametrized module),
+independently of language, by opening
+two interfaces: the resource ``Syntax`` and an application lexicon.
+```
+  incomplete concrete MusicI of Music = 
+      open Syntax, MusicLex in {
+  lincat 
+    Kind = CN ;
+    Property = AP ;
+  lin
+    PropKind k p = mkCN p k ;
+    Song = mkCN song_N ;
+    American = mkAP american_A ;
+  }
+```
+The application lexicon ``MusicLex`` has an abstract syntax that extends
+the resource category system ``Cat``.
+```
+  abstract MusicLex = Cat ** {
+    
+  fun
+    song_N : N ;
+    american_A : A ;
+  }
+```
+Each language has its own concrete syntax, which opens the 
+inflectional paradigms module for that language:
+```
+  concrete MusicLexGer of MusicLex = 
+      CatGer ** open ParadigmsGer in {    
+  lin
+    song_N = mkN "Lied" "Lieder" neuter ;
+    american_A = mkA "amerikanisch" ;
+  }
+
+  concrete MusicLexFre of MusicLex = 
+      CatFre ** open ParadigmsFre in {
+  lin
+    song_N = mkN "chanson" feminine ;
+    american_A = mkA "américain" ;
+  }
+```
+The top-level ``Music`` grammars are obtained by 
+instantiating the two interfaces of ``MusicI``:
+```
+  concrete MusicGer of Music = MusicI with
+    (Syntax = SyntaxGer),
+    (MusicLex = MusicLexGer) ;
+
+  concrete MusicFre of Music = MusicI with
+    (Syntax = SyntaxFre),
+    (MusicLex = MusicLexFre) ;
+```
+Both of these files can use the same ``path``, defined as
+```
+  --# -path=.:present:prelude
+```
+The ``present`` category contains the compiled resources, restricted to
+present tense; ``alltenses`` has the full resources.
+
+To localize the music player system to a new language, 
+all that is needed is two modules,
+one implementing ``MusicLex`` and the other 
+instantiating ``Music``. The latter is
+completely trivial, whereas the former one involves the choice of correct
+vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
+```
+  concrete MusicLexFin of MusicLex = 
+      CatFin ** open ParadigmsFin in {
+  lin
+    song_N = mkN "kappale" ;
+    american_A = mkA "amerikkalainen" ;
+  }
+
+  concrete MusicFin of Music = MusicI with
+    (Syntax = SyntaxFin),
+    (MusicLex = MusicLexFin) ;
+```
+More work is of course needed if the language-independent linearizations in
+MusicI are not satisfactory for some language. The resource grammar guarantees
+that the linearizations are possible in all languages, in the sense of grammatical,
+but they might of course be inadequate for stylistic reasons. Assume, 
+for the sake of argument, that adjectival modification does not sound good in
+English, but that a relative clause would be preferrable. One can then use
+restricted inheritance of the functor:
+```
+  concrete MusicEng of Music = 
+    MusicI - [PropKind] 
+      with
+        (Syntax = SyntaxEng),
+        (MusicLex = MusicLexEng) ** 
+    open SyntaxEng in {
+  lin
+    PropKind k p = mkCN k (mkRS (mkRCl which_RP (mkVP p))) ;
+  }
+```
+The lexicon is as expected:
+```
+  concrete MusicLexEng of MusicLex = 
+      CatEng ** open ParadigmsEng in {
+  lin
+    song_N = mkN "song" ;
+    american_A = mkA "American" ;
+  }
+```
+
+
+==Lock fields==
+
+//This section is only relevant as a guide to error messages that have to do with lock fields, and can be skipped otherwise.//
+
+FIXME: this section may become obsolete.
+
+When the categories of the resource grammar are used
+in applications, a **lock field** is added to their linearization types.
+The lock field for a category ``C`` is a record field 
+```
+  lock_C : {}
+```
+with the only possible value
+```
+  lock_C = <>
+```
+The lock field carries no information, but its presence
+makes the linearization type of ``C``
+unique, so that categories
+with the same implementation are not confused with each other.
+(This is inspired by the ``newtype`` discipline in Haskell.)
+
+For example, the lincats of adverbs and conjunctions are the same
+in ``CatEng`` (and therefore in ``GrammarEng``, which inherits it):
+```
+  lincat Adv  = {s : Str} ;
+  lincat Conj = {s : Str} ;
+```
+But when these category symbols are used to denote their linearization 
+types in an application, these definitions are translated to
+```
+  oper Adv  : Type = {s : Str  ; lock_Adv  : {}} ;
+  oper Conj : Type = {s : Str} ; lock_Conj : {}} ;
+```
+In this way, the user of a resource grammar cannot confuse adverbs with
+conjunctions. In other words, the lock fields force the type checker
+to function as grammaticality checker.
+
+When the resource grammar is ``open``ed in an application grammar, 
+and only functions from the resource are used in type-correct way, the
+lock fields are never seen (except possibly in type error messages).
+If an application grammarian has to write lock fields herself,
+it is a sign that the guarantees given by the resource grammar 
+no longer hold. But since the resource may be incomplete, the
+application grammarian may occasionally have to provide the dummy
+values of lock fields (always ``<>``, the empty record). 
+Here is an example:
+```
+  mkUtt : Str -> Utt ;
+  mkUtt s = {s = s ; lock_Utt = <>} ;
+```
+Currently, missing lock field produce warnings rather than errors,
+but this behaviour of GF may change in future.
+
+
+==Parsing with resource grammars?==
+
+The intended use of the resource grammar is as a library for writing
+application grammars. It is not designed for parsing e.g. newspaper text. There
+are several reasons why this is not practical:
+- Efficiency: the resource grammar uses complex data structures, in
+particular, discontinuous constituents, which make parsing slow and the
+parser size huge.
+- Completeness: the resource grammar does not necessarily cover all rules
+of the language - only enough many to be able to express everything
+in one way or another.
+- Lexicon: the resource grammar has a very small lexicon, only meant for test
+purposes.
+- Semantics: the resource grammar has very little semantic control, and may
+accept strange input or deliver strange interpretations.
+- Ambiguity: parsing in the resource grammar may return lots of results many
+of which are implausible.
+
+
+All of these problems should be solved in application grammars. 
+The task of resource grammars is just to take care of low-level linguistic 
+details such as inflection, agreement, and word order.
+
+It is for the same reasons that resource grammars are not adequate for translation.
+That the syntax API is implemented for different languages of course makes
+it possible to translate via it - but there is no guarantee of translation
+equivalence. Of course, the use of functor implementations such as ``MusicI``
+above only extends to those cases where the syntax API does give translation
+equivalence - but this must be seen as a limiting case, and bigger applications
+will often use only restricted inheritance of ``MusicI``.
+
+
+
+=To find rules in the resource grammar library=
+
+==Inflection paradigms==
+
+Inflection paradigms are defined separately for each language //L//
+in the module ``Paradigms``//L//. To test them, the command 
+``cc`` (= ``compute_concrete``)
+can be used:
+```
+  > i -retain german/ParadigmsGer.gf
+
+  > cc mkN "Schlange"
+  {
+    s : Number => Case => Str = table Number {
+      Sg => table Case {
+        Nom => "Schlange" ;
+        Acc => "Schlange" ;
+        Dat => "Schlange" ;
+        Gen => "Schlange"
+        } ;
+      Pl => table Case {
+        Nom => "Schlangen" ;
+        Acc => "Schlangen" ;
+        Dat => "Schlangen" ;
+        Gen => "Schlangen"
+        }
+      } ;
+    g : Gender = Fem
+  }
+```
+For the sake of convenience, every language implements these five paradigms:
+```
+  oper
+    mkN  : Str -> N ;   -- regular nouns
+    mkA  : Str -> A :   -- regular adjectives
+    mkV  : Str -> V ;   -- regular verbs
+    mkPN : Str -> PN ;  -- regular proper names
+    mkV2 : V   -> V2 ;  -- direct transitive verbs
+```
+It is often possible to initialize a lexicon by just using these functions,
+and later revise it by using the more involved paradigms. For instance, in
+German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
+Masculine noun with the plural form ``"Liede"``. 
+The individual ``Paradigms`` modules
+tell what cases are covered by the regular heuristics.
+
+As a limiting case, one could even initialize the lexicon for a new language
+by copying the English (or some other already existing) lexicon. This would
+produce language with correct grammar but with content words directly borrowed from
+English - maybe not so strange in certain technical domains.
+
+
+
+==Syntax rules==
+
+Syntax rules should be looked for in the module ``Constructors``.
+Below this top-level module exposing overloaded constructors,
+there are around 10 abstract modules, each defining constructors for
+a group of one or more related categories. For instance, the module
+``Noun`` defines how to construct common nouns, noun phrases, and determiners.
+But these special modules are seldom or never needed by the users of the library.
+
+TODO: when are they needed?
+
+Browsing the libraries is helped by the gfdoc-generated HTML pages,
+whose LaTeX versions are included in the present document.
+
+
+==Special-purpose APIs==
+
+To give an analogy with the well-known type setting software, GF can be compared
+with TeX and the resource grammar library with LaTeX. 
+Just like TeX frees the author
+from thinking about low-level problems of page layout, so GF frees the grammarian
+from writing parsing and generation algorithms. But quite a lot of knowledge of
+//how// to write grammars is still needed, and the resource grammar library helps
+GF grammarians in a way similar to how the LaTeX macro package helps TeX authors.
+
+But even LaTeX is often too detailed and low-level, and users are encouraged to
+develop their own macro packages. The same applies to GF resource grammars:
+the application grammarian might not need all the choices that the resource
+provides, but would prefer less writing and higher-level programming.
+To this end, application grammarians may want to write their own views on the
+resource grammar. 
+
+
+==Browsing by the parser==
+
+A method alternative to browsing library documentation is
+to use the parser.
+Even though parsing is not an intended end-user application 
+of resource grammars, it is a useful technique for application grammarians
+to browse the library. To find out which resource function implements
+a particular structure, one can just parse a string that exemplifies this 
+structure. For instance, to find out how sentences are built using 
+transitive verbs, write
+```
+  > i english/LangEng.gf
+ 
+  > p -cat=Cl "she loves him"
+  PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
+```
+The parser returns original constructors, not overloaded ones. Overloaded
+constructors can be returned, so far with experimental heuristics, by using
+the grammar ``api/toplevel/OverLangEng.gf`` and a special flag:
+```
+  > i api/toplevel/OverLangEng.gf
+
+  > p -cat=Cl -overload "she loves him"
+  mkCl (mkNP she_Pron) love_V2 (mkNP he_Pron)
+```
+Parsing with the English resource grammar has an acceptable speed, but
+with most languages it takes just too much resources even to build the
+parser. However, examples parsed in one language can always be linearized into
+other languages:
+```
+  > i italian/LangIta.gf
+
+  > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
+  lo ama
+```
+Therefore, one can use the English parser to write an Italian grammar, and also
+to write a language-independent (incomplete) grammar. One can also parse strings
+that are bizarre in English but the intended way of expression in another language.
+For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
+This can be built by parsing "I have beer" in ``OverLangEng`` and then writing
+```
+  lin IamHungry = 
+    let beer_N = mkN "fame" feminine 
+    in
+    mkCl (mkNP i_Pron) have_V2 (mkNP massQuant beer_N)
+```
+which uses ``ParadigmsIta.mkN``. 
+
+
+
+==Example-based grammar writing==
+
+The technique of parsing with the resource grammar can be used in GF source files,
+endowed with the suffix ``.gfe`` ("GF examples"). The suffix tells GF to preprocess
+the file by replacing all expressions of the form
+```
+  in Module.Cat "example string"
+```
+by the syntax trees obtained by parsing "example string" in ``Cat`` in ``Module``.
+For instance,
+```
+  lin IamHungry = 
+    let beer_N = mkN "fame" feminine 
+    in
+    (in LangEng.Cl "I have beer") ;
+```
+will result in the rule displayed in the previous section. The normal binding rules
+of functional programming (and GF) guarantee that local bindings of identifiers
+take precedence over constants of the same forms. Thus it is also possible to
+linearize functions taking arguments in this way:
+```
+  lin
+    PropKind car_N old_A = in LangEng.CN "old car" ;
+```
+However, the technique of example-based grammar writing has some limitations:
+- Ambiguity. If a string has several parses, the first one is returned, and
+it may not be the intended one. The other parses are shown in a comment, from
+where they must/can be picked manually.
+- Lexicality. The arguments of a function must be atomic identifiers, and are thus
+not available for categories that have no lexical items. 
+For instance, the ``PropKind`` rule above gives the result
+```
+  lin
+    PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;  
+```
+However, it is possible to write a special lexicon that gives atomic rules for
+all those categories that can be used as arguments, for instance,
+```
+  fun
+    cat_CN : CN ;
+    old_AP : AP ;
+```
+and then use this lexicon instead of the standard one included in ``Lang``.
+
+
diff --git a/doc/overview-resource.txt b/doc/overview-resource.txt
new file mode 100644
index 000000000..2f9b2cd04
--- /dev/null
+++ b/doc/overview-resource.txt
@@ -0,0 +1,300 @@
+==Texts. phrases, and utterances==
+
+The outermost linguistic structure is ``Text``. ``Text``s are composed
+from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or
+"!" (with their proper variants in Spanish and Arabic). Here is an 
+example of a ``Text`` string.
+```
+  John walks. Why? He doesn't want to sleep!
+```
+Phrases are mostly built from Utterances (``Utt``), which in turn are
+declarative sentences, questions, or imperatives - but there
+are also "one-word utterances" consisting of noun phrases
+or other subsentential phrases. Some Phrases are atomic,
+for instance "yes" and "no". Here are some examples of Phrases.
+```
+  yes
+  come on, John
+  but John walks
+  give me the stick please
+  don't you know that he is sleeping
+  a glass of wine
+  a glass of wine please
+```
+There is no connection between the punctuation marks and the
+types of utterances. This reflects the fact that the punctuation
+mark in a real text is selected as a function of the speech act
+rather than the grammatical form of an utterance. The following
+text is thus well-formed.
+```
+  John walks. John walks? John walks!
+```
+What is the difference between Phrase and Utterance? Just technical:
+a Phrase is an Utterance with an optional leading conjunction ("but")
+and an optional tailing vocative ("John", "please").
+
+
+==Sentences and clauses==
+
+TODO: use overloaded operations in the examples.
+
+The richest of the categories below Utterance is ``S``, Sentence. A Sentence
+is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity.
+For example, each of the following strings has a distinct syntax tree
+in the category Sentence:
+```
+  John walks
+  John doesn't walk
+  John walked
+  John didn't walk
+  John has walked
+  John hasn't walked
+  John will walk
+  John won't walk
+  ...
+```
+whereas in the category Clause all of them are just different forms of
+the same tree.
+The difference between Sentence and Clause is thus also rather technical.
+It may not correspond exactly to any standard usage of the terms
+"clause" and "sentence".
+
+Figure 1 shows a type-annotated syntax tree of the Text "John walks." 
+and gives an overview of the structural levels.
+
+#BFIG
+
+```
+Node Constructor             Value type  Other constructors
+-----------------------------------------------------------
+ 1.  TFullStop               Text        TQuestMark
+ 2.    (PhrUtt               Phr             
+ 3.      NoPConj             PConj       but_PConj
+ 4.      (UttS               Utt         UttQS
+ 5.        (UseCl            S           UseQCl
+ 6.           TPres          Tense       TPast
+ 7.           ASimul         Anter       AAnter
+ 8.           PPos           Pol         PNeg
+ 9.           (PredVP        Cl              
+10.             (UsePN       NP          UsePron, DetCN
+11.               john_PN)   PN          mary_PN
+12.             (UseV        VP          ComplV2, ComplV3
+13.               walk_V)))) V           sleep_V
+14.      NoVoc)              Voc         please_Voc
+15.    TEmpty                Text            
+```
+
+#BCENTER
+Figure 1. Type-annotated syntax tree of the Text "John walks."
+#ECENTER
+
+#EFIG
+
+Here are some examples of the results of changing constructors.
+```
+ 1. TFullStop -> TQuestMark   John walks?
+ 3. NoPConj   -> but_PConj    But John walks.
+ 6. TPres     -> TPast        John walked.
+ 7. ASimul    -> AAnter       John has walked.
+ 8. PPos      -> PNeg         John doesn't walk.
+11. john_PN   -> mary_PN      Mary walks.
+13. walk_V    -> sleep_V      John sleeps.
+14. NoVoc     -> please_Voc   John sleeps please.
+```
+All constructors cannot of course be changed so freely, because the
+resulting tree would not remain well-typed. Here are some changes involving
+many constructors:
+```
+ 4- 5. UttS (UseCl ...) -> 
+         UttQS (UseQCl (... QuestCl ...)) Does John walk?
+10-11. UsePN john_PN    -> 
+         UsePron we_Pron                  We walk.
+12-13. UseV walk_V      -> 
+         ComplV2 love_V2 this_NP          John loves this. 
+```
+
+
+==Parts of sentences==
+
+The linguistic phenomena mostly discussed in both traditional grammars and modern
+syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
+to Sentences, lines 5-13. At this level, the major categories are
+``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically 
+consists of just an ``NP`` and a ``VP``. 
+The internal structure of both ``NP`` and ``VP`` can be very complex,
+and these categories are mutually recursive: not only can a ``VP`` 
+contain an ``NP``,
+```
+  [VP loves [NP Mary]]
+```
+but also an ``NP`` can contain a ``VP``
+```
+  [NP every man [RS who [VP walks]]]
+```
+(a labelled bracketing like this is of course just a rough approximation of
+a GF syntax tree, but still a useful device of exposition).
+
+Most of the resource modules thus define functions that are used inside
+NPs and VPs. Here is a brief overview:
+
+**Noun**. How to construct NPs. The main three mechanisms 
+for constructing NPs are
+- from proper names: "John"
+- from pronouns: "we"
+- from common nouns by determiners: "this man"
+
+
+The ``Noun`` module also defines the construction of common nouns. 
+The most frequent ways are
+- lexical noun items: "man"
+- adjectival modification: "old man"
+- relative clause modification: "man who sleeps"
+- application of relational nouns: "successor of the number"
+
+
+**Verb**. 
+How to construct VPs. The main mechanism is verbs with their arguments, 
+for instance,
+- one-place verbs: "walks"
+- two-place verbs: "loves Mary"
+- three-place verbs: "gives her a kiss"
+- sentence-complement verbs: "says that it is cold"
+- VP-complement verbs: "wants to give her a kiss"
+
+
+A special verb is the copula, "be" in English but not even realized 
+by a verb in all languages.
+A copula can take different kinds of complement: 
+- an adjectival phrase: "(John is) old"
+- an adverb: "(John is) here"
+- a noun phrase: "(John is) a man"
+
+
+**Adjective**. 
+How to constuct ``AP``s. The main ways are
+- positive forms of adjectives: "old"
+- comparative forms with object of comparison: "older than John"
+
+
+**Adverb**. 
+How to construct ``Adv``s. The main ways are
+- from adjectives: "slowly"
+- as prepositional phrases: "in the car"
+
+
+==Modules and their names==
+
+This section is not necessary for users of the library.
+
+TODO: explain the overloaded API.
+
+The resource modules are named after the kind of 
+phrases that are constructed in them,
+and they can be roughly classified by the "level" or "size" of expressions that are
+formed in them:
+- Larger than sentence: ``Text``, ``Phrase`` 
+- Same level as sentence: ``Sentence``, ``Question``, ``Relative``
+- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb``
+- Cross-cut (coordination): ``Conjunction``
+
+
+Because of mutual recursion such as in embedded sentences, this classification is
+not a complete order. However, no mutual dependence is needed between the 
+modules themselves - they can all be compiled separately. This is due
+to the module ``Cat``, which defines the type system common to the other modules.
+For instance, the types ``NP`` and ``VP`` are defined in ``Cat``, 
+and the module ``Verb`` only
+needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement
+a rule such as
+```
+  Verb.ComplV2 : V2 -> NP -> VP
+```
+it is enough to know the linearization type of ``NP`` 
+(as well as those of ``V2`` and ``VP``, all
+given in ``Cat``). It is not necessary to know what
+ways there are to build ``NP``s (given in ``Noun``), since all these ways must 
+conform to the linearization type defined in ``Cat``. Thus the format of
+category-specific modules is as follows:
+```
+  abstract Adjective = Cat ** {...}
+  abstract Noun      = Cat ** {...}
+  abstract Verb      = Cat ** {...}
+```
+
+
+==Top-level grammar and lexicon==
+
+The module ``Grammar`` collects all the category-specific modules into
+a complete grammar:
+```
+  abstract Grammar = 
+    Adjective, Noun, Verb, ..., Structural, Idiom
+```
+The module ``Structural`` is a lexicon of structural words (function words),
+such as determiners.
+
+The module ``Idiom`` is a collection of idiomatic structures whose
+implementation is very language-dependent. An example is existential
+structures ("there is", "es gibt", "il y a", etc).
+
+The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of 
+ca. 350 content words:
+```
+  abstract Lang = Grammar, Lexicon
+```
+Using ``Lang`` instead of ``Grammar`` as a library may give 
+for free some words needed in an application. But its main purpose is to
+help testing the resource library, rather than as a resource itself. 
+It does not even seem realistic to develop
+a general-purpose multilingual resource lexicon. 
+
+The diagram in Figure 2 shows the structure of the API.
+
+#BFIG
+
+#GRAMMAR
+
+#BCENTER
+Figure 2. The resource syntax API. 
+#ECENTER
+
+#EFIG
+
+==Language-specific syntactic structures==
+
+The API collected in ``Grammar`` has been designed to be implementable for
+all languages in the resource package. It does contain some rules that
+are strange or superfluous in some languages; for instance, the distinction
+between definite and indefinite articles does not apply to Finnish and Russian.
+But such rules are still easy to implement: they only create some superfluous
+ambiguity in the languages in question.
+
+But the library makes no claim that all languages should have exactly the same
+abstract syntax. The common API is therefore extended by language-dependent
+rules. The top level of each languages looks as follows (with English as example):
+```
+  abstract English = Grammar, ExtraEngAbs, DictEngAbs
+```
+where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
+and ``DictEngAbs`` is an English dictionary 
+(at the moment, it consists of ``IrregEngAbs``,
+the irregular verbs of English). Each of these language-specific grammars has 
+the potential to grow into a full-scale grammar of the language. These grammars
+can also be used as libraries, but the possibility of using functors is lost.
+
+To give a better overview of language-specific structures, 
+modules like ``ExtraEngAbs``
+are built from a language-independent module ``ExtraAbs`` 
+by restricted inheritance:
+```
+  abstract ExtraEngAbs = Extra [f,g,...]
+```
+Thus any category and function in ``Extra`` may be shared by a subset of all
+languages. One can see this set-up as a matrix, which tells 
+what ``Extra`` structures
+are implemented in what languages. For the common API in ``Grammar``, the matrix
+is filled with 1's (everything is implemented in every language).
+
+Language-specific extensions and the use of restricted
+inheritance is a recent addition to the resource grammar library, and
+has only been exploited in a very small scale so far.
author	aarne <aarne@cs.chalmers.se>	2007-07-03 15:28:50 +0000
committer	aarne <aarne@cs.chalmers.se>	2007-07-03 15:28:50 +0000
commit	064df9267c125a878c0f41c9ffb5ed373b02f927 (patch)
tree	d82ae350b63da3bf0025b726145971ff3f71d8ad
parent	e0071bc69c1fef54d5a99db6d43dc00375850f09 (diff)