diff options
Diffstat (limited to 'doc/resource.txt')
| -rw-r--r-- | doc/resource.txt | 1258 |
1 files changed, 0 insertions, 1258 deletions
diff --git a/doc/resource.txt b/doc/resource.txt deleted file mode 100644 index 13f1f4798..000000000 --- a/doc/resource.txt +++ /dev/null @@ -1,1258 +0,0 @@ -The GF Resource Grammar Library, Version 1.2 -Authors: Aarne Ranta, Ali El Dada, Janna Khegai, and Björn Bringert -Last update: %%date(%c) - -% NOTE: this is a txt2tags file. -% Create an latex file from this file using: -% txt2tags -ttex --toc resource.txt -%!style(tex) : isolatin1 -%!postproc: "section*{" "section{" -%!postproc(tex): "#SMALL" "scriptsize" -%!postproc(tex): "#BFIG" "begin{figure}" -%!postproc(tex): "#GRAMMAR" "includegraphics[width=4in]{Grammar.epsi}" -%!postproc(tex): "#EFIG" "end{figure}" -%!postproc(tex): "#BCENTER" "begin{center}" -%!postproc(tex): "#ECENTER" "end{center}" -%!postproc(tex): "#CAPTION" "caption{" -%!postproc(tex): "#RBRACE" "end{figure}" -%!postproc(tex): "#CLEARPAGE" "clearpage" -%!postproc(tex): "#PARADIGMSRUS" "input{ParadigmsRus.tex}" -%!target:tex - -#CLEARPAGE - -%%toc - -#CLEARPAGE - -This document is a guide for using the -GF Resource Grammar Library. It presupposes knowledge of GF and its -module system, knowledge that can be acquired e.g. from the -GF tutorial. -We start with an introduction to the library, and proceed to -details with the goal of covering all that one needs to know -in order to use the library. - -How to //write// one's own resource grammar (i.e. to implement the API for -a new language), is covered by a separate Resource-HOWTO document (available in -the www address below). - -The main part of the document (the API documentation) is generated -from the actual GF code by using the ``gfdoc`` tool. This documentation -is also available on-line in HTML format in - -[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.2/doc/`` http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.2/doc/]. - - -=Motivation= - -The GF Resource Grammar Library contains grammar rules for -10 languages (in addition, 2 languages are available as incomplete -implementations, and a few more are under construction). Its purpose -is to make these rules available for application programmers, -who can thereby concentrate on the semantic and stylistic -aspects of their grammars, without having to think about -grammaticality. The targeted level of application grammarians -is that of a skilled programmer with -a practical knowledge of the target languages, but without -theoretical knowledge about their grammars. -Such a combination of -skills is typical of programmers who, for instance, want to localize -software to new languages. - -The current resource languages are -- ``Ara``bic -- ``Cat``alan -- ``Dan``ish -- ``Eng``lish -- ``Fin``nish -- ``Fre``nch -- ``Ger``man -- ``Ita``lian -- ``Nor``wegian -- ``Rus``sian -- ``Spa``nish -- ``Swe``dish - - -The first three letters (``Eng`` etc) are used in grammar module names. -The Arabic and Catalan implementations are still incomplete, but -enough to be used in many applications. - - - -==A first example== - -To give an example application, consider a system for steering -music playing devices by voice commands. In the application, -we may have a semantical category ``Kind``, examples -of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song`` -is linearized into the noun "Lied", but knowing this is not -enough to make the application work, because the noun must be -produced in both singular and plural, and in four different -cases. By using the resource grammar library, it is enough to -write -``` - lin Song = mkN "Lied" "Lieder" neuter -``` -and the eight forms are correctly generated. The resource grammar -library contains a complete set of inflectional paradigms (such as -``mkN`` here), enabling the definition of any lexical items. - -The resource grammar library is not only about inflectional paradigms - it -also has syntax rules. The music player application -might also want to modify songs with properties, such as "American", -"old", "good". The German grammar for adjectival modifications is -particularly complex, because adjectives have to agree in gender, -number, and case, and also depend on what determiner is used -("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this -variation is taken care of by the resource grammar function -``` - mkCN : AP -> CN -> CN -``` -(see the table in the end of this document for the list of all resource grammar -functions). The resource grammar implementation of the rule adding properties -to kinds is -``` - lin PropKind kind prop = mkCN prop kind -``` -given that -``` - lincat Prop = AP - lincat Kind = CN -``` -The resource library API is devided into language-specific -and language-independent parts. To put it roughly, -- the lexicon API is language-specific -- the syntax API is language-independent - - -Thus, to render the above example in French instead of German, we need to -pick a different linearization of ``Song``, -``` - lin Song = mkN "chanson" feminine -``` -But to linearize ``PropKind``, we can use the very same rule as in German. -The resource function ``mkCN`` has different implementations in the two -languages (e.g. a different word order in French), -but the application programmer need not care about the difference. - - - -==Note on APIs== - -From version 1.1 onwards, the resource library is available via two -APIs: -- original ``fun`` and ``oper`` definitions -- overloaded ``oper`` definitions - - -Introducing overloading in GF version 2.7 has been a success in improving -the accessibility of libraries. It has also created a layer of abstraction -between the writers and users of libraries, and thereby makes the library -easier to modify. We shall therefore use the overloaded API -in this document. The original function names are mainly interesting -for those who want to write or modify libraries. - - - -==A complete example== - -To summarize the example, and also give a template for a programmer to work on, -here is the complete implementation of a small system with songs and properties. -The abstract syntax defines a "domain ontology": -``` - abstract Music = { - - cat - Kind, - Property ; - fun - PropKind : Kind -> Property -> Kind ; - Song : Kind ; - American : Property ; - } -``` -The concrete syntax is defined by a functor (parametrized module), -independently of language, by opening -two interfaces: the resource ``Syntax`` and an application lexicon. -``` - incomplete concrete MusicI of Music = - open Syntax, MusicLex in { - lincat - Kind = CN ; - Property = AP ; - lin - PropKind k p = mkCN p k ; - Song = mkCN song_N ; - American = mkAP american_A ; - } -``` -The application lexicon ``MusicLex`` has an abstract syntax that extends -the resource category system ``Cat``. -``` - abstract MusicLex = Cat ** { - - fun - song_N : N ; - american_A : A ; - } -``` -Each language has its own concrete syntax, which opens the -inflectional paradigms module for that language: -``` - concrete MusicLexGer of MusicLex = - CatGer ** open ParadigmsGer in { - lin - song_N = mkN "Lied" "Lieder" neuter ; - american_A = mkA "amerikanisch" ; - } - - concrete MusicLexFre of MusicLex = - CatFre ** open ParadigmsFre in { - lin - song_N = mkN "chanson" feminine ; - american_A = mkA "américain" ; - } -``` -The top-level ``Music`` grammars are obtained by -instantiating the two interfaces of ``MusicI``: -``` - concrete MusicGer of Music = MusicI with - (Syntax = SyntaxGer), - (MusicLex = MusicLexGer) ; - - concrete MusicFre of Music = MusicI with - (Syntax = SyntaxFre), - (MusicLex = MusicLexFre) ; -``` -Both of these files can use the same ``path``, defined as -``` - --# -path=.:present:prelude -``` -The ``present`` category contains the compiled resources, restricted to -present tense; ``alltenses`` has the full resources. - -To localize the music player system to a new language, -all that is needed is two modules, -one implementing ``MusicLex`` and the other -instantiating ``Music``. The latter is -completely trivial, whereas the former one involves the choice of correct -vocabulary and inflectional paradigms. For instance, Finnish is added as follows: -``` - concrete MusicLexFin of MusicLex = - CatFin ** open ParadigmsFin in { - lin - song_N = mkN "kappale" ; - american_A = mkA "amerikkalainen" ; - } - - concrete MusicFin of Music = MusicI with - (Syntax = SyntaxFin), - (MusicLex = MusicLexFin) ; -``` -More work is of course needed if the language-independent linearizations in -MusicI are not satisfactory for some language. The resource grammar guarantees -that the linearizations are possible in all languages, in the sense of grammatical, -but they might of course be inadequate for stylistic reasons. Assume, -for the sake of argument, that adjectival modification does not sound good in -English, but that a relative clause would be preferrable. One can then use -restricted inheritance of the functor: -``` - concrete MusicEng of Music = - MusicI - [PropKind] - with - (Syntax = SyntaxEng), - (MusicLex = MusicLexEng) ** - open SyntaxEng in { - lin - PropKind k p = mkCN k (mkRS (mkRCl which_RP (mkVP p))) ; - } -``` -The lexicon is as expected: -``` - concrete MusicLexEng of MusicLex = - CatEng ** open ParadigmsEng in { - lin - song_N = mkN "song" ; - american_A = mkA "American" ; - } -``` - - -==Lock fields== - -//This section is only relevant as a guide to error messages that have to do with lock fields, and can be skipped otherwise.// - -FIXME: this section may become obsolete. - -When the categories of the resource grammar are used -in applications, a **lock field** is added to their linearization types. -The lock field for a category ``C`` is a record field -``` - lock_C : {} -``` -with the only possible value -``` - lock_C = <> -``` -The lock field carries no information, but its presence -makes the linearization type of ``C`` -unique, so that categories -with the same implementation are not confused with each other. -(This is inspired by the ``newtype`` discipline in Haskell.) - -For example, the lincats of adverbs and conjunctions are the same -in ``CatEng`` (and therefore in ``GrammarEng``, which inherits it): -``` - lincat Adv = {s : Str} ; - lincat Conj = {s : Str} ; -``` -But when these category symbols are used to denote their linearization -types in an application, these definitions are translated to -``` - oper Adv : Type = {s : Str ; lock_Adv : {}} ; - oper Conj : Type = {s : Str} ; lock_Conj : {}} ; -``` -In this way, the user of a resource grammar cannot confuse adverbs with -conjunctions. In other words, the lock fields force the type checker -to function as grammaticality checker. - -When the resource grammar is ``open``ed in an application grammar, -and only functions from the resource are used in type-correct way, the -lock fields are never seen (except possibly in type error messages). -If an application grammarian has to write lock fields herself, -it is a sign that the guarantees given by the resource grammar -no longer hold. But since the resource may be incomplete, the -application grammarian may occasionally have to provide the dummy -values of lock fields (always ``<>``, the empty record). -Here is an example: -``` - mkUtt : Str -> Utt ; - mkUtt s = {s = s ; lock_Utt = <>} ; -``` -Currently, missing lock field produce warnings rather than errors, -but this behaviour of GF may change in future. - - -==Parsing with resource grammars?== - -The intended use of the resource grammar is as a library for writing -application grammars. It is not designed for parsing e.g. newspaper text. There -are several reasons why this is not practical: -- Efficiency: the resource grammar uses complex data structures, in -particular, discontinuous constituents, which make parsing slow and the -parser size huge. -- Completeness: the resource grammar does not necessarily cover all rules -of the language - only enough many to be able to express everything -in one way or another. -- Lexicon: the resource grammar has a very small lexicon, only meant for test -purposes. -- Semantics: the resource grammar has very little semantic control, and may -accept strange input or deliver strange interpretations. -- Ambiguity: parsing in the resource grammar may return lots of results many -of which are implausible. - - -All of these problems should be solved in application grammars. -The task of resource grammars is just to take care of low-level linguistic -details such as inflection, agreement, and word order. - -It is for the same reasons that resource grammars are not adequate for translation. -That the syntax API is implemented for different languages of course makes -it possible to translate via it - but there is no guarantee of translation -equivalence. Of course, the use of functor implementations such as ``MusicI`` -above only extends to those cases where the syntax API does give translation -equivalence - but this must be seen as a limiting case, and bigger applications -will often use only restricted inheritance of ``MusicI``. - - - -=To find rules in the resource grammar library= - -==Inflection paradigms== - -Inflection paradigms are defined separately for each language //L// -in the module ``Paradigms``//L//. To test them, the command -``cc`` (= ``compute_concrete``) -can be used: -``` - > i -retain german/ParadigmsGer.gf - - > cc mkN "Schlange" - { - s : Number => Case => Str = table Number { - Sg => table Case { - Nom => "Schlange" ; - Acc => "Schlange" ; - Dat => "Schlange" ; - Gen => "Schlange" - } ; - Pl => table Case { - Nom => "Schlangen" ; - Acc => "Schlangen" ; - Dat => "Schlangen" ; - Gen => "Schlangen" - } - } ; - g : Gender = Fem - } -``` -For the sake of convenience, every language implements these five paradigms: -``` - oper - mkN : Str -> N ; -- regular nouns - mkA : Str -> A : -- regular adjectives - mkV : Str -> V ; -- regular verbs - mkPN : Str -> PN ; -- regular proper names - mkV2 : V -> V2 ; -- direct transitive verbs -``` -It is often possible to initialize a lexicon by just using these functions, -and later revise it by using the more involved paradigms. For instance, in -German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a -Masculine noun with the plural form ``"Liede"``. -The individual ``Paradigms`` modules -tell what cases are covered by the regular heuristics. - -As a limiting case, one could even initialize the lexicon for a new language -by copying the English (or some other already existing) lexicon. This would -produce language with correct grammar but with content words directly borrowed from -English - maybe not so strange in certain technical domains. - - - -==Syntax rules== - -Syntax rules should be looked for in the module ``Constructors``. -Below this top-level module exposing overloaded constructors, -there are around 10 abstract modules, each defining constructors for -a group of one or more related categories. For instance, the module -``Noun`` defines how to construct common nouns, noun phrases, and determiners. -But these special modules are seldom or never needed by the users of the library. - -TODO: when are they needed? - -Browsing the libraries is helped by the gfdoc-generated HTML pages, -whose LaTeX versions are included in the present document. - - -==Special-purpose APIs== - -To give an analogy with the well-known type setting software, GF can be compared -with TeX and the resource grammar library with LaTeX. -Just like TeX frees the author -from thinking about low-level problems of page layout, so GF frees the grammarian -from writing parsing and generation algorithms. But quite a lot of knowledge of -//how// to write grammars is still needed, and the resource grammar library helps -GF grammarians in a way similar to how the LaTeX macro package helps TeX authors. - -But even LaTeX is often too detailed and low-level, and users are encouraged to -develop their own macro packages. The same applies to GF resource grammars: -the application grammarian might not need all the choices that the resource -provides, but would prefer less writing and higher-level programming. -To this end, application grammarians may want to write their own views on the -resource grammar. One example of this is the overloaded predication -operation ``pred`` available in ``api/Combinators``. -Instead of the ``NP-VP`` structure, it permits clause construction directly from -verbs and adjectives and their arguments: -``` - pred : V -> NP -> Cl ; -- x converges - pred : V2 -> NP -> NP -> Cl ; -- x intersects y - pred : V3 -> NP -> NP -> NP -> Cl ; -- x intersects y at z - pred : V -> NP -> NP -> Cl ; -- x and y intersect - pred : A -> NP -> Cl ; -- x is even - pred : A2 -> NP -> NP -> Cl ; -- x is divisible by y - pred : A -> NP -> NP -> Cl ; -- x and y are equal -``` - - -==Browsing by the parser== - -A method alternative to browsing library documentation is -to use the parser. -Even though parsing is not an intended end-user application -of resource grammars, it is a useful technique for application grammarians -to browse the library. To find out which resource function implements -a particular structure, one can just parse a string that exemplifies this -structure. For instance, to find out how sentences are built using -transitive verbs, write -``` - > i english/LangEng.gf - - > p -cat=Cl -fcfg "she loves him" - - PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) -``` -The parser returns original constructors, not overloaded ones. - -Parsing with the English resource grammar has an acceptable speed, but -with most languages it takes just too much resources even to build the -parser. However, examples parsed in one language can always be linearized into -other languages: -``` - > i italian/LangIta.gf - - > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) - - lo ama -``` -Therefore, one can use the English parser to write an Italian grammar, and also -to write a language-independent (incomplete) grammar. One can also parse strings -that are bizarre in English but the intended way of expression in another language. -For instance, the phrase for "I am hungry" in Italian is literally "I have hunger". -This can be built by parsing "I have beer" in LanEng and then writing -``` - lin IamHungry = - let beer_N = regGenN "fame" feminine - in - PredVP (UsePron i_Pron) (ComplV2 have_V2 - (DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ; -``` -which uses ParadigmsIta.regGenN. - - - -==Example-based grammar writing== - -The technique of parsing with the resource grammar can be used in GF source files, -endowed with the suffix ``.gfe`` ("GF examples"). The suffix tells GF to preprocess -the file by replacing all expressions of the form -``` - in Module.Cat "example string" -``` -by the syntax trees obtained by parsing "example string" in ``Cat`` in ``Module``. -For instance, -``` - lin IamHungry = - let beer_N = regGenN "fame" feminine - in - (in LangEng.Cl "I have beer") ; -``` -will result in the rule displayed in the previous section. The normal binding rules -of functional programming (and GF) guarantee that local bindings of identifiers -take precedence over constants of the same forms. Thus it is also possible to -linearize functions taking arguments in this way: -``` - lin - PropKind car_N old_A = in LangEng.CN "old car" ; -``` -However, the technique of example-based grammar writing has some limitations: -- Ambiguity. If a string has several parses, the first one is returned, and -it may not be the intended one. The other parses are shown in a comment, from -where they must/can be picked manually. -- Lexicality. The arguments of a function must be atomic identifiers, and are thus -not available for categories that have no lexical items. -For instance, the ``PropKind`` rule above gives the result -``` - lin - PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ; -``` -However, it is possible to write a special lexicon that gives atomic rules for -all those categories that can be used as arguments, for instance, -``` - fun - cat_CN : CN ; - old_AP : AP ; -``` -and then use this lexicon instead of the standard one included in ``Lang``. - - -=Overview of syntactic structures= - -==Texts. phrases, and utterances== - -The outermost linguistic structure is ``Text``. ``Text``s are composed -from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or -"!" (with their proper variants in Spanish and Arabic). Here is an -example of a ``Text`` string. -``` - John walks. Why? He doesn't want to sleep! -``` -Phrases are mostly built from Utterances (``Utt``), which in turn are -declarative sentences, questions, or imperatives - but there -are also "one-word utterances" consisting of noun phrases -or other subsentential phrases. Some Phrases are atomic, -for instance "yes" and "no". Here are some examples of Phrases. -``` - yes - come on, John - but John walks - give me the stick please - don't you know that he is sleeping - a glass of wine - a glass of wine please -``` -There is no connection between the punctuation marks and the -types of utterances. This reflects the fact that the punctuation -mark in a real text is selected as a function of the speech act -rather than the grammatical form of an utterance. The following -text is thus well-formed. -``` - John walks. John walks? John walks! -``` -What is the difference between Phrase and Utterance? Just technical: -a Phrase is an Utterance with an optional leading conjunction ("but") -and an optional tailing vocative ("John", "please"). - - -==Sentences and clauses== - -TODO: use overloaded operations in the examples. - -The richest of the categories below Utterance is ``S``, Sentence. A Sentence -is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity. -For example, each of the following strings has a distinct syntax tree -in the category Sentence: -``` - John walks - John doesn't walk - John walked - John didn't walk - John has walked - John hasn't walked - John will walk - John won't walk - ... -``` -whereas in the category Clause all of them are just different forms of -the same tree. -The difference between Sentence and Clause is thus also rather technical. -It may not correspond exactly to any standard usage of the terms -"clause" and "sentence". - -Figure 1 shows a type-annotated syntax tree of the Text "John walks." -and gives an overview of the structural levels. - -#BFIG - -``` -Node Constructor Value type Other constructors ------------------------------------------------------------ - 1. TFullStop Text TQuestMark - 2. (PhrUtt Phr - 3. NoPConj PConj but_PConj - 4. (UttS Utt UttQS - 5. (UseCl S UseQCl - 6. TPres Tense TPast - 7. ASimul Anter AAnter - 8. PPos Pol PNeg - 9. (PredVP Cl -10. (UsePN NP UsePron, DetCN -11. john_PN) PN mary_PN -12. (UseV VP ComplV2, ComplV3 -13. walk_V)))) V sleep_V -14. NoVoc) Voc please_Voc -15. TEmpty Text -``` - -#BCENTER -Figure 1. Type-annotated syntax tree of the Text "John walks." -#ECENTER - -#EFIG - -Here are some examples of the results of changing constructors. -``` - 1. TFullStop -> TQuestMark John walks? - 3. NoPConj -> but_PConj But John walks. - 6. TPres -> TPast John walked. - 7. ASimul -> AAnter John has walked. - 8. PPos -> PNeg John doesn't walk. -11. john_PN -> mary_PN Mary walks. -13. walk_V -> sleep_V John sleeps. -14. NoVoc -> please_Voc John sleeps please. -``` -All constructors cannot of course be changed so freely, because the -resulting tree would not remain well-typed. Here are some changes involving -many constructors: -``` - 4- 5. UttS (UseCl ...) -> - UttQS (UseQCl (... QuestCl ...)) Does John walk? -10-11. UsePN john_PN -> - UsePron we_Pron We walk. -12-13. UseV walk_V -> - ComplV2 love_V2 this_NP John loves this. -``` - - -==Parts of sentences== - -The linguistic phenomena mostly discussed in both traditional grammars and modern -syntax belong to the level of Clauses, that is, lines 9-13, and occasionally -to Sentences, lines 5-13. At this level, the major categories are -``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically -consists of just an ``NP`` and a ``VP``. -The internal structure of both ``NP`` and ``VP`` can be very complex, -and these categories are mutually recursive: not only can a ``VP`` -contain an ``NP``, -``` - [VP loves [NP Mary]] -``` -but also an ``NP`` can contain a ``VP`` -``` - [NP every man [RS who [VP walks]]] -``` -(a labelled bracketing like this is of course just a rough approximation of -a GF syntax tree, but still a useful device of exposition). - -Most of the resource modules thus define functions that are used inside -NPs and VPs. Here is a brief overview: - -**Noun**. How to construct NPs. The main three mechanisms -for constructing NPs are -- from proper names: "John" -- from pronouns: "we" -- from common nouns by determiners: "this man" - - -The ``Noun`` module also defines the construction of common nouns. -The most frequent ways are -- lexical noun items: "man" -- adjectival modification: "old man" -- relative clause modification: "man who sleeps" -- application of relational nouns: "successor of the number" - - -**Verb**. -How to construct VPs. The main mechanism is verbs with their arguments, -for instance, -- one-place verbs: "walks" -- two-place verbs: "loves Mary" -- three-place verbs: "gives her a kiss" -- sentence-complement verbs: "says that it is cold" -- VP-complement verbs: "wants to give her a kiss" - - -A special verb is the copula, "be" in English but not even realized -by a verb in all languages. -A copula can take different kinds of complement: -- an adjectival phrase: "(John is) old" -- an adverb: "(John is) here" -- a noun phrase: "(John is) a man" - - -**Adjective**. -How to constuct ``AP``s. The main ways are -- positive forms of adjectives: "old" -- comparative forms with object of comparison: "older than John" - - -**Adverb**. -How to construct ``Adv``s. The main ways are -- from adjectives: "slowly" -- as prepositional phrases: "in the car" - - -==Modules and their names== - -This section is not necessary for users of the library. - -TODO: explain the overloaded API. - -The resource modules are named after the kind of -phrases that are constructed in them, -and they can be roughly classified by the "level" or "size" of expressions that are -formed in them: -- Larger than sentence: ``Text``, ``Phrase`` -- Same level as sentence: ``Sentence``, ``Question``, ``Relative`` -- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb`` -- Cross-cut (coordination): ``Conjunction`` - - -Because of mutual recursion such as in embedded sentences, this classification is -not a complete order. However, no mutual dependence is needed between the -modules themselves - they can all be compiled separately. This is due -to the module ``Cat``, which defines the type system common to the other modules. -For instance, the types ``NP`` and ``VP`` are defined in ``Cat``, -and the module ``Verb`` only -needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement -a rule such as -``` - Verb.ComplV2 : V2 -> NP -> VP -``` -it is enough to know the linearization type of ``NP`` -(as well as those of ``V2`` and ``VP``, all -given in ``Cat``). It is not necessary to know what -ways there are to build ``NP``s (given in ``Noun``), since all these ways must -conform to the linearization type defined in ``Cat``. Thus the format of -category-specific modules is as follows: -``` - abstract Adjective = Cat ** {...} - abstract Noun = Cat ** {...} - abstract Verb = Cat ** {...} -``` - - -==Top-level grammar and lexicon== - -The module ``Grammar`` collects all the category-specific modules into -a complete grammar: -``` - abstract Grammar = - Adjective, Noun, Verb, ..., Structural, Idiom -``` -The module ``Structural`` is a lexicon of structural words (function words), -such as determiners. - -The module ``Idiom`` is a collection of idiomatic structures whose -implementation is very language-dependent. An example is existential -structures ("there is", "es gibt", "il y a", etc). - -The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of -ca. 350 content words: -``` - abstract Lang = Grammar, Lexicon -``` -Using ``Lang`` instead of ``Grammar`` as a library may give -for free some words needed in an application. But its main purpose is to -help testing the resource library, rather than as a resource itself. -It does not even seem realistic to develop -a general-purpose multilingual resource lexicon. - -The diagram in Figure 2 shows the structure of the API. - -#BFIG - -#GRAMMAR - -#BCENTER -Figure 2. The resource syntax API. -#ECENTER - -#EFIG - -==Language-specific syntactic structures== - -The API collected in ``Grammar`` has been designed to be implementable for -all languages in the resource package. It does contain some rules that -are strange or superfluous in some languages; for instance, the distinction -between definite and indefinite articles does not apply to Finnish and Russian. -But such rules are still easy to implement: they only create some superfluous -ambiguity in the languages in question. - -But the library makes no claim that all languages should have exactly the same -abstract syntax. The common API is therefore extended by language-dependent -rules. The top level of each languages looks as follows (with English as example): -``` - abstract English = Grammar, ExtraEngAbs, DictEngAbs -``` -where ``ExtraEngAbs`` is a collection of syntactic structures specific to English, -and ``DictEngAbs`` is an English dictionary -(at the moment, it consists of ``IrregEngAbs``, -the irregular verbs of English). Each of these language-specific grammars has -the potential to grow into a full-scale grammar of the language. These grammars -can also be used as libraries, but the possibility of using functors is lost. - -To give a better overview of language-specific structures, -modules like ``ExtraEngAbs`` -are built from a language-independent module ``ExtraAbs`` -by restricted inheritance: -``` - abstract ExtraEngAbs = Extra [f,g,...] -``` -Thus any category and function in ``Extra`` may be shared by a subset of all -languages. One can see this set-up as a matrix, which tells -what ``Extra`` structures -are implemented in what languages. For the common API in ``Grammar``, the matrix -is filled with 1's (everything is implemented in every language). - -Language-specific extensions and the use of restricted -inheritance is a recent addition to the resource grammar library, and -has only been exploited in a very small scale so far. - - -=API Documentation= - -==Top-level modules== - -===Grammar: the Main Module of the Resource Grammar=== - -%!include: ../lib/resource-1.0/abstract/Grammar.txt - -===Lang: a Test Module for the Resource Grammar=== - -%!include: ../lib/resource-1.0/abstract/Lang.txt - - -==Type system== - -===Cat: the Category System=== - -%!include: ../lib/resource-1.0/abstract/Cat.txt - -===Common: Structures with Common Implementations=== - -%!include: ../lib/resource-1.0/abstract/Common.txt - - -==Syntax rule modules== - -===Adjective: Adjectives and Adjectival Phrases=== - -%!include: ../lib/resource-1.0/abstract/Adjective.txt - -===Adverb: Adverbs and Adverbial Phrases=== - -%!include: ../lib/resource-1.0/abstract/Adverb.txt - -===Conjunction: Coordination=== - -%!include: ../lib/resource-1.0/abstract/Conjunction.txt - -===Idiom: Idiomatic Expressions=== - -%!include: ../lib/resource-1.0/abstract/Idiom.txt - -===Noun: Nouns, Noun Phrases, and Determiners=== - -%!include: ../lib/resource-1.0/abstract/Noun.txt - -===Numeral: Cardinal and Ordinal Numerals=== - -%!include: ../lib/resource-1.0/abstract/Numeral.txt - -===Phrase: Phrases and Utterances=== - -%!include: ../lib/resource-1.0/abstract/Phrase.txt - -===Question: Questions and Interrogative Pronouns=== - -%!include: ../lib/resource-1.0/abstract/Question.txt - -===Relative: Relative Clauses and Relative Pronouns=== - -%!include: ../lib/resource-1.0/abstract/Relative.txt - -===Sentence: Sentences, Clauses, and Imperatives=== - -%!include: ../lib/resource-1.0/abstract/Sentence.txt - -===Structural: Structural Words=== - -%!include: ../lib/resource-1.0/abstract/Structural.txt - -===Text: Texts=== - -%!include: ../lib/resource-1.0/abstract/Text.txt - -===Verb: Verb Phrases=== - -%!include: ../lib/resource-1.0/abstract/Verb.txt - - -==Inflectional paradigms== - -===Arabic=== - -%!include: ../lib/resource-1.0/arabic/ParadigmsAra.txt - -===Danish=== - -%!include: ../lib/resource-1.0/danish/ParadigmsDan.txt - -===English=== - -%!include: ../lib/resource-1.0/english/ParadigmsEng.txt - -===Finnish=== - -%!include: ../lib/resource-1.0/finnish/ParadigmsFin.txt - -===French=== - -%!include: ../lib/resource-1.0/french/ParadigmsFre.txt - -===German=== - -%!include: ../lib/resource-1.0/german/ParadigmsGer.txt - -===Italian=== - -%!include: ../lib/resource-1.0/italian/ParadigmsIta.txt - -===Norwegian=== - -%!include: ../lib/resource-1.0/norwegian/ParadigmsNor.txt - -===Russian=== - -% %!include: ../lib/resource-1.0/russian/ParadigmsRus.txt - -% %!include: ""./ParadigmsRus.tex"" - -#PARADIGMSRUS - -===Spanish=== - -%!include: ../lib/resource-1.0/spanish/ParadigmsSpa.txt - -===Swedish=== - -%!include: ../lib/resource-1.0/swedish/ParadigmsSwe.txt - - -#CLEARPAGE - -=Summary of Categories and Functions= - -These tables show all categories and functions in ``Grammar``, -except the functions in ``Structural``. -All example strings can be parsed in ``LangEng`` and therefore -translated to the other ``Lang`` languages. - - -==Categories== - - -#SMALL - -|| Category | Module | Explanation | Example | -| A2 | Cat | two place adjective | "married" -| A | Cat | one place adjective | "old" -| AdA | Common | adjective modifying adverb, | "very" -| AdN | Common | numeral modifying adverb, | "more than" -| AdV | Common | adverb directly attached to verb | "always" -| Adv | Common | verb phrase modifying adverb, | "in the house" -| Ant | Common | anteriority | simultaneous -| AP | Cat | adjectival phrase | "very old" -| CAdv | Common | comparative adverb | "more" -| Cl | Cat | declarative clause, with all tenses | "she walks" -| CN | Cat | common noun (without determiner) | "red house" -| Comp | Cat | complement of copula, such as AP | "very warm" -| Conj | Cat | conjunction, | "and" -| DConj | Cat | distributed conj. | "both" - "and" -| Det | Cat | determiner phrase | "these seven" -| Digit | Numeral | digit from 2 to 9 | "4" -| IAdv | Common | interrogative adverb | "why" -| IComp | Cat | interrogative complement of copula | "where" -| IDet | Cat | interrogative determiner | "which" -| Imp | Cat | imperative | "look at this" -| IP | Cat | interrogative pronoun | "who" -| N2 | Cat | relational noun | "brother" -| N3 | Cat | three place relational noun | "connection" -| N | Cat | common noun | "house" -| NP | Cat | noun phrase (subject or object) | "the red house" -| Num | Cat | cardinal number (used with QuantPl) | "seven" -| Numeral | Cat | cardinal or ordinal, | "five" / "fifth" -| Ord | Cat | ordinal number (used in Det) | "seventh" -| PConj | Common | phrase beginning conj. | "therefore" -| Phr | Common | phrase in a text | "but look at this please" -| PN | Cat | proper name | "Paris" -| Pol | Common | polarity | positive -| Predet | Cat | predeterminer (prefixed Quant) | "all" -| Prep | Cat | preposition, or just case | "in" -| Pron | Cat | personal pronoun | "she" -| QCl | Cat | question clause, with all tenses | "why does she walk" -| QS | Cat | question | "where did she walk" -| Quant | Cat | quantifier with both sg and pl | "this"/"these" -| QuantPl | Cat | quantifier ('nucleus' of plur. Det) | "many" -| QuantSg | Cat | quantifier ('nucleus' of sing. Det) | "every" -| RCl | Cat | relative clause, with all tenses | "in which she walks" -| RP | Cat | relative pronoun | "in which" -| RS | Cat | relative | "that she loves" -| S | Cat | declarative sentence | "she was here" -| SC | Common | embedded sentence or question | "that it rains" -| Slash | Cat | clause missing NP (S/NP in GPSG) | "she loves" - - -|| Category | Module | Explanation | Example | -| Sub10 | Numeral | numeral under 10 | "9" -| Sub100 | Numeral | numeral under 100 | "99" -| Sub1000 | Numeral | numeral under 1000 | "999" -| Sub1000000 | Numeral | numeral under million | 123456 -| Subj | Cat | subjunction, | "if" -| Tense | Common | tense | present -| Text | Common | text consisting of several phrases | "He is here. Why?" -| Utt | Common | sentence, question, word... | "be quiet" -| V2A | Cat | verb with NP and AP complement | "paint" -| V2 | Cat | two place verb | "love" -| V3 | Cat | three place verb | "show" -| VA | Cat | adjective complement verb | "look" -| V | Cat | one place verb | "sleep" -| Voc | Common | vocative or | "please" "my darling" -| VP | Cat | verb phrase | "is very warm" -| VQ | Cat | question complement verb | "ask" -| VS | Cat | sentence complement verb | "claim" -| VV | Cat | verb phrase complement verb | "want" -| [Adv] | Conjunction | adverb list | "here, oddly" -| [AP] | Conjunction | adjectival phrase list | "even, very odd" -| [NP] | Conjunction | noun phrase list | "John, all women" -| [S] | Conjunction | sentence list | "I walk, you run" - - -==Functions== - -|| Function | Module | Type | Example | -| AAnter | Common | Ant | "" -| ASimul | Common | Ant | "" -| AdAdv | Adverb | AdA -> Adv -> Adv | "very" -| AdAP | Adjective | AdA -> AP -> AP | "very old" -| AdjCN | Noun | AP -> CN -> CN | "big house" -| AdnCAdv | Adverb | CAdv -> AdN | "more than" -| AdNum | Noun | AdN -> Num -> Num | "almost ten" -| AdvCN | Noun | CN -> Adv -> CN | "house on the mountain" -| AdvIP | Question | IP -> Adv -> IP | "who in Paris" -| AdvNP | Noun | NP -> Adv -> NP | "Paris without wine" -| AdvSC | Adverb | SC -> Adv | "that he sleeps" -| AdvSlash | Sentence | Slash -> Adv -> Slash | "she sees here" -| AdVVP | Verb | AdV -> VP -> VP | "always sleep" -| AdvVP | Verb | VP -> Adv -> VP | "sleep here" -| ApposCN | Noun | CN -> NP -> CN | "number x" -| BaseAdv | Conjunction | Adv -> Adv -> [Adv] | "here" - "today" -| BaseAP | Conjunction | AP -> AP -> [AP] | "even" - "odd" -| BaseNP | Conjunction | NP -> NP -> [NP] | "the car" - "the house" -| BaseS | Conjunction | S -> S -> [S] | "I walk" - "you run" -| CleftAdv | Idiom | Adv -> S -> Cl | "it is here that she sleeps" -| CleftNP | Idiom | NP -> RS -> Cl | "it is she who sleeps" -| CompAdv | Verb | Adv -> Comp | "here" -| CompAP | Verb | AP -> Comp | "old" -| ComparA | Adjective | A -> NP -> AP | "warmer than the house" -| ComparAdvAdj | Adverb | CAdv -> A -> NP -> Adv | "more heavily than Paris" -| ComparAdvAdjS | Adverb | CAdv -> A -> S -> Adv | "more heavily than she sleeps" - - -|| Function | Module | Type | Example | -| CompIAdv | Question | IAdv -> IComp | "where" -| ComplA2 | Adjective | A2 -> NP -> AP | "married to her" -| ComplN2 | Noun | N2 -> NP -> CN | "brother of the woman" -| ComplN3 | Noun | N3 -> NP -> N2 | "connection from that city to Paris" -| ComplV2A | Verb | V2A -> NP -> AP -> VP | "paint the house red" -| ComplV2 | Verb | V2 -> NP -> VP | "love it" -| ComplV3 | Verb | V3 -> NP -> NP -> VP | "send flowers to us" -| ComplVA | Verb | VA -> AP -> VP | "become red" -| ComplVQ | Verb | VQ -> QS -> VP | "ask if she runs" -| ComplVS | Verb | VS -> S -> VP | "say that she runs" -| ComplVV | Verb | VV -> VP -> VP | "want to run" -| CompNP | Verb | NP -> Comp | "a man" -| ConjAdv | Conjunction | Conj -> [Adv] -> Adv | "here or in the car" -| ConjAP | Conjunction | Conj -> [AP] -> AP | "warm or cold" -| ConjNP | Conjunction | Conj -> [NP] -> NP | "the man or the woman" -| ConjS | Conjunction | Conj -> [S] -> S | "he walks or she runs" -| ConsAdv | Conjunction | Adv -> [Adv] -> [Adv] | "here" - "without them, with us" -| ConsAP | Conjunction | AP -> [AP] -> [AP] | "warm" - "red, old" -| ConsNP | Conjunction | NP -> [NP] -> [NP] | "she" - "you, I" -| ConsS | Conjunction | S -> [S] -> [S] | "I walk" - "she runs, he sleeps" -| DConjAdv | Conjunction | DConj -> [Adv] -> Adv | "either here or there" -| DConjAP | Conjunction | DConj -> [AP] -> AP | "either warm or cold" -| DConjNP | Conjunction | DConj -> [NP] -> NP | "either the man or the woman" -| DConjS | Conjunction | DConj -> [S] -> S | "either he walks or she runs" -| DefArt | Noun | Quant | "the" -| DetCN | Noun | Det -> CN -> NP | "the man" -| DetPl | Noun | QuantPl -> Num -> Ord -> Det | "the five best" -| DetSg | Noun | QuantSg -> Ord -> Det | "this" -| EmbedQS | Sentence | QS -> SC | "whom she loves" -| EmbedS | Sentence | S -> SC | "that you go" -| EmbedVP | Sentence | VP -> SC | "to love it" -| ExistIP | Idiom | IP -> QCl | "which cars are there" -| ExistNP | Idiom | NP -> Cl | "there is a car" -| FunRP | Relative | Prep -> NP -> RP -> RP | "all houses in which" -| GenericCl | Idiom | VP -> Cl | "one sleeps" -| IDetCN | Question | IDet -> Num -> Ord -> CN -> IP | "which five hottest songs" -| IdRP | Relative | RP | "which" -| ImpersCl | Idiom | VP -> Cl | "it rains" -| ImpPl1 | Idiom | VP -> Utt | "let's go" -| ImpVP | Sentence | VP -> Imp | "go to the house" -| IndefArt | Noun | Quant | "a" -| MassDet | Noun | QuantSg | ("beer") -| NoNum | Noun | Num | "" -| NoOrd | Noun | Ord | "" -| NoPConj | Phrase | PConj | "" -| NoVoc | Phrase | Voc | "" -| NumInt | Noun | Int -> Num | "51" -| NumNumeral | Noun | Numeral -> Num | "five hundred" -| OrdInt | Noun | Int -> Ord | "13 th" -| OrdNumeral | Noun | Numeral -> Ord | "thirteenth" -| OrdSuperl | Noun | A -> Ord | "hottest" -| PassV2 | Verb | V2 -> VP | "be seen" -| PConjConj | Phrase | Conj -> PConj | "and" -| PhrUtt | Phrase | PConj -> Utt -> Voc -> Phr | "but come here please" -| PlQuant | Noun | Quant -> QuantPl | "these" -| PositA | Adjective | A -> AP | "warm" -| PositAdvAdj | Adverb | A -> Adv | "warmly" - -|| Function | Module | Type | Example | -| PossPron | Noun | Pron -> Quant | "my" -| PPartNP | Noun | NP -> V2 -> NP | "the city seen" -| PNeg | Common | Pol | "" -| PPos | Common | Pol | "" -| PredetNP | Noun | Predet -> NP -> NP | "only the man" -| PredSCVP | Sentence | SC -> VP -> Cl | "that she sleeps is good" -| PredVP | Sentence | NP -> VP -> Cl | "she walks" -| PrepIP | Question | Prep -> IP -> IAdv | "with whom" -| PrepNP | Adverb | Prep -> NP -> Adv | "in the house" -| ProgrVP | Idiom | VP -> VP | "be sleeping" -| QuestCl | Question | Cl -> QCl | "does she walk" -| QuestIAdv | Question | IAdv -> Cl -> QCl | "why does she walk" -| QuestIComp | Question | IComp -> NP -> QCl | "where is she" -| QuestSlash | Question | IP -> Slash -> QCl | "whom does she love" -| QuestVP | Question | IP -> VP -> QCl | "who walks" -| ReflA2 | Adjective | A2 -> AP | "married to itself" -| ReflV2 | Verb | V2 -> VP | "see himself" -| RelCl | Relative | Cl -> RCl | "such that she loves him" -| RelCN | Noun | CN -> RS -> CN | "house that she buys" -| RelSlash | Relative | RP -> Slash -> RCl | "that she loves" -| RelVP | Relative | RP -> VP -> RCl | "that loves her" -| SentAP | Adjective | AP -> SC -> AP | "good that she came" -| SentCN | Noun | CN -> SC -> CN | "fact that she smokes" -| SgQuant | Noun | Quant -> QuantSg | "this" -| SlashPrep | Sentence | Cl -> Prep -> Slash | (with whom) "he walks" -| SlashV2 | Sentence | NP -> V2 -> Slash | (whom) "he sees" -| SlashVVV2 | Sentence | NP -> VV -> V2 -> Slash | (whom) "he wants to see" -| SubjS | Adverb | Subj -> S -> Adv | "when he came" -| TCond | Common | Tense | "" -| TEmpty | Text | Text | "" -| TFut | Common | Tense | "" -| TExclMark | Text | Phr -> Text -> Text | "She walks!" -| TFullStop | Text | Phr -> Text -> Text | "She walks." -| TPast | Common | Tense | "" -| TPres | Common | Tense | "" -| TQuestMark | Text | Phr -> Text -> Text | "Does she walk?" -| UseA2 | Adjective | A2 -> A | "married" -| UseCl | Sentence | Tense -> Ant -> Pol -> Cl -> S | "she wouldn't have walked" -| UseComp | Verb | Comp -> VP | "be warm" -| UseN2 | Noun | N2 -> CN | "brother" -| UseN3 | Noun | N3 -> CN | "connection" -| UseN | Noun | N -> CN | "house" -| UsePN | Noun | PN -> NP | "Paris" -| UsePron | Noun | Pron -> NP | "she" -| UseQCl | Sentence | Tense -> Ant -> Pol -> QCl -> QS | "where hadn't she walked" -| UseRCl | Sentence | Tense -> Ant -> Pol -> RCl -> RS | "that she hadn't seen" -| UseVQ | Verb | VQ -> V2 | "ask" (a question) -| UseVS | Verb | VS -> V2 | "know" (a secret) -| UseV | Verb | V -> VP | "sleep" -| UttAdv | Phrase | Adv -> Utt | "here" -| UttIAdv | Phrase | IAdv -> Utt | "why" -| UttImpPl | Phrase | Pol -> Imp -> Utt | "love yourselves" -| UttImpSg | Phrase | Pol -> Imp -> Utt | "love yourself" -| UttIP | Phrase | IP -> Utt | "who" -| UttNP | Phrase | NP -> Utt | "this man" -| UttQS | Phrase | QS -> Utt | "is it good" -| UttS | Phrase | S -> Utt | "she walks" -| UttVP | Phrase | VP -> Utt | "to sleep" -| VocNP | Phrase | NP -> Voc | "my brother" - -|| Function | Module | Type | Example | -| num | Numeral | Sub1000000 -> Numeral | "2" -| n2 | Numeral | Digit | "2" -| n3 | Numeral | Digit | "3" -| n4 | Numeral | Digit | "4" -| n5 | Numeral | Digit | "5" -| n6 | Numeral | Digit | "6" -| n7 | Numeral | Digit | "7" -| n8 | Numeral | Digit | "8" -| n9 | Numeral | Digit | "9" -| pot01 | Numeral | Sub10 | "1" -| pot0 | Numeral | Digit -> Sub10 | "3" -| pot110 | Numeral | Sub100 | "10" -| pot111 | Numeral | Sub100 | "11" -| pot1to19 | Numeral | Digit -> Sub100 | "18" -| pot0as1 | Numeral | Sub10 -> Sub100 | "3" -| pot1 | Numeral | Digit -> Sub100 | "50" -| pot1plus | Numeral | Digit -> Sub10 -> Sub100 | "54" -| pot1as2 | Numeral | Sub100 -> Sub1000 | "99" -| pot2 | Numeral | Sub10 -> Sub1000 | "600" -| pot2plus | Numeral | Sub10 -> Sub100 -> Sub1000 | "623" -| pot2as3 | Numeral | Sub1000 -> Sub1000000 | "999" -| pot3 | Numeral | Sub1000 -> Sub1000000 | "53000" -| pot3plus | Numeral | Sub1000 -> Sub1000 -> Sub1000000 | "53201" - |
