diff options
Diffstat (limited to 'doc/intro-resource.txt')
| -rw-r--r-- | doc/intro-resource.txt | 511 |
1 files changed, 0 insertions, 511 deletions
diff --git a/doc/intro-resource.txt b/doc/intro-resource.txt deleted file mode 100644 index c4c292fca..000000000 --- a/doc/intro-resource.txt +++ /dev/null @@ -1,511 +0,0 @@ - - -==Coverage== - -The GF Resource Grammar Library contains grammar rules for -10 languages (in addition, 2 languages are available as incomplete -implementations, and a few more are under construction). Its purpose -is to make these rules available for application programmers, -who can thereby concentrate on the semantic and stylistic -aspects of their grammars, without having to think about -grammaticality. The targeted level of application grammarians -is that of a skilled programmer with -a practical knowledge of the target languages, but without -theoretical knowledge about their grammars. -Such a combination of -skills is typical of programmers who, for instance, want to localize -software to new languages. - -The current resource languages are -- ``Ara``bic (incomplete) -- ``Cat``alan (incomplete) -- ``Dan``ish -- ``Eng``lish -- ``Fin``nish -- ``Fre``nch -- ``Ger``man -- ``Ita``lian -- ``Nor``wegian -- ``Rus``sian -- ``Spa``nish -- ``Swe``dish - - -The first three letters (``Eng`` etc) are used in grammar module names. -The incomplete Arabic and Catalan implementations are -enough to be used in many applications; they both contain, amoung other -things, complete inflectional morphology. - - - -==A first example== - -To give an example application, consider a system for steering -music playing devices by voice commands. In the application, -we may have a semantical category ``Kind``, examples -of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song`` -is linearized into the noun "Lied", but knowing this is not -enough to make the application work, because the noun must be -produced in both singular and plural, and in four different -cases. By using the resource grammar library, it is enough to -write -``` - lin Song = mkN "Lied" "Lieder" neuter -``` -and the eight forms are correctly generated. The resource grammar -library contains a complete set of inflectional paradigms (such as -``mkN`` here), enabling the definition of any lexical items. - -The resource grammar library is not only about inflectional paradigms - it -also has syntax rules. The music player application -might also want to modify songs with properties, such as "American", -"old", "good". The German grammar for adjectival modifications is -particularly complex, because adjectives have to agree in gender, -number, and case, and also depend on what determiner is used -("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this -variation is taken care of by the resource grammar function -``` - mkCN : AP -> CN -> CN -``` -(see the table in the end of this document for the list of all resource grammar -functions). The resource grammar implementation of the rule adding properties -to kinds is -``` - lin PropKind kind prop = mkCN prop kind -``` -given that -``` - lincat Prop = AP - lincat Kind = CN -``` -The resource library API is devided into language-specific -and language-independent parts. To put it roughly, -- the lexicon API is language-specific -- the syntax API is language-independent - - -Thus, to render the above example in French instead of German, we need to -pick a different linearization of ``Song``, -``` - lin Song = mkN "chanson" feminine -``` -But to linearize ``PropKind``, we can use the very same rule as in German. -The resource function ``mkCN`` has different implementations in the two -languages (e.g. a different word order in French), -but the application programmer need not care about the difference. - - - -==Note on APIs== - -From version 1.1 onwards, the resource library is available via two -APIs: -- original ``fun`` and ``oper`` definitions -- overloaded ``oper`` definitions - - -Introducing overloading in GF version 2.7 has been a success in improving -the accessibility of libraries. It has also created a layer of abstraction -between the writers and users of libraries, and thereby makes the library -easier to modify. We shall therefore use the overloaded API -in this document. The original function names are mainly interesting -for those who want to write or modify libraries. - - - -==A complete example== - -To summarize the example, and also give a template for a programmer to work on, -here is the complete implementation of a small system with songs and properties. -The abstract syntax defines a "domain ontology": -``` - abstract Music = { - - cat - Kind, - Property ; - fun - PropKind : Kind -> Property -> Kind ; - Song : Kind ; - American : Property ; - } -``` -The concrete syntax is defined by a functor (parametrized module), -independently of language, by opening -two interfaces: the resource ``Syntax`` and an application lexicon. -``` - incomplete concrete MusicI of Music = - open Syntax, MusicLex in { - lincat - Kind = CN ; - Property = AP ; - lin - PropKind k p = mkCN p k ; - Song = mkCN song_N ; - American = mkAP american_A ; - } -``` -The application lexicon ``MusicLex`` is an interface -opening the resource category system ``Cat``. -``` - interface MusicLex = Cat ** { - oper - song_N : N ; - american_A : A ; - } -``` -It could also be an abstract syntax that extends ``Cat``, but -this would limit the kind of constructions that are possible in -the interface - -Each language has its own concrete syntax, which opens the -inflectional paradigms module for that language: -``` - interface MusicLexGer of MusicLex = - CatGer ** open ParadigmsGer in { - oper - song_N = mkN "Lied" "Lieder" neuter ; - american_A = mkA "amerikanisch" ; - } - - interface MusicLexFre of MusicLex = - CatFre ** open ParadigmsFre in { - oper - song_N = mkN "chanson" feminine ; - american_A = mkA "américain" ; - } -``` -The top-level ``Music`` grammars are obtained by -instantiating the two interfaces of ``MusicI``: -``` - concrete MusicGer of Music = MusicI with - (Syntax = SyntaxGer), - (MusicLex = MusicLexGer) ; - - concrete MusicFre of Music = MusicI with - (Syntax = SyntaxFre), - (MusicLex = MusicLexFre) ; -``` -Both of these files can use the same ``path``, defined as -``` - --# -path=.:present:prelude -``` -The ``present`` category contains the compiled resources, restricted to -present tense; ``alltenses`` has the full resources. - -To localize the music player system to a new language, -all that is needed is two modules, -one implementing ``MusicLex`` and the other -instantiating ``Music``. The latter is -completely trivial, whereas the former one involves the choice of correct -vocabulary and inflectional paradigms. For instance, Finnish is added as follows: -``` - instance MusicLexFin of MusicLex = - CatFin ** open ParadigmsFin in { - oper - song_N = mkN "kappale" ; - american_A = mkA "amerikkalainen" ; - } - - concrete MusicFin of Music = MusicI with - (Syntax = SyntaxFin), - (MusicLex = MusicLexFin) ; -``` -More work is of course needed if the language-independent linearizations in -MusicI are not satisfactory for some language. The resource grammar guarantees -that the linearizations are possible in all languages, in the sense of grammatical, -but they might of course be inadequate for stylistic reasons. Assume, -for the sake of argument, that adjectival modification does not sound good in -English, but that a relative clause would be preferrable. One can then use -restricted inheritance of the functor: -``` - concrete MusicEng of Music = - MusicI - [PropKind] - with - (Syntax = SyntaxEng), - (MusicLex = MusicLexEng) ** - open SyntaxEng in { - lin - PropKind k p = mkCN k (mkRS (mkRCl which_RP (mkVP p))) ; - } -``` -The lexicon is as expected: -``` - instance MusicLexEng of MusicLex = - CatEng ** open ParadigmsEng in { - oper - song_N = mkN "song" ; - american_A = mkA "American" ; - } -``` - - -==Lock fields== - -//This section is only relevant as a guide to error messages that have to do with lock fields, and can be skipped otherwise.// - -FIXME: this section may become obsolete. - -When the categories of the resource grammar are used -in applications, a **lock field** is added to their linearization types. -The lock field for a category ``C`` is a record field -``` - lock_C : {} -``` -with the only possible value -``` - lock_C = <> -``` -The lock field carries no information, but its presence -makes the linearization type of ``C`` -unique, so that categories -with the same implementation are not confused with each other. -(This is inspired by the ``newtype`` discipline in Haskell.) - -For example, the lincats of adverbs and conjunctions are the same -in ``CatEng`` (and therefore in ``GrammarEng``, which inherits it): -``` - lincat Adv = {s : Str} ; - lincat Conj = {s : Str} ; -``` -But when these category symbols are used to denote their linearization -types in an application, these definitions are translated to -``` - oper Adv : Type = {s : Str ; lock_Adv : {}} ; - oper Conj : Type = {s : Str} ; lock_Conj : {}} ; -``` -In this way, the user of a resource grammar cannot confuse adverbs with -conjunctions. In other words, the lock fields force the type checker -to function as grammaticality checker. - -When the resource grammar is ``open``ed in an application grammar, -and only functions from the resource are used in type-correct way, the -lock fields are never seen (except possibly in type error messages). -If an application grammarian has to write lock fields herself, -it is a sign that the guarantees given by the resource grammar -no longer hold. But since the resource may be incomplete, the -application grammarian may occasionally have to provide the dummy -values of lock fields (always ``<>``, the empty record). -Here is an example: -``` - mkUtt : Str -> Utt ; - mkUtt s = {s = s ; lock_Utt = <>} ; -``` -Currently, missing lock field produce warnings rather than errors, -but this behaviour of GF may change in future. - - -==Parsing with resource grammars?== - -The intended use of the resource grammar is as a library for writing -application grammars. It is not designed for parsing e.g. newspaper text. There -are several reasons why this is not practical: -- Efficiency: the resource grammar uses complex data structures, in -particular, discontinuous constituents, which make parsing slow and the -parser size huge. -- Completeness: the resource grammar does not necessarily cover all rules -of the language - only enough many to be able to express everything -in one way or another. -- Lexicon: the resource grammar has a very small lexicon, only meant for test -purposes. -- Semantics: the resource grammar has very little semantic control, and may -accept strange input or deliver strange interpretations. -- Ambiguity: parsing in the resource grammar may return lots of results many -of which are implausible. - - -All of these problems should be solved in application grammars. -The task of resource grammars is just to take care of low-level linguistic -details such as inflection, agreement, and word order. - -It is for the same reasons that resource grammars are not adequate for translation. -That the syntax API is implemented for different languages of course makes -it possible to translate via it - but there is no guarantee of translation -equivalence. Of course, the use of functor implementations such as ``MusicI`` -above only extends to those cases where the syntax API does give translation -equivalence - but this must be seen as a limiting case, and bigger applications -will often use only restricted inheritance of ``MusicI``. - - - -=To find rules in the resource grammar library= - -==Inflection paradigms== - -Inflection paradigms are defined separately for each language //L// -in the module ``Paradigms``//L//. To test them, the command -``cc`` (= ``compute_concrete``) -can be used: -``` - > i -retain german/ParadigmsGer.gf - - > cc mkN "Schlange" - { - s : Number => Case => Str = table Number { - Sg => table Case { - Nom => "Schlange" ; - Acc => "Schlange" ; - Dat => "Schlange" ; - Gen => "Schlange" - } ; - Pl => table Case { - Nom => "Schlangen" ; - Acc => "Schlangen" ; - Dat => "Schlangen" ; - Gen => "Schlangen" - } - } ; - g : Gender = Fem - } -``` -For the sake of convenience, every language implements these five paradigms: -``` - oper - mkN : Str -> N ; -- regular nouns - mkA : Str -> A : -- regular adjectives - mkV : Str -> V ; -- regular verbs - mkPN : Str -> PN ; -- regular proper names - mkV2 : V -> V2 ; -- direct transitive verbs -``` -It is often possible to initialize a lexicon by just using these functions, -and later revise it by using the more involved paradigms. For instance, in -German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a -Masculine noun with the plural form ``"Liede"``. -The individual ``Paradigms`` modules -tell what cases are covered by the regular heuristics. - -As a limiting case, one could even initialize the lexicon for a new language -by copying the English (or some other already existing) lexicon. This would -produce language with correct grammar but with content words directly borrowed from -English - maybe not so strange in certain technical domains. - - - -==Syntax rules== - -Syntax rules should be looked for in the module ``Constructors``. -Below this top-level module exposing overloaded constructors, -there are around 10 abstract modules, each defining constructors for -a group of one or more related categories. For instance, the module -``Noun`` defines how to construct common nouns, noun phrases, and determiners. -But these special modules are seldom or never needed by the users of the library. - -TODO: when are they needed? - -Browsing the libraries is helped by the gfdoc-generated HTML pages, -whose LaTeX versions are included in the present document. - - -==Special-purpose APIs== - -To give an analogy with the well-known type setting software, GF can be compared -with TeX and the resource grammar library with LaTeX. -Just like TeX frees the author -from thinking about low-level problems of page layout, so GF frees the grammarian -from writing parsing and generation algorithms. But quite a lot of knowledge of -//how// to write grammars is still needed, and the resource grammar library helps -GF grammarians in a way similar to how the LaTeX macro package helps TeX authors. - -But even LaTeX is often too detailed and low-level, and users are encouraged to -develop their own macro packages. The same applies to GF resource grammars: -the application grammarian might not need all the choices that the resource -provides, but would prefer less writing and higher-level programming. -To this end, application grammarians may want to write their own views on the -resource grammar. - - -==Browsing by the parser== - -A method alternative to browsing library documentation is -to use the parser. -Even though parsing is not an intended end-user application -of resource grammars, it is a useful technique for application grammarians -to browse the library. To find out which resource function implements -a particular structure, one can just parse a string that exemplifies this -structure. For instance, to find out how sentences are built using -transitive verbs, write -``` - > i english/LangEng.gf - - > p -cat=Cl "she loves him" - PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) -``` -The parser returns original constructors, not overloaded ones. Overloaded -constructors can be returned, so far with experimental heuristics, by using -the grammar ``api/toplevel/OverLangEng.gf`` and a special flag: -``` - > i api/toplevel/OverLangEng.gf - - > p -cat=Cl -overload "she loves him" - mkCl (mkNP she_Pron) love_V2 (mkNP he_Pron) -``` -Parsing with the English resource grammar has an acceptable speed, but -with most languages it takes just too much resources even to build the -parser. However, examples parsed in one language can always be linearized into -other languages: -``` - > i italian/LangIta.gf - - > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) - lo ama -``` -Therefore, one can use the English parser to write an Italian grammar, and also -to write a language-independent (incomplete) grammar. One can also parse strings -that are bizarre in English but the intended way of expression in another language. -For instance, the phrase for "I am hungry" in Italian is literally "I have hunger". -This can be built by parsing "I have beer" in ``OverLangEng`` and then writing -``` - lin IamHungry = - let beer_N = mkN "fame" feminine - in - mkCl (mkNP i_Pron) have_V2 (mkNP massQuant beer_N) -``` -which uses ``ParadigmsIta.mkN``. - - - -==Example-based grammar writing== - -The technique of parsing with the resource grammar can be used in GF source files, -endowed with the suffix ``.gfe`` ("GF examples"). The suffix tells GF to preprocess -the file by replacing all expressions of the form -``` - in Module.Cat "example string" -``` -by the syntax trees obtained by parsing "example string" in ``Cat`` in ``Module``. -For instance, -``` - lin IamHungry = - let beer_N = mkN "fame" feminine - in - (in LangEng.Cl "I have beer") ; -``` -will result in the rule displayed in the previous section. The normal binding rules -of functional programming (and GF) guarantee that local bindings of identifiers -take precedence over constants of the same forms. Thus it is also possible to -linearize functions taking arguments in this way: -``` - lin - PropKind car_N old_A = in LangEng.CN "old car" ; -``` -However, the technique of example-based grammar writing has some limitations: -- Ambiguity. If a string has several parses, the first one is returned, and -it may not be the intended one. The other parses are shown in a comment, from -where they must/can be picked manually. -- Lexicality. The arguments of a function must be atomic identifiers, and are thus -not available for categories that have no lexical items. -For instance, the ``PropKind`` rule above gives the result -``` - lin - PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ; -``` -However, it is possible to write a special lexicon that gives atomic rules for -all those categories that can be used as arguments, for instance, -``` - fun - cat_CN : CN ; - old_AP : AP ; -``` -and then use this lexicon instead of the standard one included in ``Lang``. - - |
