diff options
| author | aarne <aarne@cs.chalmers.se> | 2008-06-27 11:32:49 +0000 |
|---|---|---|
| committer | aarne <aarne@cs.chalmers.se> | 2008-06-27 11:32:49 +0000 |
| commit | 64d2a981a99c8f48f85c4efd0cecd1db1e5ce93a (patch) | |
| tree | 8ec777785ae6b99e4ade6ab7c97a7653317b82ad /doc/overview-resource.txt | |
| parent | 032531c6a690edbb377ff11ee2a743a30c5bf500 (diff) | |
more rm in doc
Diffstat (limited to 'doc/overview-resource.txt')
| -rw-r--r-- | doc/overview-resource.txt | 300 |
1 files changed, 0 insertions, 300 deletions
diff --git a/doc/overview-resource.txt b/doc/overview-resource.txt deleted file mode 100644 index 2f9b2cd04..000000000 --- a/doc/overview-resource.txt +++ /dev/null @@ -1,300 +0,0 @@ -==Texts. phrases, and utterances== - -The outermost linguistic structure is ``Text``. ``Text``s are composed -from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or -"!" (with their proper variants in Spanish and Arabic). Here is an -example of a ``Text`` string. -``` - John walks. Why? He doesn't want to sleep! -``` -Phrases are mostly built from Utterances (``Utt``), which in turn are -declarative sentences, questions, or imperatives - but there -are also "one-word utterances" consisting of noun phrases -or other subsentential phrases. Some Phrases are atomic, -for instance "yes" and "no". Here are some examples of Phrases. -``` - yes - come on, John - but John walks - give me the stick please - don't you know that he is sleeping - a glass of wine - a glass of wine please -``` -There is no connection between the punctuation marks and the -types of utterances. This reflects the fact that the punctuation -mark in a real text is selected as a function of the speech act -rather than the grammatical form of an utterance. The following -text is thus well-formed. -``` - John walks. John walks? John walks! -``` -What is the difference between Phrase and Utterance? Just technical: -a Phrase is an Utterance with an optional leading conjunction ("but") -and an optional tailing vocative ("John", "please"). - - -==Sentences and clauses== - -TODO: use overloaded operations in the examples. - -The richest of the categories below Utterance is ``S``, Sentence. A Sentence -is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity. -For example, each of the following strings has a distinct syntax tree -in the category Sentence: -``` - John walks - John doesn't walk - John walked - John didn't walk - John has walked - John hasn't walked - John will walk - John won't walk - ... -``` -whereas in the category Clause all of them are just different forms of -the same tree. -The difference between Sentence and Clause is thus also rather technical. -It may not correspond exactly to any standard usage of the terms -"clause" and "sentence". - -Figure 1 shows a type-annotated syntax tree of the Text "John walks." -and gives an overview of the structural levels. - -#BFIG - -``` -Node Constructor Value type Other constructors ------------------------------------------------------------ - 1. TFullStop Text TQuestMark - 2. (PhrUtt Phr - 3. NoPConj PConj but_PConj - 4. (UttS Utt UttQS - 5. (UseCl S UseQCl - 6. TPres Tense TPast - 7. ASimul Anter AAnter - 8. PPos Pol PNeg - 9. (PredVP Cl -10. (UsePN NP UsePron, DetCN -11. john_PN) PN mary_PN -12. (UseV VP ComplV2, ComplV3 -13. walk_V)))) V sleep_V -14. NoVoc) Voc please_Voc -15. TEmpty Text -``` - -#BCENTER -Figure 1. Type-annotated syntax tree of the Text "John walks." -#ECENTER - -#EFIG - -Here are some examples of the results of changing constructors. -``` - 1. TFullStop -> TQuestMark John walks? - 3. NoPConj -> but_PConj But John walks. - 6. TPres -> TPast John walked. - 7. ASimul -> AAnter John has walked. - 8. PPos -> PNeg John doesn't walk. -11. john_PN -> mary_PN Mary walks. -13. walk_V -> sleep_V John sleeps. -14. NoVoc -> please_Voc John sleeps please. -``` -All constructors cannot of course be changed so freely, because the -resulting tree would not remain well-typed. Here are some changes involving -many constructors: -``` - 4- 5. UttS (UseCl ...) -> - UttQS (UseQCl (... QuestCl ...)) Does John walk? -10-11. UsePN john_PN -> - UsePron we_Pron We walk. -12-13. UseV walk_V -> - ComplV2 love_V2 this_NP John loves this. -``` - - -==Parts of sentences== - -The linguistic phenomena mostly discussed in both traditional grammars and modern -syntax belong to the level of Clauses, that is, lines 9-13, and occasionally -to Sentences, lines 5-13. At this level, the major categories are -``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically -consists of just an ``NP`` and a ``VP``. -The internal structure of both ``NP`` and ``VP`` can be very complex, -and these categories are mutually recursive: not only can a ``VP`` -contain an ``NP``, -``` - [VP loves [NP Mary]] -``` -but also an ``NP`` can contain a ``VP`` -``` - [NP every man [RS who [VP walks]]] -``` -(a labelled bracketing like this is of course just a rough approximation of -a GF syntax tree, but still a useful device of exposition). - -Most of the resource modules thus define functions that are used inside -NPs and VPs. Here is a brief overview: - -**Noun**. How to construct NPs. The main three mechanisms -for constructing NPs are -- from proper names: "John" -- from pronouns: "we" -- from common nouns by determiners: "this man" - - -The ``Noun`` module also defines the construction of common nouns. -The most frequent ways are -- lexical noun items: "man" -- adjectival modification: "old man" -- relative clause modification: "man who sleeps" -- application of relational nouns: "successor of the number" - - -**Verb**. -How to construct VPs. The main mechanism is verbs with their arguments, -for instance, -- one-place verbs: "walks" -- two-place verbs: "loves Mary" -- three-place verbs: "gives her a kiss" -- sentence-complement verbs: "says that it is cold" -- VP-complement verbs: "wants to give her a kiss" - - -A special verb is the copula, "be" in English but not even realized -by a verb in all languages. -A copula can take different kinds of complement: -- an adjectival phrase: "(John is) old" -- an adverb: "(John is) here" -- a noun phrase: "(John is) a man" - - -**Adjective**. -How to constuct ``AP``s. The main ways are -- positive forms of adjectives: "old" -- comparative forms with object of comparison: "older than John" - - -**Adverb**. -How to construct ``Adv``s. The main ways are -- from adjectives: "slowly" -- as prepositional phrases: "in the car" - - -==Modules and their names== - -This section is not necessary for users of the library. - -TODO: explain the overloaded API. - -The resource modules are named after the kind of -phrases that are constructed in them, -and they can be roughly classified by the "level" or "size" of expressions that are -formed in them: -- Larger than sentence: ``Text``, ``Phrase`` -- Same level as sentence: ``Sentence``, ``Question``, ``Relative`` -- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb`` -- Cross-cut (coordination): ``Conjunction`` - - -Because of mutual recursion such as in embedded sentences, this classification is -not a complete order. However, no mutual dependence is needed between the -modules themselves - they can all be compiled separately. This is due -to the module ``Cat``, which defines the type system common to the other modules. -For instance, the types ``NP`` and ``VP`` are defined in ``Cat``, -and the module ``Verb`` only -needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement -a rule such as -``` - Verb.ComplV2 : V2 -> NP -> VP -``` -it is enough to know the linearization type of ``NP`` -(as well as those of ``V2`` and ``VP``, all -given in ``Cat``). It is not necessary to know what -ways there are to build ``NP``s (given in ``Noun``), since all these ways must -conform to the linearization type defined in ``Cat``. Thus the format of -category-specific modules is as follows: -``` - abstract Adjective = Cat ** {...} - abstract Noun = Cat ** {...} - abstract Verb = Cat ** {...} -``` - - -==Top-level grammar and lexicon== - -The module ``Grammar`` collects all the category-specific modules into -a complete grammar: -``` - abstract Grammar = - Adjective, Noun, Verb, ..., Structural, Idiom -``` -The module ``Structural`` is a lexicon of structural words (function words), -such as determiners. - -The module ``Idiom`` is a collection of idiomatic structures whose -implementation is very language-dependent. An example is existential -structures ("there is", "es gibt", "il y a", etc). - -The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of -ca. 350 content words: -``` - abstract Lang = Grammar, Lexicon -``` -Using ``Lang`` instead of ``Grammar`` as a library may give -for free some words needed in an application. But its main purpose is to -help testing the resource library, rather than as a resource itself. -It does not even seem realistic to develop -a general-purpose multilingual resource lexicon. - -The diagram in Figure 2 shows the structure of the API. - -#BFIG - -#GRAMMAR - -#BCENTER -Figure 2. The resource syntax API. -#ECENTER - -#EFIG - -==Language-specific syntactic structures== - -The API collected in ``Grammar`` has been designed to be implementable for -all languages in the resource package. It does contain some rules that -are strange or superfluous in some languages; for instance, the distinction -between definite and indefinite articles does not apply to Finnish and Russian. -But such rules are still easy to implement: they only create some superfluous -ambiguity in the languages in question. - -But the library makes no claim that all languages should have exactly the same -abstract syntax. The common API is therefore extended by language-dependent -rules. The top level of each languages looks as follows (with English as example): -``` - abstract English = Grammar, ExtraEngAbs, DictEngAbs -``` -where ``ExtraEngAbs`` is a collection of syntactic structures specific to English, -and ``DictEngAbs`` is an English dictionary -(at the moment, it consists of ``IrregEngAbs``, -the irregular verbs of English). Each of these language-specific grammars has -the potential to grow into a full-scale grammar of the language. These grammars -can also be used as libraries, but the possibility of using functors is lost. - -To give a better overview of language-specific structures, -modules like ``ExtraEngAbs`` -are built from a language-independent module ``ExtraAbs`` -by restricted inheritance: -``` - abstract ExtraEngAbs = Extra [f,g,...] -``` -Thus any category and function in ``Extra`` may be shared by a subset of all -languages. One can see this set-up as a matrix, which tells -what ``Extra`` structures -are implemented in what languages. For the common API in ``Grammar``, the matrix -is filled with 1's (everything is implemented in every language). - -Language-specific extensions and the use of restricted -inheritance is a recent addition to the resource grammar library, and -has only been exploited in a very small scale so far. |
