diff options
| author | aarne <aarne@cs.chalmers.se> | 2009-02-16 15:12:23 +0000 |
|---|---|---|
| committer | aarne <aarne@cs.chalmers.se> | 2009-02-16 15:12:23 +0000 |
| commit | 086f861b5e807f049d4fd3159eeb2acdbc542348 (patch) | |
| tree | 7a70a85722d9d0eb3e8243f02c5166b560d6db31 | |
| parent | e6fd01066bb50626a10cdc93c344488077e5183f (diff) | |
new school web page
| -rw-r--r-- | doc/gf-summerschool.html | 378 | ||||
| -rw-r--r-- | doc/gf-summerschool.txt | 314 | ||||
| -rw-r--r-- | doc/school-langs.dot | 100 | ||||
| -rw-r--r-- | doc/school-langs.png | bin | 0 -> 137688 bytes | |||
| -rw-r--r-- | next-lib/src/arabic/ParadigmsAra.gf | 24 |
5 files changed, 584 insertions, 232 deletions
diff --git a/doc/gf-summerschool.html b/doc/gf-summerschool.html index 6fac73612..8f47f3671 100644 --- a/doc/gf-summerschool.html +++ b/doc/gf-summerschool.html @@ -3,21 +3,56 @@ <HEAD> <META NAME="generator" CONTENT="http://txt2tags.sf.net"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> -<TITLE>European Resource Grammar Summer School</TITLE> +<TITLE>GF Resource Grammar Summer School</TITLE> </HEAD><BODY BGCOLOR="white" TEXT="black"> -<P ALIGN="center"><CENTER><H1>European Resource Grammar Summer School</H1> +<P ALIGN="center"><CENTER><H1>GF Resource Grammar Summer School</H1> <FONT SIZE="4"> <I>Gothenburg, 17-28 August 2009</I><BR> Aarne Ranta (aarne at chalmers.se) </FONT></CENTER> +<P></P> +<HR NOSHADE SIZE=1> +<P></P> + <UL> + <LI><A HREF="#toc1">Executive summary</A> + <LI><A HREF="#toc2">Introduction</A> + <LI><A HREF="#toc3">The GF resource grammar library</A> + <UL> + <LI><A HREF="#toc4">Applications of the library</A> + <LI><A HREF="#toc5">The structure of the library</A> + </UL> + <LI><A HREF="#toc6">The summer school</A> + <UL> + <LI><A HREF="#toc7">Selecting participants</A> + <LI><A HREF="#toc8">Who is qualified</A> + <LI><A HREF="#toc9">Costs</A> + <LI><A HREF="#toc10">Teachers</A> + <LI><A HREF="#toc11">The Summer School Committee</A> + <LI><A HREF="#toc12">Time and Place</A> + <LI><A HREF="#toc13">Dissemination and intellectual property</A> + </UL> + <LI><A HREF="#toc14">Why I should participate</A> + <LI><A HREF="#toc15">More information</A> + <UL> + <LI><A HREF="#toc16">Contaxt</A> + <LI><A HREF="#toc17">Selected publications from earlier resource grammar projects</A> + </UL> + </UL> + +<P></P> +<HR NOSHADE SIZE=1> +<P></P> <P> -<I>preliminary version, 17 November 2008</I> +<center> +<IMG ALIGN="middle" SRC="school-langs.png" BORDER="0" ALT=""> +</center> </P> <P> -<IMG ALIGN="middle" SRC="eu-langs.png" BORDER="0" ALT=""> +<I>red=wanted, green=exists, yellow=in-progress, solid=official-eu, dotted=non-eu</I> </P> -<H3>Executive summary</H3> +<A NAME="toc1"></A> +<H2>Executive summary</H2> <P> We plan to organize a summer school with the goal of implementing the GF resource grammar library for 15 new languages, so that the library will @@ -32,91 +67,76 @@ and also ported to other formats. The library is licensed under LGPL. </P> <P> Each language is implemented by one or two students working together. -Travel grants will be available for students selected on the basis of +Travel grants will be available for some students selected on the basis of pre-conference assignments. </P> <P> -The official announcement will be in January 2009, and the summer school -itself on 17-28 August 2009, at the campus of Chalmers University of -Technology in Gothenburg, Sweden. +The summer school will be held on 17-28 August 2009, at the campus of +Chalmers University of Technology in Gothenburg, Sweden. </P> +<A NAME="toc2"></A> <H2>Introduction</H2> <P> Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this -document. -There is a growing need of translation between -these languages. The traditional language-to-language method requires 23*22 = 506 -translators (humans or computer programs) to cover all possible translation needs. -</P> -<P> -An alternative to language-to-language translation is the use of an <B>interlingua</B>: -a language-independent representation such that all translation problems can -be reduced to translating to and from the interlingua. With 23 languages, -only 2*23 = 46 translators are needed. -</P> -<P> -Interlingua sounds too good to be true. In a sense, it is. All attempts to -create an interlingua that would solve all translation problems have failed. -However, interlinguas for restricted applications have shown more -success. For instance, mathematical texts and weather reports can be translated -by using interlinguas tailor-made for the domains of mathematics and weather reports, -respectively. +document. There is a growing need of linguistic resources for these +languages, to help in tasks such as translation and information retrieval. +These resources should be <B>portable</B> and <B>freely accessible</B>. +Languages marked in red in the diagram are of particular interest for +the summer school, since they are those on which the effort will be concentrated. </P> <P> -What is required of an interlingua is +GF (Grammatical Framework, +<A HREF="http://digitalgrammars.com/gf"><CODE>digitalgrammars.com/gf</CODE></A>) +is a <B>functional programming language</B> designed for writing natural +language grammars. It provides an efficient platform for this task, due to +its modern characteristics: </P> <UL> -<LI>semantic accuracy: correspondence to what you want to say in the application -<LI>language-independence: abstraction from individual languages +<LI>It is a functional programming language, similar to Haskell and ML. +<LI>It has a static type system and type checker. +<LI>It has a powerful module system supporting separate compilation + and data abstraction. +<LI>It has an optimizing compiler to <B>Portable Grammar Format</B> (PGF). +<LI>PGF can be further compiled to other formats, such as JavaScript and + speech recognition language models. +<LI>GF has a <B>resource grammar library</B> giving access to the morphology and + basic syntax of 12 languages. </UL> <P> -Thus, for instance, an interlingua for mathematical texts may be based on -mathematical logic, which at the same time gives semantic accuracy and -language independence. In other domains, something else than mathematical -logic may be needed; the <B>ontologies</B> defined within the semantic -web technology are often good starting points for interlinguas. +In addition to "ordinary" grammars for single languages, GF +supports <B>multilingual grammars</B>. A multilingual GF grammar consists of an +<B>abstract syntax</B> and a set of <B>concrete syntaxes</B>. +An abstract syntax is system of <B>trees</B>, serving as a semantic +model or an ontology. A concrete syntax is a mapping from abstract syntax +trees to strings of a particular language. </P> -<H2>GF: a framework for multilingual grammars</H2> <P> -The interlingua is just one part of a translation system. We also need -the mappings between the interlingua and the involved languages. As the -number of languages increases, this part grows while the interlingua remains -constant. -</P> -<P> -GF (Grammatical Framework, -<A HREF="http://digitalgrammars.com/gf"><CODE>digitalgrammars.com/gf</CODE></A>) -is a programming language designed to support interlingua-based translation. -A "program" in GF is a <B>multilingual grammar</B>, which consists of an -<B>abstract syntax</B> and a set of <B>concrete syntaxes</B>. A concrete -syntaxes is a mapping from the abstract syntax to a particular language. -These mappings are <B>reversible</B>, which means that they can be used for -translating in both directions. This means that creating an interlingua-based -translator for 23 languages just requires 1 + 23 = 24 grammar modules (the abstract -syntax and the concrete syntaxes). -</P> -<P> -The diagram first in this document shows an interlingua -system covering the 23 EU languages. -Languages marked in -red are of particular interest for the summer school, since they are those -on which the effort will be concentrated. +These mappings defined in concrete syntax are <B>reversible</B>: they +can be used both for <B>generating</B> strings from trees, and for +<B>parsing</B> strings into trees. Combinations of generation and +parsing can be used for <B>translation</B>, where the abstract +syntax works as an <B>interlingua</B>. Thus GF has been used as a +framework for building translation systems in several areas +of application and large sets of languages. </P> +<A NAME="toc3"></A> <H2>The GF resource grammar library</H2> <P> -The GF resource grammar library is a set of grammars used as libraries when -building interlingua-based translation systems. The library currently covers +The GF resource grammar library is a set of grammars usable as libraries when +building translation systems and other applications. +The library currently covers the 9 languages coloured in green in the diagram above; in addition, Catalan, Norwegian, and Russian are covered, and there is ongoing work on -Arabic, Hindi/Urdu, and Thai. +Arabic, Hindi/Urdu, Polish, Romanian, and Thai. </P> <P> The purpose of the resource grammar library is to define the "low-level" structure of a language: inflection, word order, agreement. This structure belongs to what linguists call morphology and syntax. It can be very complex and requires -a lot of knowledge. Yet, when translating from one language to another, knowing -morphology and syntax is but a part of what is needed. The translator (whether human +a lot of knowledge. Yet, when translating from one language to +another, knowing morphology and syntax is but a part of what is needed. +The translator (whether human or machine) must understand the meaning of what is translated, and must also know the idiomatic way to express the meaning in the target language. This knowledge can be very domain-dependent and requires in general an expert in the field to @@ -127,13 +147,15 @@ in the field of weather reports, etc. The problem is to find a person who is an expert in both the domain of translation and in the low-level linguistic details. It is the rareness of this combination that has made it difficult to build interlingua-based translation systems. -The GF resource grammar library has the mission of helping in this task. It encapsulates -the low-level linguistics in program modules accessed through easy-to-use interfaces. +The GF resource grammar library has the mission of helping in this task. +It encapsulates the low-level linguistics in program modules +accessed through easy-to-use interfaces. Experts on different domains can build translation systems by using the library, without knowing low-level linguistics. The idea is much the same as when a programmer builds a graphical user interface (GUI) from high-level elements such as buttons and menus, without having to care about pixels or geometrical forms. </P> +<A NAME="toc4"></A> <H3>Applications of the library</H3> <P> In addition to translation, the library is also useful in <B>localization</B>, @@ -149,25 +171,29 @@ interlingua-based translation or localization of systems to new languages: <A HREF="http://webalt.math.helsinki.fi/content/index_eng.html"><CODE>http://webalt.math.helsinki.fi/content/index_eng.html</CODE></A>, for translating mathematical exercises to 7 languages <LI>in TALK <A HREF="http://www.talk-project.org"><CODE>http://www.talk-project.org</CODE></A>, - where the library was used for localizing spoken dialogue systems to six languages + where the library was used for localizing spoken dialogue systems + to six languages </UL> <P> The library is also a generic linguistic resource, which can be used for tasks such as language teaching and information retrieval. The liberal license (LGPL) makes it usable for anyone and for any task. GF also has tools supporting the -use of grammars in programs written in other programming languages: C, C++, Haskell, -Java, JavaScript, and Prolog. In connection with the TALK project, support has also been +use of grammars in programs written in other +programming languages: C, C++, Haskell, +Java, JavaScript, and Prolog. In connection with the TALK project, +support has also been developed for translating GF grammars to language models used in speech recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF). </P> +<A NAME="toc5"></A> <H3>The structure of the library</H3> <P> The library has the following main parts: </P> <UL> <LI><B>Inflection paradigms</B>, covering the inflection of each language. -<LI><B>Common Syntax API</B>, covering a large set of syntax rule that +<LI><B>Core Syntax</B>, covering a large set of syntax rule that can be implemented for all languages involved. <LI><B>Common Test Lexicon</B>, giving ca. 500 common words that can be used for testing the library. @@ -181,11 +207,13 @@ The library has the following main parts: The goal of the summer school is to implement, for each language, at least the first three components. The latter three are more open-ended in character. </P> +<A NAME="toc6"></A> <H2>The summer school</H2> <P> The goal of the summer school is to extend the GF resource grammar library to covering all 23 EU languages, which means we need 15 new languages. -We also welcome other languages, if there are interested participants. +We also welcome other languages than these 23, +if there are interested participants. </P> <P> The amount of work and skill is between a Master's thesis and a PhD thesis. @@ -201,50 +229,52 @@ will probably require more work. </P> <P> In any case, the proposed allocation of work power is 2 participants per -language. They will have 6 months to work at home, followed -by 2 weeks of summer school. Who are these participants? +language. They will do 2 months' worth of home work, followed +by 2 weeks of summer school, followed by 4 months work at home. +Who are these participants? </P> +<A NAME="toc7"></A> <H3>Selecting participants</H3> <P> -After the call has been published, persons interested to participate in -the project are expected to learn GF by self-study from the +Persons interested to participate in the Summer School should sign up in +the <B>Google Group</B> of the course, +</P> +<P> +<A HREF="http://groups.google.com/group/gf-resource-school-2009/"><CODE>groups.google.com/group/gf-resource-school-2009/</CODE></A> +</P> +<P> +The participants are expected to learn GF by self-study from the <A HREF="http://digitalgrammars.com/gf/doc/gf-tutorial.html">tutorial</A>. -This should take a couple of weeks. Also an on-line course will be -arranged to help in getting started with GF. +This should take a couple of weeks. An <B>on-line course</B> will be +arranged in April to help in getting started with GF. </P> <P> -Participants should continue to -implement selected parts of the resource grammar, following the advice from -the -<A HREF="http://digitalgrammars.com/gf/doc/Resource-HOWTO.html">Resource-HOWTO document</A>. -What parts exactly are selected will be announced later. -This work will take another couple of weeks. +After the on-line course, a <B>programming assignment</B> will be published. +This assignment will test skills required in resource grammar programming. +Work on the assignment will take a couple of weeks. </P> <P> Those who are interested in getting a travel grant will submit their sample resource grammar fragment -to the Summer School Committee in the beginning of May. +to the Summer School Committee by 12 May. The Committee then decides who is invited to represent which language in the summer school. </P> <P> -After the Committee decision, the participants have around three months -to work on their languages. The work is completed in the summer school -itself. It is also thoroughly tested by using it to add new languages -to applications - in particular, to the WebALT mathematical +The summer school itself is devoted for working on resource grammars. +In addition to grammar writing itself, testing and evaluation is +performed. One way to do this is via adding new languages +to resource grammar applications - in particular, to the WebALT mathematical exercise translator. </P> <P> -Depending on the quality of submitted work, and on the demands of different -languages, the Committee may decide to select another number than 2 participants -for a language. We will also consider accepting participants who want to -pay their own expenses. +The resource grammars are expected to be completed by December 2009. They will +be published at GF website and licensed under LGPL. </P> <P> -To keep track on who is working on which language, we will establish a Wiki page -soon after the call is published. The participants are encouraged -to contact each other and even work in groups. +The participants are encouraged to contact each other and even work in groups. </P> +<A NAME="toc8"></A> <H3>Who is qualified</H3> <P> Writing a resource grammar implementation requires good general programming @@ -265,6 +295,7 @@ But it is the quality of the assignment that is assessed, not any formal requirements. The "typical participant" was described to give an idea of who is likely to succeed in this. </P> +<A NAME="toc9"></A> <H3>Costs</H3> <P> Our aim is to make the summer school free of charge for the participants @@ -273,8 +304,15 @@ we plan to cover their travel and accommodation costs, up to 1000 EUR per person. </P> <P> -We try to get the funding question settled by mid-February 2009. +The number of grants will be decided during Spring 2009, so that grand +holders can be notified before the beginning of June. </P> +<P> +Special terms will apply to students in +<A HREF="http://www.gslt.hum.gu.se/">GSLT</A> and +<A HREF="http://ngslt.org/">NGSLT</A>. +</P> +<A NAME="toc10"></A> <H3>Teachers</H3> <P> A list of teachers will be published here later. Some of the local teachers @@ -298,11 +336,13 @@ we can discuss your involvement and travel arrangements. In addition to teachers, we will look for consultants who can help to assess the results for each language. Please contact us! </P> +<A NAME="toc11"></A> <H3>The Summer School Committee</H3> <P> -This committee consists of a number of teachers and consultants, -who will select the participants. It will be selected by February 2009. +This committee consists of a number of teachers and informants, +who will select the participants. It will be selected by April 2009. </P> +<A NAME="toc12"></A> <H3>Time and Place</H3> <P> The summer school will @@ -313,15 +353,16 @@ Sweden, on 17-28 August 2009. Time schedule: </P> <UL> -<LI>February: announcement of summer school and the grammar - writing contest to get participants -<LI>March-April: on-line course, work on the contest assignment (ca 1 month) -<LI>May: submission deadline and notification of acceptance -<LI>June-July: more work on the grammars -<LI>August: summer school -<LI>September-December: more homework if necessary +<LI>February: announcement of summer school +<LI>April: on-line course, work on the contest assignment +<LI>12 May: submission deadline for assignment work +<LI>31 May: review of assignments, notifications of acceptance +<LI>17-28 August: Summer School +<LI>September-December: homework on resource grammars +<LI>December: release of the extended Resource Grammar Library </UL> +<A NAME="toc13"></A> <H3>Dissemination and intellectual property</H3> <P> The new resource grammars will be released under the LGPL just like @@ -331,28 +372,137 @@ with the copyright held by respective authors. <P> The grammars will be distributed via the GF web site. </P> -<P> -The WebALT-specific grammars will have special licenses agreed between the -authors and WebALT Inc. -</P> +<A NAME="toc14"></A> <H2>Why I should participate</H2> <P> Seven reasons: </P> <OL> -<LI>participation in a pioneering language technology work in an enthusiastic atmosphere +<LI>participation in a pioneering language technology work in an + enthusiastic atmosphere <LI>work and fun with people from all over Europe and the world <LI>job opportunities and business ideas <LI>credits: the school project will be established as a course at Chalmers worth - 15 ETCS points per person, but extensions to Master's thesis will - also be considered -<LI>merits: the resulting grammar can easily lead to a published paper + 7.5 or 15 ETCS points per person, depending on the work accompliched; also + extensions to Master's thesis will be considered (special credit arrangements + for <A HREF="http://www.gslt.hum.gu.se/">GSLT</A> and <A HREF="http://ngslt.org/">NGSLT</A>) +<LI>merits: the resulting grammar can easily lead to a published paper (see below) <LI>contribution to the multilingual and multicultural development of Europe and the world <LI>free trip and stay in Gothenburg (for travel grant students) </OL> +<A NAME="toc15"></A> +<H2>More information</H2> +<P> +<A HREF="http://groups.google.com/group/gf-resource-school-2009/">Course Google Group</A> +</P> +<P> +<A HREF="http://digitalgrammars.com/gf/">GF web page</A> +</P> +<P> +<A HREF="http://digitalgrammars.com/gf/doc/gf-tutorial.html">GF tutorial</A> +</P> +<P> +<A HREF="http://digitalgrammars.com/gf/doc/Resource-HOWTO.html">Resource-HOWTO document</A> +</P> +<P> +Forthcoming: survey article "The GF Resource Grammar Library" +</P> +<P> +Forthcoming: book about GF +</P> +<A NAME="toc16"></A> +<H3>Contaxt</H3> +<P> +Hkan Burden: burden at chalmers se +</P> +<P> +Aarne Ranta: aarne at chalmers se +</P> +<A NAME="toc17"></A> +<H3>Selected publications from earlier resource grammar projects</H3> +<P> +K. Angelov. +Type-Theoretical Bulgarian Grammar. +In B. Nordstrm and A. Ranta (eds), +<I>Advances in Natural Language Processing (GoTAL 2008)</I>, +LNCS/LNAI 5221, Springer, +2008. +</P> +<P> +A. El Dada and A. Ranta. +Implementing an Open Source Arabic Resource Grammar in GF. +In M. Mughazy (ed), +<I>Perspectives on Arabic Linguistics XX. Papers from the Twentieth Annual Symposium on Arabic Linguistics, Kalamazoo, March 26</I> +John Benjamins Publishing Company. +2007. +</P> +<P> +A. El Dada. +Implementation of the Arabic Numerals and their Syntax in GF. +Computational Approaches to Semitic Languages: Common Issues and Resources, + ACL-2007 Workshop, +June 28, 2007, Prague. +2007. +</P> +<P> +H. Hammarstrm and A. Ranta. +Cardinal Numerals Revisited in GF. +<I>Workshop on Numerals in the World's Languages</I>. +Dept. of Linguistics Max Planck Institute for Evolutionary Anthropology, Leipzig, +2004. +</P> +<P> +M. Humayoun, H. Hammarstrm, and A. Ranta. +Urdu Morphology, Orthography and Lexicon Extraction. +<I>CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages</I>, +July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University. +2007. +</P> +<P> +J Khegai. +GF parallel resource grammars and Russian. +In proceedings of ACL2006 + (The joint conference of the International Committee on Computational + Linguistics and the Association for Computational Linguistics) (pp. 475-482), + Sydney, Australia, July 2006. +</P> +<P> +J. Khegai. +Language engineering in Grammatical Framework (GF). +Phd thesis, Computer Science, Chalmers University of Technology, +2006. +</P> +<P> +W. Ng'ang'a. +Multilingual content development for eLearning in Africa. +eLearning Africa: 1st Pan-African Conference on ICT for Development, + Education and Training. 24-26 May 2006, Addis Ababa, Ethiopia. +2006. +</P> +<P> +N. Perera and A. Ranta. +Dialogue System Localization with the GF Resource Grammar Library. +<I>SPEECHGRAM 2007: ACL Workshop on Grammar-Based Approaches to Spoken Language Processing</I>, +June 29, 2007, Prague. +2007. +</P> +<P> +A. Ranta. +Modular Grammar Engineering in GF. +<I>Research on Language and Computation</I>, +5:133-158, 2007. +</P> +<P> +A. Ranta. +How predictable is Finnish morphology? An experiment on lexicon construction. +In J. Nivre, M. Dahllf and B. Megyesi (eds), +<I>Resourceful Language Technology: Festschrift in Honor of Anna Sgvall Hein</I>, +University of Uppsala, +2008. +</P> <!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) --> -<!-- cmdline: txt2tags gf-summerschool.txt --> +<!-- cmdline: txt2tags -\-toc gf-summerschool.txt --> </BODY></HTML> diff --git a/doc/gf-summerschool.txt b/doc/gf-summerschool.txt index a083d9e01..8d77b58e7 100644 --- a/doc/gf-summerschool.txt +++ b/doc/gf-summerschool.txt @@ -1,17 +1,22 @@ -European Resource Grammar Summer School +GF Resource Grammar Summer School Gothenburg, 17-28 August 2009 Aarne Ranta (aarne at chalmers.se) %!Encoding : iso-8859-1 %!target:html +%!postproc(html): #BECE <center> +%!postproc(html): #ENCE </center> -//preliminary version, 17 November 2008// +#BECE +[school-langs.png] +#ENCE -[eu-langs.png] +//red=wanted, green=exists, yellow=in-progress, solid=official-eu, dotted=non-eu// -===Executive summary=== + +==Executive summary== We plan to organize a summer school with the goal of implementing the GF resource grammar library for 15 new languages, so that the library will @@ -24,89 +29,71 @@ and basic syntax of each language. It can be used in GF applications and also ported to other formats. The library is licensed under LGPL. Each language is implemented by one or two students working together. -Travel grants will be available for students selected on the basis of +Travel grants will be available for some students selected on the basis of pre-conference assignments. -The official announcement will be in January 2009, and the summer school -itself on 17-28 August 2009, at the campus of Chalmers University of -Technology in Gothenburg, Sweden. +The summer school will be held on 17-28 August 2009, at the campus of +Chalmers University of Technology in Gothenburg, Sweden. ==Introduction== Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this -document. -%[``http://ec.europa.eu/education/policies/lang/languages/index_en.html`` -%http://ec.europa.eu/education/policies/lang/languages/index_en.html]. -There is a growing need of translation between -these languages. The traditional language-to-language method requires 23*22 = 506 -translators (humans or computer programs) to cover all possible translation needs. - -An alternative to language-to-language translation is the use of an **interlingua**: -a language-independent representation such that all translation problems can -be reduced to translating to and from the interlingua. With 23 languages, -only 2*23 = 46 translators are needed. - -Interlingua sounds too good to be true. In a sense, it is. All attempts to -create an interlingua that would solve all translation problems have failed. -However, interlinguas for restricted applications have shown more -success. For instance, mathematical texts and weather reports can be translated -by using interlinguas tailor-made for the domains of mathematics and weather reports, -respectively. - -What is required of an interlingua is -- semantic accuracy: correspondence to what you want to say in the application -- language-independence: abstraction from individual languages - - -Thus, for instance, an interlingua for mathematical texts may be based on -mathematical logic, which at the same time gives semantic accuracy and -language independence. In other domains, something else than mathematical -logic may be needed; the **ontologies** defined within the semantic -web technology are often good starting points for interlinguas. - - -==GF: a framework for multilingual grammars== - -The interlingua is just one part of a translation system. We also need -the mappings between the interlingua and the involved languages. As the -number of languages increases, this part grows while the interlingua remains -constant. +document. There is a growing need of linguistic resources for these +languages, to help in tasks such as translation and information retrieval. +These resources should be **portable** and **freely accessible**. +Languages marked in red in the diagram are of particular interest for +the summer school, since they are those on which the effort will be concentrated. GF (Grammatical Framework, [``digitalgrammars.com/gf`` http://digitalgrammars.com/gf]) -is a programming language designed to support interlingua-based translation. -A "program" in GF is a **multilingual grammar**, which consists of an -**abstract syntax** and a set of **concrete syntaxes**. A concrete -syntaxes is a mapping from the abstract syntax to a particular language. -These mappings are **reversible**, which means that they can be used for -translating in both directions. This means that creating an interlingua-based -translator for 23 languages just requires 1 + 23 = 24 grammar modules (the abstract -syntax and the concrete syntaxes). - -The diagram first in this document shows an interlingua -system covering the 23 EU languages. -Languages marked in -red are of particular interest for the summer school, since they are those -on which the effort will be concentrated. - +is a **functional programming language** designed for writing natural +language grammars. It provides an efficient platform for this task, due to +its modern characteristics: +- It is a functional programming language, similar to Haskell and ML. +- It has a static type system and type checker. +- It has a powerful module system supporting separate compilation + and data abstraction. +- It has an optimizing compiler to **Portable Grammar Format** (PGF). +- PGF can be further compiled to other formats, such as JavaScript and + speech recognition language models. +- GF has a **resource grammar library** giving access to the morphology and + basic syntax of 12 languages. + + +In addition to "ordinary" grammars for single languages, GF +supports **multilingual grammars**. A multilingual GF grammar consists of an +**abstract syntax** and a set of **concrete syntaxes**. +An abstract syntax is system of **trees**, serving as a semantic +model or an ontology. A concrete syntax is a mapping from abstract syntax +trees to strings of a particular language. + +These mappings defined in concrete syntax are **reversible**: they +can be used both for **generating** strings from trees, and for +**parsing** strings into trees. Combinations of generation and +parsing can be used for **translation**, where the abstract +syntax works as an **interlingua**. Thus GF has been used as a +framework for building translation systems in several areas +of application and large sets of languages. ==The GF resource grammar library== -The GF resource grammar library is a set of grammars used as libraries when -building interlingua-based translation systems. The library currently covers +The GF resource grammar library is a set of grammars usable as libraries when +building translation systems and other applications. +The library currently covers the 9 languages coloured in green in the diagram above; in addition, Catalan, Norwegian, and Russian are covered, and there is ongoing work on -Arabic, Hindi/Urdu, and Thai. +Arabic, Hindi/Urdu, Polish, Romanian, and Thai. The purpose of the resource grammar library is to define the "low-level" structure of a language: inflection, word order, agreement. This structure belongs to what linguists call morphology and syntax. It can be very complex and requires -a lot of knowledge. Yet, when translating from one language to another, knowing -morphology and syntax is but a part of what is needed. The translator (whether human +a lot of knowledge. Yet, when translating from one language to +another, knowing morphology and syntax is but a part of what is needed. +The translator (whether human or machine) must understand the meaning of what is translated, and must also know the idiomatic way to express the meaning in the target language. This knowledge can be very domain-dependent and requires in general an expert in the field to @@ -116,8 +103,9 @@ in the field of weather reports, etc. The problem is to find a person who is an expert in both the domain of translation and in the low-level linguistic details. It is the rareness of this combination that has made it difficult to build interlingua-based translation systems. -The GF resource grammar library has the mission of helping in this task. It encapsulates -the low-level linguistics in program modules accessed through easy-to-use interfaces. +The GF resource grammar library has the mission of helping in this task. +It encapsulates the low-level linguistics in program modules +accessed through easy-to-use interfaces. Experts on different domains can build translation systems by using the library, without knowing low-level linguistics. The idea is much the same as when a programmer builds a graphical user interface (GUI) from high-level elements such as @@ -138,14 +126,17 @@ interlingua-based translation or localization of systems to new languages: [``http://webalt.math.helsinki.fi/content/index_eng.html`` http://webalt.math.helsinki.fi/content/index_eng.html], for translating mathematical exercises to 7 languages - in TALK [``http://www.talk-project.org`` http://www.talk-project.org], - where the library was used for localizing spoken dialogue systems to six languages + where the library was used for localizing spoken dialogue systems + to six languages The library is also a generic linguistic resource, which can be used for tasks such as language teaching and information retrieval. The liberal license (LGPL) makes it usable for anyone and for any task. GF also has tools supporting the -use of grammars in programs written in other programming languages: C, C++, Haskell, -Java, JavaScript, and Prolog. In connection with the TALK project, support has also been +use of grammars in programs written in other +programming languages: C, C++, Haskell, +Java, JavaScript, and Prolog. In connection with the TALK project, +support has also been developed for translating GF grammars to language models used in speech recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF). @@ -155,7 +146,7 @@ recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF). The library has the following main parts: - **Inflection paradigms**, covering the inflection of each language. -- **Common Syntax API**, covering a large set of syntax rule that +- **Core Syntax**, covering a large set of syntax rule that can be implemented for all languages involved. - **Common Test Lexicon**, giving ca. 500 common words that can be used for testing the library. @@ -173,7 +164,8 @@ the first three components. The latter three are more open-ended in character. The goal of the summer school is to extend the GF resource grammar library to covering all 23 EU languages, which means we need 15 new languages. -We also welcome other languages, if there are interested participants. +We also welcome other languages than these 23, +if there are interested participants. The amount of work and skill is between a Master's thesis and a PhD thesis. The Russian implementation was made by Janna Khegai as a part of her @@ -187,45 +179,43 @@ Latvian and Lithuanian are the first languages of the Baltic family and will probably require more work. In any case, the proposed allocation of work power is 2 participants per -language. They will have 6 months to work at home, followed -by 2 weeks of summer school. Who are these participants? +language. They will do 2 months' worth of home work, followed +by 2 weeks of summer school, followed by 4 months work at home. +Who are these participants? ===Selecting participants=== -After the call has been published, persons interested to participate in -the project are expected to learn GF by self-study from the +Persons interested to participate in the Summer School should sign up in +the **Google Group** of the course, + +[``groups.google.com/group/gf-resource-school-2009/`` http://groups.google.com/group/gf-resource-school-2009/] + +The participants are expected to learn GF by self-study from the [tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html]. -This should take a couple of weeks. Also an on-line course will be -arranged to help in getting started with GF. +This should take a couple of weeks. An **on-line course** will be +arranged in April to help in getting started with GF. -Participants should continue to -implement selected parts of the resource grammar, following the advice from -the -[Resource-HOWTO document http://digitalgrammars.com/gf/doc/Resource-HOWTO.html]. -What parts exactly are selected will be announced later. -This work will take another couple of weeks. +After the on-line course, a **programming assignment** will be published. +This assignment will test skills required in resource grammar programming. +Work on the assignment will take a couple of weeks. Those who are interested in getting a travel grant will submit their sample resource grammar fragment -to the Summer School Committee in the beginning of May. +to the Summer School Committee by 12 May. The Committee then decides who is invited to represent which language in the summer school. -After the Committee decision, the participants have around three months -to work on their languages. The work is completed in the summer school -itself. It is also thoroughly tested by using it to add new languages -to applications - in particular, to the WebALT mathematical +The summer school itself is devoted for working on resource grammars. +In addition to grammar writing itself, testing and evaluation is +performed. One way to do this is via adding new languages +to resource grammar applications - in particular, to the WebALT mathematical exercise translator. -Depending on the quality of submitted work, and on the demands of different -languages, the Committee may decide to select another number than 2 participants -for a language. We will also consider accepting participants who want to -pay their own expenses. +The resource grammars are expected to be completed by December 2009. They will +be published at GF website and licensed under LGPL. -To keep track on who is working on which language, we will establish a Wiki page -soon after the call is published. The participants are encouraged -to contact each other and even work in groups. +The participants are encouraged to contact each other and even work in groups. @@ -254,7 +244,14 @@ who are selected on the basis of their assignments. And not only that: we plan to cover their travel and accommodation costs, up to 1000 EUR per person. -We try to get the funding question settled by mid-February 2009. +The number of grants will be decided during Spring 2009, so that grand +holders can be notified before the beginning of June. + +Special terms will apply to students in +[GSLT http://www.gslt.hum.gu.se/] and +[NGSLT http://ngslt.org/]. + + @@ -281,8 +278,8 @@ the results for each language. Please contact us! ===The Summer School Committee=== -This committee consists of a number of teachers and consultants, -who will select the participants. It will be selected by February 2009. +This committee consists of a number of teachers and informants, +who will select the participants. It will be selected by April 2009. ===Time and Place=== @@ -292,13 +289,13 @@ be organized at the campus of Chalmers University of Technology in Gothenburg, Sweden, on 17-28 August 2009. Time schedule: -- February: announcement of summer school and the grammar - writing contest to get participants -- March-April: on-line course, work on the contest assignment (ca 1 month) -- May: submission deadline and notification of acceptance -- June-July: more work on the grammars -- August: summer school -- September-December: more homework if necessary +- February: announcement of summer school +- April: on-line course, work on the contest assignment +- 12 May: submission deadline for assignment work +- 31 May: review of assignments, notifications of acceptance +- 17-28 August: Summer School +- September-December: homework on resource grammars +- December: release of the extended Resource Grammar Library ===Dissemination and intellectual property=== @@ -309,22 +306,115 @@ with the copyright held by respective authors. The grammars will be distributed via the GF web site. -The WebALT-specific grammars will have special licenses agreed between the -authors and WebALT Inc. ==Why I should participate== Seven reasons: -+ participation in a pioneering language technology work in an enthusiastic atmosphere ++ participation in a pioneering language technology work in an + enthusiastic atmosphere + work and fun with people from all over Europe and the world + job opportunities and business ideas + credits: the school project will be established as a course at Chalmers worth - 15 ETCS points per person, but extensions to Master's thesis will - also be considered -+ merits: the resulting grammar can easily lead to a published paper + 7.5 or 15 ETCS points per person, depending on the work accompliched; also + extensions to Master's thesis will be considered (special credit arrangements + for [GSLT http://www.gslt.hum.gu.se/] and [NGSLT http://ngslt.org/]) ++ merits: the resulting grammar can easily lead to a published paper (see below) + contribution to the multilingual and multicultural development of Europe and the world + free trip and stay in Gothenburg (for travel grant students) +==More information== + +[Course Google Group http://groups.google.com/group/gf-resource-school-2009/] + +[GF web page http://digitalgrammars.com/gf/] + +[GF tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html] + +[Resource-HOWTO document http://digitalgrammars.com/gf/doc/Resource-HOWTO.html] + +Forthcoming: survey article "The GF Resource Grammar Library" + +Forthcoming: book about GF + +===Contaxt=== + +Hkan Burden: burden at chalmers se + +Aarne Ranta: aarne at chalmers se + + + +===Selected publications from earlier resource grammar projects=== + +K. Angelov. +Type-Theoretical Bulgarian Grammar. +In B. Nordstrm and A. Ranta (eds), +//Advances in Natural Language Processing (GoTAL 2008)//, +LNCS/LNAI 5221, Springer, +2008. + +A. El Dada and A. Ranta. +Implementing an Open Source Arabic Resource Grammar in GF. +In M. Mughazy (ed), +//Perspectives on Arabic Linguistics XX. Papers from the Twentieth Annual Symposium on Arabic Linguistics, Kalamazoo, March 26// +John Benjamins Publishing Company. +2007. + +A. El Dada. +Implementation of the Arabic Numerals and their Syntax in GF. +Computational Approaches to Semitic Languages: Common Issues and Resources, + ACL-2007 Workshop, +June 28, 2007, Prague. +2007. + +H. Hammarstrm and A. Ranta. +Cardinal Numerals Revisited in GF. +//Workshop on Numerals in the World's Languages//. +Dept. of Linguistics Max Planck Institute for Evolutionary Anthropology, Leipzig, +2004. + +M. Humayoun, H. Hammarstrm, and A. Ranta. +Urdu Morphology, Orthography and Lexicon Extraction. +//CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages//, +July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University. +2007. + +J Khegai. +GF parallel resource grammars and Russian. +In proceedings of ACL2006 + (The joint conference of the International Committee on Computational + Linguistics and the Association for Computational Linguistics) (pp. 475-482), + Sydney, Australia, July 2006. + +J. Khegai. +Language engineering in Grammatical Framework (GF). +Phd thesis, Computer Science, Chalmers University of Technology, +2006. + +W. Ng'ang'a. +Multilingual content development for eLearning in Africa. +eLearning Africa: 1st Pan-African Conference on ICT for Development, + Education and Training. 24-26 May 2006, Addis Ababa, Ethiopia. +2006. + +N. Perera and A. Ranta. +Dialogue System Localization with the GF Resource Grammar Library. +//SPEECHGRAM 2007: ACL Workshop on Grammar-Based Approaches to Spoken Language Processing//, +June 29, 2007, Prague. +2007. + +A. Ranta. +Modular Grammar Engineering in GF. +//Research on Language and Computation//, +5:133-158, 2007. + +A. Ranta. +How predictable is Finnish morphology? An experiment on lexicon construction. +In J. Nivre, M. Dahllf and B. Megyesi (eds), +//Resourceful Language Technology: Festschrift in Honor of Anna Sgvall Hein//, +University of Uppsala, +2008. + diff --git a/doc/school-langs.dot b/doc/school-langs.dot new file mode 100644 index 000000000..f35284951 --- /dev/null +++ b/doc/school-langs.dot @@ -0,0 +1,100 @@ +graph{ + +size = "8,8" ; + +overlap = scale ; + +"Abs" [label = "Abstract Syntax", style = "solid", shape = "rectangle"] ; + +"1" [label = "Bulgarian", style = "solid", shape = "ellipse", color = "green"] ; +"1" -- "Abs" [style = "solid"]; + +"2" [label = "Czech", style = "solid", shape = "ellipse", color = "red"] ; +"2" -- "Abs" [style = "solid"]; + +"3" [label = "Danish", style = "solid", shape = "ellipse", color = "green"] ; +"3" -- "Abs" [style = "solid"]; + +"4" [label = "German", style = "solid", shape = "ellipse", color = "green"] ; +"4" -- "Abs" [style = "solid"]; + +"5" [label = "Estonian", style = "solid", shape = "ellipse", color = "red"] ; +"5" -- "Abs" [style = "solid"]; + +"6" [label = "Greek", style = "solid", shape = "ellipse", color = "red"] ; +"6" -- "Abs" [style = "solid"]; + +"7" [label = "English", style = "solid", shape = "ellipse", color = "green"] ; +"7" -- "Abs" [style = "solid"]; + +"8" [label = "Spanish", style = "solid", shape = "ellipse", color = "green"] ; +"8" -- "Abs" [style = "solid"]; + +"9" [label = "French", style = "solid", shape = "ellipse", color = "green"] ; +"9" -- "Abs" [style = "solid"]; + +"10" [label = "Italian", style = "solid", shape = "ellipse", color = "green"] ; +"10" -- "Abs" [style = "solid"]; + +"11" [label = "Latvian", style = "solid", shape = "ellipse", color = "red"] ; +"11" -- "Abs" [style = "solid"]; + +"12" [label = "Lithuanian", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "12" [style = "solid"]; + +"13" [label = "Irish", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "13" [style = "solid"]; + +"14" [label = "Hungarian", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "14" [style = "solid"]; + +"15" [label = "Maltese", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "15" [style = "solid"]; + +"16" [label = "Dutch", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "16" [style = "solid"]; + +"17" [label = "Polish", style = "solid", shape = "ellipse", color = "yellow"] ; +"Abs" -- "17" [style = "solid"]; + +"18" [label = "Portuguese", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "18" [style = "solid"]; + +"19" [label = "Slovak", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "19" [style = "solid"]; + +"20" [label = "Slovene", style = "solid", shape = "ellipse", color = "red"] ; +"Abs" -- "20" [style = "solid"]; + +"21" [label = "Romanian", style = "solid", shape = "ellipse", color = "yellow"] ; +"Abs" -- "21" [style = "solid"]; + +"22" [label = "Finnish", style = "solid", shape = "ellipse", color = "green"] ; +"Abs" -- "22" [style = "solid"]; + +"23" [label = "Swedish", style = "solid", shape = "ellipse", color = "green"] ; +"Abs" -- "23" [style = "solid"]; + +"24" [label = "Catalan", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "24" [style = "solid"]; + +"25" [label = "Norwegian", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "25" [style = "solid"]; + +"26" [label = "Russian", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "26" [style = "solid"]; + +"27" [label = "Interlingua", style = "dotted", shape = "ellipse", color = "green"] ; +"Abs" -- "27" [style = "solid"]; + +"28" [label = "Latin", style = "dotted", shape = "ellipse", color = "yellow"] ; +"Abs" -- "28" [style = "solid"]; +"29" [label = "Turkish", style = "dotted", shape = "ellipse", color = "yellow"] ; +"Abs" -- "29" [style = "solid"]; +"30" [label = "Hindi", style = "dotted", shape = "ellipse", color = "yellow"] ; +"Abs" -- "30" [style = "solid"]; +"31" [label = "Thai", style = "dotted", shape = "ellipse", color = "yellow"] ; +"Abs" -- "31" [style = "solid"]; + + +} diff --git a/doc/school-langs.png b/doc/school-langs.png Binary files differnew file mode 100644 index 000000000..03373d7b5 --- /dev/null +++ b/doc/school-langs.png diff --git a/next-lib/src/arabic/ParadigmsAra.gf b/next-lib/src/arabic/ParadigmsAra.gf index bc9d498a5..752039bd5 100644 --- a/next-lib/src/arabic/ParadigmsAra.gf +++ b/next-lib/src/arabic/ParadigmsAra.gf @@ -187,17 +187,29 @@ resource ParadigmsAra = open -- The definitions should not bother the user of the API. So they are -- hidden from the document. -{- --- AED's original definition of regV - regV = \word -> - case word of { +----AR AED's original definition of regV + regV_orig : Str -> V = \wo -> + case wo of { "يَ" + f@_ + c@_ + "ُ" + l@_ => v1 (f+c+l) a u ; "يَ" + f@_ + c@_ + "ِ" + l@_ => v1 (f+c+l) a i ; "يَ" + f@_ + c@_ + "َ" + l@_ => v1 (f+c+l) a a ; - f@_ + "َ" + c@_ + "ِ" + l@_ => v1 (f+c+l) i a + f@_ + "َ" + c@_ + "ِ" + l@_ => v1 (f+c+l) i a ; + _ => Predef.error "regV not applicable" }; --} + + + regV_o : Str -> Str = \word -> + case word of { + "يَ" + f@_ + c@_ + "ُ" + l@_ => "a" ; + "يَ" + f@_ + c@_ + "ِ" + l@_ => "b" ; + "يَ" + f@_ + c@_ + "َ" + l@_ => "c" ; + f@_ + "َ" + c@_ + "ِ" + l@_ => "d" ; + _ => "q" + }; + aa = a ; uu = u ; ii = i ; + ----AR for debug end + ---- begin workaround for a problem with pattern matching, AR 27/6/2008 |
