diff options
| author | aarne <aarne@chalmers.se> | 2009-12-09 09:37:47 +0000 |
|---|---|---|
| committer | aarne <aarne@chalmers.se> | 2009-12-09 09:37:47 +0000 |
| commit | 101df06f6c8380328d4266adadac3ab6d1bac0b3 (patch) | |
| tree | c3303f42914904293a35ea57523e33e71a39ac18 /doc | |
| parent | b0f3796360820d228f4763b77f2f033bc7a9b726 (diff) | |
manual web page edits from cs.chalmers
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/gf-ideas.html | 4 | ||||
| -rw-r--r-- | doc/gf-summerschool.html | 634 |
2 files changed, 2 insertions, 636 deletions
diff --git a/doc/gf-ideas.html b/doc/gf-ideas.html index 908ac4179..90894f599 100644 --- a/doc/gf-ideas.html +++ b/doc/gf-ideas.html @@ -274,9 +274,9 @@ This project is rather open: find some cool applications of the technology that are useful or entertaining on the web. Examples include </P> <UL> -<LI>translators: see <A HREF="http://129.16.250.57:41296/translate">demo</A> +<LI>translators: see <A HREF="http://tournesol.cs.chalmers.se:41296/translate">demo</A> <LI>multilingual wikis: see <A HREF="http://csmisc14.cs.chalmers.se/~meza/restWiki/wiki.cgi">demo</A> -<LI>fridge magnets: see <A HREF="http://129.16.250.57:41296/fridge">demo</A> +<LI>fridge magnets: see <A HREF="http://tournesol.cs.chalmers.se:41296/fridge">demo</A> </UL> <P> diff --git a/doc/gf-summerschool.html b/doc/gf-summerschool.html deleted file mode 100644 index 977a16735..000000000 --- a/doc/gf-summerschool.html +++ /dev/null @@ -1,634 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> -<HTML> -<HEAD> -<META NAME="generator" CONTENT="http://txt2tags.sf.net"> -<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> -<TITLE>GF Resource Grammar Summer School</TITLE> -</HEAD><BODY BGCOLOR="white" TEXT="black"> -<P ALIGN="center"><CENTER><H1>GF Resource Grammar Summer School</H1> -<FONT SIZE="4"> -<I>Gothenburg, 17-28 August 2009</I><BR> -Aarne Ranta (aarne at chalmers.se) -</FONT></CENTER> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> - <UL> - <LI><A HREF="#toc1">News</A> - <LI><A HREF="#toc2">Executive summary</A> - <LI><A HREF="#toc3">Introduction</A> - <LI><A HREF="#toc4">The GF resource grammar library</A> - <UL> - <LI><A HREF="#toc5">Missing EU languages, by the family</A> - <LI><A HREF="#toc6">Applications of the library</A> - <LI><A HREF="#toc7">The structure of the library</A> - </UL> - <LI><A HREF="#toc8">The summer school</A> - <UL> - <LI><A HREF="#toc9">Selecting participants</A> - <LI><A HREF="#toc10">Who is qualified</A> - <LI><A HREF="#toc11">Costs</A> - <LI><A HREF="#toc12">Teachers</A> - <LI><A HREF="#toc13">The Summer School Committee</A> - <LI><A HREF="#toc14">Time and Place</A> - <LI><A HREF="#toc15">Dissemination and intellectual property</A> - </UL> - <LI><A HREF="#toc16">Why I should participate</A> - <LI><A HREF="#toc17">More information</A> - <UL> - <LI><A HREF="#toc18">Contact</A> - <LI><A HREF="#toc19">Selected publications from earlier resource grammar projects</A> - </UL> - </UL> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> -<P> -<center> -<IMG ALIGN="middle" SRC="school-langs.png" BORDER="0" ALT=""> -</center> -</P> -<P> -<I>red=wanted, green=exists, orange=in-progress, solid=official-eu, dotted=non-eu</I> -</P> -<A NAME="toc1"></A> -<H2>News</H2> -<P> -An on-line course <I>GF for Resource Grammar Writers</I> will start on -Monday 20 April at 15.30 CEST. The slides and recordings of the five -45-minute lectures will be made available via this web page. If requested, -the course may be repeated in the beginning of the summer school. -</P> -<A NAME="toc2"></A> -<H2>Executive summary</H2> -<P> -GF Resource Grammar Library is an open-source computational grammar resource -that currently covers 12 languages. -The Summer School is a part of a collaborative effort to extend the library -to all of the 23 official EU languages. Also other languages -chosen by the participants are welcome. -</P> -<P> -The missing EU languages are: -Czech, Dutch, Estonian, Greek, Hungarian, Irish, Latvian, Lithuanian, -Maltese, Portuguese, Slovak, and Slovenian. There is also more work to -be done on Polish and Romanian. -</P> -<P> -The linguistic coverage of the library includes the inflectional morphology -and basic syntax of each language. It can be used in GF applications -and also ported to other formats. It can also be used for building other -linguistic resources, such as morphological lexica and parsers. -The library is licensed under LGPL. -</P> -<P> -In the summer school, each language will be implemented by one or two students -working together. A morphology implementation will be credited -as a Chalmers course worth 7.5 ETCS points; adding a syntax implementation -will be worth more. The estimated total work load is 1-2 months for the -morphology, and 3-6 months for the whole grammar. -</P> -<P> -Participation in the course is free. Registration is done via the courses's -Google group, <A HREF="http://groups.google.com/group/gf-resource-school-2009/"><CODE>groups.google.com/group/gf-resource-school-2009/</CODE></A>. The registration deadline is 15 June 2009. -</P> -<P> -Some travel grants will be available. They are distributed on the basis of a -GF programming contest in April and May. -</P> -<P> -The summer school will be held on 17-28 August 2009, at the campus of -Chalmers University of Technology in Gothenburg, Sweden. -</P> -<P> -<IMG ALIGN="middle" SRC="align6.png" BORDER="0" ALT=""> -</P> -<P> -<I>Word alignment produced by GF from the resource grammar in Bulgarian, English, Italian, German, Finnish, French, and Swedish.</I> -</P> -<A NAME="toc3"></A> -<H2>Introduction</H2> -<P> -Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this -document. There is a growing need of linguistic resources for these -languages, to help in tasks such as translation and information retrieval. -These resources should be <B>portable</B> and <B>freely accessible</B>. -Languages marked in red in the diagram are of particular interest for -the summer school, since they are those on which the effort will be concentrated. -</P> -<P> -GF (Grammatical Framework, -<A HREF="http://digitalgrammars.com/gf"><CODE>digitalgrammars.com/gf</CODE></A>) -is a <B>functional programming language</B> designed for writing natural -language grammars. It provides an efficient platform for this task, due to -its modern characteristics: -</P> -<UL> -<LI>It is a functional programming language, similar to Haskell and ML. -<LI>It has a static type system and type checker. -<LI>It has a powerful module system supporting separate compilation - and data abstraction. -<LI>It has an optimizing compiler to <B>Portable Grammar Format</B> (PGF). -<LI>PGF can be further compiled to other formats, such as JavaScript and - speech recognition language models. -<LI>GF has a <B>resource grammar library</B> giving access to the morphology and - basic syntax of 12 languages. -</UL> - -<P> -In addition to "ordinary" grammars for single languages, GF -supports <B>multilingual grammars</B>. A multilingual GF grammar consists of an -<B>abstract syntax</B> and a set of <B>concrete syntaxes</B>. -An abstract syntax is system of <B>trees</B>, serving as a semantic -model or an ontology. A concrete syntax is a mapping from abstract syntax -trees to strings of a particular language. -</P> -<P> -These mappings defined in concrete syntax are <B>reversible</B>: they -can be used both for <B>generating</B> strings from trees, and for -<B>parsing</B> strings into trees. Combinations of generation and -parsing can be used for <B>translation</B>, where the abstract -syntax works as an <B>interlingua</B>. Thus GF has been used as a -framework for building translation systems in several areas -of application and large sets of languages. -</P> -<A NAME="toc4"></A> -<H2>The GF resource grammar library</H2> -<P> -The GF resource grammar library is a set of grammars usable as libraries when -building translation systems and other applications. -The library currently covers -the 9 languages coloured in green in the diagram above; in addition, -Catalan, Norwegian, and Russian are covered, and there is ongoing work on -Arabic, Hindi/Urdu, Polish, Romanian, and Thai. -</P> -<P> -The purpose of the resource grammar library is to define the "low-level" structure -of a language: inflection, word order, agreement. This structure belongs to what -linguists call morphology and syntax. It can be very complex and requires -a lot of knowledge. Yet, when translating from one language to -another, knowing morphology and syntax is but a part of what is needed. -The translator (whether human -or machine) must understand the meaning of what is translated, and must also know -the idiomatic way to express the meaning in the target language. This knowledge -can be very domain-dependent and requires in general an expert in the field to -reach high quality: a mathematician in the field of mathematics, a meteorologist -in the field of weather reports, etc. -</P> -<P> -The problem is to find a person who is an expert in both the domain of translation -and in the low-level linguistic details. It is the rareness of this combination -that has made it difficult to build interlingua-based translation systems. -The GF resource grammar library has the mission of helping in this task. -It encapsulates the low-level linguistics in program modules -accessed through easy-to-use interfaces. -Experts on different domains can build translation systems by using the library, -without knowing low-level linguistics. The idea is much the same as when a -programmer builds a graphical user interface (GUI) from high-level elements such as -buttons and menus, without having to care about pixels or geometrical forms. -</P> -<A NAME="toc5"></A> -<H3>Missing EU languages, by the family</H3> -<P> -Writing a grammar for a language is usually easier if other languages -from the same family already have grammars. The colours have the same -meaning as in the diagram above. -</P> -<P> -Baltic: -<font color="red"> Latvian </font> -<font color="red"> Lithuanian </font> -</P> -<P> -Celtic: -<font color="red"> Irish </font> -</P> -<P> -Fenno-Ugric: -<font color="red"> Estonian </font> -<font color="green" size="-1"> Finnish </font> -<font color="red"> Hungarian </font> -</P> -<P> -Germanic: -<font color="green" size="-1"> Danish </font> -<font color="red"> Dutch </font> -<font color="green" size="-1"> English </font> -<font color="green" size="-1"> German </font> -<font color="green" size="-1"> Swedish </font> -</P> -<P> -Hellenic: -<font color="red"> Greek </font> -</P> -<P> -Romance: -<font color="green" size="-1"> French </font> -<font color="green" size="-1"> Italian </font> -<font color="red"> Portuguese </font> -<font color="orange"> Romanian </font> -<font color="green" size="-1"> Spanish </font> -</P> -<P> -Semitic: -<font color="red"> Maltese </font> -</P> -<P> -Slavonic: -<font color="green" size="-1"> Bulgarian </font> -<font color="red"> Czech </font> -<font color="orange"> Polish </font> -<font color="red"> Slovak </font> -<font color="red"> Slovenian </font> -</P> -<A NAME="toc6"></A> -<H3>Applications of the library</H3> -<P> -In addition to translation, the library is also useful in <B>localization</B>, -that is, porting a piece of software to new languages. -The GF resource grammar library has been used in three major projects that need -interlingua-based translation or localization of systems to new languages: -</P> -<UL> -<LI>in KeY, - <A HREF="http://www.key-project.org/"><CODE>http://www.key-project.org/</CODE></A>, - for writing formal and informal software specifications (3 languages) -<LI>in WebALT, - <A HREF="http://webalt.math.helsinki.fi/content/index_eng.html"><CODE>http://webalt.math.helsinki.fi/content/index_eng.html</CODE></A>, - for translating mathematical exercises to 7 languages -<LI>in TALK <A HREF="http://www.talk-project.org"><CODE>http://www.talk-project.org</CODE></A>, - where the library was used for localizing spoken dialogue systems - to six languages -</UL> - -<P> -The library is also a generic <B>linguistic resource</B>, -which can be used for tasks -such as language teaching and information retrieval. The liberal license (LGPL) -makes it usable for anyone and for any task. GF also has tools supporting the -use of grammars in programs written in other -programming languages: C, C++, Haskell, -Java, JavaScript, and Prolog. In connection with the TALK project, -support has also been -developed for translating GF grammars to language models used in speech -recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF). -</P> -<A NAME="toc7"></A> -<H3>The structure of the library</H3> -<P> -The library has the following main parts: -</P> -<UL> -<LI><B>Inflection paradigms</B>, covering the inflection of each language. -<LI><B>Core Syntax</B>, covering a large set of syntax rule that - can be implemented for all languages involved. -<LI><B>Common Test Lexicon</B>, giving ca. 500 common words that can be used for - testing the library. -<LI><B>Language-Specific Syntax Extensions</B>, covering syntax rules that are - not implementable for all languages. -<LI><B>Language-Specific Lexica</B>, word lists for each language, with - accurate morphological and syntactic information. -</UL> - -<P> -The goal of the summer school is to implement, for each language, at least -the first three components. The latter three are more open-ended in character. -</P> -<A NAME="toc8"></A> -<H2>The summer school</H2> -<P> -The goal of the summer school is to extend the GF resource grammar library -to covering all 23 EU languages, which means we need 15 new languages. -We also welcome other languages than these 23, -if there are interested participants. -</P> -<P> -The amount of work and skill is between a Master's thesis and a PhD thesis. -The Russian implementation was made by Janna Khegai as a part of her -PhD thesis; the thesis contains other material, too. -The Arabic implementation was started by Ali El Dada in his Master's thesis, -but the thesis does not cover the whole API. The realistic amount of work is -somewhere between 3 and 8 person months, -but this is very much language-dependent. -Dutch, for instance, can profit from previous implementations of German and -Scandinavian languages, and will probably require less work. -Latvian and Lithuanian are the first languages of the Baltic family and -will probably require more work. -</P> -<P> -In any case, the proposed allocation of work power is 2 participants per -language. They will do 1 months' worth of home work, followed -by 2 weeks of summer school, followed by 4 months work at home. -Who are these participants? -</P> -<A NAME="toc9"></A> -<H3>Selecting participants</H3> -<P> -Persons interested to participate in the Summer School should sign up in -the <B>Google Group</B> of the course, -</P> -<P> -<A HREF="http://groups.google.com/group/gf-resource-school-2009/"><CODE>groups.google.com/group/gf-resource-school-2009/</CODE></A> -</P> -<P> -The registration deadline is 15 June 2009. -</P> -<P> -Notice: you can sign up in the Google -group even if you are not planning to attend the summer school, but are -just interested in the topic. There will be a separate registration to the -school itself later. -</P> -<P> -The participants are recommended to learn GF in advance, by self-study from the -<A HREF="http://digitalgrammars.com/gf/doc/gf-tutorial.html">tutorial</A>. -This should take a couple of weeks. An <B>on-line course</B> will be -arranged on 20-29 April to help in getting started with GF. -</P> -<P> -At the end of the on-line course, a <B>programming assignment</B> will be published. -This assignment will test skills required in resource grammar programming. -Work on the assignment will take a couple of weeks. -Those who are interested in getting a travel grant will submit -their sample resource grammar fragment -to the Summer School Committee by 12 May. -The Committee then decides who is given a travel grant of up to 1000 EUR. -</P> -<P> -Notice: you can participate in the summer school without following the on-line -course or participating in the contest. These things are required only if you -want a travel grant. If requested by enough many participants, the lectures of -the on-line course will be repeated in the beginning of the summer school. -</P> -<P> -The summer school itself is devoted for working on resource grammars. -In addition to grammar writing itself, testing and evaluation is -performed. One way to do this is via adding new languages -to resource grammar applications - in particular, to the WebALT mathematical -exercise translator. -</P> -<P> -The resource grammars are expected to be completed by December 2009. They will -be published at GF website and licensed under LGPL. -</P> -<P> -The participants are encouraged to contact each other and even work in groups. -</P> -<A NAME="toc10"></A> -<H3>Who is qualified</H3> -<P> -Writing a resource grammar implementation requires good general programming -skills, and a good explicit knowledge of the grammar of the target language. -A typical participant could be -</P> -<UL> -<LI>native or fluent speaker of the target language -<LI>interested in languages on the theoretical level, and preferably familiar - with many languages (to be able to think about them on an abstract level) -<LI>familiar with functional programming languages such as ML or Haskell - (GF itself is a language similar to these) -<LI>on Master's or PhD level in linguistics, computer science, or mathematics -</UL> - -<P> -But it is the quality of the assignment that is assessed, not any formal -requirements. The "typical participant" was described to give an idea of -who is likely to succeed in this. -</P> -<A NAME="toc11"></A> -<H3>Costs</H3> -<P> -The summer school is free of charge. -</P> -<P> -Some travel grants are given, on the basis of a programming contest, -to cover travel and accommodation costs up to 1000 EUR -per person. -</P> -<P> -The number of grants will be decided during Spring 2009, and the grand -holders will be notified before the beginning of June. -</P> -<P> -Special terms will apply to students in -<A HREF="http://www.gslt.hum.gu.se/">GSLT</A> and -<A HREF="http://ngslt.org/">NGSLT</A>. -</P> -<A NAME="toc12"></A> -<H3>Teachers</H3> -<P> -A list of teachers will be published here later. Some of the local teachers -probably involved are the following: -</P> -<UL> -<LI>Krasimir Angelov -<LI>Robin Cooper -<LI>Håkan Burden -<LI>Markus Forsberg -<LI>Harald Hammarström -<LI>Peter Ljunglöf -<LI>Aarne Ranta -</UL> - -<P> -More teachers are welcome! If you are interested, please contact us so that -we can discuss your involvement and travel arrangements. -</P> -<P> -In addition to teachers, we will look for consultants who can help to assess -the results for each language. Please contact us! -</P> -<A NAME="toc13"></A> -<H3>The Summer School Committee</H3> -<P> -This committee consists of a number of teachers and informants, -who will select the participants. It will be selected by April 2009. -</P> -<A NAME="toc14"></A> -<H3>Time and Place</H3> -<P> -The summer school will -be organized at the campus of Chalmers University of Technology in Gothenburg, -Sweden, on 17-28 August 2009. -</P> -<P> -Time schedule: -</P> -<UL> -<LI>February: announcement of summer school -<LI>20-29 April: on-line course -<LI>12 May: submission deadline for assignment work -<LI>31 May: review of assignments, notifications of acceptance -<LI>15 June: <B>registration deadline</B> -<LI>17-28 August: Summer School -<LI>September-December: homework on resource grammars -<LI>December: release of the extended Resource Grammar Library -</UL> - -<A NAME="toc15"></A> -<H3>Dissemination and intellectual property</H3> -<P> -The new resource grammars will be released under the LGPL just like -the current resource grammars, -with the copyright held by respective authors. -</P> -<P> -The grammars will be distributed via the GF web site. -</P> -<A NAME="toc16"></A> -<H2>Why I should participate</H2> -<P> -Seven reasons: -</P> -<OL> -<LI>participation in a pioneering language technology work in an - enthusiastic atmosphere -<LI>work and fun with people from all over Europe and the world -<LI>job opportunities and business ideas -<LI>credits: the school project will be established as a course at Chalmers worth - 7.5 or 15 ETCS points per person, depending on the work accompliched; also - extensions to Master's thesis will be considered (special credit arrangements - for <A HREF="http://www.gslt.hum.gu.se/">GSLT</A> and <A HREF="http://ngslt.org/">NGSLT</A>) -<LI>merits: the resulting grammar can easily lead to a published paper (see below) -<LI>contribution to the multilingual and multicultural development of Europe and the - world -<LI>free trip and stay in Gothenburg (for travel grant students) -</OL> - -<A NAME="toc17"></A> -<H2>More information</H2> -<P> -<A HREF="http://groups.google.com/group/gf-resource-school-2009/">Course Google Group</A> -</P> -<P> -<A HREF="http://digitalgrammars.com/gf/">GF web page</A> -</P> -<P> -<A HREF="http://digitalgrammars.com/gf/doc/gf-tutorial.html">GF tutorial</A> -</P> -<P> -<A HREF="http://digitalgrammars.com/gf/lib/resource/doc/synopsis.html">GF resource synopsis</A> -</P> -<P> -<A HREF="http://digitalgrammars.com/gf/doc/Resource-HOWTO.html">Resource-HOWTO document</A> -</P> -<A NAME="toc18"></A> -<H3>Contact</H3> -<P> -Håkan Burden: burden at chalmers se -</P> -<P> -Aarne Ranta: aarne at chalmers se -</P> -<A NAME="toc19"></A> -<H3>Selected publications from earlier resource grammar projects</H3> -<P> -K. Angelov. -Type-Theoretical Bulgarian Grammar. -In B. Nordström and A. Ranta (eds), -<I>Advances in Natural Language Processing (GoTAL 2008)</I>, -LNCS/LNAI 5221, Springer, -2008. -</P> -<P> -B. Bringert. -<I>Programming Language Techniques for Natural Language Applications</I>. -Phd thesis, Computer Science, University of Gothenburg, -2008. -</P> -<P> -A. El Dada and A. Ranta. -Implementing an Open Source Arabic Resource Grammar in GF. -In M. Mughazy (ed), -<I>Perspectives on Arabic Linguistics XX. Papers from the Twentieth Annual Symposium on Arabic Linguistics, Kalamazoo, March 26</I> -John Benjamins Publishing Company. -2007. -</P> -<P> -A. El Dada. -Implementation of the Arabic Numerals and their Syntax in GF. -Computational Approaches to Semitic Languages: Common Issues and Resources, - ACL-2007 Workshop, -June 28, 2007, Prague. -2007. -</P> -<P> -H. Hammarström and A. Ranta. -Cardinal Numerals Revisited in GF. -<I>Workshop on Numerals in the World's Languages</I>. -Dept. of Linguistics Max Planck Institute for Evolutionary Anthropology, Leipzig, -2004. -</P> -<P> -M. Humayoun, H. Hammarström, and A. Ranta. -Urdu Morphology, Orthography and Lexicon Extraction. -<I>CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages</I>, -July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University. -2007. -</P> -<P> -K. Johannisson. -<I>Formal and Informal Software Specifications.</I> -Phd thesis, Computer Science, University of Gothenburg, -2005. -</P> -<P> -J. Khegai. -GF parallel resource grammars and Russian. -In proceedings of ACL2006 - (The joint conference of the International Committee on Computational - Linguistics and the Association for Computational Linguistics) (pp. 475-482), - Sydney, Australia, July 2006. -</P> -<P> -J. Khegai. -<I>Language engineering in Grammatical Framework (GF)</I>. -Phd thesis, Computer Science, Chalmers University of Technology, -2006. -</P> -<P> -W. Ng'ang'a. -Multilingual content development for eLearning in Africa. -eLearning Africa: 1st Pan-African Conference on ICT for Development, - Education and Training. 24-26 May 2006, Addis Ababa, Ethiopia. -2006. -</P> -<P> -N. Perera and A. Ranta. -Dialogue System Localization with the GF Resource Grammar Library. -<I>SPEECHGRAM 2007: ACL Workshop on Grammar-Based Approaches to Spoken Language Processing</I>, -June 29, 2007, Prague. -2007. -</P> -<P> -A. Ranta. -Modular Grammar Engineering in GF. -<I>Research on Language and Computation</I>, -5:133-158, 2007. -</P> -<P> -A. Ranta. -How predictable is Finnish morphology? An experiment on lexicon construction. -In J. Nivre, M. Dahllöf and B. Megyesi (eds), -<I>Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein</I>, -University of Uppsala, -2008. -</P> -<P> -A. Ranta. Grammars as Software Libraries. -To appear in -Y. Bertot, G. Huet, J-J. Lévy, and G. Plotkin (eds.), -<I>From Semantics to Computer Science</I>, -Cambridge University Press, Cambridge, 2009. -</P> -<P> -A. Ranta and K. Angelov. -Implementing Controlled Languages in GF. -To appear in the proceedings of <I>CNL 2009</I>. -</P> - -<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) --> -<!-- cmdline: txt2tags -\-toc gf-summerschool.txt --> -</BODY></HTML> |
