summaryrefslogtreecommitdiff
path: root/src/compiler/GF/Text
AgeCommit message (Collapse)Author
2022-03-05prepare for GHC 9, base 4.15, by using Buffer constructor interfaceMeng Weng Wong
2021-07-07Replace tabs for whitespace in source codeJohn J. Camilleri
2019-11-18PGFService: revert unlexing change in PGFService to restore &+ behaviourThomas Hallgren
2019-01-17Adding -output-format canonical_gfThomas Hallgren
This output format converts a GF grammar to a "canonical" GF grammar. A canonical GF grammar consists of - one self-contained module for the abstract syntax - one self-contained module per concrete syntax The concrete syntax modules contain param, lincat and lin definitions, everything else has been eliminated by the partial evaluator, including references to resource library modules and functors. Record types and tables are retained. The -output-format canonical_gf option writes canonical GF grammars to a subdirectory "canonical/". The canonical GF grammars are written as normal GF ".gf" source files, which can be compiled with GF in the normal way. The translation to canonical form goes via an AST for canonical GF grammars, defined in GF.Grammar.Canonical. This is a simple, self-contained format that doesn't cover everyting in GF (e.g. omitting dependent types and HOAS), but it is complete enough to translate the Foods and Phrasebook grammars found in gf-contrib. The AST is based on the GF grammar "GFCanonical" presented here: https://github.com/GrammaticalFramework/gf-core/issues/30#issuecomment-453556553 The translation of concrete syntax to canonical form is based on the previously existing translation of concrete syntax to Haskell, implemented in module GF.Compile.ConcreteToHaskell. This module could now be reimplemented and simplified significantly by going via the canonical format. Perhaps exports to other output formats could benefit by going via the canonical format too. There is also the possibility of completing the GFCanonical grammar mentioned above and using GF itself to convert canonical GF grammars to other formats...
2018-06-12added transliteration arabic_unvocalized, which omits the vowelsAarne Ranta
2017-09-04eliminate modules PGF.Lexing, PGF.LexingAGreek. Make PGF.Utilities an ↵Krasimir Angelov
internal module in the runtime. These are not really part of the core runtime.
2017-06-14added Arabic question mark to arabic and persian transliterations, as well ↵aarne
as the zero-width non-joiner U+200C to persian"
2016-02-23add lexer and unlexer for Ancient Greek accent normalizationleiss
2015-08-28Comment out some dead code found with -fwarn-unused-bindshallgren
Also fixed some warnings and tightened some imports
2014-08-11revert an accidental change that I pushed together with the last patchkr.angelov
2014-08-11a partial support for def rules in the C runtimekr.angelov
The def rules are now compiled to byte code by the compiler and then to native code by the JIT compiler in the runtime. Not all constructions are implemented yet. The partial implementation is now in the repository but it is not activated by default since this requires changes in the PGF format. I will enable it only after it is complete.
2014-07-27Adding GF.Infra.Location and GF.Text.Pretty (forgot to 'darcs add' them before)hallgren
2014-06-24minibar: include the grammar's last modification in the grammar info shown ↵hallgren
by the "i" button Also bumped version number in gf.cabal to 3.6-darcs. Also removed some unecessary use of CPP.
2014-04-09Change the type of PGF.Lexing.bindTok to [String] -> [String]hallgren
The old type was [String] -> String. This function was only used in GF.Text.Lexing.stringOp, which now uses (unwords . bindTok) instead, with no change in behaviour.
2014-04-09Unlexers: move capitalization of first word from GF.Text.Lexing to PGF.Lexinghallgren
The capitalization of the first word was done in GF.Text.Lexing.stringOp, but is now done in the functions unlexText and unlexMixed in PGF.Lexing. These functions are only used in stringOp and in PGFService (where the change is needed), so the subtle change in behaviour should not cause any bugs.
2014-04-08Move basic lexing functions from GF.Text.Lexing to the new module PGF.Lexinghallgren
They are thus part of the PGF Run-Time Library, making it possible to add lexing functionality in PGF service in a natural way.
2013-12-03removed the unlines-lines wrapper from Lexing.unlexer to prevent empty lines ↵aarne
when an unlexer (such as -bind or -unchars) is used as an option in linearization. Don't know really why the input had been broken into lines in the first place. You can see the effect by importing LangEng and running "gr -cat=Cl | l -table -bind" before and after recompiling GF.
2013-11-25Change how GF deals with character encodings in grammar fileshallgren
1. The default encoding is changed from Latin-1 to UTF-8. 2. Alternate encodings should be specified as "--# -coding=enc", the old "flags coding=enc" declarations have no effect but are still checked for consistency. 3. A transitional warning is generated for files that contain non-ASCII characters without specifying a character encoding: "Warning: default encoding has changed from Latin-1 to UTF-8" 4. Conversion to Unicode is now done *before* lexing. This makes it possible to allow arbitrary Unicode characters in identifiers. But identifiers are still stored as ByteStrings, so they are limited to Latin-1 characters for now. 5. Lexer.hs is no longer part of the repository. We now generate the lexer from Lexer.x with alex>=3. Some workarounds for bugs in alex-3.0 were needed. These bugs might already be fixed in newer versions of alex, but we should be compatible with what is shipped in the Haskell Platform.
2013-06-15Improvements In Sindhi RGvirk.shafqat
2013-06-02GF.Text.Transliterations: avoid error prone function Data.Map.fromAscListhallgren
2013-05-31Prasad's sanskrit transliteration ; MiniresourceSan now compiles but is ↵aarne
mostly incorrect due to missing paradigms
2012-11-05unicode4k-changedvirk.shafqat
2012-03-26compiler/GF/Text/Coding.hs: fix build failure against ghc-7.2Sergei Trofimovich
2012-02-23hindi-resource-grammarvirk.shafqat
2012-02-21sindhipatchvirk.shafqat
2011-09-15made ps -from_TRANSLIT symmetric to -to_TRANSLIT in the sense that unknown ↵aarne
characters are returned as themselves and not as question marks
2011-06-20refinementNepali-11-06-20virk.shafqat
2011-06-14allow empty lines in transliteration filesaarne
2011-05-19refinementsTextUrd-11-05-19virk.shafqat
2011-05-06fixed problems in persian transliteration pointed out by Elnazaarne
2011-05-02transliteration via configuration file: ps -to=file or ps -from=fileaarne
2011-02-06a simple clitic analysis command 'ca'aarne
2011-01-31corrections to ancientgreek encoding by Hans Leissaarne
2010-11-25DiffUrd and Hin; updated Transliteration.hsaarne
2010-05-07Amharic transliteration by Markosaarne
2010-04-19use the native unicode support from GHC 6.12krasimir
2010-04-01Urdu transliteration fixed (by Shafqat)aarne
2010-03-23added codepage for Turkishkrasimir
2010-03-23added comment to every GF.Text.CPxxxx module about the purpose of the codepagekrasimir
2010-03-22transliteration for Urdukrasimir
2009-12-17correct capitalization in unlexmixed; unlextext and unlexmixed now remove ↵aarne
string literal quotes
2009-12-13reorganize the directories under src, and rescue the JavaScript interpreter ↵krasimir
from deprecated