| Age | Commit message (Collapse) | Author |
|
1. The default encoding is changed from Latin-1 to UTF-8.
2. Alternate encodings should be specified as "--# -coding=enc", the old
"flags coding=enc" declarations have no effect but are still checked for
consistency.
3. A transitional warning is generated for files that contain non-ASCII
characters without specifying a character encoding:
"Warning: default encoding has changed from Latin-1 to UTF-8"
4. Conversion to Unicode is now done *before* lexing. This makes it possible
to allow arbitrary Unicode characters in identifiers. But identifiers are
still stored as ByteStrings, so they are limited to Latin-1 characters
for now.
5. Lexer.hs is no longer part of the repository. We now generate the lexer
from Lexer.x with alex>=3. Some workarounds for bugs in alex-3.0 were
needed. These bugs might already be fixed in newer versions of alex, but
we should be compatible with what is shipped in the Haskell Platform.
|
|
write for instance 'ab.c' and then everything between the quites is identifier. This includes Unicode characters and non-ASCII symbols. This is useful for automatically generated GF grammars.
|
|
|
|
The fact that identifiers are represented as ByteStrings is now an internal
implentation detail in module GF.Infra.Ident. Conversion between ByteString
and identifiers is only needed in the lexer and the Binary instances.
|
|
As a temporary workaround, alex is no longer invoked automatically when
building with cabal. Developers who want to modify the lexer need to run
alex on Lexer.x manually and record the modified Lexer.hs.
src/compiler/GF/Grammar/lexer/Lexer.x -- hidden from cabal
src/compiler/GF/Grammar/Lexer.hs -- update it manually
|