summaryrefslogtreecommitdiff
path: root/examples/gfcc/complin.tex
diff options
context:
space:
mode:
authoraarne <aarne@chalmers.se>2010-03-02 19:10:56 +0000
committeraarne <aarne@chalmers.se>2010-03-02 19:10:56 +0000
commite4748e998453b979af46983a079f2ec3d307ada4 (patch)
treeef1c2b278850a7f6a95959b6f18469645cf1094c /examples/gfcc/complin.tex
parent21b10f91cbbd0a2d369dc55306bad53df1502cd7 (diff)
restored gfcc example (GF C compiler)
Diffstat (limited to 'examples/gfcc/complin.tex')
-rw-r--r--examples/gfcc/complin.tex1477
1 files changed, 1477 insertions, 0 deletions
diff --git a/examples/gfcc/complin.tex b/examples/gfcc/complin.tex
new file mode 100644
index 000000000..9e624a846
--- /dev/null
+++ b/examples/gfcc/complin.tex
@@ -0,0 +1,1477 @@
+\documentclass[12pt]{article}
+
+\usepackage[latin1]{inputenc}
+%\input{psfig.sty}
+
+\setlength{\oddsidemargin}{0mm}
+%\setlength{\evensidemargin}{0mm}
+\setlength{\evensidemargin}{-2mm}
+\setlength{\topmargin}{-16mm}
+\setlength{\textheight}{240mm}
+\setlength{\textwidth}{158mm}
+
+%\setlength{\parskip}{2mm}
+%\setlength{\parindent}{0mm}
+
+\input{macros}
+
+\newcommand{\begit}{\begin{itemize}}
+\newcommand{\enit}{\end{itemize}}
+\newcommand{\HOAS}{higher-order abstract syntax}
+\newcommand{\newone}{} %%{\newpage}
+\newcommand{\heading}[1]{\subsection{#1}}
+\newcommand{\explanation}[1]{{\small #1}}
+\newcommand{\empha}[1]{{\em #1}}
+\newcommand{\commentOut}[1]{}
+\newcommand{\rarrow}{\; \rightarrow\;}
+
+\newcommand{\nocolor}{} %% {\color[rgb]{0,0,0}}
+
+
+\title{{\bf Declarative Language Definitions and Code Generation as Linearization}}
+
+\author{Aarne Ranta \\
+ Department of Computing Science \\
+ Chalmers University of Technology and the University of Gothenburg\\
+ {\tt aarne@cs.chalmers.se}}
+
+\date{30 November 2004}
+
+\begin{document}
+
+\maketitle
+
+
+\subsection*{Abstract}
+
+{\em
+This paper presents a compiler for a fragment of the C programming
+language, with JVM (Java Virtual Machine) as target language.
+The compiler is implemented in a purely declarative way:
+its definition consists of an abstract syntax of program
+structures and two concrete syntaxes matching the abstract
+syntax: one for C and one for JVM. From these grammar components,
+the compiler is derived by using the GF (Grammatical Framework)
+grammat tool: the front end consists of parsing and semantic
+checking in accordance to the C grammar, and the back end consists
+of linearization in accordance to the JVM grammar. The tool provides
+other functionalities as well, such as decompilation and interactive
+editing.
+}
+
+\section{Introduction}
+
+The experiment reported in this paper was prompted by a challenge
+posted by Lennart Augustsson to the participants of the workshop
+on Dependent Types in Programming held at Dagstuhl in September 2004.
+The challenge was to use dependent types to write a compiler from
+C to bytecode. This paper does not meet the challenge quite literally,
+since our compiler is for a different subset of C than Augustsson's
+specification, and since the bytecode that we generate is JVM instead
+of his format. But it definitely makes use of dependent types.
+
+Augustsson's challenge did not specify \textit{how} dependent
+types are to be used, and the first of the two points we make in this
+paper (and its title) reflects our interpretation:
+we use dependent types, in combination with higher-order abstract syntax (HOAS),
+to define the grammar of the source language (here, the fragment of C).
+The grammar constitutes the single, declarative source from which
+the compiler front end is derived, comprising both parser and type
+checker.
+
+The second point, code generation by linearization, means that the
+back end is likewise implemented by a grammar of the target
+language (in this case, a fragment of JVM). This grammar is the
+declarative source from which the compiler back end is derived.
+In addition, some postprocessing is needed to
+make the code conform to Jasmin assembler requirements.
+
+The complete code of the compiler is 300 lines: 250 lines for the grammars,
+50 lines for the postprocessor. The code
+is presented in the appendices of this paper.
+
+
+
+\section{The Grammatical Framework}
+
+The tool we have used for implementing the compiler is
+GF, the \empha{Grammatical Framework} \cite{gf-jfp}. GF
+is similar to a Logical Framework (LF)
+\cite{harper-honsell}
+extended with
+a notation for defining concrete syntax. GF was originally
+designed to help building multilingual
+translation systems for natural languages and also
+between formal and natural languages. The translation model
+implemented by GF is very simple:
+\begin{verbatim}
+ parsing linearization
+ ------------> ------------>
+ Language_1 Abstract Syntax Language_2
+ <------------ <------------
+ linearization parsing
+\end{verbatim}
+An abstract syntax is similar to a \empha{theory}, or a
+\empha{signature} in a logical framework. A
+concrete syntax defines, in a declarative way,
+a translation of abstract syntax trees (well-formed terms)
+into concrete language structures, and from this definition, one can
+derive both linearization and parsing.
+
+To give an example,
+a (somewhat simplified) translator for addition expressions
+consists of the abstract syntax rule
+\begin{verbatim}
+ fun EAdd : (A : Typ) -> Exp A -> Exp A -> Exp A ;
+\end{verbatim}
+the C concrete syntax rule
+\begin{verbatim}
+ lin EAdd _ x y = {s = x.s ++ "+" ++ y.s ; prec = 2} ;
+\end{verbatim}
+and the JVM concrete syntax rule
+\begin{verbatim}
+ lin EAdd t x y = {s = x.s ++ y.s ++ t.s ++ "_add"} ;
+\end{verbatim}
+The abstract syntax rule uses a type argument to capture
+the fact that addition is polymorphic (which is a simplification,
+because we will restrict the rule to numeric types only)
+and that both operands have the same type as the value.
+The C rule shows that the type information is suppressed,
+and that the expression has precedence level 2 (which is a simplification,
+since we will also treat associativity).
+The JVM rule shows how addition is translated to stack machine
+instructions, where the type of the postfixed addition instruction has to
+be made explicit. Our compiler, like any GF translation system, will
+consist of rules like these.
+
+The number of languages related to one abstract syntax in
+a translation system is of course not limited to two.
+Sometimes just one language is involved;
+GF then works much the same way as any grammar
+formalism or parser generator.
+The largest number of languages in an application known to us is 88;
+its domain are numeral expressions from 1 to 999,999 \cite{gf-homepage}.
+
+In addition to linearization and parsing, GF supports grammar-based
+\empha{multilingual authoring} \cite{khegai}: interactive editing
+of abstract syntax trees with immediate feedback as linearized texts,
+and the possibility to textual through the parsers.
+
+From the GF point of view, the goal of the compiler experiment
+is to investigate if GF is capable of implementing
+compilers using the ideas of single-source language definition
+and code generation as linearization. The working hypothesis
+was that it \textit{is} capable but inconvenient, and that,
+working out a complete example, we would find out what
+should be done to extend GF into a compiler construction tool.
+
+
+\subsection{Advantages and disadvantages}
+
+Due to the way in which it is built, our compiler has
+a number of unusual, yet attractive features:
+\bequ
+The front end is defined by a grammar of C as its single source.
+
+The grammar defines both abstract and concrete syntax, and also
+semantic well-formedness (types, variable scopes).
+
+The back end is implemented by means of a grammar of JVM providing
+another concrete syntax to the abstract syntax of C.
+
+As a result of the way JVM is defined, only semantically well formed
+JVM programs are generated.
+
+The JVM grammar can also be used as a decompiler, which translates
+JVM code back into C code.
+
+The language has an interactive editor that also supports incremental
+compilation.
+\enqu
+The problems that we encountered and their causes will be explained in
+the relevant sections of this report. To summarize,
+\bequ
+The scoping conditions resulting from \HOAS\ are slightly different
+from the standard ones of C.
+
+Our JVM syntax is slightly different from the specification, and
+hence needs some postprocessing.
+
+Using \HOAS\ to encode all bindings is sometimes cumbersome.
+\enqu
+The first shortcoming seems to be inevitable with the technique
+we use: just like lambda calculus, our C semantics allows
+overshadowing of earlier bindings by later ones.
+The second problem is systematically solved by using
+an intermediate JVM format, where symbolic variable addresses
+are used instead of numeric stack addresses.
+The last shortcoming is partly inherent in the problem of binding:
+to spell out, in any formal notation,
+what happens in complex binding structures \textit{is}
+complicated. But it also suggests ways in which GF could be
+tuned to give better support
+to compiler construction, which, after all, is not an intended
+use of GF as it is now.
+
+
+
+
+
+
+
+
+\section{The abstract syntax}
+
+An \empha{abstract syntax} in GF consists of \texttt{cat} judgements
+\[
+\mbox{\texttt{cat}} \; C \; \Gamma
+\]
+declaring basic types (depending on a context $\Gamma$),
+and \texttt{fun} judgements
+\[
+\mbox{\texttt{fun}} \; f \; \mbox{\texttt{:}} \; A
+\]
+declaring functions $f$ of any type $A$, which can be a basic type or
+a function type.
+\empha{Syntax trees} are well-formed terms of basic
+types, in $\eta$-long normal form.
+
+As for notation, each judgement form is recognized by
+its keyword (\texttt{cat}, \texttt{fun}, etc),
+and the same keyword governs all judgements
+until the next keyword is encountered.
+
+The abstract syntax that we will present is no doubt closer
+to C than to JVM. One reason is that what we are building is a
+\textit{C compiler}, and match with the target language is a
+secondary consideration. Another, more general reason is that
+C is a higher-level language and JVM which means, among
+other things, that C makes more semantic distinctions.
+In general, the abstract syntax of a translation system
+must reflect all semantic distinctions that can be made
+in the languages involved, and then it is a good idea to
+start with looking at what the most distinctive language
+needs.
+
+
+
+\subsection{Statements}
+
+Statements in C may involve variables, expressions, and
+other statements.
+The following \texttt{cat} judgements of GF define the syntactic categories
+that are needed to construct statements
+\begin{verbatim}
+ cat
+ Stm ;
+ Typ ;
+ Exp Typ ;
+ Var Typ ;
+\end{verbatim}
+The type \texttt{Typ} is the type of C's datatypes.
+The type of expressions is a dependent type,
+since it has a nonempty context, indicating that \texttt{Exp} takes
+a \texttt{Typ} as argument. The rules for \texttt{Exp}
+will thus be rules to construct well-typed expressions of
+a given type. \texttt{Var}\ is the type of variables,
+of a given type, that get bound in C's variable
+declarations.
+
+Let us start with the simplest kind of statements:
+declarations and assignments. The following \texttt{fun}
+rules define their abstract syntax:
+\begin{verbatim}
+ fun
+ Decl : (A : Typ) -> (Var A -> Stm) -> Stm ;
+ Assign : (A : Typ) -> Var A -> Exp A -> Stm -> Stm ;
+\end{verbatim}
+The \texttt{Decl}\ function captures the rule that
+a variable must be declared before it can be used or assigned to:
+its second argument is a \empha{continuation}, which is
+the sequence of statements that depend on (= may refer to)
+the declared variable.
+The \texttt{Assign} function uses dependent types to
+control that a variable is always assigned a value of proper
+type.
+
+We will treat all statements, except
+\texttt{return}s, in terms of continuations. A sequence of
+statements (which always has the type \texttt{Stm}) thus
+always ends in a \texttt{return}, or, abruptly, in
+an empty statement, \texttt{End}. Here are rules for some other
+statement forms:
+\begin{verbatim}
+ While : Exp TInt -> Stm -> Stm -> Stm ;
+ IfElse : Exp TInt -> Stm -> Stm -> Stm -> Stm ;
+ Block : Stm -> Stm -> Stm ;
+ Return : (A : Typ) -> Exp A -> Stm ;
+ End : Stm ;
+\end{verbatim}
+Here is an example of a piece of code and its abstract syntax.
+\begin{verbatim}
+ int x ; Decl (TNum TInt) (\x ->
+ x = 5 ; Assign (TNum TInt) x (EInt 5) (
+ return x ; Return (TNum TInt) (EVar (TNum TInt) x)))
+\end{verbatim}
+The details of expression and type
+syntax will be explained in the next section.
+
+Our binding syntax is more liberal than C's in two ways.
+First,
+lambda calculus permits overshadowing previous bindings
+by new ones, e.g. to write \verb6\x -> (\x -> f x)6.
+The corresponding overshadowing of declarations is not
+legal in C, within one and the same block.
+Secondly, we allow declarations anywhere in a block,
+not just in the beginning.
+The second deviation would be easy to mend, whereas
+the first one is inherent to the method of \HOAS.
+
+
+
+\subsection{Types and expressions}
+
+Our fragment of C has two types: integers and floats.
+Many operators of C are overloaded so that they can
+be used for both of these types, as well as for
+some other numeric types---but not for e.g.\ arrays
+and structures. We capture this distinction by a notion
+reminiscent of \empha{type classes}: we introduce a special
+category of numeric types, and a coercion of numeric types
+into types in general.
+\begin{verbatim}
+ cat
+ NumTyp ;
+ fun
+ TInt, TFloat : NumTyp ;
+ TNum : NumTyp -> Typ ;
+\end{verbatim}
+Well-typed expressions are built from constants,
+from variables, and by means of binary operations.
+\begin{verbatim}
+ EVar : (A : Typ) -> Var A -> Exp A ;
+ EInt : Int -> Exp (TNum TInt) ;
+ EFloat : Int -> Int -> Exp (TNum TFloat) ;
+ ELt : (n : NumTyp) -> let Ex = Exp (TNum n) in
+ Ex -> Ex -> Exp (TNum TInt) ;
+ EAdd, EMul, ESub : (n : NumTyp) -> let Ex = Exp (TNum n) in
+ Ex -> Ex -> Ex ;
+\end{verbatim}
+Notice that the category \texttt{Var} has no constructors,
+but its expressions are only created by
+variable bindings in \HOAS.
+Notice also that GF has a built-in type \texttt{Int} of
+integer literals, but floats are constructed by hand.
+
+Yet another expression form are function calls. To this
+end, we need a notion of (user-defined) functions and
+argument lists. The type of functions depends on an
+argument type list and a value type. Expression lists
+are formed to match type lists.
+\begin{verbatim}
+ cat
+ ListTyp ;
+ Fun ListTyp Typ ;
+ ListExp ListTyp ;
+
+ fun
+ EAppNil : (V : Typ) -> Fun NilTyp V -> Exp V ;
+ EApp : (AS : ListTyp) -> (V : Typ) ->
+ Fun AS V -> ListExp AS -> Exp V ;
+
+ NilTyp : ListTyp ;
+ ConsTyp : Typ -> ListTyp -> ListTyp ;
+
+ OneExp : (A : Typ) -> Exp A -> ListExp (ConsTyp A NilTyp) ;
+ ConsExp : (A : Typ) -> (AS : ListTyp) ->
+ Exp A -> ListExp AS -> ListExp (ConsExp A AS) ;
+\end{verbatim}
+The separation between zero-element applications and other
+applications is a concession to the concrete syntax of C:
+it in this way that we can control the use of commas so that
+they appear between arguments (\texttt{(x,y,z)}) but not
+after the last argument (\texttt{(x,y,z,)}).
+The compositionality of linearization (Section~\ref{compositionality} below)
+forbids case analysis on the length of the lists.
+
+
+\subsection{Functions}
+
+On the top level, a program is a sequence of functions.
+Each function may refer to functions defined earlier
+in the program. The idea to express the binding of
+function symbols with \HOAS\ is analogous to the binding
+of variables in statements, using a continuation.
+As with variables, the principal way to build function symbols is as
+bound variables (in addition, there can be some
+built-in functions, unlike in the case of variables).
+The continuation of can be recursive, which we express by
+making the function body into a part of the continuation;
+the category \texttt{Rec} is the combination of a function
+body and the subsequent function definitions.
+\begin{verbatim}
+ cat
+ Program ;
+ Rec ListTyp ;
+ fun
+ Empty : Program ;
+ Funct : (AS : ListTyp) -> (V : Typ) ->
+ (Fun AS V -> Rec AS) -> Program ;
+ FunctNil : (V : Typ) ->
+ Stm -> (Fun NilTyp V -> Program) -> Program ;
+\end{verbatim}
+For syntactic reasons similar to function application
+expressions in the previous section, we have distinguished between
+empty and non-empty argument lists.
+
+The tricky problem with function definitions
+is that they involve two nested binding constructions:
+the outer binding of the function symbol and the inner
+binding of the function parameter lists.
+For the latter, we could use
+vectors of variables, in the same way as vectors of
+expressions are used to give arguments to functions.
+However, this would lead to the need of cumbersome
+projection functions when using the parameters
+in the function body. A more elegant solution is
+to use \HOAS\ to build function bodies:
+\begin{verbatim}
+ RecOne : (A : Typ) ->
+ (Var A -> Stm) -> Program -> Rec (ConsTyp A NilTyp) ;
+ RecCons : (A : Typ) -> (AS : ListTyp) ->
+ (Var A -> Rec AS) -> Program -> Rec (ConsTyp A AS) ;
+\end{verbatim}
+The end result is an abstract syntax whose relation
+to concrete syntax is somewhat remote. Here is an example of
+the code of a function and its abstract syntax:
+\begin{verbatim}
+ let int = TNum TInt in
+ int fact Funct (ConsTyp int NilTyp) int (\fact ->
+ (int n) { RecOne int (\n ->
+ int f ; Decl int (\f ->
+ f = 1 ; Assign int f (EInt 1) (
+ while (1 < n) { While (ELt int (EInt 1) (EVar int n)) (Block (
+ f = n * f ; Assign int f (EMul int (EVar int n) (EVar int f)) (
+ n = n - 1 ; Assign int n (ESub int (EVar int n) (EInt 1))
+ } End))
+ return f ; (Return int (EVar int f))) End)))
+ } ; Empty)
+\end{verbatim}
+
+
+\subsection{The \texttt{printf} function}
+
+To give a valid type to the C function \texttt{printf}
+is one of the things that can be done with
+dependent types \cite{cayenne}. We have not defined \texttt{printf}
+in its full strength, partly because of the difficulties to compile
+it into JVM. But we use special cases of \texttt{printf} as
+statements, to be able to print values of different types.
+\begin{verbatim}
+ Printf : (A : Typ) -> Exp A -> Stm -> Stm ;
+\end{verbatim}
+
+
+
+
+
+\section{The concrete syntax of C}
+
+A concrete syntax, for a given abstract syntax,
+consists of \texttt{lincat} judgements
+\[
+\mbox{\texttt{lincat}} \; C \; \mbox{\texttt{=}} \; T
+\]
+defining the \empha{linearization types} $T$ of each category $C$,
+and \texttt{lin} judgements
+\[
+\mbox{\texttt{lin}} \; f \; \mbox{\texttt{=}} \; t
+\]
+defining the \empha{linearization functions} $t$ of each function $f$
+in the abstract syntax. The linearization functions are
+checked to be well-typed with respect the \texttt{lincat}
+definitions, and the syntax of GF forces them to be \empha{compositional}
+in the sense that the linearization of a complex tree is always
+a function of the linearizations of the subtrees. Schematically, if
+\[
+ f \colon A_{1} \rarrow \cdots \rarrow A_{n} \rarrow A
+\]
+then
+\[
+ \sugmap{f} \colon
+ \sugmap{A_{1}} \rarrow \cdots
+ \rarrow \sugmap{A_{n}} \rarrow \sugmap{A}
+\]
+and the linearization algorithm is simply
+\[
+ \sugmap{(f \; a_{1} \; \ldots \; a_{n})} \; = \;
+ \sugmap{f} \; \sugmap{a_{1}} \; \ldots \; \sugmap{a_{n}}
+\]
+using the \sugmap{} notation for both linearization types,
+linearization functions, and linearizations of trees.
+
+\label{compositionality}
+Because of compositionality, no case analysis on expressions
+is possible in linearization rules. The values of linearization
+therefore have to carry information on how they are used in
+different situations. Therefore linearization
+types are generally record types instead of just the string type.
+The simplest record type that is used in GF is
+\bece
+\verb6{s : Str}6
+\ence
+If the linearization type of a category is not explicitly
+given by a \texttt{lincat} judgement, this type is
+used by default.
+
+With \HOAS, a syntax tree can have variable-bindings in its
+constituents. The linearization of such a constituent
+is compositionally defined to be the record linearizing the body
+extended with fields for each of the variable symbols:
+\[
+\sugmap{(\lambda x_{0} \rarrow \cdots \rarrow \lambda x_{n} \rarrow b)}
+\;=\;
+\sugmap{b} *\!* \{\$_{0} = \sugmap{x_{0}} ; \ldots ; \$_{n} = \sugmap{x_{n}}\}
+\]
+Notice that the variable symbols can
+always be found because linearizable trees are in $\eta$-long normal form.
+Also notice that we are here using the
+\sugmap{} notation in yet another way, to denote the magic operation
+that converts variable symbols into strings.
+
+
+\subsection{Resource modules}
+
+Resource modules define auxiliary notions that can be
+used in concrete syntax. These notions include
+\empha{parameter types} defined by \texttt{param}
+judgements
+\[
+\mbox{\texttt{param}} \; P \; \mbox{\texttt{=}}
+ \; C_{1} \; \Gamma_{1} \; \mid \; \cdots \; \mid \;
+ \; C_{n} \; \Gamma_{n}
+\]
+and \empha{operations} defined by
+\texttt{oper} judgements
+\[
+\mbox{\texttt{oper}} \; f \; \mbox{\texttt{:}} \; T \; \mbox{\texttt{=}} \; t
+\]
+These judgements are
+similar to datatype and function definitions
+in functional programming, with the restriction
+that parameter types must be finite and operations
+may not be recursive. It is due to these restrictions that
+we can always derive a parsing algorithm from a set of
+linearization rules.
+
+In addition to types defined in \texttt{param} judgements,
+initial segments of natural numbers, \texttt{Ints n},
+can be used as parameter types. This is the most important parameter
+type we use in the syntax of C, to represent precedence.
+
+The following string operations are useful in almost
+all grammars. They are actually included in a GF \texttt{Prelude},
+but are here defined from scratch to make the code shown in
+the Appendices complete.
+\begin{verbatim}
+ oper
+ SS : Type = {s : Str} ;
+ ss : Str -> SS = \s -> {s = s} ;
+ cc2 : (_,_ : SS) -> SS = \x,y -> ss (x.s ++ y.s) ;
+ paren : Str -> Str = \str -> "(" ++ str ++ ")" ;
+\end{verbatim}
+
+
+
+\subsection{Precedence}
+
+We want to be able to recognize and generate one and the same expression with
+or without parentheses, depending on whether its precedence level
+is lower or higher than expected. For instance, a sum used as
+an operand of multiplication must be in parentheses. We
+capture this by defining a parameter type of
+precedence levels. Five levels are enough for the present
+fragment of C, so we use the enumeration type of
+integers from 0 to 4 to define the \empha{inherent precedence level}
+of an expression
+\begin{verbatim}
+ oper
+ Prec : PType = Predef.Ints 4 ;
+ PrecExp : Type = {s : Str ; p : Prec} ;
+\end{verbatim}
+in a resource module (see Appendix D), and
+\begin{verbatim}
+ lincat Exp = PrecExp ;
+\end{verbatim}
+in the concrete syntax of C itself.
+
+To build an expression that has a certain inherent precedence level,
+we use the operation
+\begin{verbatim}
+ mkPrec : Prec -> Str -> PrecExp = \p,s -> {s = s ; p = p} ;
+\end{verbatim}
+To use an expression of a given inherent level at some expected level,
+we define a function that says that, if the inherent level is lower
+than the expected level, parentheses are required.
+\begin{verbatim}
+ usePrec : PrecExp -> Prec -> Str = \x,p ->
+ ifThenStr
+ (less x.p p)
+ (paren x.s)
+ x.s ;
+\end{verbatim}
+(The code shown in Appendix D is at the moment more cumbersome,
+due to holes in the support for integer arithmetic in GF.)
+
+With the help of \texttt{mkPrec} and \texttt{usePrec},
+we can now define the main high-level operations that are
+used in the concrete syntax itself---constants (highest level),
+non-associative infixes, and left associative infixes:
+\begin{verbatim}
+ constant : Str -> PrecExp = mkPrec 4 ;
+
+ infixN : Prec -> Str -> (_,_ : PrecExp) -> PrecExp = \p,f,x,y ->
+ mkPrec p (usePrec x (nextPrec p) ++ f ++ usePrec y (nextPrec p)) ;
+ infixL : Prec -> Str -> (_,_ : PrecExp) -> PrecExp = \p,f,x,y ->
+ mkPrec p (usePrec x p ++ f ++ usePrec y (nextPrec p)) ;
+\end{verbatim}
+(The code in Appendix D adds to this an associativity parameter,
+which is redundant in GF, but which we use to instruct the Happy
+parser generator.)
+
+
+\subsection{Expressions}
+
+With the machinery introduced, the linearization rules of expressions
+are simple and concise:
+\begin{verbatim}
+ EVar _ x = constant x.s ;
+ EInt n = constant n.s ;
+ EFloat a b = constant (a.s ++ "." ++ b.s) ;
+ EMul _ = infixL 3 "*" ;
+ EAdd _ = infixL 2 "+" ;
+ ESub _ = infixL 2 "-" ;
+ ELt _ = infixN 1 "<" ;
+
+ EAppNil val f = constant (f.s ++ paren []) ;
+ EApp args val f exps = constant (f.s ++ paren exps.s) ;
+\end{verbatim}
+
+
+\subsection{Types}
+
+Types are expressed in two different ways:
+in declarations, we have \texttt{int} and \texttt{float}, but
+as formatting arguments to \texttt{printf}, we have
+\verb6"%d"6 and \verb6"%f"6, with the quotes belonging to the
+names. The simplest solution in GF is to linearize types
+to records with two string fields.
+\begin{verbatim}
+ lincat
+ Typ, NumTyp = {s,s2 : Str} ;
+ lin
+ TInt = {s = "int" ; s2 = "\"%d\""} ;
+ TFloat = {s = "float" ; s2 = "\"%f\""} ;
+\end{verbatim}
+
+
+\subsection{Statements}
+
+Statements in C have
+the simplest linearization type, \verb6{s : Str}6.
+We use a handful of auxiliary operations to regulate
+the use of semicolons on a high level.
+\begin{verbatim}
+ oper
+ continues : Str -> SS -> SS = \s,t -> ss (s ++ ";" ++ t.s) ;
+ continue : Str -> SS -> SS = \s,t -> ss (s ++ t.s) ;
+ statement : Str -> SS = \s -> ss (s ++ ";");
+\end{verbatim}
+As for declarations, which bind variables, we notice the
+projection \verb6.$06 to refer to the bound variable.
+Also notice the use of the \texttt{s2} field of the type
+in \texttt{printf}.
+\begin{verbatim}
+ lin
+ Decl typ cont = continues (typ.s ++ cont.$0) cont ;
+ Assign _ x exp = continues (x.s ++ "=" ++ exp.s) ;
+ While exp loop = continue ("while" ++ paren exp.s ++ loop.s) ;
+ IfElse exp t f = continue ("if" ++ paren exp.s ++ t.s ++ "else" ++ f.s) ;
+ Block stm = continue ("{" ++ stm.s ++ "}") ;
+ Printf t e = continues ("printf" ++ paren (t.s2 ++ "," ++ e.s)) ;
+ Return _ exp = statement ("return" ++ exp.s) ;
+ Returnv = statement "return" ;
+ End = ss [] ;
+\end{verbatim}
+
+
+\subsection{Functions and programs}
+
+The category \texttt{Rec} of recursive function bodies with continuations
+has three components: the function body itself, the parameter list, and
+the program that follows. We express this by a linearization type that
+contains three strings:
+\begin{verbatim}
+ lincat Rec = {s,s2,s3 : Str} ;
+\end{verbatim}
+The body construction rules accumulate the parameter list
+independently of the two other components:
+\begin{verbatim}
+ lin
+ RecOne typ stm prg = stm ** {
+ s2 = typ.s ++ stm.$0 ;
+ s3 = prg.s
+ } ;
+ RecCons typ _ body prg = {
+ s = body.s ;
+ s2 = typ.s ++ body.$0 ++ "," ++ body.s2 ;
+ s3 = prg.s
+ } ;
+\end{verbatim}
+The top-level program construction rules rearrange the three
+components into a linear structure:
+\begin{verbatim}
+ FunctNil val stm cont = ss (
+ val.s ++ cont.$0 ++ paren [] ++ "{" ++
+ stm.s ++ "}" ++ ";" ++ cont.s) ;
+ Funct args val rec = ss (
+ val.s ++ rec.$0 ++ paren rec.s2 ++ "{" ++
+ rec.s ++ "}" ++ ";" ++ rec.s3) ;
+\end{verbatim}
+
+
+%%\subsection{Lexing and unlexing}
+
+
+
+\section{The concrete syntax of JVM}
+
+JVM syntax is, linguistically, more straightforward than
+the syntax of C, and could even be defined by a regular
+expression. However, the JVM syntax that our compiler
+generates does not comprise full JVM, but only the fragment
+that corresponds to well-formed C programs.
+
+The JVM syntax we use is a symbolic variant of the Jasmin assembler
+\cite{jasmin}.
+The main deviation from Jasmin are
+variable addresses, as described in Section~\ref{postproc}.
+The other deviations have to do with spacing: the normal
+unlexer of GF puts spaces between constituents, whereas
+in JVM, type names are integral parts of instruction names.
+We indicate gluing uniformly by generating an underscore
+on the side from which the adjacent element is glued. Thus
+e.g.\ \verb6i _load6 becomes \verb6iload6.
+
+
+\subsection{Symbolic JVM}
+\label{postproc}
+
+What makes the translation from our abstract syntax to JVM
+tricky is that variables must be replaced by
+numeric addresses (relative to the frame pointer).
+Code generation must therefore maintain a symbol table that permits
+the lookup of variable addresses. As shown in the code
+in Appendix C, we do not treat symbol tables
+in linearization, but instead generated code in
+\empha{Symbolic JVM}---that is, JVM with symbolic addresses.
+Therefore we need a postprocessor that resolves the symbolic addresses,
+shown in Appendix E.
+
+To make the postprocessor straightforward,
+Symbolic JVM has special \texttt{alloc} instructions,
+which are not present in real JVM.
+Our compiler generates \texttt{alloc} instructions from
+variable declarations.
+The postprocessor comments out the \texttt{alloc} instructions, but we
+found it a good idea not to erase them completely, since they make the
+code more readable.
+
+The following example shows how the three representations (C, Symbolic JVM, JVM)
+look like for a piece of code.
+\begin{verbatim}
+ int x ; alloc i x ; x gets address 0
+ int y ; alloc i y ; y gets address 1
+ x = 5 ; ldc 5 ldc 5
+ i _store x istore 0
+ y = x ; i _load x iload 0
+ i _store y istore 1
+\end{verbatim}
+
+
+\subsection{Labels and jumps}
+
+A problem related to variable addresses
+is the generation of fresh labels for
+jumps. We solve this in linearization
+by maintaining a growing label suffix
+as a field of the linearization of statements into
+instructions. The problem remains that statements on the
+same nesting level, e.g.\ the two branches
+of an \texttt{if-else} statement can use the same
+labels. Making them unique must be
+added to the post-processing pass. This is
+always possible, because labels are nested in a
+disciplined way, and jumps can never go to remote labels.
+
+As it turned out laborious to thread the label counter
+to expressions, we decided to compile comparison
+expressions (\verb6x < y6) into function calls, and provide the functions in
+a run-time library. This will no more work for the
+conjunction (\verb6x && y6)
+and disjunction (\verb6x || y6), if we want to keep their semantics
+lazy, since function calls are strict in their arguments.
+
+
+
+
+
+\subsection{How to restore code generation by linearization}
+
+Since postprocessing is needed, we have not quite achieved
+the goal of code generation as linearization---if
+linearization is understood in the
+sense of GF. In GF, linearization can only depend
+on parameters from finite parameter sets. Since the size of
+a symbol table can grow indefinitely, it is not
+possible to encode linearization with updates to and
+lookups from a symbol table, as is usual in code generation.
+
+One attempt we made to achieve JVM linearization with
+numeric addresses was to alpha-convert abstract syntax syntax trees
+so that variables get indexed with integers that indicate their
+depths in the tree. This hack works in the present fragment of C
+because all variables need the same amount of memory (one word),
+but would break down if we added double-precision floats. Therefore
+we have used the less pure (from the point of view of
+code generation as linearization) method of
+symbolic addresses.
+
+It would certainly be possible to generate variable addresses
+directly in the syntax trees by using dependent types; but this
+would clutter the abstract
+syntax in a way that is hard to motivate when we are in
+the business of describing the syntax of C. The abstract syntax would
+have to, so to say, anticipate all demands of the compiler's
+target languages.
+
+
+\subsection{Problems with the JVM bytecode verifier}
+
+An inherent restriction for linearization in GF is compositionality.
+This prevents optimizations during linearization
+by clever instruction selection, elimination of superfluous
+labels and jumps, etc. One such optimization, the removal
+of unreachable code (i.e.\ code after a \texttt{return} instruction)
+is actually required by the JVM byte code verifier.
+The solution is, again, to perform this optimization in postprocessing.
+What we currently do, however, is to be careful and write
+C programs so that they always end with a return statement in the
+outermost block.
+
+Another problem related to \texttt{return} instructions is that
+both C and JVM programs have a designated \texttt{main} function.
+This function must have a certain type, which is different in C and
+JVM. In C, \texttt{main} returns an integer encoding what
+errors may have happend during execution. The JVM
+\texttt{main}, on the other hand, returns a \texttt{void}, i.e.\
+no value at all. A \texttt{main} program returning an
+integer therefore provokes a JVM bytecode verifier error.
+The postprocessor could take care of this; but currently
+we just write programs with void \texttt{return}s in the
+\texttt{main} functions.
+
+The parameter list of \texttt{main} is also different in C (empty list)
+and JVM (a string array \texttt{args}). We handle this problem
+with an \empha{ad hoc} postprocessor rule.
+
+Every function prelude in JVM must indicate the maximum space for
+local variables, and the maximum evaluation stack space (within
+the function's own stack frame). The locals limit is computed in
+linearization by maintaining a counter field. The stack limit
+is blindly set to 1000; it would be possible to set an
+accurate limit in the postprocessing phase.
+
+
+\section{Translation as linearization vs.\ transfer}
+
+Many of the problems we have encountered in code generation by
+linearization are familiar from
+translation systems for natural languages. For instance, to translate
+the English pronoun \eex{you} to German, you have to choose
+between \eex{du, ihr, Sie}; for Italian, there are four
+variants, and so on. To deal with this by linearization,
+all semantic distinctions made in any of the involved languages
+have to be present in the common abstract syntax. The usual solution to
+this problem is not a universal abstract syntax, but
+\empha{transfer}: translation does not just linearize
+the same syntax trees to another language, but uses
+a noncompositional function that translates
+trees of one language into trees of another.
+
+Using transfer in the
+back end is precisely what traditional compilers do.
+The transfer function in our case would be a noncompositional
+function from the abstract syntax of C to a different abstract
+syntax of JVM. The abstract syntax notation of GF permits
+definitions of functions, and the GF interpreter can be used
+for evaluating terms into normal form. Thus one could write
+the code generator just like in any functional language:
+by sending in an environment and a syntax tree, and
+returning a new environment with an instruction list:
+\begin{verbatim}
+ fun
+ transStm : Env -> Stm -> EnvInstr ;
+ def
+ transStm env (Decl typ cont) = ...
+ transStm env (While (ELt a b) stm cont) = ...
+ transStm env (While exp stm cont) = ...
+\end{verbatim}
+This would be cumbersome in practice, because
+GF does not have programming-language facilities
+like built-in lists and tuples, or monads. Moreover,
+the compiler could no longer be inverted into a decompiler,
+in the way true linearization can be inverted into a parser.
+
+
+
+\section{Parser generation}
+\label{bnfc}
+
+The whole GF part of the compiler (parser, type checker, Symbolic JVM
+generator) can be run in the GF interpreter.
+The weakest point of the resulting compiler, by current
+standards, is the parser. GF is a powerful grammar formalism, which
+needs a very general parser, taking care of ambiguities and other
+problems that are typical of natural languages but should be
+overcome in programming languages by design. The parser is moreover run
+in an interpreter that takes the grammar (in a suitably compiled form)
+as an argument.
+
+Fortunately, it is easy to replace the generic, interpreting GF parser
+by a compiled LR(1) parser. GF supports the translation of a concrete
+syntax into the \empha{Labelled BNF} (LBNF) format, %% \cite{lbnf},
+which in turn can be translated to parser generator code
+(Happy, Bison, or JavaCUP), by the BNF Converter \cite{bnfc}.
+The parser we are therefore using in the compiler is a Haskell
+program generated by Happy \cite{happy}.
+
+We regard parser generation
+as a first step towards developing GF into a
+production-quality compiler compiler. The efficiency of the parser
+is not the only relevant thing. Another advantage of an LR(1)
+parser generator is that it performs an analysis on the grammar
+finding conflicts, and provides a debugger. It may be
+difficult for a human to predict how a context-free grammar
+performs at parsing; it is much more difficult to do this for
+a grammar written in the abstract way that GF permits (cf.\ the
+example in Appendix B).
+
+The current version of the C grammar is ambiguous. GF's own
+parser returns all alternatives, whereas the parser generated by
+Happy rules out some of them by its normal conflict handling
+policy. This means, in practice, that extra brackets are
+sometimes needed to group staments together.
+
+
+\subsection{Another notation for \HOAS}
+
+Describing variable bindings with \HOAS\ is sometimes considered
+unintuitive. Let us consider the declaration rule of C (without
+type dependencies for simplicity):
+\begin{verbatim}
+ fun Decl : Typ -> (Var -> Stm) -> Stm ;
+ lin Decl typ stm = {s = typ.s ++ stm.$0 ++ ";" ++stm.s} ;
+\end{verbatim}
+Compare this with a corresponding LBNF rule (also using a continuation):
+\begin{verbatim}
+ Decl. Stm ::= Typ Ident ";" Stm ;
+\end{verbatim}
+To explain bindings attached to this rule, one can say, in natural language,
+that the identifier gets bound in the statement that follows.
+This means that syntax trees formed by this rule do not have
+the form \verb6(Decl typ x stm)6, but the form \verb6(Decl typ (\x -> stm))6.
+
+One way to formalize the informal binding rules stated beside
+BNF rules is to use \empha{profiles}: data structures describing
+the way in which the logical arguments of the syntax tree are
+represented by the linearized form. The declaration rule can be
+written using a profile notation as follows:
+\begin{verbatim}
+ Decl [1,(2)3]. Stm ::= Typ Ident ";" Stm ;
+\end{verbatim}
+When compiling GF grammars into LBNF, we were forced to enrich
+LBNF by a (more general) profile notation
+(cf.\ \cite{gf-jfp}, Section 3.3). This suggested at the same
+time that profiles could provide a user-fiendly notation for
+\HOAS\ avoiding the explicit use of lambda calculus.
+
+
+
+\section{Using the compiler}
+
+Our compiler is invoked, of course, by the command \texttt{gfcc}.
+It produces a JVM \texttt{.class} file, by running the
+Jasmin bytecode assembler \cite{jasmin} on a Jasmin (\texttt{.j})
+file:
+\begin{verbatim}
+ % gfcc factorial.c
+ > > wrote file factorial.j
+ Generated: factorial.class
+\end{verbatim}
+The Jasmin code is produced by a postprocessor, written in Haskell
+(Appendix E), from the Symbolic JVM format that is produced by
+linearization. The reasons why actual Jasmin is not generated
+by linearization are explained in Section~\ref{postproc} above.
+
+In addition to the batch compiler, GF provides an interactive
+syntax editor, in which C programs can be constructed by
+stepwise refinements, local changes, etc.\ \cite{khegai}. The user of the
+editor can work simultaneously on all languages involved.
+In our case, this means that changes can be done both to
+the C code and to the JVM code, and they are automatically
+carried over from one language to the other.
+\commentOut{
+A screen dump of the editor is shown in Fig~\ref{demo}.
+
+\begin{figure}
+\centerline{\psfig{figure=demo2.epsi}} \caption{
+GF editor session where an integer
+expression is expected to be given. The left window shows the
+abstract syntax tree, and the right window the evolving C and
+JVM code. The editor focus is shadowed, and the refinement alternatives
+are shown in a pop-up window.
+}
+\label{demo}
+\end{figure}
+}
+
+
+\section{Related work}
+
+The theoretical ideas behind our compiler experiment
+are familiar from various sources.
+Single-source language and compiler definitions
+can be built using attribute grammars \cite{knuth-attr}.
+The use of
+dependent types in combination with higher-order abstract syntax
+has been studied in various logical frameworks
+\cite{harper-honsell,magnusson-nordstr,twelf}.
+The addition of linearization rules to type-theoretical
+abstract syntax is studied in \cite{semBNF}, which also
+compares the method with attribute grammars.
+
+The idea of using a common abstract syntax for different
+languages was clearly exposed by Landin \cite{landin}. The view of
+code generation as linearization is a central aspect of
+the classic compiler textbook by Aho, Sethi, and Ullman
+\cite{aho-ullman}.
+The use of one and the same grammar both for parsing and linearization
+is a guiding principle of unification-based linguistic grammar
+formalisms \cite{pereira-shieber}. Interactive editors derived from
+grammars have been developed in various programming and proof
+assistants \cite{teitelbaum,metal,magnusson-nordstr}.
+
+Even though the different ideas are well-known,
+we have not seen them used together to construct a complete
+compiler. In our view, putting these ideas together is
+an attractive approach to compiling, since a compiler written
+in this way is completely declarative, hence concise,
+and therefore easy to modify and extend. What is more, if
+a new language construct is added, the GF type checker
+verifies that the addition is propagated to all components
+of the compiler. As the implementation is declarative,
+it is also self-documenting, since a human-readable
+grammar defines the syntax and static
+semantics that is actually used in the implementation.
+
+
+\section{Conclusion}
+
+The \texttt{gfcc} compiler translates a representative
+fragment of C to JVM, and growing the fragment
+does not necessarily pose any new kinds of problems.
+Using \HOAS\ and dependent types to describe the abstract
+syntax of C works fine, and defining the concrete syntax
+of C on top of this using GF linearization machinery is
+possible. To build a parser that is more efficient than
+GF's generic one, GF offers code generation for standard
+parser tools.
+
+One result of the experiment is the beginning of a
+library for dealing with typical programming language structures
+such as precedence. This library is exploited in the parser
+generator, which maps certain parameters used into GF grammars
+into precedence directives in labelled BNF grammars.
+
+The most serious difficulty with JVM code generation by linearization
+is to maintain a symbol table mapping variables to addresses.
+The solution we have chosen is to generate Symbolic JVM, that is,
+JVM with symbolic addresses, and translate the symbolic addresses to
+(relative) memory locations by a postprocessor.
+
+Since the postprocessor works uniformly for the whole Symbolic JVM,
+building a new compiler to generate JVM should now be
+possible by just writing GF grammars. The most immediate
+idea for developing GF as a compiler tool is to define
+a similar symbolic format for an intermediate language,
+which uses three-operand code and virtual registers.
+
+
+
+\bibliographystyle{plain}
+
+\bibliography{gf-bib}
+
+
+\newpage
+\subsection*{Appendix A: The abstract syntax}
+
+\small
+\begin{verbatim}
+abstract Imper = PredefAbs ** {
+ cat
+ Program ;
+ Rec ListTyp ;
+ Typ ;
+ NumTyp ;
+ ListTyp ;
+ Fun ListTyp Typ ;
+ Body ListTyp ;
+ Stm ;
+ Exp Typ ;
+ Var Typ ;
+ ListExp ListTyp ;
+
+ fun
+ Empty : Program ;
+ Funct : (AS : ListTyp) -> (V : Typ) -> (Fun AS V -> Rec AS) -> Program ;
+ FunctNil : (V : Typ) -> Stm -> (Fun NilTyp V -> Program) -> Program ;
+ RecOne : (A : Typ) -> (Var A -> Stm) -> Program -> Rec (ConsTyp A NilTyp) ;
+ RecCons : (A : Typ) -> (AS : ListTyp) ->
+ (Var A -> Rec AS) -> Program -> Rec (ConsTyp A AS) ;
+
+ Decl : (A : Typ) -> (Var A -> Stm) -> Stm ;
+ Assign : (A : Typ) -> Var A -> Exp A -> Stm -> Stm ;
+ While : Exp (TNum TInt) -> Stm -> Stm -> Stm ;
+ IfElse : Exp (TNum TInt) -> Stm -> Stm -> Stm -> Stm ;
+ Block : Stm -> Stm -> Stm ;
+ Printf : (A : Typ) -> Exp A -> Stm -> Stm ;
+ Return : (A : Typ) -> Exp A -> Stm ;
+ Returnv : Stm ;
+ End : Stm ;
+
+ EVar : (A : Typ) -> Var A -> Exp A ;
+ EInt : Int -> Exp (TNum TInt) ;
+ EFloat : Int -> Int -> Exp (TNum TFloat) ;
+ ELt : (n : NumTyp) -> let Ex = Exp (TNum n) in Ex -> Ex -> Exp (TNum TInt) ;
+ EAdd, EMul, ESub : (n : NumTyp) -> let Ex = Exp (TNum n) in Ex -> Ex -> Ex ;
+ EAppNil : (V : Typ) -> Fun NilTyp V -> Exp V ;
+ EApp : (AS : ListTyp) -> (V : Typ) -> Fun AS V -> ListExp AS -> Exp V ;
+
+ TNum : NumTyp -> Typ ;
+ TInt, TFloat : NumTyp ;
+ NilTyp : ListTyp ;
+ ConsTyp : Typ -> ListTyp -> ListTyp ;
+ OneExp : (A : Typ) -> Exp A -> ListExp (ConsTyp A NilTyp) ;
+ ConsExp : (A : Typ) -> (AS : ListTyp) ->
+ Exp A -> ListExp AS -> ListExp (ConsTyp A AS) ;
+}
+\end{verbatim}
+\normalsize
+\newpage
+
+
+\subsection*{Appendix B: The concrete syntax of C}
+
+\small
+\begin{verbatim}
+concrete ImperC of Imper = open ResImper in {
+ flags lexer=codevars ; unlexer=code ; startcat=Program ;
+
+ lincat
+ Exp = PrecExp ;
+ Typ, NumTyp = {s,s2 : Str} ;
+ Rec = {s,s2,s3 : Str} ;
+ lin
+ Empty = ss [] ;
+ FunctNil val stm cont = ss (
+ val.s ++ cont.$0 ++ paren [] ++ "{" ++ stm.s ++ "}" ++ ";" ++ cont.s) ;
+ Funct args val rec = ss (
+ val.s ++ rec.$0 ++ paren rec.s2 ++ "{" ++ rec.s ++ "}" ++ ";" ++ rec.s3) ;
+ RecOne typ stm prg = stm ** {
+ s2 = typ.s ++ stm.$0 ;
+ s3 = prg.s
+ } ;
+ RecCons typ _ body prg = {
+ s = body.s ;
+ s2 = typ.s ++ body.$0 ++ "," ++ body.s2 ;
+ s3 = prg.s
+ } ;
+
+ Decl typ cont = continues (typ.s ++ cont.$0) cont ;
+ Assign _ x exp = continues (x.s ++ "=" ++ exp.s) ;
+ While exp loop = continue ("while" ++ paren exp.s ++ loop.s) ;
+ IfElse exp t f = continue ("if" ++ paren exp.s ++ t.s ++ "else" ++ f.s) ;
+ Block stm = continue ("{" ++ stm.s ++ "}") ;
+ Printf t e = continues ("printf" ++ paren (t.s2 ++ "," ++ e.s)) ;
+ Return _ exp = statement ("return" ++ exp.s) ;
+ Returnv = statement "return" ;
+ End = ss [] ;
+
+ EVar _ x = constant x.s ;
+ EInt n = constant n.s ;
+ EFloat a b = constant (a.s ++ "." ++ b.s) ;
+ EMul _ = infixL 3 "*" ;
+ EAdd _ = infixL 2 "+" ;
+ ESub _ = infixL 2 "-" ;
+ ELt _ = infixN 1 "<" ;
+ EAppNil val f = constant (f.s ++ paren []) ;
+ EApp args val f exps = constant (f.s ++ paren exps.s) ;
+
+ TNum t = t ;
+ TInt = {s = "int" ; s2 = "\"%d\""} ; TFloat = {s = "float" ; s2 = "\"%f\""} ;
+ NilTyp = ss [] ; ConsTyp = cc2 ;
+ OneExp _ e = e ; ConsExp _ _ e es = ss (e.s ++ "," ++ es.s) ;
+}
+\end{verbatim}
+\normalsize
+\newpage
+
+
+\subsection*{Appendix C: The concrete syntax of JVM}
+
+\small
+\begin{verbatim}
+concrete ImperJVM of Imper = open ResImper in {
+ flags lexer=codevars ; unlexer=code ; startcat=Program ;
+
+ lincat
+ Rec = {s,s2,s3 : Str} ; -- code, storage for locals, continuation
+ Stm = Instr ;
+
+ lin
+ Empty = ss [] ;
+ FunctNil val stm cont = ss (
+ ".method" ++ "public" ++ "static" ++ cont.$0 ++ paren [] ++ val.s ++ ";" ++
+ ".limit" ++ "locals" ++ stm.s2 ++ ";" ++
+ ".limit" ++ "stack" ++ "1000" ++ ";" ++
+ stm.s ++
+ ".end" ++ "method" ++ ";" ++ ";" ++
+ cont.s
+ ) ;
+ Funct args val rec = ss (
+ ".method" ++ "public" ++ "static" ++ rec.$0 ++ paren args.s ++ val.s ++ ";" ++
+ ".limit" ++ "locals" ++ rec.s2 ++ ";" ++
+ ".limit" ++ "stack" ++ "1000" ++ ";" ++
+ rec.s ++
+ ".end" ++ "method" ++ ";" ++ ";" ++
+ rec.s3
+ ) ;
+
+ RecOne typ stm prg = instrb typ.s (
+ ["alloc"] ++ typ.s ++ stm.$0 ++ stm.s2) {s = stm.s ; s2 = stm.s2 ; s3 = prg.s};
+
+ RecCons typ _ body prg = instrb typ.s (
+ ["alloc"] ++ typ.s ++ body.$0 ++ body.s2)
+ {s = body.s ; s2 = body.s2 ; s3 = prg.s};
+
+ Decl typ cont = instrb typ.s (
+ ["alloc"] ++ typ.s ++ cont.$0
+ ) cont ;
+ Assign t x exp = instrc (exp.s ++ t.s ++ "_store" ++ x.s) ;
+ While exp loop =
+ let
+ test = "TEST_" ++ loop.s2 ;
+ end = "END_" ++ loop.s2
+ in instrl (
+ "label" ++ test ++ ";" ++
+ exp.s ++
+ "ifeq" ++ end ++ ";" ++
+ loop.s ++
+ "goto" ++ test ++ ";" ++
+ "label" ++ end
+ ) ;
+ IfElse exp t f =
+ let
+ false = "FALSE_" ++ t.s2 ++ f.s2 ;
+ true = "TRUE_" ++ t.s2 ++ f.s2
+ in instrl (
+ exp.s ++
+ "ifeq" ++ false ++ ";" ++
+ t.s ++
+ "goto" ++ true ++ ";" ++
+ "label" ++ false ++ ";" ++
+ f.s ++
+ "label" ++ true
+ ) ;
+ Block stm = instrc stm.s ;
+ Printf t e = instrc (e.s ++ "invokestatic" ++ t.s ++ "runtime/printf" ++ paren (t.s) ++ "v") ;
+ Return t exp = instr (exp.s ++ t.s ++ "_return") ;
+ Returnv = instr "return" ;
+ End = ss [] ** {s2,s3 = []} ;
+
+ EVar t x = instr (t.s ++ "_load" ++ x.s) ;
+ EInt n = instr ("ldc" ++ n.s) ;
+ EFloat a b = instr ("ldc" ++ a.s ++ "." ++ b.s) ;
+ EAdd = binopt "_add" ;
+ ESub = binopt "_sub" ;
+ EMul = binopt "_mul" ;
+ ELt t = binop ("invokestatic" ++ t.s ++ "runtime/lt" ++ paren (t.s ++ t.s) ++ "i") ;
+ EAppNil val f = instr (
+ "invokestatic" ++ f.s ++ paren [] ++ val.s
+ ) ;
+ EApp args val f exps = instr (
+ exps.s ++
+ "invokestatic" ++ f.s ++ paren args.s ++ val.s
+ ) ;
+
+ TNum t = t ;
+ TInt = ss "i" ;
+ TFloat = ss "f" ;
+ NilTyp = ss [] ;
+ ConsTyp = cc2 ;
+ OneExp _ e = e ;
+ ConsExp _ _ = cc2 ;
+}
+\end{verbatim}
+\normalsize
+\newpage
+
+\subsection*{Appendix D: Auxiliary operations for concrete syntax}
+
+\small
+\begin{verbatim}
+resource ResImper = open Predef in {
+
+ -- precedence
+
+ param PAssoc = PN | PL | PR ;
+
+ oper
+ Prec : PType = Predef.Ints 4 ;
+ PrecExp : Type = {s : Str ; p : Prec ; a : PAssoc} ;
+
+ mkPrec : Prec -> PAssoc -> Str -> PrecExp = \p,a,f ->
+ {s = f ; p = p ; a = a} ;
+
+ usePrec : PrecExp -> Prec -> Str = \x,p ->
+ case <<x.p,p> : Prec * Prec> of {
+ <3,4> | <2,3> | <2,4> => paren x.s ;
+ <1,1> | <1,0> | <0,0> => x.s ;
+ <1,_> | <0,_> => paren x.s ;
+ _ => x.s
+ } ;
+
+ constant : Str -> PrecExp = mkPrec 4 PN ;
+
+ infixN : Prec -> Str -> (_,_ : PrecExp) -> PrecExp = \p,f,x,y ->
+ mkPrec p PN (usePrec x (nextPrec p) ++ f ++ usePrec y (nextPrec p)) ;
+ infixL : Prec -> Str -> (_,_ : PrecExp) -> PrecExp = \p,f,x,y ->
+ mkPrec p PL (usePrec x p ++ f ++ usePrec y (nextPrec p)) ;
+ infixR : Prec -> Str -> (_,_ : PrecExp) -> PrecExp = \p,f,x,y ->
+ mkPrec p PR (usePrec x (nextPrec p) ++ f ++ usePrec y p) ;
+
+ nextPrec : Prec -> Prec = \p -> case <p : Prec> of {
+ 4 => 4 ;
+ n => Predef.plus n 1
+ } ;
+
+ -- string operations
+
+ SS : Type = {s : Str} ;
+ ss : Str -> SS = \s -> {s = s} ;
+ cc2 : (_,_ : SS) -> SS = \x,y -> ss (x.s ++ y.s) ;
+ paren : Str -> Str = \str -> "(" ++ str ++ ")" ;
+
+ continues : Str -> SS -> SS = \s,t -> ss (s ++ ";" ++ t.s) ;
+ continue : Str -> SS -> SS = \s,t -> ss (s ++ t.s) ;
+ statement : Str -> SS = \s -> ss (s ++ ";");
+
+ -- operations for JVM
+
+ Instr : Type = {s,s2,s3 : Str} ; -- code, variables, labels
+ instr : Str -> Instr = \s ->
+ statement s ** {s2,s3 = []} ;
+ instrc : Str -> Instr -> Instr = \s,i ->
+ ss (s ++ ";" ++ i.s) ** {s2 = i.s2 ; s3 = i.s3} ;
+ instrl : Str -> Instr -> Instr = \s,i ->
+ ss (s ++ ";" ++ i.s) ** {s2 = i.s2 ; s3 = "L" ++ i.s3} ;
+ instrb : Str -> Str -> Instr -> Instr = \v,s,i ->
+ ss (s ++ ";" ++ i.s) ** {s2 = v ++ i.s2 ; s3 = i.s3} ;
+ binop : Str -> SS -> SS -> SS = \op, x, y ->
+ ss (x.s ++ y.s ++ op ++ ";") ;
+ binopt : Str -> SS -> SS -> SS -> SS = \op, t ->
+ binop (t.s ++ op) ;
+}
+\end{verbatim}
+\normalsize
+\newpage
+
+
+\subsection*{Appendix E: Translation of Symbolic JVM to Jasmin}
+
+\small
+\begin{verbatim}
+module Main where
+import Char
+import System
+
+main :: IO ()
+main = do
+ jvm:src:_ <- getArgs
+ s <- readFile jvm
+ let cls = takeWhile (/='.') src
+ let obj = cls ++ ".j"
+ writeFile obj $ boilerplate cls
+ appendFile obj $ mkJVM cls s
+ putStrLn $ "wrote file " ++ obj
+
+mkJVM :: String -> String -> String
+mkJVM cls = unlines . reverse . fst . foldl trans ([],([],0)) . lines where
+ trans (code,(env,v)) s = case words s of
+ ".method":p:s:f:ns
+ | f == "main" -> (".method public static main([Ljava/lang/String;)V":code,([],1))
+ | otherwise -> (unwords [".method",p,s, f ++ typesig ns] : code,([],0))
+ "alloc":t:x:_ -> (("; " ++ s):code, ((x,v):env, v + size t))
+ ".limit":"locals":ns -> chCode (".limit locals " ++ show (length ns))
+ "invokestatic":t:f:ns | take 8 f == "runtime/" ->
+ chCode $ "invokestatic " ++ "runtime/" ++ t ++ drop 8 f ++ typesig ns
+ "invokestatic":f:ns -> chCode $ "invokestatic " ++ cls ++ "/" ++ f ++ typesig ns
+ "alloc":ns -> chCode $ "; " ++ s
+ t:('_':instr):[";"] -> chCode $ t ++ instr
+ t:('_':instr):x:_ -> chCode $ t ++ instr ++ " " ++ look x
+ "goto":ns -> chCode $ "goto " ++ label ns
+ "ifeq":ns -> chCode $ "ifeq " ++ label ns
+ "label":ns -> chCode $ label ns ++ ":"
+ ";":[] -> chCode ""
+ _ -> chCode s
+ where
+ chCode c = (c:code,(env,v))
+ look x = maybe (error $ x ++ show env) show $ lookup x env
+ typesig = init . map toUpper . concat
+ label = init . concat
+ size t = case t of
+ "d" -> 2
+ _ -> 1
+
+boilerplate :: String -> String
+boilerplate cls = unlines [
+ ".class public " ++ cls, ".super java/lang/Object",
+ ".method public <init>()V","aload_0",
+ "invokenonvirtual java/lang/Object/<init>()V","return",
+ ".end method"]
+\end{verbatim}
+\normalsize
+\newpage
+
+
+
+
+
+
+\end{document}
+
+\begin{verbatim}
+\end{verbatim}
+