Author: Li, Junrui
Project repo and intro: https://github.com/msrtea7/project_hyggec_full
Given a typechecked AST node e
, the partial evaluation process attempts to:
e
into e′
using the interpreter’s reduction functione′
e
cannot be reduced:
e
is a simple value, return e
e′
To ensure correct semantics, the optimization will use a specialized runtime environment where:
I implement the partial evaluation optimization in a separate file PartialEval.fs
, leveraging the existing reduce()
function from interpreter.fs
.
To meet the requirement that the user could choose whether to use this optimization or not, I set an optimization level 2 which could be enabled with argument -O 2
.
It is notable here that, in my current mechanism, the optimization level 2 would not trigger the level 1 which is the predefined peephole optimization.
The core implementation is in the tryConstantFold()
function:
and tryConstantFold (node: AST.Node<'E,'T>) : Option<AST.Node<'E,'T>> =
if Interpreter.isValue node then None
else
match Interpreter.reduce env node with
| Some(_, reduced) ->
match tryConstantFold reduced with
| Some furtherReduced -> Some furtherReduced
| None -> Some reduced
| None -> None
This function recursively applies the interpreter’s reduce
method, which already handles arithmetic operations like addition and multiplication.
The optimization also handles specific expression patterns through direct pattern matching on the Expr
field:
match (lhs'.Expr, rhs'.Expr) with
| (IntVal(v1), IntVal(v2)) -> {node with Expr = IntVal(v1 + v2)}
| (FloatVal(v1), FloatVal(v2)) -> {node with Expr = FloatVal(v1 + v2)}
| _ -> {node with Expr = Add(lhs', rhs')}
For constant propagation, the optimizer maintains a mapping of variable names to their constant values using the ConstEnv
type:
type ConstEnv<'E,'T> = Map<string, Node<'E,'T>>
This map is passed through recursive calls to optimizeNode()
and updated when processing variable bindings. When evaluating variable references, the optimizer looks up the name in the environment:
| Var(name) when constEnv.ContainsKey(name) ->
constEnv.[name]
For let
bindings, the optimizer updates the environment when a variable is bound to a constant:
| Let(name, init, body) ->
let init' = optimizeNode constEnv init
let bodyEnv =
if isConstant init' then
constEnv.Add(name, init')
else
constEnv
let body' = optimizeNode bodyEnv body
The isConstant
helper function identifies AST nodes that represent constant values:
let isConstant (node: Node<'E,'T>) : bool =
match node.Expr with
| UnitVal | BoolVal(_) | IntVal(_) | FloatVal(_) | StringVal(_) -> true
| _ -> false
The optimizer examines the condition after optimization and, if it’s a constant, eliminates the unreachable branch:
| If(cond, ifTrue, ifFalse) ->
let cond' = optimizeNode constEnv cond
match cond'.Expr with
| BoolVal(true) -> optimizeNode constEnv ifTrue
| BoolVal(false) -> optimizeNode constEnv ifFalse
| _ ->
let ifTrue' = optimizeNode constEnv ifTrue
let ifFalse' = optimizeNode constEnv ifFalse
{node with Expr = If(cond', ifTrue', ifFalse')}
The optimizer also simplifies sequence expressions (Seq
) by removing non-terminal effects that have no impact on the final result:
| Seq(nodes) ->
let nodes' = List.map (optimizeNode constEnv) nodes
match nodes' with
| [] -> {node with Expr = UnitVal}
| [last] when isConstant last -> last
| _ -> {node with Expr = Seq(nodes')}
Function inlining is implemented using pattern matching to detect these cases:
| Application(expr, args) ->
let expr' = optimizeNode constEnv expr
let args' = List.map (optimizeNode constEnv) args
match expr'.Expr with
| Lambda(lamArgs, body) when List.forall isConstant args' &&
args'.Length = lamArgs.Length ->
let (lamArgNames, _) = List.unzip lamArgs
let lamArgNamesValues = List.zip lamArgNames args'
let folder acc (var, sub) = (ASTUtil.subst acc var sub)
let inlinedBody = List.fold folder body lamArgNamesValues
optimizeNode constEnv inlinedBody
| _ ->
{node with Expr = Application(expr', args')}
The inlining process extracts parameter names from lamArgs
, pairs them with argument values in lamArgNamesValues
, and uses a folder()
function with ASTUtil.subst
to perform the substitution. The List.fold
operation sequentially applies these substitutions, building the inlined function body.
To prevent reducing expressions with side effects (such as I/O operations), the implementation creates a special runtime environment where I/O operations cannot be performed:
let env = {
...
Interpreter.Reader = None
Interpreter.Printer = None
...
}
By setting Reader
and Printer
to None
, expressions like print("Hello")
will not be reduced and will be preserved in the optimized code.
The optimizer must handle AST nodes containing Pointer
expressions, which are used by the interpreter but cannot be directly compiled. The containsPointer
function recursively checks for pointer references:
let rec containsPointer (node: Node<'E,'T>) : bool =
match node.Expr with
| Pointer(_) -> true
| Add(lhs, rhs) -> containsPointer lhs || containsPointer rhs
// Other expression types...
| _ -> false
For expressions that might create pointers, the optimizer tracks all reduction steps in a history list:
let history = reduceSteps node [node]
The optimizer then selects the most recent reduction that doesn’t contain pointers:
match history |> List.tryFind (fun n -> not (containsPointer n)) with
| Some node -> Some node // Use this safe reduction
| None -> None // No pointer-free reduction found
Example command: ./hyggec rars -v -O 2 examples/helloworld.hyg
Here the Table 2 compares the number of RISC-V assembly instructions before and after applying the partial evaluation optimization. The test cases mentioned are at folder /project_dir/examples/
Filename | before | after |
---|---|---|
constant_propagation.hyg | 43 | 31 |
constant_folding_1.hyg | 31 | 23 |
constant_folding_2.hyg | 43 | 25 |
deadcode.hyg | 32 | 21 |
deadcode_2.hyg | 53 | 22 |
inlining.hyg | 100 | 21 |
pointer.hyg | 483 | 475 |
Table: Comparison of instruction counts before and after optimization
For the first 6 files, the optimization works well and eliminates the number of instructions to different extents since the complexity of the source code varies.
pointer.hyg
is an example from test files of “structures”, involving a lot of pointer expressions. The optimizer works well, it keeps most of the instructions the same.