MimIR 0.1
MimIR is my Intermediate Representation
Loading...
Searching...
No Matches
Phases

At a high level, a phase is an isolated compiler transformation or analysis step. Phases are intended to do one thing at a time and to compose in a straightforward sequence. See also the Rewriting Guide, since several phase families are built directly on top of Rewriter.

Overview

A Phase is a Stage with a single entry point, run(), which wraps the actual implementation in start().

Phase

Phase is the minimal base class.

A phase has:

  • a name or annex,
  • access to the current World,
  • a run() wrapper for logging and verification,
  • a todo() accessor backed by the internal todo_ flag for fixed-point iteration.
Note
A phase sets this flag via invalidate() when it discovers that another round is required.
  • PhaseMan uses this to drive fixed-point pipelines.
  • RWPhase uses this to drive its optional pre-analysis to a fixed point.

Typical Shape

A custom phase usually derives from Phase and implements start():

class MyPhase : public mim::Phase {
public:
MyPhase(World& world)
: Phase(world, "my_phase") {}
private:
void start() override {
// do work here
}
};
Unlike a Pass, a Phase performs one self-contained task and does not interleave with other phases.
Definition phase.h:27
virtual void start()=0
Actual entry.

Run it with:

virtual void run()
Entry point and generates some debug output; invokes Phase::start.
Definition phase.cpp:13

Analysis

Analysis is the base class for phases that inspect the current world using the Rewriter traversal machinery. It inherits from both Phase and Rewriter, but unlike RWPhase, it rewrites into the same world. In practice, this means Rewriter is used as a structured, graph-aware traversal over ordinary MimIR Defs.

Note
An Analysis based on Rewriter has an abstract domain of ordinary Defs.

A central feature of Analysis is its internal lattice(), which stores abstract information for old-world Defs as a Def2Def mapping.

This is often convenient because analysis information can itself be represented as ordinary MimIR Defs. As a result, existing IR machinery applies automatically, including:

  • hash-consing / canonical sharing,
  • built-in normalizations,
  • and other simplifications already provided by the World.

So if your abstract domain fits naturally into MimIR, you can often encode it directly as Defs and store it in the analysis lattice.

An Analysis visits:

  1. all registered annex roots, then
  2. all external mutables.

During the first part, mim::Analysis::is_bootstrapping() is true. During the second part, it is false.

Typical usage:

  • override rewrite(), rewrite_imm(), rewrite_mut(), or node-specific rewrite hooks,
  • compute abstract information while traversing reachable IR,
  • store that information in lattice() and/or in side tables,
  • call invalidate() if new information was discovered and another iteration is required.

Handling of Mutables

Unlike RWPhase, an Analysis must traverse the entire reachable program without rebuilding it. For this reason, Analysis overrides rewrite_mut() to preserve mutables and their binders instead of rewriting them.

When a mutable is visited, the analysis immediately maps it to itself (mut -> mut), and likewise maps its associated variable(s) to themselves (var -> var). This keeps the mutable structure intact as a stable "skeleton program" and prevents the analysis from accidentally introducing new mutables or fresh variables.

After establishing these identity mappings, the analysis recursively traverses the mutable’s dependencies via rewrite(). This uses the Rewriter machinery purely as a structured, graph-aware traversal, while all computed information is stored separately in Analysis::lattice().

A common convention is to encode top as def -> def in the lattice: mapping a definition to itself means "no useful information, keep as-is", while mapping it to a different Def represents a discovered abstract value.

Reset Between Iterations

If an analysis participates in a fixed-point loop, it should be ready to run multiple times. The base reset() clears the rewriter state and resets the internal fixed-point state for the next round.

RWPhase

RWPhase is the base class for phases that rebuild the current world into a new one, thereby eliminating garbage. This is the standard base class for optimization phases that structurally transform IR.

It inherits from both Phase and Rewriter, but here the two worlds differ:

Note
To avoid confusion, direct world() access is deleted. Use:

Cleanup

Cleanup is simply an RWPhase with no custom rewrites. Because an RWPhase reconstructs only what is reachable from the world roots, rebuilding automatically eliminates dead and unreachable code.

Execution Model

An RWPhase runs in three conceptual steps:

  1. optionally perform a fixed-point analysis on the old world,
  2. rewrite reachable old Defs into the new world:
    1. rewrite annex roots,
    2. rewrite external mutables;
  3. swap the old and new worlds.

After the swap, the rewritten world becomes the current one.

Optional Pre-Analysis

An RWPhase may be given an associated Analysis. If so, analyze() runs that analysis to a fixed point before rewriting begins.

This is a common pattern:

  • the analysis computes facts on the old world,
  • those facts are stored in Analysis::lattice() and/or auxiliary side tables,
  • the rewrite queries them through RWPhase::lattice() and produces the new world.

If no analysis is needed, analyze() can simply return false.

Analysis Results

An RWPhase may be associated with an Analysis. If so, the rewrite can query the analysis result through RWPhase::lattice().

This provides read access to the analysis lattice for old-world Defs: given an old definition, RWPhase::lattice() returns the abstract value computed by the associated Analysis, or nullptr if no value is available.

This is the standard way to communicate fixed-point analysis results into the subsequent rewrite.

Bootstrapping

Like Analysis, RWPhase processes annex roots before externals.

While annexes are being rewritten, mim::RWPhase::is_bootstrapping() is true.

This matters because annexes may depend on one another. During bootstrapping, rewrites that refer to other annexes may need to be deferred or skipped, since those annexes might not yet exist in the new world.

Typical Shape

class MyRWPhase : public mim::RWPhase {
public:
MyRWPhase(World& world)
: RWPhase(world, "my_rw_phase") {}
private:
const Def* rewrite_imm_App(const App* app) override {
// customize rebuilding here
return Rewriter::rewrite_imm_App(app);
}
};
Rebuilds old_world() into new_world() and then swaps them.
Definition phase.h:142

Run it with:

PhaseMan

PhaseMan organizes several phases into a pipeline.

It can run them:

  • once, in sequence, or
  • repeatedly to a fixed point.

A fixed-point PhaseMan reruns the whole pipeline as long as at least one phase invalidates.

Between iterations, each phase is recreated from its original configuration. This keeps phase-local state from leaking across rounds unless the phase explicitly recomputes it.

Note
PhaseMan is the orchestration layer for classical phase pipelines.

Typical Shape

auto phases = mim::Phases();
phases.emplace_back(std::make_unique<PhaseA>(world));
phases.emplace_back(std::make_unique<PhaseB>(world));
man.apply(/*fixed_point=*/true, std::move(phases));
man.run();
Organizes several Phases into a pipeline.
Definition phase.h:309
std::deque< std::unique_ptr< Phase > > Phases
Definition phase.h:22

Use a fixed-point pipeline when phases expose new optimization opportunities for one another.

ClosedMutPhase

ClosedMutPhase is a traversal helper for phases that visit all reachable, closed mutables in the world.

A mutable is relevant here if it is:

  • reachable,
  • closed, i.e. it has no free variables,
  • optionally non-empty, depending on elide_empty.

This is useful for local analyses or transformations naturally phrased as:

Note
For every reachable closed mutable, inspect or process it.

You override visit(M*), where M defaults to Def but may be restricted to a particular mutable subtype.

Typical Shape

class MyClosedPhase : public mim::ClosedMutPhase<Lam> {
public:
MyClosedPhase(World& world)
: ClosedMutPhase(world, "my_closed_phase", /*elide_empty=*/true) {}
private:
void visit(Lam* lam) override {
// process each reachable closed Lam
}
};
Transitively visits all reachable, closed mutables in the World.
Definition phase.h:340
virtual void visit(M *)=0

NestPhase

NestPhase builds on ClosedMutPhase and computes a Nest for each visited mutable.

Use it when your phase needs a structured view of nested control or binding structure rather than just the raw mutable.

Instead of overriding visit(M*), override:

visit(const Nest&)

This is convenient for analyses that reason about nesting, dominance-like structure, or hierarchical regions.

Example: SCCP

#pragma once
#include <mim/phase.h>
namespace mim {
/// SCCP - Sparse Conditional Constant Propagation - but propagates **arbitrary expressions**.
/// @see [Constant propagation with conditional branches](https://dl.acm.org/doi/pdf/10.1145/103135.103136)
///
/// Lattice per Lam::var:
/// ```
/// ⊤ ← Keep as is
/// |
/// Expr ← Whole expression is propagated (vertically) through var
/// |
/// ⊥
/// ```
class SCCP : public RWPhase {
private:
class Analysis : public mim::Analysis {
public:
Analysis(World& world)
: mim::Analysis(world, "SCCP::Analyzer") {}
private:
const Def* propagate(const Def* var, const Def* def);
const Def* rewrite_imm_App(const App* app) final;
};
public:
SCCP(World& world)
: RWPhase(world, "SCCP", &analysis_)
, analysis_(world) {}
private:
const Def* rewrite_imm_App(const App* old_app) final;
Analysis analysis_;
Lam2Lam lam2lam_;
};
} // namespace mim
Traverses the current World using Rewriter infrastructure while staying in the same world.
Definition phase.h:82
Definition ast.h:14
LamMap< Lam * > Lam2Lam
Definition lam.h:221

The provided Sparse Conditional Constant Propagation (SCCP) implementation is a good example of the intended phase structure. Its architecture is:

  • an inner Analysis computes propagation facts on the old world,
  • an outer RWPhase uses those facts to rebuild a simplified new world.
Note
The implementation propagates not only constants but also arbitrary expressions.

Analysis

#include "sccp.h"
namespace mim {
const Def* SCCP::Analysis::propagate(const Def* var, const Def* def) {
auto [i, ins] = lattice_.emplace(var, def);
if (ins) {
invalidate();
DLOG("propagate: {} → {}", var, def);
return def;
}
auto cur = i->second;
if (!cur || def->isa<Bot>() || cur == def || cur == var) return cur;
invalidate();
if (cur->isa<Bot>()) return i->second = def;
return i->second = var; // top
}
const Def* SCCP::Analysis::rewrite_imm_App(const App* app) {
if (auto lam = app->callee()->isa_mut<Lam>(); isa_optimizable(lam)) {
auto n = app->num_targs();
auto abstr_args = absl::FixedArray<const Def*>(n);
auto abstr_vars = absl::FixedArray<const Def*>(n);
// propagate
for (size_t i = 0; i != n; ++i) {
auto abstr = rewrite(app->targ(i));
abstr_vars[i] = propagate(lam->tvar(i), abstr);
abstr_args[i] = abstr;
}
// set new abstract var
auto abstr_var = world().tuple(abstr_vars);
map(lam->var(), abstr_var);
lattice_[lam->var()] = abstr_var;
if (!lookup(lam))
for (auto d : lam->deps())
rewrite(d);
return world().app(lam, abstr_args);
}
return mim::Analysis::rewrite_imm_App(app);
}
} // namespace mim
Base class for all Defs.
Definition def.h:252
A function.
Definition lam.h:110
#define DLOG(...)
Vaporizes to nothingness in Debug build.
Definition log.h:95
auto lookup(const C &container, const K &key)
Yields pointer to element (or the element itself if it is already a pointer), if found and nullptr ot...
Definition util.h:100
Lam * isa_optimizable(Lam *lam)
These are Lams that are.
Definition lam.h:352
TExt< false > Bot
Definition lattice.h:171

The SCCP analysis associates each lambda variable with a lattice value:

  • bottom: no useful information yet,
  • a concrete expression: this value can be propagated,
  • top: keep the variable as-is (a Def maps to itself).

In the implementation, this lattice is stored in Analysis::lattice() as a Def2Def map. A nice aspect here is that the propagated value is itself a regular Def. This illustrates the benefit of building analysis on top of Rewriter: the abstract domain can live directly inside MimIR, so canonicalization and normalization come for free.

The analysis traverses the old world and updates the lattice when it sees applications of optimizable lambdas. If new information is discovered, it invalidates, causing the analysis to rerun until stable. This is a textbook use of Analysis:

Transformation

#include "sccp.h"
namespace mim {
const Def* SCCP::rewrite_imm_App(const App* old_app) {
if (auto old_lam = old_app->callee()->isa_mut<Lam>()) {
if (auto l = lattice(old_lam->var()); l && l != old_lam->var()) {
invalidate();
size_t num_old = old_lam->num_tvars();
Lam* new_lam;
if (auto i = lam2lam_.find(old_lam); i != lam2lam_.end())
new_lam = i->second;
else {
// build new dom
auto new_doms = DefVec();
for (size_t i = 0; i != num_old; ++i) {
auto old_var = old_lam->var(num_old, i);
auto abstr = lattice(old_var);
if (old_var == abstr) new_doms.emplace_back(rewrite(old_lam->dom(num_old, i)));
}
// build new lam
size_t num_new = new_doms.size();
auto new_vars = absl::FixedArray<const Def*>(num_old);
new_lam = new_world().mut_lam(new_doms, rewrite(old_lam->codom()))->set(old_lam->dbg());
lam2lam_[old_lam] = new_lam;
// build new var
for (size_t i = 0, j = 0; i != num_old; ++i) {
auto old_var = old_lam->var(num_old, i);
auto abstr = lattice(old_var);
if (old_var == abstr) {
auto v = new_lam->var(num_new, j++);
new_vars[i] = v;
} else {
new_vars[i] = rewrite(abstr); // SCCP propagate
}
}
map(old_lam->var(), new_vars);
new_lam->set(rewrite(old_lam->filter()), rewrite(old_lam->body()));
}
// build new app
size_t num_new = new_lam->num_vars();
auto new_args = absl::FixedArray<const Def*>(num_new);
for (size_t i = 0, j = 0; i != num_old; ++i) {
auto old_var = old_lam->var(num_old, i);
auto abstr = lattice(old_var);
if (old_var == abstr) new_args[j++] = rewrite(old_app->targ(i));
}
return map(old_app, new_world().app(new_lam, new_args));
}
}
return Rewriter::rewrite_imm_App(old_app);
}
} // namespace mim
Vector< const Def * > DefVec
Definition def.h:78
@ Lam
Definition def.h:115
@ App
Definition def.h:115

Once the lattice is stable, the outer SCCP phase starts rewriting. During rewriting, it can query abstract values for old-world definitions through RWPhase::lattice().

When it sees an application of a lambda whose parameters have propagated values, it rebuilds a specialized lambda:

  • parameters with known propagated expressions are removed,
  • remaining parameters are kept,
  • the lambda body is rewritten with the propagated values substituted,
  • the call site is rebuilt with only the remaining arguments.

So SCCP follows the standard RWPhase pattern:

  1. analyze the old world,
  2. rewrite into a new world using the computed facts,
  3. swap worlds.

Discussion

Separating SCCP into analysis and rewrite keeps both parts simple:

  • the analysis never mutates or partially rewrites the program,
  • the rewrite does not need to discover facts on the fly,
  • fixed-point logic stays in the analysis stage where it belongs,
  • the handoff from analysis to rewrite is explicit through Analysis::lattice() and RWPhase::lattice().

This separation is the main design pattern to follow for nontrivial optimizations.

Note
The complete SCCP example fits into roughly 150 lines of C++ source code. Most of the usual compiler boilerplate is absorbed by the existing Analysis, RWPhase, and Rewriter infrastructure, so the implementation can focus on the optimization itself.

Choosing the Right Base Class

A useful rule of thumb is:

  • derive from Phase if you just need a custom one-off action,
  • derive from Analysis if you want a graph-aware traversal that computes facts on the current world,
  • derive from RWPhase if you want to rebuild the world into a transformed new one, optionally consuming facts from an associated Analysis,
  • derive from ClosedMutPhase if you want to visit all reachable closed mutables,
  • derive from NestPhase if that visit should come with a computed Nest.

Recommended Design Pattern

For most optimization phases, the preferred structure is:

  1. write an Analysis that computes facts to a fixed point,
  2. store those facts in Analysis::lattice() and/or auxiliary tables,
  3. write an RWPhase that consumes those facts while rebuilding the world.

This keeps analyses and transformations cleanly separated and fits naturally with MimIR’s rewriting-based infrastructure.

Minimal Examples

A simple whole-world rewrite phase:

class Simplify : public mim::RWPhase {
public:
Simplify(mim::World& world)
: RWPhase(world, "simplify") {}
private:
const mim::Def* rewrite_imm_App(const mim::App* app) override {
// rewrite or simplify selected applications
// fallback:
return Rewriter::rewrite_imm_App(app);
}
};
The World represents the whole program and manages creation of MimIR nodes (Defs).
Definition world.h:34

A simple analysis phase:

class CountMutLams : public mim::Analysis {
public:
CountMutLams(mim::World& world)
: Analysis(world, "count_lams") {}
size_t num_lams = 0;
private:
const mim::Def* rewrite_mut_Lam(mim::Lam* lam) override {
++num_lams;
return mim::Analysis::rewrite_mut_Lam(lam);
}
};

Using both:

CountMutLams analysis(world);
analysis.run();

Compilation Pipelines in M⁠im

You can also expose your custom phases as axioms in Mim via the compile plugin and build your own compilation pipeline. Mim's default compilation pipeline is defined in the opt plugin.

Summary

Phases are MimIR’s main unit of compiler work.

  • Phase is the minimal base abstraction.
  • Analysis is for graph-aware fact collection on the current world and provides a reusable lattice() for abstract values.
  • RWPhase is for rewriting the current world into a transformed new one and can read analysis results through RWPhase::lattice().
  • PhaseMan sequences phases, optionally to a fixed point.
  • ClosedMutPhase and NestPhase are traversal helpers for common whole-world inspections.

The key design idea is that MimIR phases are built around structured traversal and rewriting. For substantial optimizations, the usual pattern is: