Pipeline Overview
Lattice source code passes through a six-stage pipeline before execution. The compiler produces bytecode for one of two virtual machines: a stack-based VM (default) or a register-based VM.
The register VM uses a parallel path:
--reg-vm to select the register VM, or --tree-walk for the legacy AST interpreter.
Lexer & Parser
The lexer (src/lexer.c) converts source text into a flat array of tokens. Each token carries a type tag, source position, and string slice. The parser (src/parser.c) consumes tokens via recursive descent, producing an AST of Expr nodes defined in include/expr.h.
Token Categories
| Category | Examples |
|---|---|
| Literals | TOKEN_INT, TOKEN_FLOAT, TOKEN_STRING, TOKEN_TRUE, TOKEN_FALSE, TOKEN_NIL |
| Keywords | TOKEN_FN, TOKEN_FLUX, TOKEN_FIX, TOKEN_LET, TOKEN_IF, TOKEN_WHILE, TOKEN_FOR, TOKEN_MATCH, TOKEN_STRUCT, TOKEN_ENUM |
| Operators | TOKEN_PLUS, TOKEN_MINUS, TOKEN_STAR, TOKEN_SLASH, TOKEN_PIPE, TOKEN_ARROW, TOKEN_FAT_ARROW |
| Phase | TOKEN_FREEZE, TOKEN_THAW, TOKEN_CLONE, TOKEN_SCOPE, TOKEN_SPAWN, TOKEN_SELECT |
| Delimiters | TOKEN_LPAREN, TOKEN_LBRACE, TOKEN_LBRACKET, TOKEN_DOT, TOKEN_COMMA, TOKEN_SEMICOLON |
AST Node Types
Every AST node is an Expr tagged with one of the ExprType variants. Statements and expressions share the same node type—a statement is simply an expression evaluated for side effects.
// Key expression types (from expr.h)
EXPR_INT, EXPR_FLOAT, EXPR_STRING, EXPR_BOOL,
EXPR_IDENT, EXPR_BINARY, EXPR_UNARY, EXPR_CALL,
EXPR_FN, EXPR_IF, EXPR_WHILE, EXPR_FOR,
EXPR_MATCH, EXPR_STRUCT_DEF, EXPR_ENUM_DEF,
EXPR_BLOCK, EXPR_ASSIGN, EXPR_INDEX,
EXPR_SCOPE, EXPR_SPAWN, EXPR_SELECT,
EXPR_TRY_CATCH, EXPR_THROW, EXPR_DEFERStack VM Bytecode
The stack VM uses variable-length instructions. Each opcode is a single byte, followed by zero or more operand bytes. Operands reference the constants pool, local variable slots, or jump offsets. The compiler emits bytecode into a Chunk structure.
Chunk Structure
typedef struct {
uint8_t *code; /* bytecode stream */
size_t code_len, code_cap;
LatValue *constants; /* constants pool */
size_t *const_hashes; /* pre-computed FNV-1a hashes */
size_t const_len, const_cap;
int *lines; /* source line per byte */
char **local_names; /* slot index → variable name */
char *name; /* function name (NULL for top-level) */
LatValue *default_values; /* default parameter values */
uint8_t *param_phases; /* per-param phase constraint */
PICTable pic; /* polymorphic inline cache */
} Chunk;Opcode Categories
| Range | Category | Key Opcodes |
|---|---|---|
| 0–7 | Stack | OP_CONSTANT, OP_NIL, OP_TRUE, OP_FALSE, OP_UNIT, OP_POP, OP_DUP, OP_SWAP |
| 8–22 | Arithmetic | OP_ADD, OP_SUB, OP_MUL, OP_DIV, OP_MOD, OP_NEG, OP_NOT |
| 23–30 | Bitwise | OP_BIT_AND, OP_BIT_OR, OP_BIT_XOR, OP_LSHIFT, OP_RSHIFT |
| 31–38 | Comparison | OP_EQ, OP_NEQ, OP_LT, OP_GT, OP_LTEQ, OP_GTEQ |
| 39–51 | Variables | OP_GET_LOCAL, OP_SET_LOCAL, OP_GET_GLOBAL, OP_SET_GLOBAL, OP_GET_UPVALUE, OP_SET_UPVALUE, OP_CLOSE_UPVALUE |
| 52–58 | Control Flow | OP_JUMP, OP_JUMP_IF_FALSE, OP_JUMP_IF_TRUE, OP_LOOP |
| 59–63 | Functions | OP_CALL, OP_CLOSURE, OP_RETURN |
| 64–67 | Iterators | OP_ITER_INIT, OP_ITER_NEXT |
| 68–84 | Data Structures | OP_BUILD_ARRAY, OP_BUILD_MAP, OP_BUILD_STRUCT, OP_INDEX, OP_SET_INDEX, OP_GET_FIELD, OP_SET_FIELD, OP_INVOKE |
| 85–94 | Exceptions/Defer | OP_PUSH_EXCEPTION_HANDLER, OP_POP_EXCEPTION_HANDLER, OP_THROW, OP_DEFER_PUSH, OP_DEFER_RUN |
| 95–112 | Phase System | OP_FREEZE, OP_THAW, OP_CLONE, OP_MARK_FLUID, OP_REACT, OP_BOND, OP_SEED, OP_SUBLIMATE |
| 113–122 | Builtins/Concurrency | OP_PRINT, OP_IMPORT, OP_SCOPE, OP_SELECT |
| 123–132 | Integer Specials | OP_INC_LOCAL, OP_DEC_LOCAL, OP_ADD_INT, OP_LT_INT, OP_LOAD_INT8 |
| 133–141 | Wide Constants | OP_CONSTANT_16, OP_GET_GLOBAL_16, OP_SET_GLOBAL_16, OP_CLOSURE_16 |
| 142–143 | Ephemeral | OP_RESET_EPHEMERAL |
| 144–157 | Combined/Type | OP_SET_LOCAL_POP, OP_CHECK_TYPE, OP_IS_CRYSTAL, OP_IS_FLUID, OP_FREEZE_FIELD |
| 158–160 | Halt | OP_HALT |
Wide Constant Opcodes
When a chunk contains more than 256 constants, the compiler emits wide opcodes that use 2-byte big-endian indices. This supports up to 65,536 constants per chunk.
// Standard: 1-byte index (0-255)
OP_CONSTANT [idx8]
// Wide: 2-byte big-endian index (0-65535)
OP_CONSTANT_16 [hi8] [lo8]
OP_GET_GLOBAL_16 [hi8] [lo8]
OP_SET_GLOBAL_16 [hi8] [lo8]
OP_CLOSURE_16 [hi8] [lo8]Variable-Length Instructions
Concurrency opcodes use variable-length encodings to embed sub-chunk references:
// OP_SCOPE layout
OP_SCOPE [spawn_count] [sync_chunk_idx] [spawn_chunk_idx]...
// OP_SELECT layout
OP_SELECT [arm_count] [flags, chan_idx, body_idx, binding_idx] × arm_count
// Sub-chunks are stored as VAL_CLOSURE constants
// (body==NULL, native_fn points to sub-Chunk*)Register VM Bytecode
The register VM uses fixed-width 32-bit instructions. Each instruction packs an opcode and up to three register operands into a single 32-bit word, enabling efficient decoding and cache-friendly dispatch.
Instruction Formats
// ABC format: 3 register operands
[opcode:8] [A:8] [B:8] [C:8]
// ABx format: register + 16-bit unsigned operand
[opcode:8] [A:8] [Bx:16]
// AsBx format: register + 16-bit signed operand (for jumps)
[opcode:8] [A:8] [sBx:16] // sBx = Bx - 32767RegChunk Structure
typedef struct RegChunk {
uint32_t magic; /* REGCHUNK_MAGIC = 0x524C4154 ("RLAT") */
RegInstr *code; /* 32-bit instruction array */
size_t code_len, code_cap;
LatValue *constants; /* constant pool (up to 65536) */
size_t const_len, const_cap;
int *lines; /* source line per instruction */
char **local_names; /* register → variable name */
char *name; /* function name */
uint8_t max_reg; /* high-water register count */
PICTable pic; /* polymorphic inline cache */
} RegChunk;Opcode Summary
| Range | Category | Key Opcodes |
|---|---|---|
| 0–6 | Data Movement | ROP_MOVE, ROP_LOADK, ROP_LOADI, ROP_LOADNIL, ROP_LOADTRUE, ROP_LOADFALSE |
| 7–14 | Arithmetic | ROP_ADD, ROP_SUB, ROP_MUL, ROP_DIV, ROP_MOD, ROP_NEG, ROP_ADDI, ROP_CONCAT |
| 15–20 | Comparison | ROP_EQ, ROP_NEQ, ROP_LT, ROP_LTEQ, ROP_NOT |
| 21–23 | Control Flow | ROP_JMP, ROP_JMPFALSE, ROP_JMPTRUE |
| 24–34 | Variables | ROP_GETGLOBAL, ROP_SETGLOBAL, ROP_GETFIELD, ROP_SETFIELD, ROP_GETUPVALUE, ROP_SETUPVALUE |
| 35–37 | Functions | ROP_CALL, ROP_RETURN, ROP_CLOSURE |
| 38–56 | Data Structures | ROP_NEWARRAY, ROP_NEWSTRUCT, ROP_NEWTUPLE, ROP_NEWENUM, ROP_BUILDRANGE |
| 58–63 | Exceptions/Defer | ROP_PUSH_HANDLER, ROP_POP_HANDLER, ROP_THROW, ROP_DEFER_PUSH |
| 65–76 | Phase/Concurrency | ROP_FREEZE, ROP_THAW, ROP_SCOPE, ROP_SELECT, ROP_IMPORT |
| 78–93 | Optimization | ROP_ADD_INT, ROP_INC_REG, ROP_DEC_REG, ROP_INVOKE_GLOBAL, ROP_CHECK_TYPE |
| 94 | Halt | ROP_HALT |
Stack VM Execution
The stack VM (src/vm.c) maintains a 4096-value operand stack and up to 256 call frames. Dispatch uses a switch statement on opcode values, with an optional computed-goto backend selected at compile time via USE_COMPUTED_GOTO.
Core Parameters
| Parameter | Value | Description |
|---|---|---|
STACK_MAX | 4096 | Maximum operand stack depth |
FRAMES_MAX | 256 | Maximum nested call frames |
HANDLER_MAX | 64 | Exception handler stack depth |
DEFER_MAX | 256 | Deferred operation stack depth |
Call Frames & Upvalues
Each call frame stores the return address (instruction pointer), a base pointer into the value stack, and the executing chunk. Closures capture variables via ObjUpvalue objects that form a linked list. Open upvalues point directly into the stack; when a variable goes out of scope, the upvalue is “closed” by copying the value into the upvalue itself.
// Closure representation in the VM
// For compiled closures:
// body == NULL
// native_fn == Chunk* (points to the function's chunk)
// captured_env == ObjUpvalue** (upvalue array)
// region_id == upvalue count
// For C native functions:
// body == VM_NATIVE_MARKER ((Expr**)0x1)
// native_fn == VMNativeFn function pointerDispatch Modes
Two dispatch strategies are available:
Standard C switch statement. Portable across all compilers. Each iteration fetches the next opcode and branches through the switch.
GCC/Clang extension using &&label addresses and goto *dispatch_table[opcode]. Eliminates the central branch, improving branch predictor performance. Enabled with -DUSE_COMPUTED_GOTO.
Polymorphic Inline Cache (PIC)
Method dispatch uses a per-chunk PIC to avoid repeated hash-table lookups. Each call site has a 4-entry cache mapping (type_tag, method_hash) to a handler ID. Cache hits skip the full method resolution path.
typedef struct {
uint8_t type_tag; /* ValueType of the receiver */
uint32_t method_hash; /* djb2 hash of method name */
uint16_t handler_id; /* cached handler (0 = miss) */
} PICEntry;
typedef struct {
PICEntry entries[4]; /* PIC_SIZE = 4 max entries */
uint8_t count;
} PICSlot;
// 64 direct-mapped slots per chunk (PIC_DIRECT_SLOTS)
// Slot = bytecode_offset & 0x3FPIC_NOT_BUILTIN) triggers struct/impl lookup.
Register VM Execution
The register VM (src/regvm.c) uses 256-register windows per call frame. Instead of pushing/popping an operand stack, instructions read and write directly to numbered registers, reducing memory traffic.
Core Parameters
| Parameter | Value | Description |
|---|---|---|
REGVM_REG_MAX | 256 | Registers per frame |
REGVM_FRAMES_MAX | 64 | Maximum call frame depth |
REGVM_CONST_MAX | 65536 | Constants per chunk |
REGVM_HANDLER_MAX | 64 | Exception handler depth |
REGVM_DEFER_MAX | 256 | Deferred operations |
Register Window & Call Convention
The register file is a flat array of REGVM_REG_MAX × REGVM_FRAMES_MAX values. Each call frame has a base pointer (regs) into this array. On function call, the caller places arguments starting at the callee’s register 1 (register 0 is reserved for internal use). The callee writes its return value to caller_result_reg in the caller’s frame.
typedef struct {
RegChunk *chunk;
RegInstr *ip; /* instruction pointer */
LatValue *regs; /* base of register window */
size_t reg_count;
struct ObjUpvalue **upvalues;
size_t upvalue_count;
uint8_t caller_result_reg; /* where to write return value */
} RegCallFrame;Instruction Decoding
Each 32-bit instruction is decoded by extracting fields with bitwise operations:
RegInstr instr = *ip++;
uint8_t op = instr & 0xFF; // opcode
uint8_t A = (instr >> 8) & 0xFF; // register A
uint8_t B = (instr >> 16) & 0xFF; // register B
uint8_t C = (instr >> 24) & 0xFF; // register C
uint16_t Bx = (instr >> 16) & 0xFFFF; // 16-bit operandMemory Management
Lattice uses a region-based memory system with four distinct allocation strategies, identified by region_id tags on every value.
Region Tags
| Constant | Value | Description |
|---|---|---|
REGION_NONE | (size_t)-1 | Standard malloc/free allocation |
REGION_EPHEMERAL | (size_t)-2 | Ephemeral bump arena (fast, bulk-reset) |
REGION_INTERNED | (size_t)-3 | Interned string (never freed) |
REGION_CONST | (size_t)-4 | Constant pool string (freed with chunk) |
Ephemeral Bump Arena
The ephemeral arena is a fast bump allocator used for temporary string allocations (concatenation, interpolation). It allocates linearly from 4096-byte pages and is bulk-reset at statement boundaries via OP_RESET_EPHEMERAL.
typedef struct ArenaPage {
uint8_t *data;
size_t used;
size_t cap; /* ARENA_PAGE_SIZE = 4096 */
struct ArenaPage *next;
} ArenaPage;
typedef struct BumpArena {
ArenaPage *pages; /* current page */
ArenaPage *first_page; /* head of chain (kept across resets) */
size_t total_bytes;
} BumpArena;Escape Points
Values allocated in the ephemeral arena must be “promoted” to standard heap allocation when they escape the current statement. This happens automatically at these escape points:
| Escape Point | Trigger |
|---|---|
OP_DEFINE_GLOBAL | Value bound to a global variable |
OP_CALL | Value passed as argument to compiled closure |
OP_SET_INDEX_LOCAL | Value stored into an array element |
Array .push() | Value appended to an array |
Promotion is performed by vm_promote_value(), which copies the string data to a fresh malloc allocation and resets region_id to REGION_NONE. The value_free() function already skips values with non-REGION_NONE tags, preventing double-frees.
Phase Tags on Values
Every LatValue carries a PhaseTag that governs mutability:
| Tag | Meaning |
|---|---|
VTAG_FLUID | Mutable — can be freely modified |
VTAG_CRYSTAL | Immutable — frozen, can cross channel boundaries |
VTAG_UNPHASED | No phase assigned (literals, temporaries) |
VTAG_SUBLIMATED | Write-once — assigned once, then becomes crystal |
Concurrency Runtime
Lattice’s concurrency model is built on pthreads with structured scoping guarantees. The runtime provides channels for inter-thread communication and scope/spawn/select primitives.
Channels
typedef struct LatChannel {
LatVec buffer; /* ring buffer of LatValue */
bool closed;
size_t refcount; /* atomic reference counting */
pthread_mutex_t mutex;
pthread_cond_t cond_notempty;
LatSelectWaiter *waiters; /* linked list for select */
} LatChannel;Channels use mutex-protected ring buffers. channel_send() locks the mutex, pushes the value, signals cond_notempty, and wakes all select waiters. channel_recv() blocks on the condition variable until a value is available or the channel closes. Reference counting (__atomic_add_fetch/__atomic_sub_fetch) manages channel lifetime.
Scope & Spawn
scope creates a structured concurrency boundary. Each spawn inside the scope launches a new pthread. The scope blocks until all spawned threads have joined. Variable capture uses deep-copy semantics—each spawned task gets an independent copy of captured values, ensuring isolation.
Select Multiplexing
select waits on multiple channels simultaneously. Each arm specifies a channel and a binding for the received value. The runtime uses Fisher-Yates shuffling to randomize the check order, preventing starvation of any single channel.
// Select waiter structure
typedef struct LatSelectWaiter {
pthread_mutex_t *mutex;
pthread_cond_t *cond;
struct LatSelectWaiter *next;
} LatSelectWaiter;
// Select registers a waiter on each channel,
// then waits until any channel has data.
// Fisher-Yates shuffle ensures fair ordering.#ifndef __EMSCRIPTEN__. In WebAssembly builds, concurrency primitives are no-ops—channels still work as synchronous queues, but scope/spawn execute sequentially.
Bytecode Serialization
Lattice supports serializing compiled bytecode to .latc files, enabling ahead-of-time compilation and distribution of precompiled programs.
.latc File Format
A .latc file contains a serialized Chunk with all constants, bytecode, and metadata. The format is produced by the self-hosted compiler (compiler/latc.lat) and consumed directly by the VM.
// Compile and run a .latc file
$ clat compiler/latc.lat input.lat output.latc
$ clat output.latcSelf-Hosted Compiler
The self-hosted compiler is written in Lattice itself (compiler/latc.lat). It reads .lat source, performs lexing, parsing, and code generation, then serializes the resulting bytecode. This means the bytecode path has no remaining AST dependencies at runtime.
?.), and try-propagate (?).
Design Constraints
The self-hosted compiler works within Lattice’s own semantics. Key patterns:
| Constraint | Solution |
|---|---|
| Maps/structs are pass-by-value | Parallel global arrays for compiler state |
| Buffers are pass-by-value | Global ser_buf for serialization output |
| No map literals | Map::new() with index assignment |
| Params need type annotations | fn foo(a: any, b: any) |
Performance Tuning
Lattice includes several optimization mechanisms across both VMs.
Computed Goto vs Switch
The stack VM supports computed goto dispatch via -DUSE_COMPUTED_GOTO. This replaces the central switch with a jump table, allowing each opcode handler to dispatch directly to the next without returning to a central branch point. On modern CPUs, this typically provides a 10–20% improvement in interpretation throughput by improving branch prediction.
Ephemeral Arena
String operations (concatenation, interpolation) allocate into the ephemeral bump arena instead of calling malloc. The arena is bulk-reset at each statement boundary via OP_RESET_EPHEMERAL, amortizing allocation cost to near zero for temporary strings. Pages are recycled rather than freed, eliminating repeated mmap/munmap system calls.
Stack VM vs Register VM
| Aspect | Stack VM | Register VM |
|---|---|---|
| Instruction size | Variable (1+ bytes) | Fixed (4 bytes) |
| Code density | Higher (fewer bytes) | Lower (fixed width) |
| Decode cost | Variable operand fetch | Single-word decode |
| Memory traffic | Frequent stack push/pop | Direct register access |
| Call frames | 256 max | 64 max |
| Constants per chunk | 65,536 (wide opcodes) | 65,536 (16-bit Bx) |
PIC Optimization
The polymorphic inline cache avoids repeated hash-table lookups for method dispatch. With 64 direct-mapped slots and 4 entries per slot, common monomorphic and dimorphic call sites achieve near-zero-cost dispatch after warmup. The cache is indexed by bytecode_offset & 0x3F, giving O(1) lookup with no hashing at the call site.
Specialized Integer Opcodes
Hot integer operations bypass the generic value dispatch path:
| Opcode | Effect |
|---|---|
OP_ADD_INT / ROP_ADD_INT | Integer addition without type check |
OP_SUB_INT / ROP_SUB_INT | Integer subtraction without type check |
OP_LT_INT / ROP_LT_INT | Integer less-than without type check |
OP_INC_LOCAL / ROP_INC_REG | In-place increment of local/register |
OP_LOAD_INT8 | Load small integer constant inline (no pool lookup) |
Constant Hash Pre-computation
The Chunk structure includes a const_hashes array that stores pre-computed FNV-1a hashes for string constants. Global variable lookups use these cached hashes instead of recomputing them on every access, reducing the cost of OP_GET_GLOBAL and OP_SET_GLOBAL to a single hash-table probe.
Lattice