/

Architecture

Playground Docs Extensions Examples Concurrency Home GitHub

Pipeline Overview

Lattice source code passes through a six-stage pipeline before execution. The compiler produces bytecode for one of two virtual machines: a stack-based VM (default) or a register-based VM.

Source.lat file
Lexerlexer.c
Parserparser.c
ASTexpr.h
Compilercompiler.c
Chunkchunk.h
VMvm.c

The register VM uses a parallel path:

ASTexpr.h
RegCompilerregcompiler.c
RegChunkregvm.h
RegVMregvm.c
Default: The stack-based VM is the default execution engine. Use --reg-vm to select the register VM, or --tree-walk for the legacy AST interpreter.

Lexer & Parser

The lexer (src/lexer.c) converts source text into a flat array of tokens. Each token carries a type tag, source position, and string slice. The parser (src/parser.c) consumes tokens via recursive descent, producing an AST of Expr nodes defined in include/expr.h.

Token Categories

CategoryExamples
LiteralsTOKEN_INT, TOKEN_FLOAT, TOKEN_STRING, TOKEN_TRUE, TOKEN_FALSE, TOKEN_NIL
KeywordsTOKEN_FN, TOKEN_FLUX, TOKEN_FIX, TOKEN_LET, TOKEN_IF, TOKEN_WHILE, TOKEN_FOR, TOKEN_MATCH, TOKEN_STRUCT, TOKEN_ENUM
OperatorsTOKEN_PLUS, TOKEN_MINUS, TOKEN_STAR, TOKEN_SLASH, TOKEN_PIPE, TOKEN_ARROW, TOKEN_FAT_ARROW
PhaseTOKEN_FREEZE, TOKEN_THAW, TOKEN_CLONE, TOKEN_SCOPE, TOKEN_SPAWN, TOKEN_SELECT
DelimitersTOKEN_LPAREN, TOKEN_LBRACE, TOKEN_LBRACKET, TOKEN_DOT, TOKEN_COMMA, TOKEN_SEMICOLON

AST Node Types

Every AST node is an Expr tagged with one of the ExprType variants. Statements and expressions share the same node type—a statement is simply an expression evaluated for side effects.

// Key expression types (from expr.h) EXPR_INT, EXPR_FLOAT, EXPR_STRING, EXPR_BOOL, EXPR_IDENT, EXPR_BINARY, EXPR_UNARY, EXPR_CALL, EXPR_FN, EXPR_IF, EXPR_WHILE, EXPR_FOR, EXPR_MATCH, EXPR_STRUCT_DEF, EXPR_ENUM_DEF, EXPR_BLOCK, EXPR_ASSIGN, EXPR_INDEX, EXPR_SCOPE, EXPR_SPAWN, EXPR_SELECT, EXPR_TRY_CATCH, EXPR_THROW, EXPR_DEFER

Stack VM Bytecode

The stack VM uses variable-length instructions. Each opcode is a single byte, followed by zero or more operand bytes. Operands reference the constants pool, local variable slots, or jump offsets. The compiler emits bytecode into a Chunk structure.

Chunk Structure

typedef struct { uint8_t *code; /* bytecode stream */ size_t code_len, code_cap; LatValue *constants; /* constants pool */ size_t *const_hashes; /* pre-computed FNV-1a hashes */ size_t const_len, const_cap; int *lines; /* source line per byte */ char **local_names; /* slot index → variable name */ char *name; /* function name (NULL for top-level) */ LatValue *default_values; /* default parameter values */ uint8_t *param_phases; /* per-param phase constraint */ PICTable pic; /* polymorphic inline cache */ } Chunk;

Opcode Categories

RangeCategoryKey Opcodes
0–7StackOP_CONSTANT, OP_NIL, OP_TRUE, OP_FALSE, OP_UNIT, OP_POP, OP_DUP, OP_SWAP
8–22ArithmeticOP_ADD, OP_SUB, OP_MUL, OP_DIV, OP_MOD, OP_NEG, OP_NOT
23–30BitwiseOP_BIT_AND, OP_BIT_OR, OP_BIT_XOR, OP_LSHIFT, OP_RSHIFT
31–38ComparisonOP_EQ, OP_NEQ, OP_LT, OP_GT, OP_LTEQ, OP_GTEQ
39–51VariablesOP_GET_LOCAL, OP_SET_LOCAL, OP_GET_GLOBAL, OP_SET_GLOBAL, OP_GET_UPVALUE, OP_SET_UPVALUE, OP_CLOSE_UPVALUE
52–58Control FlowOP_JUMP, OP_JUMP_IF_FALSE, OP_JUMP_IF_TRUE, OP_LOOP
59–63FunctionsOP_CALL, OP_CLOSURE, OP_RETURN
64–67IteratorsOP_ITER_INIT, OP_ITER_NEXT
68–84Data StructuresOP_BUILD_ARRAY, OP_BUILD_MAP, OP_BUILD_STRUCT, OP_INDEX, OP_SET_INDEX, OP_GET_FIELD, OP_SET_FIELD, OP_INVOKE
85–94Exceptions/DeferOP_PUSH_EXCEPTION_HANDLER, OP_POP_EXCEPTION_HANDLER, OP_THROW, OP_DEFER_PUSH, OP_DEFER_RUN
95–112Phase SystemOP_FREEZE, OP_THAW, OP_CLONE, OP_MARK_FLUID, OP_REACT, OP_BOND, OP_SEED, OP_SUBLIMATE
113–122Builtins/ConcurrencyOP_PRINT, OP_IMPORT, OP_SCOPE, OP_SELECT
123–132Integer SpecialsOP_INC_LOCAL, OP_DEC_LOCAL, OP_ADD_INT, OP_LT_INT, OP_LOAD_INT8
133–141Wide ConstantsOP_CONSTANT_16, OP_GET_GLOBAL_16, OP_SET_GLOBAL_16, OP_CLOSURE_16
142–143EphemeralOP_RESET_EPHEMERAL
144–157Combined/TypeOP_SET_LOCAL_POP, OP_CHECK_TYPE, OP_IS_CRYSTAL, OP_IS_FLUID, OP_FREEZE_FIELD
158–160HaltOP_HALT

Wide Constant Opcodes

When a chunk contains more than 256 constants, the compiler emits wide opcodes that use 2-byte big-endian indices. This supports up to 65,536 constants per chunk.

// Standard: 1-byte index (0-255) OP_CONSTANT [idx8] // Wide: 2-byte big-endian index (0-65535) OP_CONSTANT_16 [hi8] [lo8] OP_GET_GLOBAL_16 [hi8] [lo8] OP_SET_GLOBAL_16 [hi8] [lo8] OP_CLOSURE_16 [hi8] [lo8]

Variable-Length Instructions

Concurrency opcodes use variable-length encodings to embed sub-chunk references:

// OP_SCOPE layout OP_SCOPE [spawn_count] [sync_chunk_idx] [spawn_chunk_idx]... // OP_SELECT layout OP_SELECT [arm_count] [flags, chan_idx, body_idx, binding_idx] × arm_count // Sub-chunks are stored as VAL_CLOSURE constants // (body==NULL, native_fn points to sub-Chunk*)

Register VM Bytecode

The register VM uses fixed-width 32-bit instructions. Each instruction packs an opcode and up to three register operands into a single 32-bit word, enabling efficient decoding and cache-friendly dispatch.

Instruction Formats

// ABC format: 3 register operands [opcode:8] [A:8] [B:8] [C:8] // ABx format: register + 16-bit unsigned operand [opcode:8] [A:8] [Bx:16] // AsBx format: register + 16-bit signed operand (for jumps) [opcode:8] [A:8] [sBx:16] // sBx = Bx - 32767

RegChunk Structure

typedef struct RegChunk { uint32_t magic; /* REGCHUNK_MAGIC = 0x524C4154 ("RLAT") */ RegInstr *code; /* 32-bit instruction array */ size_t code_len, code_cap; LatValue *constants; /* constant pool (up to 65536) */ size_t const_len, const_cap; int *lines; /* source line per instruction */ char **local_names; /* register → variable name */ char *name; /* function name */ uint8_t max_reg; /* high-water register count */ PICTable pic; /* polymorphic inline cache */ } RegChunk;

Opcode Summary

RangeCategoryKey Opcodes
0–6Data MovementROP_MOVE, ROP_LOADK, ROP_LOADI, ROP_LOADNIL, ROP_LOADTRUE, ROP_LOADFALSE
7–14ArithmeticROP_ADD, ROP_SUB, ROP_MUL, ROP_DIV, ROP_MOD, ROP_NEG, ROP_ADDI, ROP_CONCAT
15–20ComparisonROP_EQ, ROP_NEQ, ROP_LT, ROP_LTEQ, ROP_NOT
21–23Control FlowROP_JMP, ROP_JMPFALSE, ROP_JMPTRUE
24–34VariablesROP_GETGLOBAL, ROP_SETGLOBAL, ROP_GETFIELD, ROP_SETFIELD, ROP_GETUPVALUE, ROP_SETUPVALUE
35–37FunctionsROP_CALL, ROP_RETURN, ROP_CLOSURE
38–56Data StructuresROP_NEWARRAY, ROP_NEWSTRUCT, ROP_NEWTUPLE, ROP_NEWENUM, ROP_BUILDRANGE
58–63Exceptions/DeferROP_PUSH_HANDLER, ROP_POP_HANDLER, ROP_THROW, ROP_DEFER_PUSH
65–76Phase/ConcurrencyROP_FREEZE, ROP_THAW, ROP_SCOPE, ROP_SELECT, ROP_IMPORT
78–93OptimizationROP_ADD_INT, ROP_INC_REG, ROP_DEC_REG, ROP_INVOKE_GLOBAL, ROP_CHECK_TYPE
94HaltROP_HALT

Stack VM Execution

The stack VM (src/vm.c) maintains a 4096-value operand stack and up to 256 call frames. Dispatch uses a switch statement on opcode values, with an optional computed-goto backend selected at compile time via USE_COMPUTED_GOTO.

Core Parameters

ParameterValueDescription
STACK_MAX4096Maximum operand stack depth
FRAMES_MAX256Maximum nested call frames
HANDLER_MAX64Exception handler stack depth
DEFER_MAX256Deferred operation stack depth

Call Frames & Upvalues

Each call frame stores the return address (instruction pointer), a base pointer into the value stack, and the executing chunk. Closures capture variables via ObjUpvalue objects that form a linked list. Open upvalues point directly into the stack; when a variable goes out of scope, the upvalue is “closed” by copying the value into the upvalue itself.

// Closure representation in the VM // For compiled closures: // body == NULL // native_fn == Chunk* (points to the function's chunk) // captured_env == ObjUpvalue** (upvalue array) // region_id == upvalue count // For C native functions: // body == VM_NATIVE_MARKER ((Expr**)0x1) // native_fn == VMNativeFn function pointer

Dispatch Modes

Two dispatch strategies are available:

Switch Dispatch

Standard C switch statement. Portable across all compilers. Each iteration fetches the next opcode and branches through the switch.

Computed Goto

GCC/Clang extension using &&label addresses and goto *dispatch_table[opcode]. Eliminates the central branch, improving branch predictor performance. Enabled with -DUSE_COMPUTED_GOTO.

Polymorphic Inline Cache (PIC)

Method dispatch uses a per-chunk PIC to avoid repeated hash-table lookups. Each call site has a 4-entry cache mapping (type_tag, method_hash) to a handler ID. Cache hits skip the full method resolution path.

typedef struct { uint8_t type_tag; /* ValueType of the receiver */ uint32_t method_hash; /* djb2 hash of method name */ uint16_t handler_id; /* cached handler (0 = miss) */ } PICEntry; typedef struct { PICEntry entries[4]; /* PIC_SIZE = 4 max entries */ uint8_t count; } PICSlot; // 64 direct-mapped slots per chunk (PIC_DIRECT_SLOTS) // Slot = bytecode_offset & 0x3F
Handler IDs: IDs 1–127 map to builtin methods (array: 1–24, string: 30–56, map: 60–70, buffer: 75–98, set: 100–111, enum: 115–119, channel: 120–122). IDs 128+ are closure methods (map, filter, reduce, etc.). ID 255 (PIC_NOT_BUILTIN) triggers struct/impl lookup.

Register VM Execution

The register VM (src/regvm.c) uses 256-register windows per call frame. Instead of pushing/popping an operand stack, instructions read and write directly to numbered registers, reducing memory traffic.

Core Parameters

ParameterValueDescription
REGVM_REG_MAX256Registers per frame
REGVM_FRAMES_MAX64Maximum call frame depth
REGVM_CONST_MAX65536Constants per chunk
REGVM_HANDLER_MAX64Exception handler depth
REGVM_DEFER_MAX256Deferred operations

Register Window & Call Convention

The register file is a flat array of REGVM_REG_MAX × REGVM_FRAMES_MAX values. Each call frame has a base pointer (regs) into this array. On function call, the caller places arguments starting at the callee’s register 1 (register 0 is reserved for internal use). The callee writes its return value to caller_result_reg in the caller’s frame.

typedef struct { RegChunk *chunk; RegInstr *ip; /* instruction pointer */ LatValue *regs; /* base of register window */ size_t reg_count; struct ObjUpvalue **upvalues; size_t upvalue_count; uint8_t caller_result_reg; /* where to write return value */ } RegCallFrame;

Instruction Decoding

Each 32-bit instruction is decoded by extracting fields with bitwise operations:

RegInstr instr = *ip++; uint8_t op = instr & 0xFF; // opcode uint8_t A = (instr >> 8) & 0xFF; // register A uint8_t B = (instr >> 16) & 0xFF; // register B uint8_t C = (instr >> 24) & 0xFF; // register C uint16_t Bx = (instr >> 16) & 0xFFFF; // 16-bit operand

Memory Management

Lattice uses a region-based memory system with four distinct allocation strategies, identified by region_id tags on every value.

Region Tags

ConstantValueDescription
REGION_NONE(size_t)-1Standard malloc/free allocation
REGION_EPHEMERAL(size_t)-2Ephemeral bump arena (fast, bulk-reset)
REGION_INTERNED(size_t)-3Interned string (never freed)
REGION_CONST(size_t)-4Constant pool string (freed with chunk)

Ephemeral Bump Arena

The ephemeral arena is a fast bump allocator used for temporary string allocations (concatenation, interpolation). It allocates linearly from 4096-byte pages and is bulk-reset at statement boundaries via OP_RESET_EPHEMERAL.

typedef struct ArenaPage { uint8_t *data; size_t used; size_t cap; /* ARENA_PAGE_SIZE = 4096 */ struct ArenaPage *next; } ArenaPage; typedef struct BumpArena { ArenaPage *pages; /* current page */ ArenaPage *first_page; /* head of chain (kept across resets) */ size_t total_bytes; } BumpArena;

Escape Points

Values allocated in the ephemeral arena must be “promoted” to standard heap allocation when they escape the current statement. This happens automatically at these escape points:

Escape PointTrigger
OP_DEFINE_GLOBALValue bound to a global variable
OP_CALLValue passed as argument to compiled closure
OP_SET_INDEX_LOCALValue stored into an array element
Array .push()Value appended to an array

Promotion is performed by vm_promote_value(), which copies the string data to a fresh malloc allocation and resets region_id to REGION_NONE. The value_free() function already skips values with non-REGION_NONE tags, preventing double-frees.

Phase Tags on Values

Every LatValue carries a PhaseTag that governs mutability:

TagMeaning
VTAG_FLUIDMutable — can be freely modified
VTAG_CRYSTALImmutable — frozen, can cross channel boundaries
VTAG_UNPHASEDNo phase assigned (literals, temporaries)
VTAG_SUBLIMATEDWrite-once — assigned once, then becomes crystal

Concurrency Runtime

Lattice’s concurrency model is built on pthreads with structured scoping guarantees. The runtime provides channels for inter-thread communication and scope/spawn/select primitives.

Channels

typedef struct LatChannel { LatVec buffer; /* ring buffer of LatValue */ bool closed; size_t refcount; /* atomic reference counting */ pthread_mutex_t mutex; pthread_cond_t cond_notempty; LatSelectWaiter *waiters; /* linked list for select */ } LatChannel;

Channels use mutex-protected ring buffers. channel_send() locks the mutex, pushes the value, signals cond_notempty, and wakes all select waiters. channel_recv() blocks on the condition variable until a value is available or the channel closes. Reference counting (__atomic_add_fetch/__atomic_sub_fetch) manages channel lifetime.

Scope & Spawn

scope creates a structured concurrency boundary. Each spawn inside the scope launches a new pthread. The scope blocks until all spawned threads have joined. Variable capture uses deep-copy semantics—each spawned task gets an independent copy of captured values, ensuring isolation.

Select Multiplexing

select waits on multiple channels simultaneously. Each arm specifies a channel and a binding for the received value. The runtime uses Fisher-Yates shuffling to randomize the check order, preventing starvation of any single channel.

// Select waiter structure typedef struct LatSelectWaiter { pthread_mutex_t *mutex; pthread_cond_t *cond; struct LatSelectWaiter *next; } LatSelectWaiter; // Select registers a waiter on each channel, // then waits until any channel has data. // Fisher-Yates shuffle ensures fair ordering.
WASM: All pthread code is guarded with #ifndef __EMSCRIPTEN__. In WebAssembly builds, concurrency primitives are no-ops—channels still work as synchronous queues, but scope/spawn execute sequentially.

Bytecode Serialization

Lattice supports serializing compiled bytecode to .latc files, enabling ahead-of-time compilation and distribution of precompiled programs.

.latc File Format

A .latc file contains a serialized Chunk with all constants, bytecode, and metadata. The format is produced by the self-hosted compiler (compiler/latc.lat) and consumed directly by the VM.

// Compile and run a .latc file $ clat compiler/latc.lat input.lat output.latc $ clat output.latc

Self-Hosted Compiler

The self-hosted compiler is written in Lattice itself (compiler/latc.lat). It reads .lat source, performs lexing, parsing, and code generation, then serializes the resulting bytecode. This means the bytecode path has no remaining AST dependencies at runtime.

Current status: The self-hosted compiler handles expressions, variables, functions, closures, loops, control flow, structs, enums, compound index assignment, require/ensure contracts, optional chaining (?.), and try-propagate (?).

Design Constraints

The self-hosted compiler works within Lattice’s own semantics. Key patterns:

ConstraintSolution
Maps/structs are pass-by-valueParallel global arrays for compiler state
Buffers are pass-by-valueGlobal ser_buf for serialization output
No map literalsMap::new() with index assignment
Params need type annotationsfn foo(a: any, b: any)

Performance Tuning

Lattice includes several optimization mechanisms across both VMs.

Computed Goto vs Switch

The stack VM supports computed goto dispatch via -DUSE_COMPUTED_GOTO. This replaces the central switch with a jump table, allowing each opcode handler to dispatch directly to the next without returning to a central branch point. On modern CPUs, this typically provides a 10–20% improvement in interpretation throughput by improving branch prediction.

Ephemeral Arena

String operations (concatenation, interpolation) allocate into the ephemeral bump arena instead of calling malloc. The arena is bulk-reset at each statement boundary via OP_RESET_EPHEMERAL, amortizing allocation cost to near zero for temporary strings. Pages are recycled rather than freed, eliminating repeated mmap/munmap system calls.

Stack VM vs Register VM

AspectStack VMRegister VM
Instruction sizeVariable (1+ bytes)Fixed (4 bytes)
Code densityHigher (fewer bytes)Lower (fixed width)
Decode costVariable operand fetchSingle-word decode
Memory trafficFrequent stack push/popDirect register access
Call frames256 max64 max
Constants per chunk65,536 (wide opcodes)65,536 (16-bit Bx)

PIC Optimization

The polymorphic inline cache avoids repeated hash-table lookups for method dispatch. With 64 direct-mapped slots and 4 entries per slot, common monomorphic and dimorphic call sites achieve near-zero-cost dispatch after warmup. The cache is indexed by bytecode_offset & 0x3F, giving O(1) lookup with no hashing at the call site.

Specialized Integer Opcodes

Hot integer operations bypass the generic value dispatch path:

OpcodeEffect
OP_ADD_INT / ROP_ADD_INTInteger addition without type check
OP_SUB_INT / ROP_SUB_INTInteger subtraction without type check
OP_LT_INT / ROP_LT_INTInteger less-than without type check
OP_INC_LOCAL / ROP_INC_REGIn-place increment of local/register
OP_LOAD_INT8Load small integer constant inline (no pool lookup)

Constant Hash Pre-computation

The Chunk structure includes a const_hashes array that stores pre-computed FNV-1a hashes for string constants. Global variable lookups use these cached hashes instead of recomputing them on every access, reducing the cost of OP_GET_GLOBAL and OP_SET_GLOBAL to a single hash-table probe.