RVE · Source Code Walkthrough
Instruction decoding, CSRs, traps & IRQs, ELF loading, Sv32 MMU, Linux boot
Roadmap
Extension set: 0b01000000000101000001000100000001 (MISA) → Integer · Multiply · Atomic · Supervisor · User · Machine
rv32.h · types.h
pc = 0x80000000 · all xreg = 0
x11 = 0x1020 (DTB ptr for Linux)
privilege = PRIV_MACHINE (3)
Single flat 128 MiB uint8_t heap.
Address bit 31 distinguishes RAM from MMIO.
addr & 0x7FFFFFFF = physical offset.
write_reg / write_val — rd writeback
pc_val — next PC
csr_write / csr_val — CSR update
trap — exception/IRQ signal
ISA · 6 formats — slides 05–10 decode each one
R-type add sub mul div and or xor sll sra …
I-type addi lw jalr ecall csrrw …
S-type sw sh sb
B-type beq bne blt bge bltu bgeu
U-type lui auipc
J-type jal
rs1 is always [19:15] · rs2 always [24:20] · opcode always [6:0] — consistent field positions let the decoder read registers before knowing the instruction type. Next: dedicated animated slides for each format.
R-type add sub and or xor sll srl sra mul div rem …
opcode → funct3 → funct7 → rd → rs1 → rs2
funct7 bit[30]: 0=ADD, 1=SUB (also SRL vs SRA)
Current field
Fields highlight left → right as each is decoded…
I-type addi lw jalr ecall slli csrrw …
0010011=OP-IMM 0000011=LOAD 1100111=JALR 1110011=SYSTEM
Current field
Fields highlight left → right as each is decoded…
S-type sw sh sb
Keeping rs1/rs2 at fixed positions [19:15] and [24:20] means the register file can be read before the opcode is fully decoded. The immediate is split to accommodate this — no rd field exists.
Current field
Both amber imm fields highlight together — they form one immediate…
B-type beq bne blt bge bltu bgeu
{imm[12], imm[11], imm[10:5], imm[4:1], 0}
offset = 8 = 0b00001000 · range: ±4 KiB
Current field
All 4 amber imm fragments highlight together…
U-type lui auipc
LUI x1, hi20 + ADDI x1, x1,
lo12
If lo12 bit[11] = 1 (negative), hi20 must be incremented by 1 to compensate sign-extension.
AUIPC adds the upper immediate to PC instead — useful for PIC code.
Current field
Fields highlight left → right as each is decoded…
J-type jal
{imm[20], imm[19:12], imm[11], imm[10:1], 0}
offset=8: imm[10:1]=0000000100 (bit 3 set) → PC+8.
Range: ±1 MiB · x0 as rd = unconditional jump (no link).
Current field
All 4 imm fragments highlight together then rd and opcode…
emu.cpp
parse_FormatR — register–register ops
parse_FormatB — branches (scattered imm)
All 6 format parsers are pre-called once per instruction at the top of insSelect() before any dispatch — the compiler eliminates dead computations. B/J immediates reassemble scattered bits with sign-extension via the sign bit mask 0x80000000.
emu.cpp · insSelect()
Different instructions occupy different bit fields. A single opcode byte isn't enough — funct3 and funct7 disambiguate. Each switch uses a mask that exposes only the relevant discriminator bits.
emu.cpp · Macro DSL
The three core macros
Implementations look like inline specs
AS_SIGNED / AS_UNSIGNED reinterpret bits without conversion — pointer cast trick to avoid UB via *(int32_t*)&val.
types.h · emu.cpp
insReturnNoop() zeroes the struct and sets pc_val = pc + 4. Instruction handlers only populate fields they affect — most only call WR_RD.
After insSelect() returns, emulate() commits all side-effects: apply write_reg, advance pc, write CSR, then call handleIrqAndTrap() to check for exceptions.
write_reg == 0 → write is silently discarded at commit time. No special casing inside instruction implementations needed.
rv32.h · rv32.cpp
rv32.cpp · readCsrRaw / writeCsrRaw
Supervisor sees a masked view of machine state
Exposes only the SSTATUS-legal bits of MSTATUS: SD, MXR, SUM, XS, FS, SPP, SPIE, SIE. M-mode bits (MPP, MPIE, MIE) are hidden from S-mode.
Bit pattern 0b001000100010 exposes only SEIP (bit 9), STIP (bit 5), SSIP (bit 1) — the supervisor-visible interrupt bits within MIE/MIP.
SSTATUS, SIE, SIP have no backing storage. They're computed on read and written back as masked updates to the M-mode registers. One source of truth.
emu.cpp · CSRRW / CSRRS / CSRRC
Before any instruction runs, ins_FormatCSR.value is populated by calling getCsr(). Instructions receive the old value in ins.value — enabling atomic semantics.
Read happens before dispatch. Write is committed via WR_CSR in the result struct after the instruction returns. The old value goes to rd simultaneously — true read-modify-write with no window.
CSRRWI / CSRRSI / CSRRCI use a 5-bit zero-extended immediate from ins.rs instead of a register value. Same atomic semantics, different source.
rv32.h · rv32.cpp
mret / sret restore privilege from MSTATUS.MPP / SSTATUS.SPP and re-enable interrupts via MPIE→MIE. Delegation registers (MIDELEG / MEDELEG) control which privilege level handles each trap — checked in handleTrap().
rv32.cpp · handleIrqAndTrap()
After commit, handleIrqAndTrap(&ret) checks: did the instruction raise a trap? Are any interrupts pending and enabled?
If ret.trap.en is set, skip IRQ scan — handle the synchronous exception. IRQs only checked when no synchronous trap fired.
MEIP → MSIP → MTIP → SEIP → SSIP → STIP — first match wins. MIP bit is cleared after handling.
rv32.cpp · handleTrap()
TVEC[1:0] == 0 → Direct: all traps jump to base.
TVEC[1:0] != 0 → Vectored: jump to base + 4
× cause.
Vectored mode lets the hardware jump directly to per-interrupt handlers without a dispatch table in
software.
MIE → saved to MPIE, then MIE = 0 (disable further IRQs).
Current privilege → saved to MPP.
Privilege → new_privilege.
mret reverses this: MPIE→MIE, MPP→privilege.
rv32.h · rv32.cpp · CLINT
memGetByte / memSetByte have case-per-byte entries for all 8 CLINT registers. Little-endian: byte 0 at base, byte 3 at base+3. 64-bit time split across two 32-bit registers.
rv32.cpp · uartTick()
Two u32s hold 8 UART registers, each
accessed via UART_GET1/2 + shift macros.
rbr_thr_ier_iir: RBR·THR·IER·IIR
lcr_mcr_lsr_scr: LCR·MCR·LSR·SCR
RBR != 0 && IER.RXINT → IIR_RD_AVAILABLE
(4)
THR == 0 && IER.THRE → IIR_THR_EMPTY
(2)
Otherwise → IIR_NO_INTERRUPT (7)
uart.interrupting = true → emu.cpp sets MIP.SEIP → handleIrqAndTrap fires trap_SupervisorExternalInterrupt.
loader.cpp · loadElf()
ELF virtual addresses start at 0x80000000. Masking with 0x7FFFFFFF gives the physical offset into the 128 MiB flat buffer — byte 0 of RAM.
rv32.cpp · memGetByte
memGetByte first asserts addr & 0x80000000. After the MMIO switch table falls through, the final check is return mem[addr & 0x7FFFFFFF] — no branches for the hot path.
Device Tree Blob is read from a separate pointer (cpu.dtb), mapped at 0x1020. x11 is pre-set to this address on reset following the Linux boot protocol.
memSetByte mirrors the structure: same MMIO addresses in a switch. UART writes to THR trigger uartUpdateIir() immediately. CLINT writes update the packed 32-bit fields byte-by-byte.
No DMA, no PCI, no GPU. Just CLINT (timer + software IRQ) and UART 16550-compatible serial port — enough to boot Linux with a console.
Makefile · main.cpp · app.cpp
Compiles src/*.cpp + ImGui + ImPlot + disassembler. Flags: g++ -std=c++17 -O2 -Wall. Links SDL2 + OpenGL + Cocoa/IOKit (macOS) or libGL + SDL2 (Linux). Output: build/rve.
Builds then launches ./build/rve with no extra flags — GUI mode (SDL2 + ImGui). Emulator paused on start. Load a binary from the UI or pass -e <elf> / -b <bin>.
Runs all 60 ISA compliance tests: ./build/rve -re <test> per binary. -r = auto-start (emu.running=true), -e <file> = load ELF. All 60 tests must exit cleanly.
Downloads linux-6.1.14-rv32nommu image, then: ./build/rve -n -b Image. Flag -n = headless — no SDL window, pure terminal I/O. stdin set to raw mode so keystrokes go directly to guest UART.
Same image download, then: ./build/rve -r -b Image. Opens GUI window and auto-starts emulator (-r sets emu.running=true). Linux UART output visible in the ImGui console panel.
main.cpp: if -n → runHeadless() tight loop (while (emu.running) emu.emulate()). Otherwise → App SDL2 window + ImGui render loop calling emu.emulate() each frame.
assets/isa-test/ · Makefile · RISC-V ISA compliance
Each is a bare-metal ELF32 loaded at 0x80000000. No OS, no MMU (-p- = physical-only mode). Test runs in M-mode with no trap delegation. Exercises raw instruction semantics.
Pass: test writes 0x55 to SYSCON at 0x11100000 → syscon_cmd = 0x5555 → emulator prints SYSCON POWEROFF and halts. Fail: any illegal instruction trap or loop timeout before SYSCON write.
Each test ships with a *.dump disassembly, excluded from the run by filter-out %.dump. Cross-reference the dump with a failing PC to locate the broken instruction.
loader.cpp · rv32.cpp · RISC-V Linux boot protocol
Not an ELF — a flat binary: Linux 6.1.14 rv32nommu kernel + BusyBox initramfs. Load address is 0x80000000; loading at buffer offset 0 is identical since 0x80000000 & 0x7FFFFFFF = 0.
A compiled .dtb mapped at mem[0x1020] via a separate cpu.dtb pointer. Describes the machine: 1 hart at 100 MHz, RAM at 0x80000000 (128 MiB), UART at 0x10000000, CLINT at 0x02000000.
Standard convention: a0 = hart ID, a1 = physical DTB address. Kernel entry reads DTB immediately to discover memory map, configure UART and CLINT drivers, then starts the scheduler. No firmware (BBL/OpenSBI) layer needed — kernel boots directly in M-mode.
rv32.cpp · mmuUpdate() · Sv32 ISA spec §4.3
SATP CSR (0x180) — supervisor address translation & protection
PTE format — 32-bit entry at each level of the page table
rv32.cpp · mmuTranslate() · called on every fetch, load & store
32-bit virtual address decomposition
PPN[1]<<22 | PPN[0]<<12 | offset[11:0]
Leaf at level 0; PPN[0] must be 0 (else fault).
PPN[1]<<22 | VPN[0]<<12 | offset[11:0]
rv32.cpp · mmuTranslate() · Linux rv32nommu
All set ret.trap.en and propagate through handleIrqAndTrap(). Delegated to S-mode if MEDELEG bit is set — kernel's page fault handler runs.
Hardware does not auto-set A or D bits — the OS must set them before a page is accessed. Any access to A=0, or write to D=0, faults identically to a permission failure and must be handled by the kernel.
rv32nommu kernel never executes csrw satp. mmu.mode stays MMU_MODE_OFF = 0 for the entire run. Every memory access hits the first branch and returns the physical address immediately — the full Sv32 walk code is compiled in but never reached at runtime.
emu.cpp · rv32.h · RV32F + RV32D
Bits [7:5]: frm — rounding mode (RNE · RTZ · RDN · RUP · RMM)
Bits [4:0]: fflags — exception flags (NV · DZ · OF · UF · NX)
Accessible as CSR_FFLAGS (0x001), CSR_FRM (0x002), CSR_FCSR (0x003)
All rv32{f,d} ISA tests pass. hello_linux printf of floats runs natively — musl libc soft-float helpers (__adddf3, __floatsidf) execute correctly via the F/D extensions.
hello_linux/framebuff.c · /dev/fb0
RVE's Linux kernel exposes /dev/fb0 via MMIO. The program uses standard POSIX calls (open / ioctl / write) — no special emulator hooks needed. The entire graphics stack runs as normal Linux userspace.
hsv2rgb() uses fmodf / fabsf. The 3-D cube uses cosf / sinf for rotation and float perspective divide — all running as native RV32F instructions through the emulator's F-extension.
That's a wrap
Full RISC-V pipeline — from instruction bits to Linux userspace in ~1000 lines of C++
Format parsers and CSR pre-reads happen unconditionally. Dispatch is dead-simple masked switches. 60 bare-metal ISA tests validate every instruction independently before running Linux.
Sv32 is a two-level table walk: 2× memGetWord() calls plus permission gates. The entire translation is ~20 lines. nommu Linux bypasses it in the very first branch — zero overhead.
Linux boot needs two things: flat Image at mem[0] and a1 = DTB pointer. The DTB encodes the entire machine description — no firmware layer (OpenSBI/BBL) needed. Kernel runs directly in M-mode.