RVE · Source Code Walkthrough

RISC-V
Emulator
Internals

Instruction decoding, CSRs, traps & IRQs, ELF loading, Sv32 MMU, Linux boot

RV32I RV32M RV32A RV32F RV32D Supervisor Machine Sv32 MMU Linux Framebuffer
RISC-V Logo
01 / 33

Roadmap

What we're covering

  • RV32 CPU State — registers, memory, peripherals
  • Instruction Formats — R / I / S / B / U / J
  • Format Parsers — bit-field extraction
  • insSelect() — masked-switch dispatch
  • imp / run Macros — instruction DSL
  • CSR Architecture — 4096-entry flat array
  • Shadow Registers — SSTATUS ⊂ MSTATUS
  • Privilege Levels — U / S / M modes
  • Traps & IRQs — delegation, TVEC, EPC
  • ELF Loading — section mapping to RAM
  • Build & ISA Tests — make all / run / isas / linux
  • Sv32 MMU — SATP, PTE, two-level walk
  • Linux Boot — flat image, DTB, nommu mode

Extension set: 0b01000000000101000001000100000001 (MISA) → Integer · Multiply · Atomic · Supervisor · User · Machine

02 / 33

rv32.h · types.h

RV32 CPU State

class RV32 {
 u32 clock;            // cycle counter
 u32 xreg[32];        // x0 – x31
 u32 pc;               // program counter
 u8 *mem;             // 128 MiB flat RAM
 csr_state csr;        // 4096 × u32
 clint_state clint;    // timer
 uart_state uart;     // serial
 bool reservation_en; // LR/SC
};

Reset state

pc = 0x80000000 · all xreg = 0
x11 = 0x1020 (DTB ptr for Linux)
privilege = PRIV_MACHINE (3)

Memory model

Single flat 128 MiB uint8_t heap.
Address bit 31 distinguishes RAM from MMIO.
addr & 0x7FFFFFFF = physical offset.

ins_ret — result bus

write_reg / write_val — rd writeback
pc_val — next PC
csr_write / csr_val — CSR update
trap — exception/IRQ signal

03 / 33

ISA · 6 formats — slides 05–10 decode each one

The 6 RISC-V Instruction Formats

R-type  add sub mul div and or xor sll sra …

funct7
[31:25]
rs2
[24:20]
rs1
[19:15]
fn3
[14:12]
rd
[11:7]
opcode
[6:0]

I-type  addi lw jalr ecall csrrw …

imm[11:0]
[31:20]
rs1
[19:15]
fn3
[14:12]
rd
[11:7]
opcode
[6:0]

S-type  sw sh sb

imm[11:5]
[31:25]
rs2
[24:20]
rs1
[19:15]
fn3
[14:12]
imm[4:0]
[11:7]
opcode
[6:0]

B-type  beq bne blt bge bltu bgeu

12
[31]
10:5
[30:25]
rs2
[24:20]
rs1
[19:15]
fn3
[14:12]
4:1
[11:8]
11
[7]
opcode
[6:0]

U-type  lui auipc

imm[31:12]
[31:12]
rd
[11:7]
opcode
[6:0]

J-type  jal

20
[31]
10:1
[30:21]
11
[20]
19:12
[19:12]
rd
[11:7]
opcode
[6:0]

rs1 is always [19:15] · rs2 always [24:20] · opcode always [6:0] — consistent field positions let the decoder read registers before knowing the instruction type. Next: dedicated animated slides for each format.

04 / 33

R-type  add sub and or xor sll srl sra mul div rem …

Register–Register Operations

ADD x2, x1, x3  →  0x00308133
funct7  rs2   rs1  fn3 rd   opcode
0000000 00011 00001 000 00010 0110011
funct7
[31:25]
rs2
[24:20]
rs1
[19:15]
fn3
[14:12]
rd
[11:7]
opcode
[6:0]
31 ‥ 25
24 ‥ 20
19 ‥ 15
14‥12
11 ‥ 7
6 ‥ 0

Decode order

opcode → funct3 → funct7 → rd → rs1 → rs2
funct7 bit[30]: 0=ADD, 1=SUB (also SRL vs SRA)

Current field

Fields highlight left → right as each is decoded…

05 / 33

I-type  addi lw jalr ecall slli csrrw …

Immediate Operations

ADDI x1, x2, 42  →  0x02A10093
imm[11:0]     rs1  fn3 rd   opcode
000000101010 00010 000 00001 0010011
imm[11:0]
[31:20]
rs1
[19:15]
fn3
[14:12]
rd
[11:7]
opcode
[6:0]
31 ‥‥‥‥‥‥‥‥‥ 20
19 ‥ 15
14‥12
11 ‥ 7
6 ‥ 0

4 opcode families use I-type

0010011=OP-IMM   0000011=LOAD   1100111=JALR   1110011=SYSTEM

Current field

Fields highlight left → right as each is decoded…

06 / 33

S-type  sw sh sb

Store Operations — Split Immediate

SW x3, 8(x1)  →  0x0030A423
imm[11:5] rs2   rs1  fn3 imm[4:0] opcode
0000000 00011 00001 010 01000 0100011
imm[11:5]
[31:25]
rs2
[24:20]
rs1
[19:15]
fn3
[14:12]
imm[4:0]
[11:7]
opcode
[6:0]
31 ‥ 25
24 ‥ 20
19 ‥ 15
14‥12
11 ‥ 7
6 ‥ 0

Why split the immediate?

Keeping rs1/rs2 at fixed positions [19:15] and [24:20] means the register file can be read before the opcode is fully decoded. The immediate is split to accommodate this — no rd field exists.

Current field

Both amber imm fields highlight together — they form one immediate…

07 / 33

B-type  beq bne blt bge bltu bgeu

Branch — Scrambled Immediate

BEQ x1, x2, 8  →  0x00208463
12 10:5  rs2   rs1  fn3 4:1  11 opcode
0 000000 00010 00001 000 0100 0 1100011
12
[31]
10:5
[30:25]
rs2
[24:20]
rs1
[19:15]
fn3
[14:12]
4:1
[11:8]
11
[7]
opcode
[6:0]
31
30 ‥ 25
24 ‥ 20
19 ‥ 15
14‥12
11‥8
7
6 ‥ 0

Reassembly

{imm[12], imm[11], imm[10:5], imm[4:1], 0}
offset = 8 = 0b00001000 · range: ±4 KiB

Current field

All 4 amber imm fragments highlight together…

08 / 33

U-type  lui auipc

Upper Immediate — Simplest Format

LUI x5, 0xABCDE  →  0xABCDE2B7
imm[31:12]                    rd   opcode
10101011110011011110 00101 0110111
imm[31:12]
[31:12]
rd
[11:7]
opcode
[6:0]
31 ‥‥‥‥‥‥‥‥‥‥‥‥‥‥‥‥ 12
11 ‥ 7
6 ‥ 0

32-bit constant synthesis

LUI x1, hi20  +  ADDI x1, x1, lo12
If lo12 bit[11] = 1 (negative), hi20 must be incremented by 1 to compensate sign-extension.
AUIPC adds the upper immediate to PC instead — useful for PIC code.

Current field

Fields highlight left → right as each is decoded…

09 / 33

J-type  jal

Jump and Link — Most Scrambled

JAL x1, 8  →  0x004000EF
20 imm[10:1]       11 imm[19:12] rd   opcode
0 0000000100 0 00000000 00001 1101111
20
[31]
10:1
[30:21]
11
[20]
19:12
[19:12]
rd
[11:7]
opcode
[6:0]
31
30 ‥‥‥‥‥‥ 21
20
19 ‥‥‥ 12
11 ‥ 7
6 ‥ 0

Reassembly & range

{imm[20], imm[19:12], imm[11], imm[10:1], 0}
offset=8: imm[10:1]=0000000100 (bit 3 set) → PC+8.
Range: ±1 MiB · x0 as rd = unconditional jump (no link).

Current field

All 4 imm fragments highlight together then rd and opcode…

10 / 33

emu.cpp

Format Parsers — Bit Extraction

parse_FormatR — register–register ops

FormatR parse_FormatR(u32 word) {
 FormatR ret;
 ret.rd = (word >> 7) & 0x1f;
 ret.rs1 = (word >> 15) & 0x1f;
 ret.rs2 = (word >> 20) & 0x1f;
 ret.rs3 = (word >> 27) & 0x1f;
 return ret;
}

parse_FormatB — branches (scattered imm)

FormatB parse_FormatB(u32 word) {
 FormatB ret;
 ret.rs1 = (word >> 15) & 0x1f;
 ret.rs2 = (word >> 20) & 0x1f;
 ret.imm =
  (word & 0x80000000 ? 0xfffff000:0)
  | ((word << 4) & 0x00000800) // bit 11
  | ((word >> 20) & 0x000007e0) // bits 10:5
  | ((word >> 7) & 0x0000001e); // bits 4:1
 return ret;
}

All 6 format parsers are pre-called once per instruction at the top of insSelect() before any dispatch — the compiler eliminates dead computations. B/J immediates reassemble scattered bits with sign-extension via the sign bit mask 0x80000000.

11 / 33

emu.cpp · insSelect()

Multi-Stage Masked Dispatch

Why masks?

Different instructions occupy different bit fields. A single opcode byte isn't enough — funct3 and funct7 disambiguate. Each switch uses a mask that exposes only the relevant discriminator bits.

7 mask stages

  • 0x0000007f — opcode only (lui, jal)
  • 0x0000707f — opcode + funct3 (addi, lw…)
  • 0xf800707f — AMO ops
  • 0xfc00707f — shift immediates
  • 0xfe00707f — R-type arithmetic
  • 0xfe007fff — sfence.vma
  • 0xffffffff — exact match (ecall, mret…)
ins_ret Emulator::insSelect(u32 ins_word) {
 u32 ins_masked;
 ins_ret ret = cpu.insReturnNoop();

 // Pre-parse ALL formats upfront
 FormatR ins_FormatR = parse_FormatR(ins_word);
 FormatI ins_FormatI = parse_FormatI(ins_word);
 FormatCSR ins_FormatCSR = parse_FormatCSR(ins_word);
 // ...S, U, J, B, Empty...

 // CSR pre-read if this looks like a CSR op
 if ((ins_word & 0x73) == 0x73)
  ins_FormatCSR.value = cpu.getCsr(ins_FormatCSR.csr, &ret);

 ins_masked = ins_word & 0x0000007f;
 switch (ins_masked) {
  run(auipc, 0x00000017, ins_FormatU)
  run(jal, 0x0000006f, ins_FormatJ)
  run(lui, 0x00000037, ins_FormatU)
 }
 ins_masked = ins_word & 0x0000707f;
 switch (ins_masked) { /* addi, lw, beq, csrrw … */ }
 // …5 more masked switch stages…
}
12 / 33

emu.cpp · Macro DSL

Instruction Implementation Pattern

The three core macros

// Define an instruction handler
#define imp(name, fmt_t, code)
 void Emulator::emu_##name(u32 w,
  ins_ret *ret, fmt_t ins) { code }

// Dispatch: match → call → return
#define run(name, opcode, insf)
 case opcode:
  if (debugMode) ins_p(name)
  emu_##name(ins_word, &ret, insf);
  return ret;

// Result write helpers
#define WR_RD(code) {
 ret->write_reg = ins.rd;
 ret->write_val = AS_UNSIGNED(code); }
#define WR_PC(code) { ret->pc_val = code; }
#define WR_CSR(code) {
 ret->csr_write = ins.csr; ret->csr_val = code; }

Implementations look like inline specs

imp(add, FormatR, { // rv32i
 WR_RD(AS_SIGNED(cpu.xreg[ins.rs1])
  + AS_SIGNED(cpu.xreg[ins.rs2]));
})

imp(beq, FormatB, { // rv32i
 if (cpu.xreg[ins.rs1]
  == cpu.xreg[ins.rs2])
   WR_PC(cpu.pc + ins.imm);
})

imp(amoswap_w, FormatR, { // rv32a
 u32 tmp = cpu.memGetWord(cpu.xreg[ins.rs1]);
 cpu.memSetWord(cpu.xreg[ins.rs1], cpu.xreg[ins.rs2]);
 WR_RD(tmp)
})

AS_SIGNED / AS_UNSIGNED reinterpret bits without conversion — pointer cast trick to avoid UB via *(int32_t*)&val.

13 / 33

types.h · emu.cpp

ins_ret — The Result Bus

typedef struct {
 u32 write_reg; // rd index
 u32 write_val; // rd value
 u32 pc_val; // next PC
 u32 csr_write; // CSR addr
 u32 csr_val; // CSR value
 Trap trap; // exception
} ins_ret;

typedef struct {
 bool en; // pending?
 bool irq; // interrupt?
 u32 type; // cause code
 u32 value; // tval
} Trap;

Noop default

insReturnNoop() zeroes the struct and sets pc_val = pc + 4. Instruction handlers only populate fields they affect — most only call WR_RD.

Commit phase

After insSelect() returns, emulate() commits all side-effects: apply write_reg, advance pc, write CSR, then call handleIrqAndTrap() to check for exceptions.

x0 hard-wired zero

write_reg == 0 → write is silently discarded at commit time. No special casing inside instruction implementations needed.

14 / 33

rv32.h · rv32.cpp

CSR Architecture

typedef struct {
 u32 data[4096]; // all CSRs by addr
 u32 privilege; // current mode
} csr_state;

// Address encodes privilege in [9:8]
// and read-only in [11:10] == 0b11
bool hasCsrAccessPrivilege(u32 addr) {
 u32 req = (addr >> 8) & 0x3;
 return req <= csr.privilege;
}

void setCsr(...) {
 bool ro = ((addr >> 10) & 0x3) == 0x3;
 if (ro) → trap_IllegalInstruction
}

Address space layout

  • 0x000–0x0FF — User CSRs (U-mode)
  • 0x100–0x1FF — Supervisor CSRs
  • 0x300–0x3FF — Machine CSRs
  • 0xC00–0xCFF — User read-only counters

Key registers

  • MSTATUS 0x300 — global interrupt enable, MPP/SPP
  • MTVEC 0x305 — trap handler address
  • MEPC 0x341 — return address after trap
  • MCAUSE 0x342 — trap cause code
  • MIE/MIP 0x304/0x344 — enable/pending
  • MIDELEG 0x303 — delegate to S-mode
15 / 33

rv32.cpp · readCsrRaw / writeCsrRaw

Shadow Registers — SSTATUS ⊂ MSTATUS

Supervisor sees a masked view of machine state

u32 readCsrRaw(u32 addr) {
 switch (addr) {
 case CSR_SSTATUS:
  // SSTATUS is MSTATUS & legal S-bits
  return csr.data[CSR_MSTATUS]
    & 0x000de162;
 case CSR_SIE:
  return csr.data[CSR_MIE] & 0x222;
 case CSR_SIP:
  return csr.data[CSR_MIP] & 0x222;
 case CSR_CYCLE:
  return clock;
 case CSR_TIME:
  return clint.mtime_lo;
 default:
  return csr.data[addr & 0xffff];
 }
}

The 0x000de162 mask

Exposes only the SSTATUS-legal bits of MSTATUS: SD, MXR, SUM, XS, FS, SPP, SPIE, SIE. M-mode bits (MPP, MPIE, MIE) are hidden from S-mode.

0x222 delegation mask (SIE/SIP)

Bit pattern 0b001000100010 exposes only SEIP (bit 9), STIP (bit 5), SSIP (bit 1) — the supervisor-visible interrupt bits within MIE/MIP.

No separate storage

SSTATUS, SIE, SIP have no backing storage. They're computed on read and written back as masked updates to the M-mode registers. One source of truth.

16 / 33

emu.cpp · CSRRW / CSRRS / CSRRC

CSR Instructions — Atomic Read-Modify-Write

// CSRRW rd, csr, rs1
// rd = CSR; CSR = rs1
imp(csrrw, FormatCSR, {
 WR_CSR(cpu.xreg[ins.rs]);
 WR_RD(ins.value) // pre-read
})

// CSRRS rd, csr, rs1
// rd = CSR; if rs1 != 0: CSR |= rs1
imp(csrrs, FormatCSR, {
 u32 rs = cpu.xreg[ins.rs];
 if (rs != 0) WR_CSR(ins.value | rs);
 WR_RD(ins.value)
})

// CSRRC rd, csr, rs1
// rd = CSR; if rs1 != 0: CSR &= ~rs1
imp(csrrc, FormatCSR, {
 u32 rs = cpu.xreg[ins.rs];
 if (rs != 0) WR_CSR(ins.value & ~rs);
 WR_RD(ins.value)
})

Pre-read in insSelect

Before any instruction runs, ins_FormatCSR.value is populated by calling getCsr(). Instructions receive the old value in ins.value — enabling atomic semantics.

Atomicity model

Read happens before dispatch. Write is committed via WR_CSR in the result struct after the instruction returns. The old value goes to rd simultaneously — true read-modify-write with no window.

Immediate variants

CSRRWI / CSRRSI / CSRRCI use a 5-bit zero-extended immediate from ins.rs instead of a register value. Same atomic semantics, different source.

17 / 33

rv32.h · rv32.cpp

Privilege Levels

Machine (3)

  • Boots here
  • Full hardware access
  • Owns M-mode CSRs
  • Final trap handler
  • Controls delegation

Supervisor (1)

  • OS kernel runs here
  • Sees shadow CSRs
  • Handles delegated traps
  • Controls U-mode
  • SATP / MMU (TODO)

User (0)

  • User processes
  • No privileged ops
  • ecall → S/M trap
  • No direct CSR access
  • Sandboxed by OS
#define PRIV_USER 0
#define PRIV_SUPERVISOR 1
#define PRIV_MACHINE 3

// Stored in csr.privilege
// Changed by traps and xRET

mret / sret restore privilege from MSTATUS.MPP / SSTATUS.SPP and re-enable interrupts via MPIE→MIE. Delegation registers (MIDELEG / MEDELEG) control which privilege level handles each trap — checked in handleTrap().

18 / 33

rv32.cpp · handleIrqAndTrap()

Trap & IRQ Entry Point

Called every instruction

After commit, handleIrqAndTrap(&ret) checks: did the instruction raise a trap? Are any interrupts pending and enabled?

Priority: traps first

If ret.trap.en is set, skip IRQ scan — handle the synchronous exception. IRQs only checked when no synchronous trap fired.

MIP scan order

MEIP → MSIP → MTIP → SEIP → SSIP → STIP — first match wins. MIP bit is cleared after handling.

void handleIrqAndTrap(ins_ret *ret) {
 bool trap = ret->trap.en;
 u32 cur_mip = readCsrRaw(CSR_MIP);
 u32 mip_reset = MIP_ALL;

 if (!trap) {
  // Check: which IRQs are enabled?
  u32 mirq = cur_mip
   & readCsrRaw(CSR_MIE);

  switch (mirq & MIP_ALL) {
   HANDLE(MIP_MEIP, trap_MachineExternalInterrupt)
   HANDLE(MIP_MSIP, trap_MachineSoftwareInterrupt)
   HANDLE(MIP_MTIP, trap_MachineTimerInterrupt)
   HANDLE(MIP_SEIP, trap_SupervisorExternalInterrupt)
  }
 }
 bool irq = (mip_reset != MIP_ALL);
 if (trap || irq) {
  bool ok = handleTrap(ret, irq);
  if (ok && irq) // clear MIP bit
   writeCsrRaw(CSR_MIP,
    cur_mip & ~mip_reset);
 }
}
19 / 33

rv32.cpp · handleTrap()

Trap Handler — Delegation & Vector

// 1. Determine target privilege via delegation
u32 mdeleg = readCsrRaw(isIRQ
  ? CSR_MIDELEG : CSR_MEDELEG);
u32 pos = t.type & 0xFFFF; // cause bit
u32 new_priv =
 ((mdeleg >> pos) & 1) == 0
  ? PRIV_MACHINE // M handles it
  : PRIV_SUPERVISOR; // delegated to S

// 2. Write trap registers
writeCsrRaw(EPC, pc); // save return addr
writeCsrRaw(CAUSE, t.type); // why we trapped
writeCsrRaw(TVAL, t.value); // bad addr / insn

// 3. Jump to handler vector
ret->pc_val = readCsrRaw(TVEC);
if ((ret->pc_val & 0x3) != 0) {
 // Vectored mode: base + 4 * cause
 ret->pc_val = (ret->pc_val & ~0x3)
             + 4 * pos;
}

// 4. Update MSTATUS / SSTATUS
// MIE→MPIE, privilege→MPP, MIE=0

TVEC modes

TVEC[1:0] == 0Direct: all traps jump to base.
TVEC[1:0] != 0Vectored: jump to base + 4 × cause. Vectored mode lets the hardware jump directly to per-interrupt handlers without a dispatch table in software.

MSTATUS update

MIE → saved to MPIE, then MIE = 0 (disable further IRQs).
Current privilege → saved to MPP.
Privilege → new_privilege.
mret reverses this: MPIE→MIE, MPP→privilege.

20 / 33

rv32.h · rv32.cpp · CLINT

CLINT — Core-Local Interruptor

typedef struct {
 bool msip; // SW interrupt
 u32 mtimecmp_lo; // timer compare
 u32 mtimecmp_hi;
 u32 mtime_lo; // current time
 u32 mtime_hi;
} clint_state;

Memory-mapped registers

  • 0x02000000 — MSIP (software IRQ)
  • 0x02004000 — MTIMECMP lo
  • 0x02004004 — MTIMECMP hi
  • 0x0200BFF8 — MTIME lo
  • 0x0200BFFC — MTIME hi

Timer interrupt flow

  1. OS writes MTIMECMP = MTIME + period
  2. Hardware increments MTIME each cycle (via clint_state)
  3. When MTIME >= MTIMECMP: MIP.MTIP = 1
  4. handleIrqAndTrap() detects MIP_MTIP & MIE.MTIE
  5. Trap to MTVEC with cause = trap_MachineTimerInterrupt

Byte-level access

memGetByte / memSetByte have case-per-byte entries for all 8 CLINT registers. Little-endian: byte 0 at base, byte 3 at base+3. 64-bit time split across two 32-bit registers.

21 / 33

rv32.cpp · uartTick()

UART — Serial Interrupt Path

Packed registers

Two u32s hold 8 UART registers, each accessed via UART_GET1/2 + shift macros.
rbr_thr_ier_iir: RBR·THR·IER·IIR
lcr_mcr_lsr_scr: LCR·MCR·LSR·SCR

IIR update rule

RBR != 0 && IER.RXINTIIR_RD_AVAILABLE (4)
THR == 0 && IER.THREIIR_THR_EMPTY (2)
Otherwise → IIR_NO_INTERRUPT (7)

SEIP path

uart.interrupting = trueemu.cpp sets MIP.SEIPhandleIrqAndTrap fires trap_SupervisorExternalInterrupt.

void uartTick() {
 // TX: flush THR to stdout periodically
 if ((clock & 0x16) == 0
  && UART_GET1(THR) != 0) {
  printf("%c", (char)UART_GET1(THR));
  UART_SET1(THR, 0);
  UART_SET2(LSR, LSR | LSR_THR_EMPTY);
  uartUpdateIir();
  if (UART_GET1(IER) & IER_THREINT_BIT)
   uart.thre_ip = true;
 }
 // Set interrupting flag if any IP
 uart.interrupting =
  uart.thre_ip || rx_ip;
}
22 / 33

loader.cpp · loadElf()

ELF Loading — Sections to RAM

Steps

  1. Read Elf32_Ehdr (52 bytes)
  2. Verify ELFMAG magic
  3. Reject ELF64 (not supported)
  4. Read all e_shnum section headers
  5. Filter SHT_PROGBITS sections
  6. Copy each section to data[sh_addr & 0x7FFFFFFF]

Address stripping

ELF virtual addresses start at 0x80000000. Masking with 0x7FFFFFFF gives the physical offset into the 128 MiB flat buffer — byte 0 of RAM.

// Collect loadable sections
for (const auto &sh : sh_tbl) {
 if (sh.sh_type == SHT_PROGBITS) {
  ElfSection section{
   sh.sh_addr & 0x7FFFFFFF,
   sh.sh_offset,
   sh.sh_size
  };
  sections.push_back(section);
 }
}

// DMA sections into emulated RAM
for (auto &s : sections) {
 s.sData.resize(s.size);
 lseek(fd, s.offset, SEEK_SET);
 read(fd, s.sData.data(), s.size);
 std::copy(s.sData.begin(),
      s.sData.end(),
      data + s.addr_real);
}
23 / 33

rv32.cpp · memGetByte

Physical Memory Map

0x00001020 — 0x00001FFF  DTB blob
0x02000000 — 0x0200BFFF  CLINT
0x10000000 — 0x10000007  UART 16550
0x80000000 — 0x87FFFFFF  128 MiB RAM

Bit-31 fast path

memGetByte first asserts addr & 0x80000000. After the MMIO switch table falls through, the final check is return mem[addr & 0x7FFFFFFF] — no branches for the hot path.

DTB access

Device Tree Blob is read from a separate pointer (cpu.dtb), mapped at 0x1020. x11 is pre-set to this address on reset following the Linux boot protocol.

Write path

memSetByte mirrors the structure: same MMIO addresses in a switch. UART writes to THR trigger uartUpdateIir() immediately. CLINT writes update the packed 32-bit fields byte-by-byte.

Emulated peripherals

No DMA, no PCI, no GPU. Just CLINT (timer + software IRQ) and UART 16550-compatible serial port — enough to boot Linux with a console.

24 / 33

Makefile · main.cpp · app.cpp

Building & Running RVE

make all

Compiles src/*.cpp + ImGui + ImPlot + disassembler. Flags: g++ -std=c++17 -O2 -Wall. Links SDL2 + OpenGL + Cocoa/IOKit (macOS) or libGL + SDL2 (Linux). Output: build/rve.

make run

Builds then launches ./build/rve with no extra flags — GUI mode (SDL2 + ImGui). Emulator paused on start. Load a binary from the UI or pass -e <elf> / -b <bin>.

make isas

Runs all 60 ISA compliance tests: ./build/rve -re <test> per binary. -r = auto-start (emu.running=true), -e <file> = load ELF. All 60 tests must exit cleanly.

make linuxn

Downloads linux-6.1.14-rv32nommu image, then: ./build/rve -n -b Image. Flag -n = headless — no SDL window, pure terminal I/O. stdin set to raw mode so keystrokes go directly to guest UART.

make linux

Same image download, then: ./build/rve -r -b Image. Opens GUI window and auto-starts emulator (-r sets emu.running=true). Linux UART output visible in the ImGui console panel.

Entry point split

main.cpp: if -nrunHeadless() tight loop (while (emu.running) emu.emulate()). Otherwise → App SDL2 window + ImGui render loop calling emu.emulate() each frame.

25 / 33

assets/isa-test/ · Makefile · RISC-V ISA compliance

ISA Test Suite — make isas

# Filter dump files, iterate binaries
ISA_TEST_FILES = $(filter-out %.dump,
  $(notdir $(wildcard $(ISA_TEST_DIR)/*))
)

isas: all
 $(foreach test, $(ISA_TEST_FILES),
  ./build/rve -re isa-test/$(test);)

# -r = emu.running=true (auto-start)
# -e <file> = load as ELF via loadElf()
# ; = continue on failure, don't stop

Test suites — 60 total

  • rv32ui-p-* — RV32I integer (43): add/sub/load/store/branch/jump/lui/auipc…
  • rv32um-p-* — RV32M multiply/divide (8): mul/mulh/div/rem…
  • rv32ua-p-* — RV32A atomics (9): amoswap/amoadd/amoand/lrsc…
  • rv32mi/si-p-* — Machine/Supervisor CSR tests

Binary format

Each is a bare-metal ELF32 loaded at 0x80000000. No OS, no MMU (-p- = physical-only mode). Test runs in M-mode with no trap delegation. Exercises raw instruction semantics.

Pass / fail mechanism

Pass: test writes 0x55 to SYSCON at 0x11100000syscon_cmd = 0x5555 → emulator prints SYSCON POWEROFF and halts. Fail: any illegal instruction trap or loop timeout before SYSCON write.

Dump files

Each test ships with a *.dump disassembly, excluded from the run by filter-out %.dump. Cross-reference the dump with a failing PC to locate the broken instruction.

26 / 33

loader.cpp · rv32.cpp · RISC-V Linux boot protocol

Linux Image Loading & Boot Setup

// loadLinuxImage() — flat binary, not ELF
int loadLinuxImage(path, data, len) {
 ifstream file(path, binary | ate);
 // ≈7–8 MiB: kernel + initramfs
 file.read((char*)data, file_size);
 // data[0..N] = Image bytes
}

// CPU reset state for Linux boot:
pc         = 0x80000000; // → mem[0]
xreg[0xa] = 0x0000;   // a0 = hart ID (0)
xreg[0xb] = 0x1020;   // a1 = DTB ptr

Image format

Not an ELF — a flat binary: Linux 6.1.14 rv32nommu kernel + BusyBox initramfs. Load address is 0x80000000; loading at buffer offset 0 is identical since 0x80000000 & 0x7FFFFFFF = 0.

Device Tree Blob (DTB)

A compiled .dtb mapped at mem[0x1020] via a separate cpu.dtb pointer. Describes the machine: 1 hart at 100 MHz, RAM at 0x80000000 (128 MiB), UART at 0x10000000, CLINT at 0x02000000.

RISC-V Linux boot ABI

Standard convention: a0 = hart ID, a1 = physical DTB address. Kernel entry reads DTB immediately to discover memory map, configure UART and CLINT drivers, then starts the scheduler. No firmware (BBL/OpenSBI) layer needed — kernel boots directly in M-mode.

27 / 33

rv32.cpp · mmuUpdate() · Sv32 ISA spec §4.3

Sv32 MMU — SATP & Page Table Entry Format

SATP CSR (0x180) — supervisor address translation & protection

MODE
[31]
ASID (ignored)
[30:22]
PPN — root page directory frame
[21:0]
// Called on every CSRW satp, rs1
void mmuUpdate(u32 satp) {
 mmu.mode = (satp >> 31) & 1;
 // 0 = MMU_MODE_OFF (bare/physical)
 // 1 = MMU_MODE_SV32 (paged Sv32)
 mmu.ppn = satp & 0x3fffff;
 // root page dir PA = mmu.ppn × 4096
}

PTE format — 32-bit entry at each level of the page table

PPN[1]
[31:20]
PPN[0]
[19:10]
RSW
[9:8]
D
[7]
A
[6]
G
[5]
U
[4]
X
[3]
W
[2]
R
[1]
V
[0]
  • V valid · R readable · W writable · X executable · U user page
  • A accessed · D dirty · RSW reserved for OS software
  • Leaf PTE: R|X == 1 → points to a page frame
  • Non-leaf: R==0 && X==0 → PPN points to next-level table
  • Invalid: V==0 or (!R && W) → immediate page fault
28 / 33

rv32.cpp · mmuTranslate() · called on every fetch, load & store

Sv32 Two-Level Page Table Walk

32-bit virtual address decomposition

VPN[1]
[31:22]
VPN[0]
[21:12]
page offset
[11:0]

4 KiB page PA

PPN[1]<<22 | PPN[0]<<12 | offset[11:0]

4 MiB superpage

Leaf at level 0; PPN[0] must be 0 (else fault).
PPN[1]<<22 | VPN[0]<<12 | offset[11:0]

for (int level = 0; level < 2; level++) {
 u32 page_addr;
 if (level == 0) {
  // L0: root PD, indexed by VPN[1]
  page_addr = mmu.ppn * 4096u
            + ((addr >> 22) & 0x3ff) * 4u;
 } else {
  // L1: L0 PTE's PPN, indexed by VPN[0]
  page_addr = (ppn0 | (ppn1 << 10)) * 4096u
            + ((addr >> 12) & 0x3ff) * 4u;
 }
 u32 pte = memGetWord(page_addr);
 ppn0 = (pte >> 10) & 0x3ff; // PTE[19:10]
 ppn1 = (pte >> 20) & 0xfff; // PTE[31:20]

 if (!V || (!R && W)) MMU_FAULT; // invalid
 if (R || X) break;           // leaf found
 else if (level == 1) MMU_FAULT; // L1 non-leaf
}
// Assemble physical address
u32 pa = addr & 0xfff; // page offset
pa |= super ? ((addr>>12)&0x3ff)<<12 : ppn0<<12;
pa |= ppn1 << 22;
return pa;
29 / 33

rv32.cpp · mmuTranslate() · Linux rv32nommu

Permissions, Page Faults & nommu Linux

// Fast exit for bare mode
if (mmu.mode == MMU_MODE_OFF)
  return addr; // zero overhead — always taken by nommu

// MPRV: M-mode load/store use MPP privilege
priv = ((mstatus>>17)&1)
  ? (mstatus>>11)&3 : csr.privilege;

// M-mode always physical, skip walk
if (priv == PRIV_MACHINE) return addr;

// mstatus flags for permission check
sum = (mstatus>>18)&1; // S-mode → U pages
mxr = (mstatus>>19)&1; // exec → readable

// After walk: gate on privilege + access mode
perm = (SUPERVISOR && (!U || sum))
     || (USER && U);
access = (FETCH&&X) || (READ&&(R||(X&&mxr)))
       || (WRITE&&W);
if (!(perm&&access)) MMU_FAULT;
if (super&&ppn0!=0) MMU_FAULT; // bad superpage
if (!A||(WRITE&&!D)) MMU_FAULT; // A/D not set

Page fault types

  • InstructionPageFault (12) — fetch failed
  • LoadPageFault (13) — data read failed
  • StorePageFault (15) — data write failed

All set ret.trap.en and propagate through handleIrqAndTrap(). Delegated to S-mode if MEDELEG bit is set — kernel's page fault handler runs.

A & D bit requirement

Hardware does not auto-set A or D bits — the OS must set them before a page is accessed. Any access to A=0, or write to D=0, faults identically to a permission failure and must be handled by the kernel.

nommu Linux — zero MMU cost

rv32nommu kernel never executes csrw satp. mmu.mode stays MMU_MODE_OFF = 0 for the entire run. Every memory access hits the first branch and returns the physical address immediately — the full Sv32 walk code is compiled in but never reached at runtime.

30 / 33

emu.cpp · rv32.h · RV32F + RV32D

Floating-Point Extensions

// rv32.h — unified 64-bit FP register file
u64 freg[32];  // reset: canonical qNaN-boxed

// Write single (NaN-box upper 32 bits = 0xFFFFFFFF)
u32 bits; memcpy(&bits, &f, 4);
cpu.freg[rd] = 0xFFFFFFFF00000000ULL | bits;

// Read single — validate NaN-box (RISC-V §11.3)
u64 v = cpu.freg[rs];
if ((v >> 32) != 0xFFFFFFFFu)
  return canonical_qNaN;

// Write double — full 64-bit, no boxing needed
cpu.freg[rd] = bits;

FCSR — Control & Status Register

Bits [7:5]: frm — rounding mode (RNE · RTZ · RDN · RUP · RMM)
Bits [4:0]: fflags — exception flags (NV · DZ · OF · UF · NX)
Accessible as CSR_FFLAGS (0x001), CSR_FRM (0x002), CSR_FCSR (0x003)

RV32F — Single Precision

  • FADD/FSUB/FMUL/FDIV/FSQRT — arithmetic
  • FMADD/FMSUB/FNMADD/FNMSUB — fused multiply-add
  • FSGNJ/FSGNJN/FSGNJX — sign injection
  • FMIN/FMAX — IEEE 754-2008 minNum/maxNum
  • FEQ/FLT/FLE — compare → integer result
  • FCVT.W.S / FCVT.S.W — int ↔ float convert

RV32D — Double Precision

  • Same arithmetic & FMA ops as F on 64-bit doubles
  • FCVT.D.S / FCVT.S.D — widen / narrow
  • FCVT.W.D / FCVT.D.W — int ↔ double convert
  • Shares freg[32] — no extra register file

ISA Tests

All rv32{f,d} ISA tests pass. hello_linux printf of floats runs natively — musl libc soft-float helpers (__adddf3, __floatsidf) execute correctly via the F/D extensions.

31 / 33

hello_linux/framebuff.c · /dev/fb0

Linux Framebuffer Demo

// Open Linux framebuffer device
int fd = open("/dev/fb0", O_RDWR);
struct fb_var_screeninfo vinfo;
ioctl(fd, FBIOGET_VSCREENINFO, &vinfo);
g_w = vinfo.xres; g_h = vinfo.yres;
g_buf = malloc(g_w * g_h * 4);

// Dispatch to one of 10 render patterns
switch (pat) {
 case 1: pat_smpte();     break; // SMPTE colour bars
 case 3: pat_mandelbrot(); break; // fixed-pt 4.12
 case 5: pat_cube();      break; // 3-D wireframe
 case 9: pat_julia();     break; // Julia set
 /* ... 6 more patterns ... */
}

// Flush pixel buffer to framebuffer
lseek(fd, 0, SEEK_SET);
write(fd, g_buf, g_w * g_h * 4);

10 Render Patterns

  • SMPTE bars — broadcast colour reference
  • HSV gradient / colour wheel — float hsv2rgb()
  • Mandelbrot / Julia set — fixed-point 4.12 (int64 mul)
  • Plasma — sin lookup table + interference
  • 3-D wireframe cube — float rotation & perspective
  • Sierpinski / Rings / Lissajous

Why it works

RVE's Linux kernel exposes /dev/fb0 via MMIO. The program uses standard POSIX calls (open / ioctl / write) — no special emulator hooks needed. The entire graphics stack runs as normal Linux userspace.

Float in action

hsv2rgb() uses fmodf / fabsf. The 3-D cube uses cosf / sinf for rotation and float perspective divide — all running as native RV32F instructions through the emulator's F-extension.

32 / 33

That's a wrap

Decode · execute · trap · page · boot

Full RISC-V pipeline — from instruction bits to Linux userspace in ~1000 lines of C++

Key insight #1

Format parsers and CSR pre-reads happen unconditionally. Dispatch is dead-simple masked switches. 60 bare-metal ISA tests validate every instruction independently before running Linux.

Key insight #2

Sv32 is a two-level table walk: 2× memGetWord() calls plus permission gates. The entire translation is ~20 lines. nommu Linux bypasses it in the very first branch — zero overhead.

Key insight #3

Linux boot needs two things: flat Image at mem[0] and a1 = DTB pointer. The DTB encodes the entire machine description — no firmware layer (OpenSBI/BBL) needed. Kernel runs directly in M-mode.

RISC-V Logo
33 / 33