10 "Zc*" Extension for Code Size Reduction, Version 1.0.0

10.1 Zc* Overview

Zc* is a group of extensions that define subsets of the existing C extension (Zca, Zcd, Zcf) and new extensions which only contain 16-bit encodings.

Zcm* all reuse the encodings for c.fld, c.fsd, c.fldsp, c.fsdsp.

Instruction	Zca	Zcf	Zcd	Zcb	Zcmp	Zcmt
The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores
C excl. c.f*	yes
The Zcf extension is added as a way to refer to compressed single-precision floating-point load/stores
c.flw		rv32
c.flwsp		rv32
c.fsw		rv32
c.fswsp		rv32
The Zcd extension is added as a way to refer to compressed double-precision floating-point load/stores
c.fld			yes
c.fldsp			yes
c.fsd			yes
c.fsdsp			yes
Simple operations for use on all architectures
c.lbu				yes
c.lh				yes
c.lhu				yes
c.sb				yes
c.sh				yes
c.zext.b				yes
c.sext.b				yes
c.zext.h				yes
c.sext.h				yes
c.zext.w				yes
c.mul				yes
c.not				yes
*PUSH/POP and double move which overlap with c.fsdsp. Complex operations intended for embedded CPUs*
cm.push					yes
cm.pop					yes
cm.popret					yes
cm.popretz					yes
cm.mva01s					yes
cm.mvsa01					yes
*Table jump which overlaps with c.fsdsp. Complex operations intended for embedded CPUs*
cm.jt						yes
cm.jalt						yes

10.2 C

The C extension is the superset of the following extensions:

Zca
Zcf if F is specified (RV32 only)
Zcd if D is specified

As C defines the same instructions as Zca, Zcf, and Zcd, the rule is that:

C always implies Zca
C+F implies Zcf (RV32 only)
C+D implies Zcd

10.3 Zce

The Zce extension is intended to be used for microcontrollers, and includes all relevant Zc extensions.

Specifying Zce on RV32 without F includes Zca, Zcb, Zcmp, Zcmt
Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp, Zcmt and Zcf
Specifying Zce on RV64 always includes Zca, Zcb, Zcmp, Zcmt
- Zcf doesn’t exist for RV64

Therefore common ISA strings can be updated as follows to include the relevant Zc extensions, for example:

RV32IMC_Zcb_Zcmp_Zcmt is equivalent to RV32IM_Zce
RV32IMFC_Zcb_Zcmp_Zcmt is equivalent to RV32IMF_Zce

10.4 MISA.C

MISA.C is set if the following extensions are selected:

Zca and not F
Zca, Zcf and F (but not D) is specified (RV32 only)
Zca, Zcf and Zcd if D is specified (RV32 only)
- this configuration excludes Zcmp, Zcmt
Zca, Zcd if D is specified (RV64 only)
- this configuration excludes Zcmp, Zcmt

10.5 Zca

The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores.

Therefore it excluded all 16-bit floating point loads and stores: c.flw, c.flwsp, c.fsw, c.fswsp, c.fld, c.fldsp, c.fsd, c.fsdsp.

note

the C extension only includes F/D instructions when D and F are also specified

10.6 Zcf (RV32 only)

Zcf is the existing set of compressed single precision floating point loads and stores: c.flw, c.flwsp, c.fsw, c.fswsp.

Zcf is only relevant to RV32, it cannot be specified for RV64.

The Zcf extension depends on the _zca and F extensions.

10.7 Zcd

Zcd is the existing set of compressed double precision floating point loads and stores: c.fld, c.fldsp, c.fsd, c.fsdsp.

The Zcd extension depends on the _zca and D extensions.

10.8 Zcb

Zcb has simple code-size saving instructions which are easy to implement on all CPUs.

All encodings are currently reserved for all architectures, and have no conflicts with any existing extensions.

note

Zcb can be implemented on any CPU as the instructions are 16-bit versions of existing 32-bit instructions from the application class profile.

The Zcb extension depends on the _zca extension.

As shown on the individual instruction pages, many of the instructions in Zcb depend upon another extension being implemented. For example, c.mul is only implemented if M or Zmmul is implemented, and c.sext.b is only implemented if Zbb is implemented.

The c.mul encoding uses the CA register format along with other instructions such as c.sub, c.xor etc.

note

c.sext.w is a pseudoinstruction for c.addiw rd, 0 (RV64)

RV32	RV64	Mnemonic	Instruction
yes	yes	c.lbu rd', uimm(rs1')	insns-c_lbu
yes	yes	c.lhu rd', uimm(rs1')	insns-c_lhu
yes	yes	c.lh rd', uimm(rs1')	insns-c_lh
yes	yes	c.sb rs2', uimm(rs1')	insns-c_sb
yes	yes	c.sh rs2', uimm(rs1')	insns-c_sh
yes	yes	c.zext.b rsd'	insns-c_zext_b
yes	yes	c.sext.b rsd'	insns-c_sext_b
yes	yes	c.zext.h rsd'	insns-c_zext_h
yes	yes	c.sext.h rsd'	insns-c_sext_h
	yes	c.zext.w rsd'	insns-c_zext_w
yes	yes	c.not rsd'	insns-c_not
yes	yes	c.mul rsd', rs2'	insns-c_mul

10.9 Zcmp

The Zcmp extension is a set of instructions which may be executed as a series of existing 32-bit RISC-V instructions.

This extension reuses some encodings from c.fsdsp. Therefore it is incompatible with _zcd, which is included when C and D extensions are both present.

note

Zcmp is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with application class profiles.

The Zcmp extension depends on the _zca extension.

The PUSH/POP assembly syntax uses several variables, the meaning of which are:

reg_list is a list containing 1 to 13 registers (ra and 0 to 12 s registers)
- valid values: {ra}, {ra, s0}, {ra, s0-s1}, {ra, s0-s2}, …, {ra, s0-s8}, {ra, s0-s9}, {ra, s0-s11}
- note that {ra, s0-s10} is not valid, giving 12 lists not 13 for better encoding
stack_adj is the total size of the stack frame.
- valid values vary with register list length and the specific encoding, see the instruction pages for details.

RV32	RV64	Mnemonic	Instruction
yes	yes	cm.push {reg_list}, -stack_adj	insns-cm_push
yes	yes	cm.pop {reg_list}, stack_adj	insns-cm_pop
yes	yes	cm.popret {reg_list}, stack_adj	insns-cm_popret
yes	yes	cm.popretz {reg_list}, stack_adj	insns-cm_popretz
yes	yes	cm.mva01s rs1', rs2'	insns-cm_mva01s
yes	yes	cm.mvsa01 r1s', r2s'	insns-cm_mvsa01

10.10 Zcmt

Zcmt adds the table jump instructions and also adds the jvt CSR. The jvt CSR requires a state enable if Smstateen is implemented. See csrs-jvt for details.

This extension reuses some encodings from c.fsdsp. Therefore it is incompatible with _zcd, which is included when C and D extensions are both present.

note

Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with RVA profiles.

The Zcmt extension depends on the _zca and Zicsr extensions.

RV32	RV64	Mnemonic	Instruction
yes	yes	cm.jt index	insns-cm_jt
yes	yes	cm.jalt index	insns-cm_jalt

10.11 Zc instruction formats

Several instructions in this specification use the following new instruction formats.

Format	instructions	15:10	9	7	6	5	4	3	1
CLB	c.lbu	funct6	rs1'		uimm		rd'		op
CSB	c.sb	funct6	rs1'		uimm		rs2'		op
CLH	c.lhu, c.lh	funct6	rs1'		funct1	uimm	rd'		op
CSH	c.sh	funct6	rs1'		funct1	uimm	rs2'		op
CU	c.[sz]ext.*, c.not	funct6	rd'/rs1'		funct5				op
CMMV	cm.mvsa01 cm.mva01s	funct6	r1s'		funct2		r2s'		op
CMJT	cm.jt cm.jalt	funct6	index						op
CMPP	cm.push, cm.pop	funct6	funct2	urlist				spimm	op

note

c.mul uses the existing CA format.

10.12 Zcb instructions

10.12.1 c.lbu

Synopsis Load unsigned byte, 16-bit encoding

Mnemonic c.lbu rd', uimm(rs1')

Encoding (RV32, RV64):

6a88a9faa28e44830b36666d97c7b602

The immediate offset is formed as follows:

  uimm[31:2] = 0;
  uimm[1]    = encoding[5];
  uimm[0]    = encoding[6];

Description This instruction loads a byte from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting byte is zero extended to XLEN bits and is written to rd'.

note

rd' and rs1' are from the standard 8-register set x8-x15.

Prerequisites None

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

X(rdc) = EXTZ(mem[X(rs1c)+EXTZ(uimm)][7..0]);

10.12.2 c.lhu

Synopsis Load unsigned halfword, 16-bit encoding

Mnemonic c.lhu rd', uimm(rs1')

Encoding (RV32, RV64):

43c8288d7cc42c9adafbe09cf7b4e1d6

The immediate offset is formed as follows:

  uimm[31:2] = 0;
  uimm[1]    = encoding[5];
  uimm[0]    = 0;

Description This instruction loads a halfword from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting halfword is zero extended to XLEN bits and is written to rd'.

note

rd' and rs1' are from the standard 8-register set x8-x15.

Prerequisites None

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

X(rdc) = EXTZ(load_mem[X(rs1c)+EXTZ(uimm)][15..0]);

10.12.3 c.lh

Synopsis Load signed halfword, 16-bit encoding

Mnemonic c.lh rd', uimm(rs1')

Encoding (RV32, RV64):

fa16976c8b37f108e22ec37ca739580e

The immediate offset is formed as follows:

  uimm[31:2] = 0;
  uimm[1]    = encoding[5];
  uimm[0]    = 0;

Description This instruction loads a halfword from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting halfword is sign extended to XLEN bits and is written to rd'.

note

rd' and rs1' are from the standard 8-register set x8-x15.

Prerequisites None

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

X(rdc) = EXTS(load_mem[X(rs1c)+EXTZ(uimm)][15..0]);

10.12.4 c.sb

Synopsis Store byte, 16-bit encoding

Mnemonic c.sb rs2', uimm(rs1')

Encoding (RV32, RV64):

0d7d669fd86ed1da98eef367972b79b4

The immediate offset is formed as follows:

  uimm[31:2] = 0;
  uimm[1]    = encoding[5];
  uimm[0]    = encoding[6];

Description This instruction stores the least significant byte of rs2' to the memory address formed by adding rs1' to the zero extended immediate uimm.

note

rs1' and rs2' are from the standard 8-register set x8-x15.

Prerequisites None

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

mem[X(rs1c)+EXTZ(uimm)][7..0] = X(rs2c)

10.12.5 c.sh

Synopsis Store halfword, 16-bit encoding

Mnemonic c.sh rs2', uimm(rs1')

Encoding (RV32, RV64):

c253afc565edb07abf84ed195af4d946

The immediate offset is formed as follows:

  uimm[31:2] = 0;
  uimm[1]    = encoding[5];
  uimm[0]    = 0;

Description This instruction stores the least significant halfword of rs2' to the memory address formed by adding rs1' to the zero extended immediate uimm.

note

rs1' and rs2' are from the standard 8-register set x8-x15.

Prerequisites None

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

mem[X(rs1c)+EXTZ(uimm)][15..0] = X(rs2c)

10.12.6 c.zext.b

Synopsis Zero extend byte, 16-bit encoding

Mnemonic c.zext.b rd'/rs1'

Encoding (RV32, RV64):

7df015bff884769ed7d5325ed2a8a581

Description This instruction takes a single source/destination operand. It zero-extends the least-significant byte of the operand to XLEN bits by inserting zeros into all of the bits more significant than 7.

note

rd'/rs1' is from the standard 8-register set x8-x15.

Prerequisites None

32-bit equivalent:

andi rd'/rs1', rd'/rs1', 0xff

note

The SAIL module variable for rd'/rs1' is called rsdc.

Operation

X(rsdc) = EXTZ(X(rsdc)[7..0]);

10.12.7 c.sext.b

Synopsis Sign extend byte, 16-bit encoding

Mnemonic c.sext.b rd'/rs1'

Encoding (RV32, RV64):

8c314a3d0a5a77b235dee59f55e7da80

Description This instruction takes a single source/destination operand. It sign-extends the least-significant byte in the operand to XLEN bits by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits.

note

rd'/rs1' is from the standard 8-register set x8-x15.

Prerequisites Zbb is also required.

note

The SAIL module variable for rd'/rs1' is called rsdc.

Operation

X(rsdc) = EXTS(X(rsdc)[7..0]);

10.12.8 c.zext.h

Synopsis Zero extend halfword, 16-bit encoding

Mnemonic c.zext.h rd'/rs1'

Encoding (RV32, RV64):

42cba33c0b6101a10144a99222417304

Description This instruction takes a single source/destination operand. It zero-extends the least-significant halfword of the operand to XLEN bits by inserting zeros into all of the bits more significant than 15.

note

rd'/rs1' is from the standard 8-register set x8-x15.

Prerequisites Zbb is also required.

note

The SAIL module variable for rd'/rs1' is called rsdc.

Operation

X(rsdc) = EXTZ(X(rsdc)[15..0]);

10.12.9 c.sext.h

Synopsis Sign extend halfword, 16-bit encoding

Mnemonic c.sext.h rd'/rs1'

Encoding (RV32, RV64):

53ac98030f7c423c10bd310b094394dc

Description This instruction takes a single source/destination operand. It sign-extends the least-significant halfword in the operand to XLEN bits by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits.

note

rd'/rs1' is from the standard 8-register set x8-x15.

Prerequisites Zbb is also required.

note

The SAIL module variable for rd'/rs1' is called rsdc.

Operation

X(rsdc) = EXTS(X(rsdc)[15..0]);

10.12.10 c.zext.w

Synopsis Zero extend word, 16-bit encoding

Mnemonic c.zext.w rd'/rs1'

Encoding (RV64):

7d02f43b647e4db9e7437ca5a5ad11be

Description This instruction takes a single source/destination operand. It zero-extends the least-significant word of the operand to XLEN bits by inserting zeros into all of the bits more significant than 31.

note

rd'/rs1' is from the standard 8-register set x8-x15.

Prerequisites Zba is also required.

32-bit equivalent:

add.uw rd'/rs1', rd'/rs1', zero

note

The SAIL module variable for rd'/rs1' is called rsdc.

Operation

X(rsdc) = EXTZ(X(rsdc)[31..0]);

10.12.11 c.not

Synopsis Bitwise not, 16-bit encoding

Mnemonic c.not rd'/rs1'

Encoding (RV32, RV64):

31860dec1b3e4f4839b87447a241bc61

Description This instruction takes the one’s complement of rd'/rs1' and writes the result to the same register.

note

rd'/rs1' is from the standard 8-register set x8-x15.

Prerequisites None

32-bit equivalent:

xori rd'/rs1', rd'/rs1', -1

note

The SAIL module variable for rd'/rs1' is called rsdc.

Operation

X(rsdc) = X(rsdc) XOR -1;

10.12.12 c.mul

Synopsis Multiply, 16-bit encoding

Mnemonic c.mul rsd', rs2'

Encoding (RV32, RV64):

66afe3d9afe581d33d3c4e4fbf7599a0

Description This instruction multiplies XLEN bits of the source operands from rsd' and rs2' and writes the lowest XLEN bits of the result to rsd'.

note

rd'/rs1' and rs2' are from the standard 8-register set x8-x15.

Prerequisites M or Zmmul must be configured.

note

The SAIL module variable for rd'/rs1' is called rsdc, and for rs2' is called rs2c.

Operation

let result_wide = to_bits(2 * sizeof(xlen), signed(X(rsdc)) * signed(X(rs2c)));
X(rsdc) = result_wide[(sizeof(xlen) - 1) .. 0];

10.13 PUSH/POP register instructions

These instructions are collectively referred to as PUSH/POP:

insns-cm_push
insns-cm_pop
insns-cm_popret
insns-cm_popretz

The term PUSH refers to cm.push.

The term POP refers to cm.pop.

The term POPRET refers to cm.popret and cm.popretz.

Common details for these instructions are in this section.

10.13.1 PUSH/POP functional overview

PUSH, POP, POPRET are used to reduce the size of function prologues and epilogues.

The PUSH instruction

adjusts the stack pointer to create the stack frame
pushes (stores) the registers specified in the register list to the stack frame

The POP instruction

pops (loads) the registers in the register list from the stack frame
adjusts the stack pointer to destroy the stack frame

The POPRET instructions

pop (load) the registers in the register list from the stack frame
cm.popretz also moves zero into a0 as the return value
adjust the stack pointer to destroy the stack frame
execute a ret instruction to return from the function

10.13.2 Example usage

This example gives an illustration of the use of PUSH and POPRET.

The function processMarkers in the EMBench benchmark picojpeg in the following file on github: libpicojpeg.c

The prologue and epilogue compile with GCC10 to:

   0001098a \<processMarkers>:
   1098a:       711d                    addi    sp,sp,-96 ;#cm.push(1)
   1098c:       c8ca                    sw      s2,80(sp) ;#cm.push(2)
   1098e:       c6ce                    sw      s3,76(sp) ;#cm.push(3)
   10990:       c4d2                    sw      s4,72(sp) ;#cm.push(4)
   10992:       ce86                    sw      ra,92(sp) ;#cm.push(5)
   10994:       cca2                    sw      s0,88(sp) ;#cm.push(6)
   10996:       caa6                    sw      s1,84(sp) ;#cm.push(7)
   10998:       c2d6                    sw      s5,68(sp) ;#cm.push(8)
   1099a:       c0da                    sw      s6,64(sp) ;#cm.push(9)
   1099c:       de5e                    sw      s7,60(sp) ;#cm.push(10)
   1099e:       dc62                    sw      s8,56(sp) ;#cm.push(11)
   109a0:       da66                    sw      s9,52(sp) ;#cm.push(12)
   109a2:       d86a                    sw      s10,48(sp);#cm.push(13)
   109a4:       d66e                    sw      s11,44(sp);#cm.push(14)
...
   109f4:       4501                    li      a0,0      ;#cm.popretz(1)
   109f6:       40f6                    lw      ra,92(sp) ;#cm.popretz(2)
   109f8:       4466                    lw      s0,88(sp) ;#cm.popretz(3)
   109fa:       44d6                    lw      s1,84(sp) ;#cm.popretz(4)
   109fc:       4946                    lw      s2,80(sp) ;#cm.popretz(5)
   109fe:       49b6                    lw      s3,76(sp) ;#cm.popretz(6)
   10a00:       4a26                    lw      s4,72(sp) ;#cm.popretz(7)
   10a02:       4a96                    lw      s5,68(sp) ;#cm.popretz(8)
   10a04:       4b06                    lw      s6,64(sp) ;#cm.popretz(9)
   10a06:       5bf2                    lw      s7,60(sp) ;#cm.popretz(10)
   10a08:       5c62                    lw      s8,56(sp) ;#cm.popretz(11)
   10a0a:       5cd2                    lw      s9,52(sp) ;#cm.popretz(12)
   10a0c:       5d42                    lw      s10,48(sp);#cm.popretz(13)
   10a0e:       5db2                    lw      s11,44(sp);#cm.popretz(14)
   10a10:       6125                    addi    sp,sp,96  ;#cm.popretz(15)
   10a12:       8082                    ret               ;#cm.popretz(16)

with the GCC option -msave-restore the output is the following:

0001080e \<processMarkers>:
   1080e:       73a012ef                jal     t0,11f48 \<__riscv_save_12>
   10812:       1101                    addi    sp,sp,-32
...
   10862:       4501                    li      a0,0
   10864:       6105                    addi    sp,sp,32
   10866:       71e0106f                j       11f84 \<__riscv_restore_12>

with PUSH/POPRET this reduces to

0001080e \<processMarkers>:
   1080e:       b8fa                    cm.push    \{ra,s0-s11},-96
...
   10866:       bcfa                    cm.popretz \{ra,s0-s11}, 96

The prologue / epilogue reduce from 60-bytes in the original code, to 14-bytes with -msave-restore, and to 4-bytes with PUSH and POPRET. As well as reducing the code-size PUSH and POPRET eliminate the branches from calling the millicode save/restore routines and so may also perform better.

note

The calls to <riscv_save_0>/<riscv_restore_0> become 64-bit when the target functions are out of the ±1 MB range, increasing the prologue/epilogue size to 22-bytes.

note

POP is typically used in tail-calling sequences where ret is not used to return to ra after destroying the stack frame.

10.13.2.1 Stack pointer adjustment handling

The instructions all automatically adjust the stack pointer by enough to cover the memory required for the registers being saved or restored. Additionally the spimm field in the encoding allows the stack pointer to be adjusted in additional increments of 16-bytes. There is only a small restricted range available in the encoding; if the range is insufficient then a separate c.addi16sp can be used to increase the range.

10.13.2.2 Register list handling

There is no support for the {ra, s0-s10} register list without also adding s11. Therefore the {ra, s0-s11} register list must be used in this case.

10.13.3 PUSH/POP Fault handling

Correct execution requires that sp refers to idempotent memory (also see pushpop_non-idem-mem), because the core must be able to handle traps detected during the sequence. The entire PUSH/POP sequence is re-executed after returning from the trap handler, and multiple traps are possible during the sequence.

If a trap occurs during the sequence then xEPC is updated with the PC of the instruction, xTVAL (if not read-only-zero) updated with the bad address if it was an access fault and xCAUSE updated with the type of trap.

note

It is implementation defined whether interrupts can also be taken during the sequence execution.

10.13.4 Software view of execution

10.13.4.1 Software view of the PUSH sequence

From a software perspective the PUSH sequence appears as:

A sequence of stores writing the bytes required by the pseudocode
- The bytes may be written in any order.
- The bytes may be grouped into larger accesses.
- Any of the bytes may be written multiple times.
A stack pointer adjustment

note

If an implementation allows interrupts during the sequence, and the interrupt handler uses sp to allocate stack memory, then any stores which were executed before the interrupt may be overwritten by the handler. This is safe because the memory is idempotent and the stores will be re-executed when execution resumes.

The stack pointer adjustment must only be committed only when it is certain that the entire PUSH instruction will commit.

Stores may also return imprecise faults from the bus. It is platform defined whether the core implementation waits for the bus responses before continuing to the final stage of the sequence, or handles errors responses after completing the PUSH instruction.

For example:

cm.push  \{ra, s0-s5}, -64

Appears to software as:

# any bytes from sp-1 to sp-28 may be written multiple times before
# the instruction completes therefore these updates may be visible in
# the interrupt/exception handler below the stack pointer
sw  s5, -4(sp)
sw  s4, -8(sp)
sw  s3,-12(sp)
sw  s2,-16(sp)
sw  s1,-20(sp)
sw  s0,-24(sp)
sw  ra,-28(sp)

# this must only execute once, and will only execute after all stores
# completed without any precise faults, therefore this update is only
# visible in the interrupt/exception handler if cm.push has completed
addi sp, sp, -64

10.13.4.2 Software view of the POP/POPRET sequence

From a software perspective the POP/POPRET sequence appears as:

A sequence of loads reading the bytes required by the pseudocode.
- The bytes may be loaded in any order.
- The bytes may be grouped into larger accesses.
- Any of the bytes may be loaded multiple times.
A stack pointer adjustment
An optional li a0, 0
An optional ret

If a trap occurs during the sequence, then any loads which were executed before the trap may update architectural state. The loads will be re-executed once the trap handler completes, so the values will be overwritten. Therefore it is permitted for an implementation to update some of the destination registers before taking a fault.

The optional li a0, 0, stack pointer adjustment and optional ret must only be committed only when it is certain that the entire POP/POPRET instruction will commit.

For POPRET once the stack pointer adjustment has been committed the ret must execute.

For example:

cm.popretz \{ra, s0-s3}, 32;

Appears to software as:

# any or all of these load instructions may execute multiple times
# therefore these updates may be visible in the interrupt/exception handler
lw   s3, 28(sp)
lw   s2, 24(sp)
lw   s1, 20(sp)
lw   s0, 16(sp)
lw   ra, 12(sp)

# these must only execute once, will only execute after all loads
# complete successfully all instructions must execute atomically
# therefore these updates are not visible in the interrupt/exception handler
li a0, 0
addi sp, sp, 32
ret

10.13.5 Non-idempotent memory handling

An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory.

If the core implementation does not support PUSH/POP to non-idempotent memories, the core may use an idempotency PMA to detect it and take a load (POP/POPRET) or store (PUSH) access-fault exception in order to avoid unpredictable results.

Software should only use these instructions on non-idempotent memory regions when software can tolerate the required memory accesses being issued repeatedly in the case that they cause exceptions.

10.13.6 Example RV32I PUSH/POP sequences

The examples are included show the load/store series expansion and the stack adjustment. Examples of cm.popret and cm.popretz are not included, as the difference in the expanded sequence from cm.pop is trivial in all cases.

10.13.6.1 cm.push {ra, s0-s2}, -64

Encoding: rlist=7, spimm=3

expands to:

sw  s2,  -4(sp);
sw  s1,  -8(sp);
sw  s0, -12(sp);
sw  ra, -16(sp);
addi sp, sp, -64;

10.13.6.2 cm.push {ra, s0-s11}, -112

Encoding: rlist=15, spimm=3

expands to:

sw  s11,  -4(sp);
sw  s10,  -8(sp);
sw  s9,  -12(sp);
sw  s8,  -16(sp);
sw  s7,  -20(sp);
sw  s6,  -24(sp);
sw  s5,  -28(sp);
sw  s4,  -32(sp);
sw  s3,  -36(sp);
sw  s2,  -40(sp);
sw  s1,  -44(sp);
sw  s0,  -48(sp);
sw  ra,  -52(sp);
addi sp, sp, -112;

10.13.6.3 cm.pop {ra}, 16

Encoding: rlist=4, spimm=0

expands to:

lw   ra, 12(sp);
addi sp, sp, 16;

10.13.6.4 cm.pop {ra, s0-s3}, 48

Encoding: rlist=8, spimm=1

expands to:

lw   s3, 44(sp);
lw   s2, 40(sp);
lw   s1, 36(sp);
lw   s0, 32(sp);
lw   ra, 28(sp);
addi sp, sp, 48;

10.13.6.5 cm.pop {ra, s0-s4}, 64

Encoding: rlist=9, spimm=2

expands to:

lw   s4, 60(sp);
lw   s3, 56(sp);
lw   s2, 52(sp);
lw   s1, 48(sp);
lw   s0, 44(sp);
lw   ra, 40(sp);
addi sp, sp, 64;

10.13.7 cm.push

Synopsis Create stack frame: store ra and 0 to 12 saved registers to the stack frame, optionally allocate additional stack space.

Mnemonic cm.push {reg_list}, -stack_adj

Encoding (RV32, RV64):

bff62081946f8ed931e341eac1d6a1c0

note

rlist values 0 to 3 are reserved for a future EABI variant called cm.push.e

Assembly Syntax:

cm.push \{reg_list},  -stack_adj
cm.push {xreg_list}, -stack_adj

The variables used in the assembly syntax are defined below.

RV32E:

switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32I, RV64:
switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  case  7: \{reg_list="ra, s0-s2";  xreg_list="x1, x8-x9, x18";}
  case  8: \{reg_list="ra, s0-s3";  xreg_list="x1, x8-x9, x18-x19";}
  case  9: \{reg_list="ra, s0-s4";  xreg_list="x1, x8-x9, x18-x20";}
  case 10: \{reg_list="ra, s0-s5";  xreg_list="x1, x8-x9, x18-x21";}
  case 11: \{reg_list="ra, s0-s6";  xreg_list="x1, x8-x9, x18-x22";}
  case 12: \{reg_list="ra, s0-s7";  xreg_list="x1, x8-x9, x18-x23";}
  case 13: \{reg_list="ra, s0-s8";  xreg_list="x1, x8-x9, x18-x24";}
  case 14: \{reg_list="ra, s0-s9";  xreg_list="x1, x8-x9, x18-x25";}
  //note - to include s10, s11 must also be included
  case 15: \{reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32E:

stack_adj_base = 16;
Valid values:
stack_adj      = [16|32|48|64];

RV32I:

switch (rlist) {
  case  4.. 7: stack_adj_base = 16;
  case  8..11: stack_adj_base = 32;
  case 12..14: stack_adj_base = 48;
  case     15: stack_adj_base = 64;
}

Valid values:
switch (rlist) {
  case  4.. 7: stack_adj = [16|32|48| 64];
  case  8..11: stack_adj = [32|48|64| 80];
  case 12..14: stack_adj = [48|64|80| 96];
  case     15: stack_adj = [64|80|96|112];
}

RV64:

switch (rlist) {
  case  4.. 5: stack_adj_base =  16;
  case  6.. 7: stack_adj_base =  32;
  case  8.. 9: stack_adj_base =  48;
  case 10..11: stack_adj_base =  64;
  case 12..13: stack_adj_base =  80;
  case     14: stack_adj_base =  96;
  case     15: stack_adj_base = 112;
}

Valid values:
switch (rlist) {
  case  4.. 5: stack_adj = [ 16| 32| 48| 64];
  case  6.. 7: stack_adj = [ 32| 48| 64| 80];
  case  8.. 9: stack_adj = [ 48| 64| 80| 96];
  case 10..11: stack_adj = [ 64| 80| 96|112];
  case 12..13: stack_adj = [ 80| 96|112|128];
  case     14: stack_adj = [ 96|112|128|144];
  case     15: stack_adj = [112|128|144|160];
}

Description This instruction pushes (stores) the registers in reg_list to the memory below the stack pointer, and then creates the stack frame by decrementing the stack pointer by stack_adj, including any additional stack space requested by the value of spimm.

note

All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen.

For further information see insns-pushpop.

Stack Adjustment Calculation:

stack_adj_base is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list.

spimm is the number of additional 16-byte address increments allocated for the stack frame.

The total stack adjustment represents the total size of the stack frame, which is stack_adj_base added to spimm scaled by 16, as defined above.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists

Operation The first section of pseudocode may be executed multiple times before the instruction successfully completes.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

if (XLEN==32) bytes=4; else bytes=8;

addr=sp-bytes;
for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1)  {
  //if register i is in xreg_list
  if (xreg_list[i]) {
    switch(bytes) {
      4:  asm("sw x[i], 0(addr)");
      8:  asm("sd x[i], 0(addr)");
    }
    addr-=bytes;
  }
}

The final section of pseudocode executes atomically, and only executes if the section above completes without any exceptions or interrupts.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

sp-=stack_adj;

10.13.8 cm.pop

Synopsis Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame.

Mnemonic cm.pop {reg_list}, stack_adj

Encoding (RV32, RV64):

42ed43bfb10034981f12b1fb5fa598c8

note

rlist values 0 to 3 are reserved for a future EABI variant called cm.pop.e

Assembly Syntax:

cm.pop \{reg_list},  stack_adj
cm.pop {xreg_list}, stack_adj

The variables used in the assembly syntax are defined below.

RV32E:
switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32I, RV64:
switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  case  7: \{reg_list="ra, s0-s2";  xreg_list="x1, x8-x9, x18";}
  case  8: \{reg_list="ra, s0-s3";  xreg_list="x1, x8-x9, x18-x19";}
  case  9: \{reg_list="ra, s0-s4";  xreg_list="x1, x8-x9, x18-x20";}
  case 10: \{reg_list="ra, s0-s5";  xreg_list="x1, x8-x9, x18-x21";}
  case 11: \{reg_list="ra, s0-s6";  xreg_list="x1, x8-x9, x18-x22";}
  case 12: \{reg_list="ra, s0-s7";  xreg_list="x1, x8-x9, x18-x23";}
  case 13: \{reg_list="ra, s0-s8";  xreg_list="x1, x8-x9, x18-x24";}
  case 14: \{reg_list="ra, s0-s9";  xreg_list="x1, x8-x9, x18-x25";}
  //note - to include s10, s11 must also be included
  case 15: \{reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32E:

stack_adj_base = 16;
Valid values:
stack_adj      = [16|32|48|64];

RV32I:

switch (rlist) {
  case  4.. 7: stack_adj_base = 16;
  case  8..11: stack_adj_base = 32;
  case 12..14: stack_adj_base = 48;
  case     15: stack_adj_base = 64;
}

Valid values:
switch (rlist) {
  case  4.. 7: stack_adj = [16|32|48| 64];
  case  8..11: stack_adj = [32|48|64| 80];
  case 12..14: stack_adj = [48|64|80| 96];
  case     15: stack_adj = [64|80|96|112];
}

RV64:

switch (rlist) {
  case  4.. 5: stack_adj_base =  16;
  case  6.. 7: stack_adj_base =  32;
  case  8.. 9: stack_adj_base =  48;
  case 10..11: stack_adj_base =  64;
  case 12..13: stack_adj_base =  80;
  case     14: stack_adj_base =  96;
  case     15: stack_adj_base = 112;
}

Valid values:
switch (rlist) {
  case  4.. 5: stack_adj = [ 16| 32| 48| 64];
  case  6.. 7: stack_adj = [ 32| 48| 64| 80];
  case  8.. 9: stack_adj = [ 48| 64| 80| 96];
  case 10..11: stack_adj = [ 64| 80| 96|112];
  case 12..13: stack_adj = [ 80| 96|112|128];
  case     14: stack_adj = [ 96|112|128|144];
  case     15: stack_adj = [112|128|144|160];
}

Description This instruction pops (loads) the registers in reg_list from stack memory, and then adjusts the stack pointer by stack_adj.

note

All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen.

For further information see insns-pushpop.

Stack Adjustment Calculation:

stack_adj_base is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list.

spimm is the number of additional 16-byte address increments allocated for the stack frame.

The total stack adjustment represents the total size of the stack frame, which is stack_adj_base added to spimm scaled by 16, as defined above.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists

Operation The first section of pseudocode may be executed multiple times before the instruction successfully completes.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

if (XLEN==32) bytes=4; else bytes=8;

addr=sp+stack_adj-bytes;
for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1)  {
  //if register i is in xreg_list
  if (xreg_list[i]) {
    switch(bytes) {
      4:  asm("lw x[i], 0(addr)");
      8:  asm("ld x[i], 0(addr)");
    }
    addr-=bytes;
  }
}

The final section of pseudocode executes atomically, and only executes if the section above completes without any exceptions or interrupts.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

sp+=stack_adj;

10.13.9 cm.popretz

Synopsis Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, move zero into a0, return to ra.

Mnemonic cm.popretz {reg_list}, stack_adj

Encoding (RV32, RV64):

94c42c96c1fc3202b78c4b69798eb003

note

rlist values 0 to 3 are reserved for a future EABI variant called cm.popretz.e

Assembly Syntax:

cm.popretz \{reg_list},  stack_adj
cm.popretz {xreg_list}, stack_adj

RV32E:
switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32I, RV64:

switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  case  7: \{reg_list="ra, s0-s2";  xreg_list="x1, x8-x9, x18";}
  case  8: \{reg_list="ra, s0-s3";  xreg_list="x1, x8-x9, x18-x19";}
  case  9: \{reg_list="ra, s0-s4";  xreg_list="x1, x8-x9, x18-x20";}
  case 10: \{reg_list="ra, s0-s5";  xreg_list="x1, x8-x9, x18-x21";}
  case 11: \{reg_list="ra, s0-s6";  xreg_list="x1, x8-x9, x18-x22";}
  case 12: \{reg_list="ra, s0-s7";  xreg_list="x1, x8-x9, x18-x23";}
  case 13: \{reg_list="ra, s0-s8";  xreg_list="x1, x8-x9, x18-x24";}
  case 14: \{reg_list="ra, s0-s9";  xreg_list="x1, x8-x9, x18-x25";}
  //note - to include s10, s11 must also be included
  case 15: \{reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32E:

stack_adj_base = 16;
Valid values:
stack_adj      = [16|32|48|64];

RV32I:

switch (rlist) {
  case  4.. 7: stack_adj_base = 16;
  case  8..11: stack_adj_base = 32;
  case 12..14: stack_adj_base = 48;
  case     15: stack_adj_base = 64;
}

Valid values:
switch (rlist) {
  case  4.. 7: stack_adj = [16|32|48| 64];
  case  8..11: stack_adj = [32|48|64| 80];
  case 12..14: stack_adj = [48|64|80| 96];
  case     15: stack_adj = [64|80|96|112];
}

RV64:

switch (rlist) {
  case  4.. 5: stack_adj_base =  16;
  case  6.. 7: stack_adj_base =  32;
  case  8.. 9: stack_adj_base =  48;
  case 10..11: stack_adj_base =  64;
  case 12..13: stack_adj_base =  80;
  case     14: stack_adj_base =  96;
  case     15: stack_adj_base = 112;
}

Valid values:
switch (rlist) {
  case  4.. 5: stack_adj = [ 16| 32| 48| 64];
  case  6.. 7: stack_adj = [ 32| 48| 64| 80];
  case  8.. 9: stack_adj = [ 48| 64| 80| 96];
  case 10..11: stack_adj = [ 64| 80| 96|112];
  case 12..13: stack_adj = [ 80| 96|112|128];
  case     14: stack_adj = [ 96|112|128|144];
  case     15: stack_adj = [112|128|144|160];
}

Description This instruction pops (loads) the registers in reg_list from stack memory, adjusts the stack pointer by stack_adj, moves zero into a0 and then returns to ra.

note

All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen.

For further information see insns-pushpop.

Stack Adjustment Calculation:

stack_adj_base is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list.

spimm is the number of additional 16-byte address increments allocated for the stack frame.

The total stack adjustment represents the total size of the stack frame, which is stack_adj_base added to spimm scaled by 16, as defined above.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists

Operation The first section of pseudocode may be executed multiple times before the instruction successfully completes.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

if (XLEN==32) bytes=4; else bytes=8;

addr=sp+stack_adj-bytes;
for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1)  {
  //if register i is in xreg_list
  if (xreg_list[i]) {
    switch(bytes) {
      4:  asm("lw x[i], 0(addr)");
      8:  asm("ld x[i], 0(addr)");
    }
    addr-=bytes;
  }
}

The final section of pseudocode executes atomically, and only executes if the section above completes without any exceptions or interrupts.

note

The li a0, 0 could be executed more than once, but is included in the atomic section for convenience.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

asm("li a0, 0");
sp+=stack_adj;
asm("ret");

10.13.10 cm.popret

Synopsis Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, return to ra.

Mnemonic cm.popret {reg_list}, stack_adj

Encoding (RV32, RV64):

91c001642033ba01ed68118ea07b705a

note

rlist values 0 to 3 are reserved for a future EABI variant called cm.popret.e

Assembly Syntax:

cm.popret \{reg_list},  stack_adj
cm.popret {xreg_list}, stack_adj

The variables used in the assembly syntax are defined below.

RV32E:

switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32I, RV64:

switch (rlist){
  case  4: \{reg_list="ra";         xreg_list="x1";}
  case  5: \{reg_list="ra, s0";     xreg_list="x1, x8";}
  case  6: \{reg_list="ra, s0-s1";  xreg_list="x1, x8-x9";}
  case  7: \{reg_list="ra, s0-s2";  xreg_list="x1, x8-x9, x18";}
  case  8: \{reg_list="ra, s0-s3";  xreg_list="x1, x8-x9, x18-x19";}
  case  9: \{reg_list="ra, s0-s4";  xreg_list="x1, x8-x9, x18-x20";}
  case 10: \{reg_list="ra, s0-s5";  xreg_list="x1, x8-x9, x18-x21";}
  case 11: \{reg_list="ra, s0-s6";  xreg_list="x1, x8-x9, x18-x22";}
  case 12: \{reg_list="ra, s0-s7";  xreg_list="x1, x8-x9, x18-x23";}
  case 13: \{reg_list="ra, s0-s8";  xreg_list="x1, x8-x9, x18-x24";}
  case 14: \{reg_list="ra, s0-s9";  xreg_list="x1, x8-x9, x18-x25";}
  //note - to include s10, s11 must also be included
  case 15: \{reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";}
  default: reserved();
}
stack_adj      = stack_adj_base + spimm * 16;

RV32E:

stack_adj_base = 16;
Valid values:
stack_adj      = [16|32|48|64];

RV32I:

switch (rlist) {
  case  4.. 7: stack_adj_base = 16;
  case  8..11: stack_adj_base = 32;
  case 12..14: stack_adj_base = 48;
  case     15: stack_adj_base = 64;
}

Valid values:
switch (rlist) {
  case  4.. 7: stack_adj = [16|32|48| 64];
  case  8..11: stack_adj = [32|48|64| 80];
  case 12..14: stack_adj = [48|64|80| 96];
  case     15: stack_adj = [64|80|96|112];
}

RV64:

switch (rlist) {
  case  4.. 5: stack_adj_base =  16;
  case  6.. 7: stack_adj_base =  32;
  case  8.. 9: stack_adj_base =  48;
  case 10..11: stack_adj_base =  64;
  case 12..13: stack_adj_base =  80;
  case     14: stack_adj_base =  96;
  case     15: stack_adj_base = 112;
}

Valid values:
switch (rlist) {
  case  4.. 5: stack_adj = [ 16| 32| 48| 64];
  case  6.. 7: stack_adj = [ 32| 48| 64| 80];
  case  8.. 9: stack_adj = [ 48| 64| 80| 96];
  case 10..11: stack_adj = [ 64| 80| 96|112];
  case 12..13: stack_adj = [ 80| 96|112|128];
  case     14: stack_adj = [ 96|112|128|144];
  case     15: stack_adj = [112|128|144|160];
}

Description This instruction pops (loads) the registers in reg_list from stack memory, adjusts the stack pointer by stack_adj and then returns to ra.

note

All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen.

For further information see insns-pushpop.

Stack Adjustment Calculation:

stack_adj_base is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list.

spimm is the number of additional 16-byte address increments allocated for the stack frame.

The total stack adjustment represents the total size of the stack frame, which is stack_adj_base added to spimm scaled by 16, as defined above.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists

Operation The first section of pseudocode may be executed multiple times before the instruction successfully completes.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

if (XLEN==32) bytes=4; else bytes=8;

addr=sp+stack_adj-bytes;
for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1)  {
  //if register i is in xreg_list
  if (xreg_list[i]) {
    switch(bytes) {
      4:  asm("lw x[i], 0(addr)");
      8:  asm("ld x[i], 0(addr)");
    }
    addr-=bytes;
  }
}

The final section of pseudocode executes atomically, and only executes if the section above completes without any exceptions or interrupts.

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

sp+=stack_adj;
asm("ret");

10.13.11 cm.mvsa01

Synopsis Move a0-a1 into two registers of s0-s7

Mnemonic cm.mvsa01 r1s', r2s'

Encoding (RV32, RV64):

d35b88851bd6ff600ef14e9e839959b2

note

For the encoding to be legal r1s' != r2s'.

Assembly Syntax:

cm.mvsa01 r1s', r2s'

Description This instruction moves a0 into r1s' and a1 into r2s'. r1s' and r2s' must be different. The execution is atomic, so it is not possible to observe state where only one of r1s' or r2s' has been updated.

The encoding uses sreg number specifiers instead of xreg number specifiers to save encoding space. The mapping between them is specified in the pseudocode below.

note

The s register mapping is taken from the UABI, and may not match the currently unratified EABI. cm.mvsa01.e may be included in the future.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists.

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.
if (RV32E && (r1sc>1 || r2sc>1)) {
  reserved();
}
xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]};
xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]};
X[xreg1] = X[10];
X[xreg2] = X[11];

10.13.12 cm.mva01s

Synopsis Move two s0-s7 registers into a0-a1

Mnemonic cm.mva01s r1s', r2s'

Encoding (RV32, RV64):

9ccbc1e89748c8471b7957ccbeaac82c

Assembly Syntax:

cm.mva01s r1s', r2s'

Description This instruction moves r1s' into a0 and r2s' into a1. The execution is atomic, so it is not possible to observe state where only one of a0 or a1 have been updated.

The encoding uses sreg number specifiers instead of xreg number specifiers to save encoding space. The mapping between them is specified in the pseudocode below.

note

The s register mapping is taken from the UABI, and may not match the currently unratified EABI. cm.mva01s.e may be included in the future.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists.

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.
if (RV32E && (r1sc>1 || r2sc>1)) {
  reserved();
}
xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]};
xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]};
X[10] = X[xreg1];
X[11] = X[xreg2];

10.14 Table Jump Overview

cm.jt (insns-cm_jt) and cm.jalt (insns-cm_jalt) are referred to as table jump.

Table jump uses a 256-entry XLEN wide table in instruction memory to contain function addresses. The table must be a minimum of 64-byte aligned.

Table entries follow the current data endianness. This is different from normal instruction fetch which is always little-endian.

cm.jt and cm.jalt encodings index the table, giving access to functions within the full XLEN wide address space.

This is used as a form of dictionary compression to reduce the code size of jal / auipc+jalr / jr / auipc+jr instructions.

Table jump allows the linker to replace the following instruction sequences with a cm.jt or cm.jalt encoding, and an entry in the table:

32-bit j calls
32-bit jal ra calls
64-bit auipc+jr calls to fixed locations
64-bit auipc+jalr ra calls to fixed locations
- The auipc+jr/jalr sequence is used because the offset from the PC is out of the ±1 MB range.

If a return address stack is implemented, then as cm.jalt is equivalent to jal ra, it pushes to the stack.

10.14.1 jvt

The base of the table is in the jvt CSR (see csrs-jvt), each table entry is XLEN bits.

If the same function is called with and without linking then it must have two entries in the table. This is typically caused by the same function being called with and without tail calling.

10.14.2 Table Jump Fault handling

For a table jump instruction, the table entry that the instruction selects is considered an extension of the instruction itself. Hence, the execution of a table jump instruction involves two instruction fetches, the first to read the instruction (cm.jt/cm.jalt) and the second to read from the jump vector table (JVT). Both instruction fetches are implicit reads, and both require execute permission; read permission is irrelevant. It is recommended that the second fetch be ignored for hardware triggers and breakpoints.

Memory writes to the jump vector table require an instruction barrier (fence.i) to guarantee that they are visible to the instruction fetch.

Multiple contexts may have different jump vector tables. JVT may be switched between them without an instruction barrier if the tables have not been updated in memory since the last fence.i.

If an exception occurs on either instruction fetch, xEPC is set to the PC of the table jump instruction, xCAUSE is set as expected for the type of fault and xTVAL (if not set to zero) contains the fetch address which caused the fault.

10.14.3 jvt CSR

Synopsis Table jump base vector and control register

Address:

0x017

Permissions:

URW

Format (RV32):

14379a2b0b98910c44ed385cc655e77b

Format (RV64):

9dfbf3a3821f55f3642000917e174f28

Description The jvt register is an XLEN-bit WARL read/write register that holds the jump table configuration, consisting of the jump table base address (BASE) and the jump table mode (MODE).

If Zcmt is implemented then jvt must also be implemented, but can contain a read-only value. If jvt is writable, the set of values the register may hold can vary by implementation. The value in the BASE field must always be aligned on a 64-byte boundary. Note that the CSR contains only bits XLEN-1 through 6 of the address base. When computing jump-table accesses, the lower six bits of base are filled with zeroes to obtain an XLEN-bit jump-table base address jvt.base that is always aligned on a 64-byte boundary.

jvt.base is a virtual address, whenever virtual memory is enabled.

The memory pointed to by jvt.base is treated as instruction memory for the purpose of executing table jump instructions, implying execute access permission.

jvt.mode	Comment
000000	Jump table mode
others	reserved for future standard use

jvt.mode is a WARL field, so can only be programmed to modes which are implemented. Therefore the discovery mechanism is to attempt to program different modes and read back the values to see which are available. Jump table mode must be implemented.

note

in future the RISC-V Unified Discovery method will report the available modes.

Architectural State:

jvt CSR adds architectural state to the system software context (such as an OS process), therefore must be saved/restored on context switches. <<<

10.14.4 cm.jt

Synopsis jump via table

Mnemonic cm.jt index

Encoding (RV32, RV64):

929a5b8779e10fd578c2057efef381d9

note

For this encoding to decode as cm.jt, index<32, otherwise it decodes as cm.jalt, see insns-cm_jalt.

note

If jvt.mode = 0 (Jump Table Mode) then cm.jt behaves as specified here. If jvt.mode is a reserved value, then cm.jt is also reserved. In the future other defined values of jvt.mode may change the behaviour of cm.jt.

Assembly Syntax:

cm.jt index

Description cm.jt reads an entry from the jump vector table in memory and jumps to the address that was read.

For further information see insns-tablejump.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists.

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

# table_address is temporary internal state, it doesn't represent a real register
# InstMemory is byte indexed

switch(XLEN) {
  32:  table_address[XLEN-1:0] = jvt.base + (index\<\<2);
  64:  table_address[XLEN-1:0] = jvt.base + (index\<\<3);
}

//fetch from the jump table
pc = InstMemory[table_address][XLEN-1:0]&~0x1;  // Clear bit 0.

10.14.5 cm.jalt

Synopsis jump via table with optional link

Mnemonic cm.jalt index

Encoding (RV32, RV64):

929a5b8779e10fd578c2057efef381d9

note

For this encoding to decode as cm.jalt, index>=32, otherwise it decodes as cm.jt, see insns-cm_jt.

note

If jvt.mode = 0 (Jump Table Mode) then cm.jalt behaves as specified here. If jvt.mode is a reserved value, then cm.jalt is also reserved. In the future other defined values of jvt.mode may change the behaviour of cm.jalt.

Assembly Syntax:

cm.jalt index

Description cm.jalt reads an entry from the jump vector table in memory and jumps to the address that was read, linking to ra.

For further information see insns-tablejump.

Prerequisites None

32-bit equivalent:

No direct equivalent encoding exists.

Operation

//This is not SAIL, it's pseudocode. The SAIL hasn't been written yet.

# table_address is temporary internal state, it doesn't represent a real register
# InstMemory is byte indexed

switch(XLEN) {
  32:  table_address[XLEN-1:0] = jvt.base + (index\<\<2);
  64:  table_address[XLEN-1:0] = jvt.base + (index\<\<3);
}

//fetch from the jump table

ra = pc+2;
pc = InstMemory[table_address][XLEN-1:0]&~0x1;  // Clear bit 0.

10.1 Zc* Overview​

10.2 C​

10.3 Zce​

10.4 MISA.C​

10.5 Zca​

10.6 Zcf (RV32 only)​

10.7 Zcd​

10.8 Zcb​

10.9 Zcmp​

10.10 Zcmt​

10.11 Zc instruction formats​

10.12 Zcb instructions​

10.12.1 c.lbu​

10.12.2 c.lhu​

10.12.3 c.lh​

10.12.4 c.sb​

10.12.5 c.sh​

10.12.6 c.zext.b​

10.12.7 c.sext.b​

10.12.8 c.zext.h​

10.12.9 c.sext.h​

10.12.10 c.zext.w​

10.12.11 c.not​

10.12.12 c.mul​

10.13 PUSH/POP register instructions​

10.13.1 PUSH/POP functional overview​

10.13.2 Example usage​

10.13.2.1 Stack pointer adjustment handling​

10.13.2.2 Register list handling​

10.13.3 PUSH/POP Fault handling​

10.13.4 Software view of execution​

10.13.4.1 Software view of the PUSH sequence​

10.13.4.2 Software view of the POP/POPRET sequence​

10.13.5 Non-idempotent memory handling​

10.13.6 Example RV32I PUSH/POP sequences​

10.13.6.1 cm.push {ra, s0-s2}, -64​

10.13.6.2 cm.push {ra, s0-s11}, -112​

10.13.6.3 cm.pop {ra}, 16​

10.13.6.4 cm.pop {ra, s0-s3}, 48​

10.13.6.5 cm.pop {ra, s0-s4}, 64​

10.13.7 cm.push​

10.13.8 cm.pop​

10.13.9 cm.popretz​

10.13.10 cm.popret​

10.13.11 cm.mvsa01​

10.13.12 cm.mva01s​

10.14 Table Jump Overview​

10.14.1 jvt​

10.14.2 Table Jump Fault handling​

10.14.3 jvt CSR​

10.14.4 cm.jt​

10.14.5 cm.jalt​

10.1 Zc* Overview

10.2 C

10.3 Zce

10.4 MISA.C

10.5 Zca

10.6 Zcf (RV32 only)

10.7 Zcd

10.8 Zcb

10.9 Zcmp

10.10 Zcmt

10.11 Zc instruction formats

10.12 Zcb instructions

10.12.1 c.lbu

10.12.2 c.lhu

10.12.3 c.lh

10.12.4 c.sb

10.12.5 c.sh

10.12.6 c.zext.b

10.12.7 c.sext.b

10.12.8 c.zext.h

10.12.9 c.sext.h

10.12.10 c.zext.w

10.12.11 c.not

10.12.12 c.mul

10.13 PUSH/POP register instructions

10.13.1 PUSH/POP functional overview

10.13.2 Example usage

10.13.2.1 Stack pointer adjustment handling

10.13.2.2 Register list handling

10.13.3 PUSH/POP Fault handling

10.13.4 Software view of execution

10.13.4.1 Software view of the PUSH sequence

10.13.4.2 Software view of the POP/POPRET sequence

10.13.5 Non-idempotent memory handling

10.13.6 Example RV32I PUSH/POP sequences

10.13.6.1 cm.push {ra, s0-s2}, -64

10.13.6.2 cm.push {ra, s0-s11}, -112

10.13.6.3 cm.pop {ra}, 16

10.13.6.4 cm.pop {ra, s0-s3}, 48

10.13.6.5 cm.pop {ra, s0-s4}, 64

10.13.7 cm.push

10.13.8 cm.pop

10.13.9 cm.popretz

10.13.10 cm.popret

10.13.11 cm.mvsa01

10.13.12 cm.mva01s

10.14 Table Jump Overview

10.14.1 jvt

10.14.2 Table Jump Fault handling

10.14.3 jvt CSR

10.14.4 cm.jt

10.14.5 cm.jalt