11 Bit Manipulation Extensions
The bit-manipulation (bitmanip) extension collection is comprised of several component extensions to the base RISC-V architecture that are intended to provide some combination of code-size reduction, performance improvement, and energy reduction. While the instructions are intended for general use, some instructions are more useful in certain domains than in others. Hence, several smaller bitmanip extensions are provided. Each of these smaller extensions is grouped by common function and use case, and each has its own Zb*-extension name.
Each bitmanip extension includes a group of several bitmanip instructions that have similar purposes and can often share the same logic. Some instructions are available in only one extension, while others are available in several. The instructions have mnemonics and encodings that are independent of the extensions in which they appear. Thus, when implementing extensions with overlapping instructions, there is no redundancy in logic or encoding.
The bitmanip extensions are defined for RV32 and RV64.
The bitmanip extension follows the convention in RV64 that w-suffixed instructions (without a dot before the w) ignore the upper 32 bits of their inputs, operate on the least-significant 32 bits as signed values, and produce a 32-bit signed result that is sign-extended to XLEN.
Bitmanip instructions with the suffix .uw have one operand that is an unsigned 32-bit value that is extracted from the least-significant 32 bits of the specified register. Other than that, these perform full-XLEN operations.
Bitmanip instructions with the suffixes .b, .h, and .w only look at the least-significant 8 bits, 16 bits, and 32 bits of the input (respectively) and produce an XLEN-wide result that is sign-extended or zero-extended, based on the specific instruction.
The bit-manipulation instructions comprise the following extensions:
Below is a list of all of the instructions that are included in these extensions, along with their specific mapping:
| RV32 | RV64 | Mnemonic | Instruction | Zbb | Zbkb | Zbc | Zbkc |
|---|---|---|---|---|---|---|---|
| ✓ | ✓ | andn rd, rs1, rs2 | insns-andn | ✓ | ✓ | ||
| ✓ | ✓ | brev8 rd, rs | insns-brev8 | ✓ | |||
| ✓ | ✓ | clmul rd, rs1, rs2 | insns-clmul | ✓ | ✓ | ||
| ✓ | ✓ | clmulh rd, rs1, rs2 | insns-clmulh | ✓ | ✓ | ||
| ✓ | ✓ | clmulr rd, rs1, rs2 | insns-clmulr | ✓ | |||
| ✓ | ✓ | clz rd, rs | insns-clz | ✓ | |||
| ✓ | clzw rd, rs | insns-clzw | ✓ | ||||
| ✓ | ✓ | cpop rd, rs | insns-cpop | ✓ | |||
| ✓ | cpopw rd, rs | insns-cpopw | ✓ | ||||
| ✓ | ✓ | ctz rd, rs | insns-ctz | ✓ | |||
| ✓ | ctzw rd, rs | insns-ctzw | ✓ | ||||
| ✓ | ✓ | max rd, rs1, rs2 | insns-max | ✓ | |||
| ✓ | ✓ | maxu rd, rs1, rs2 | insns-maxu | ✓ | |||
| ✓ | ✓ | min rd, rs1, rs2 | insns-min | ✓ | |||
| ✓ | ✓ | minu rd, rs1, rs2 | insns-minu | ✓ | |||
| ✓ | ✓ | orc.b rd, rs | insns-orc_b | ✓ | |||
| ✓ | ✓ | orn rd, rs1, rs2 | insns-orn | ✓ | ✓ | ||
| ✓ | ✓ | pack rd, rs1, rs2 | insns-pack | ✓ | |||
| ✓ | ✓ | packh rd, rs1, rs2 | insns-packh | ✓ | |||
| ✓ | packw rd, rs1, rs2 | insns-packw | ✓ | ||||
| ✓ | ✓ | rev8 rd, rs | insns-rev8 | ✓ | ✓ | ||
| ✓ | ✓ | rol rd, rs1, rs2 | insns-rol | ✓ | ✓ | ||
| ✓ | rolw rd, rs1, rs2 | insns-rolw | ✓ | ✓ | |||
| ✓ | ✓ | ror rd, rs1, rs2 | insns-ror | ✓ | ✓ | ||
| ✓ | ✓ | rori rd, rs1, shamt | insns-rori | ✓ | ✓ | ||
| ✓ | roriw rd, rs1, shamt | insns-roriw | ✓ | ✓ | |||
| ✓ | rorw rd, rs1, rs2 | insns-rorw | ✓ | ✓ | |||
| ✓ | ✓ | sext.b rd, rs | insns-sext_b | ✓ | |||
| ✓ | ✓ | sext.h rd, rs | insns-sext_h | ✓ | |||
| ✓ | unzip rd, rs | insns-unzip | ✓ | ||||
| ✓ | ✓ | xnor rd, rs1, rs2 | insns-xnor | ✓ | ✓ | ||
| ✓ | ✓ | zext.h rd, rs | insns-zext_h | ✓ | |||
| ✓ | zip rd, rs | insns-zip | ✓ |
| RV32 | RV64 | Mnemonic | Instruction | Zba | Zbs |
|---|---|---|---|---|---|
| ✓ | add.uw rd, rs1, rs2 | insns-add_uw | ✓ | ||
| ✓ | ✓ | bclr rd, rs1, rs2 | insns-bclr | ✓ | |
| ✓ | ✓ | bclri rd, rs1, imm | insns-bclri | ✓ | |
| ✓ | ✓ | bext rd, rs1, rs2 | insns-bext | ✓ | |
| ✓ | ✓ | bexti rd, rs1, imm | insns-bexti | ✓ | |
| ✓ | ✓ | binv rd, rs1, rs2 | insns-binv | ✓ | |
| ✓ | ✓ | binvi rd, rs1, imm | insns-binvi | ✓ | |
| ✓ | ✓ | bset rd, rs1, rs2 | insns-bset | ✓ | |
| ✓ | ✓ | bseti rd, rs1, imm | insns-bseti | ✓ | |
| ✓ | ✓ | sh1add rd, rs1, rs2 | insns-sh1add | ✓ | |
| ✓ | sh1add.uw rd, rs1, rs2 | insns-sh1add_uw | ✓ | ||
| ✓ | ✓ | sh2add rd, rs1, rs2 | insns-sh2add | ✓ | |
| ✓ | sh2add.uw rd, rs1, rs2 | insns-sh2add_uw | ✓ | ||
| ✓ | ✓ | sh3add rd, rs1, rs2 | insns-sh3add | ✓ | |
| ✓ | sh3add.uw rd, rs1, rs2 | insns-sh3add_uw | ✓ | ||
| ✓ | slli.uw rd, rs1, imm | insns-slli_uw | ✓ |
11.1 "B" Extension for Bit Manipulation, Version 1.0.0
The B standard extension comprises instructions provided by the Zba, Zbb, and Zbs extensions.
11.2 Zba: Extension for Address generation, Version 1.0.0
The Zba instructions can be used to accelerate the generation of addresses that index into arrays of basic types (halfword, word, doubleword) using both unsigned word-sized and XLEN-sized indices: a shifted index is added to a base address.
The shift and add instructions do a left shift of 1, 2, or 3 because these are commonly found in real-world code and because they can be implemented with a minimal amount of additional hardware beyond that of the simple adder. This avoids lengthening the critical path in implementations.
While the shift and add instructions are limited to a maximum left shift of 3, the slli instruction (from the base ISA) can be used to perform similar shifts for indexing into arrays of wider elements. The slli.uw — added in this extension — can be used when the index is to be interpreted as an unsigned word.
The following instructions comprise the Zba extension:
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | add.uw rd, rs1, rs2 | insns-add_uw | |
| ✓ | ✓ | sh1add rd, rs1, rs2 | insns-sh1add |
| ✓ | sh1add.uw rd, rs1, rs2 | insns-sh1add_uw | |
| ✓ | ✓ | sh2add rd, rs1, rs2 | insns-sh2add |
| ✓ | sh2add.uw rd, rs1, rs2 | insns-sh2add_uw | |
| ✓ | ✓ | sh3add rd, rs1, rs2 | insns-sh3add |
| ✓ | sh3add.uw rd, rs1, rs2 | insns-sh3add_uw | |
| ✓ | slli.uw rd, rs1, imm | insns-slli_uw |
11.3 Zbb: Extension for Basic bit-manipulation, Version 1.0.0
11.3.1 Logical with negate
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | andn rd, rs1, rs2 | insns-andn |
| ✓ | ✓ | orn rd, rs1, rs2 | insns-orn |
| ✓ | ✓ | xnor rd, rs1, rs2 | insns-xnor |
The Logical with Negate instructions can be implemented by inverting the rs2 inputs to the base-required AND, OR, and XOR logic instructions. In some implementations, the inverter on rs2 used for subtraction can be reused for this purpose.
11.3.2 Count leading/trailing zero bits
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | clz rd, rs | insns-clz |
| ✓ | clzw rd, rs | insns-clzw | |
| ✓ | ✓ | ctz rd, rs | insns-ctz |
| ✓ | ctzw rd, rs | insns-ctzw |
11.3.3 Count population
These instructions count the number of set bits (1-bits). This is also commonly referred to as population count.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | cpop rd, rs | insns-cpop |
| ✓ | cpopw rd, rs | insns-cpopw |
11.3.4 Integer minimum/maximum
The integer minimum/maximum instructions are arithmetic R-type instructions that return the smaller/larger of two operands.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | max rd, rs1, rs2 | insns-max |
| ✓ | ✓ | maxu rd, rs1, rs2 | insns-maxu |
| ✓ | ✓ | min rd, rs1, rs2 | insns-min |
| ✓ | ✓ | minu rd, rs1, rs2 | insns-minu |
11.3.5 Sign extension and zero extension
These instructions perform the sign extension or zero extension of the least-significant 8 bits or 16 bits of the source register.
These instructions replace the generalized idioms slli rd,rs,(XLEN-\<size>) + srai (for sign extension of 8-bit and 16-bit quantities) and slli + srli (for zero extension of 16-bit quantities).
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | sext.b rd, rs | insns-sext_b |
| ✓ | ✓ | sext.h rd, rs | insns-sext_h |
| ✓ | ✓ | zext.h rd, rs | insns-zext_h |
11.3.6 Bitwise rotation
Bitwise rotation instructions are similar to the shift-logical operations from the base spec. However, where the shift-logical instructions shift in zeros, the rotate instructions shift in the bits that were shifted out of the other side of the value. Such operations are also referred to as ‘circular shifts’.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | rol rd, rs1, rs2 | insns-rol |
| ✓ | rolw rd, rs1, rs2 | insns-rolw | |
| ✓ | ✓ | ror rd, rs1, rs2 | insns-ror |
| ✓ | ✓ | rori rd, rs1, shamt | insns-rori |
| ✓ | roriw rd, rs1, shamt | insns-roriw | |
| ✓ | rorw rd, rs1, rs2 | insns-rorw |
The rotate instructions were included to replace a common four-instruction sequence to achieve the same effect (neg; sll/srl; srl/sll; or)
11.3.7 OR Combine
orc.b sets the bits of each byte in the result rd to all zeros if no bit within the respective byte of rs is set, or to all ones if any bit within the respective byte of rs is set.
One use-case is string-processing functions, such as strlen and strcpy, which can use orc.b to test for the terminating zero byte by counting the set bits in leading non-zero bytes in a word.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | orc.b rd, rs | insns-orc_b |
11.3.8 Byte-reverse
rev8 reverses the byte-ordering of rs.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | rev8 rd, rs | insns-rev8 |
11.4 Zbc: Extension for Carry-less multiplication, Version 1.0.0
Carry-less multiplication is the multiplication in the polynomial ring over GF(2).
clmul produces the lower half of the carry-less product and clmulh produces the upper half of the 2×XLEN carry-less product.
clmulr produces bits 2×XLEN−2:XLEN-1 of the 2×XLEN carry-less product.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | clmul rd, rs1, rs2 | insns-clmul |
| ✓ | ✓ | clmulh rd, rs1, rs2 | insns-clmulh |
| ✓ | ✓ | clmulr rd, rs1, rs2 | insns-clmulr |
11.5 Zbs: Extension for Single-bit instructions, Version 1.0.0
The single-bit instructions provide a mechanism to set, clear, invert, or extract a single bit in a register. The bit is specified by its index.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | bclr rd, rs1, rs2 | insns-bclr |
| ✓ | ✓ | bclri rd, rs1, imm | insns-bclri |
| ✓ | ✓ | bext rd, rs1, rs2 | insns-bext |
| ✓ | ✓ | bexti rd, rs1, imm | insns-bexti |
| ✓ | ✓ | binv rd, rs1, rs2 | insns-binv |
| ✓ | ✓ | binvi rd, rs1, imm | insns-binvi |
| ✓ | ✓ | bset rd, rs1, rs2 | insns-bset |
| ✓ | ✓ | bseti rd, rs1, imm | insns-bseti |
11.6 Zbkb: Extension for Bit-manipulation for Cryptography, Version 1.0.0
This extension contains instructions essential for implementing common operations in cryptographic workloads.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | rol | insns-rol |
| ✓ | rolw | insns-rolw | |
| ✓ | ✓ | ror | insns-ror |
| ✓ | ✓ | rori | insns-rori |
| ✓ | roriw | insns-roriw | |
| ✓ | rorw | insns-rorw | |
| ✓ | ✓ | andn | insns-andn |
| ✓ | ✓ | orn | insns-orn |
| ✓ | ✓ | xnor | insns-xnor |
| ✓ | ✓ | pack | insns-pack |
| ✓ | ✓ | packh | insns-packh |
| ✓ | packw | insns-packw | |
| ✓ | ✓ | brev8 | insns-brev8 |
| ✓ | ✓ | rev8 | insns-rev8 |
| ✓ | zip | insns-zip | |
| ✓ | unzip | insns-unzip |
11.7 Zbkc: Extension for Carry-less multiplication for Cryptography, Version 1.0.0
Carry-less multiplication is the multiplication in the polynomial ring over
GF(2). This extension is a subset of the Zbc zbc extension, and only
provides clmul and clmulh. These are the crucial instructions needed to
efficiently implement the GHASH operation, a critical operation in some
cryptographic workloads such as the AES-GCM authenticated encryption scheme.
See Zbc zbc for further instruction details for these two instructions.
11.8 Zbkx: Extension for Crossbar permutations, Version 1.0.0
These instructions implement a "lookup table" for 4 and 8 bit elements inside the general purpose registers. rs1 is used as a vector of N-bit words, and rs2 as a vector of N-bit indices into rs1. Elements in rs1 are replaced by the indexed element in rs2, or zero if the index into rs2 is out of bounds.
These instructions are useful for expressing N-bit to N-bit boolean operations, and implementing cryptographic code with secret dependent memory accesses (particularly SBoxes) such that the execution latency does not depend on the (secret) data being operated on.
| RV32 | RV64 | Mnemonic | Instruction |
|---|---|---|---|
| ✓ | ✓ | xperm4 rd, rs1, rs2 | insns-xperm4 |
| ✓ | ✓ | xperm8 rd, rs1, rs2 | insns-xperm8 |
11.9 Instructions (in alphabetical order)
The semantics of each instruction is expressed in a SAIL-like syntax.
11.9.1 add.uw
Synopsis Add unsigned word
Mnemonic add.uw rd, rs1, rs2
Pseudoinstructions zext.w rd, rs1 → add.uw rd, rs1, zero
Encoding
Description This instruction performs an XLEN-wide addition between rs2 and the zero-extended least-significant word of rs1.
Operation
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + index;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
11.9.2 andn
Synopsis AND with inverted operand
Mnemonic andn rd, rs1, rs2
Encoding
Description This instruction performs the bitwise logical AND operation between rs1 and the bitwise inversion of rs2.
Operation
X(rd) = X(rs1) & ~X(rs2);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.3 bclr
Synopsis Single-Bit Clear (Register)
Mnemonic bclr rd, rs1, rs2
Encoding
Description This instruction returns rs1 with a single bit cleared at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
Operation
let index = X(rs2) & (XLEN - 1);
X(rd) = X(rs1) & ~(1 \<\< index)
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.4 bclri
Synopsis Single-Bit Clear (Immediate)
Mnemonic bclri rd, rs1, shamt
Encoding (RV32)
Encoding (RV64)
Description This instruction returns rs1 with a single bit cleared at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
Operation
let index = shamt & (XLEN - 1);
X(rd) = X(rs1) & ~(1 \<\< index)
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.5 bext
Synopsis Single-Bit Extract (Register)
Mnemonic bext rd, rs1, rs2
Encoding
Description This instruction returns a single bit extracted from rs1 at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
Operation
let index = X(rs2) & (XLEN - 1);
X(rd) = (X(rs1) >> index) & 1;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.6 bexti
Synopsis Single-Bit Extract (Immediate)
Mnemonic bexti rd, rs1, shamt
Encoding (RV32)
Encoding (RV64)
Description This instruction returns a single bit extracted from rs1 at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
Operation
let index = shamt & (XLEN - 1);
X(rd) = (X(rs1) >> index) & 1;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.7 binv
Synopsis Single-Bit Invert (Register)
Mnemonic binv rd, rs1, rs2
Encoding
Description This instruction returns rs1 with a single bit inverted at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
Operation
let index = X(rs2) & (XLEN - 1);
X(rd) = X(rs1) ^ (1 \<\< index)
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.8 binvi
Synopsis Single-Bit Invert (Immediate)
Mnemonic binvi rd, rs1, shamt
Encoding (RV32)
Encoding (RV64)
Description This instruction returns rs1 with a single bit inverted at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
Operation
let index = shamt & (XLEN - 1);
X(rd) = X(rs1) ^ (1 \<\< index)
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.9 bset
Synopsis Single-Bit Set (Register)
Mnemonic bset rd, rs1,rs2
Encoding
Description This instruction returns rs1 with a single bit set at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
Operation
let index = X(rs2) & (XLEN - 1);
X(rd) = X(rs1) | (1 \<\< index)
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.10 bseti
Synopsis Single-Bit Set (Immediate)
Mnemonic bseti rd, rs1,shamt
Encoding (RV32)
Encoding (RV64)
Description This instruction returns rs1 with a single bit set at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
Operation
let index = shamt & (XLEN - 1);
X(rd) = X(rs1) | (1 \<\< index)
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbs (zbs) | v1.0 | Ratified |
11.9.11 clmul
Synopsis Carry-less multiply (low-part)
Mnemonic clmul rd, rs1, rs2
Encoding
Description clmul produces the lower half of the 2·XLEN carry-less product.
Operation
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to (xlen - 1) by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val \<\< i);
else output;
}
X[rd] = output
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbc (zbc) | v1.0 | Ratified |
| Zbkc (zbkc) | v1.0 | Ratified |
11.9.12 clmulh
Synopsis Carry-less multiply (high-part)
Mnemonic clmulh rd, rs1, rs2
Encoding
Description clmulh produces the upper half of the 2·XLEN carry-less product.
Operation
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 1 to xlen by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val >> (xlen - i));
else output;
}
X[rd] = output
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbc (zbc) | v1.0 | Ratified |
| Zbkc (zbkc) | v1.0 | Ratified |
11.9.13 clmulr
Synopsis Carry-less multiply (reversed)
Mnemonic clmulr rd, rs1, rs2
Encoding
Description clmulr produces bits 2·XLEN−2:XLEN-1 of the 2·XLEN carry-less product.
Operation
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to (xlen - 1) by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val >> (xlen - i - 1));
else output;
}
X[rd] = output
The clmulr instruction is used to accelerate CRC calculations. The r in the instruction’s mnemonic stands for reversed, as the instruction is equivalent to bit-reversing the inputs, performing a clmul, then bit-reversing the output.
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbc (zbc) | v1.0 | Ratified |
11.9.14 clz
Synopsis Count leading zero bits
Mnemonic clz rd, rs
Encoding
Description This instruction counts the number of 0’s before the first 1, starting at the most-significant bit (i.e., XLEN-1) and progressing to bit 0. Accordingly, if the input is 0, the output is XLEN, and if the most-significant bit of the input is a 1, the output is 0.
Operation
val HighestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int
function HighestSetBit x = {
foreach (i from (xlen - 1) to 0 by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return -1;
}
let rs = X(rs);
X[rd] = (xlen - 1) - HighestSetBit(rs);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.15 clzw
Synopsis Count leading zero bits in word
Mnemonic clzw rd, rs
Encoding
Description This instruction counts the number of 0’s before the first 1 starting at bit 31 and progressing to bit 0. Accordingly, if the least-significant word is 0, the output is 32, and if the most-significant bit of the word (i.e., bit 31) is a 1, the output is 0.
Operation
val HighestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int
function HighestSetBit32 x = {
foreach (i from 31 to 0 by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return -1;
}
let rs = X(rs);
X[rd] = 31 - HighestSetBit(rs);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.16 cpop
Synopsis Count set bits
Mnemonic cpop rd, rs
Encoding
Description This instructions counts the number of 1’s (i.e., set bits) in the source register.
Operation
let bitcount = 0;
let rs = X(rs);
foreach (i from 0 to (xlen - 1) in inc)
if rs[i] == 0b1 then bitcount = bitcount + 1 else ();
X[rd] = bitcount
This operation is known as population count, popcount, sideways sum, bit summation, or Hamming weight.
The GCC builtin function __builtin_popcount (unsigned int x) is implemented by cpop on RV32 and by cpopw on RV64.
The GCC builtin function __builtin_popcountl (unsigned long x) for LP64 is implemented by cpop on RV64.
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.17 cpopw
Synopsis Count set bits in word
Mnemonic cpopw rd, rs
Encoding
Description This instructions counts the number of 1’s (i.e., set bits) in the least-significant word of the source register.
Operation
let bitcount = 0;
let val = X(rs);
foreach (i from 0 to 31 in inc)
if val[i] == 0b1 then bitcount = bitcount + 1 else ();
X[rd] = bitcount
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.18 ctz
Synopsis Count trailing zeros
Mnemonic ctz rd, rs
Encoding
Description This instruction counts the number of 0’s before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit (i.e., XLEN-1). Accordingly, if the input is 0, the output is XLEN, and if the least-significant bit of the input is a 1, the output is 0.
Operation
val LowestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int
function LowestSetBit x = {
foreach (i from 0 to (xlen - 1) by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return xlen;
}
let rs = X(rs);
X[rd] = LowestSetBit(rs);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.19 ctzw
Synopsis Count trailing zero bits in word
Mnemonic ctzw rd, rs
Encoding
Description This instruction counts the number of 0’s before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit of the least-significant word (i.e., 31). Accordingly, if the least-significant word is 0, the output is 32, and if the least-significant bit of the input is a 1, the output is 0.
Operation
val LowestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int
function LowestSetBit32 x = {
foreach (i from 0 to 31 by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return 32;
}
let rs = X(rs);
X[rd] = LowestSetBit32(rs);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.20 max
Synopsis Maximum
Mnemonic max rd, rs1, rs2
Encoding
Description This instruction returns the larger of two signed integers.
Operation
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val \<_s rs2_val
then rs2_val
else rs1_val;
X(rd) = result;
Calculating the absolute value of a signed integer can be performed using the following sequence: neg rD,rS followed by max rD,rS,rD. When using this common sequence, it is suggested that they are scheduled with no intervening instructions so that implementations that are so optimized can fuse them together.
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.21 maxu
Synopsis Unsigned maximum
Mnemonic maxu rd, rs1, rs2
Encoding
Description This instruction returns the larger of two unsigned integers.
Operation
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val \<_u rs2_val
then rs2_val
else rs1_val;
X(rd) = result;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.22 min
Synopsis Minimum
Mnemonic min rd, rs1, rs2
Encoding
Description This instruction returns the smaller of two signed integers.
Operation
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val \<_s rs2_val
then rs1_val
else rs2_val;
X(rd) = result;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.23 minu
Synopsis Unsigned minimum
Mnemonic minu rd, rs1, rs2
Encoding
Description This instruction returns the smaller of two unsigned integers.
Operation
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val \<_u rs2_val
then rs1_val
else rs2_val;
X(rd) = result;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.24 orc.b
Synopsis Bitwise OR-Combine, byte granule
Mnemonic orc.b rd, rs
Encoding
Description Combines the bits within each byte using bitwise logical OR. This sets the bits of each byte in the result rd to all zeros if no bit within the respective byte of rs is set, or to all ones if any bit within the respective byte of rs is set.
Operation
let input = X(rs);
let output : xlenbits = 0;
foreach (i from 0 to (xlen - 8) by 8) {
output[(i + 7)..i] = if input[(i + 7)..i] == 0
then 0b00000000
else 0b11111111;
}
X[rd] = output;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
11.9.25 orn
Synopsis OR with inverted operand
Mnemonic orn rd, rs1, rs2
Encoding
Description This instruction performs the bitwise logical OR operation between rs1 and the bitwise inversion of rs2.
Operation
X(rd) = X(rs1) | ~X(rs2);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.26 pack
Synopsis Pack the low halves of rs1 and rs2 into rd.
Mnemonic pack rd, rs1, rs2
Encoding
Description The pack instruction packs the XLEN/2-bit lower halves of rs1 and rs2 into rd, with rs1 in the lower half and rs2 in the upper half.
Operation
let lo_half : bits(xlen/2) = X(rs1)[xlen/2-1..0];
let hi_half : bits(xlen/2) = X(rs2)[xlen/2-1..0];
X(rd) = EXTZ(hi_half @ lo_half);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkb (zbkb) | v1.0 | Ratified |
For RV32, the pack instruction with rs2=x0 is the zext.h
instruction.
Hence, for RV32, any extension that contains the pack instruction also
contains the zext.h instruction (but not necessarily the c.zext.h
instruction, which is only guaranteed to exist if both the Zcb and Zbb
extensions are implemented).
11.9.27 packh
Synopsis Pack the low bytes of rs1 and rs2 into rd.
Mnemonic packh rd, rs1, rs2
Encoding
Description The packh instruction packs the least-significant bytes of rs1 and rs2 into the 16 least-significant bits of rd, zero extending the rest of rd.
Operation
let lo_half : bits(8) = X(rs1)[7..0];
let hi_half : bits(8) = X(rs2)[7..0];
X(rd) = EXTZ(hi_half @ lo_half);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.28 packw
Synopsis Pack the low 16-bits of rs1 and rs2 into rd on RV64.
Mnemonic packw rd, rs1, rs2
Encoding
Description This instruction packs the low 16 bits of rs1 and rs2 into the 32 least-significant bits of rd, sign extending the 32-bit result to the rest of rd. This instruction only exists on RV64 based systems.
Operation
let lo_half : bits(16) = X(rs1)[15..0];
let hi_half : bits(16) = X(rs2)[15..0];
X(rd) = EXTS(hi_half @ lo_half);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkb (zbkb) | v1.0 | Ratified |
For RV64, the packw instruction with rs2=x0 is the zext.h
instruction.
Hence, for RV64, any extension that contains the packw instruction also
contains the zext.h instruction (but not necessarily the c.zext.h
instruction, which is only guaranteed to exist if both the Zcb and Zbb
extensions are implemented).
11.9.29 rev8
Synopsis Byte-reverse register
Mnemonic rev8 rd, rs
Encoding (RV32)
Encoding (RV64)
Description This instruction reverses the order of the bytes in rs.
Operation
let input = X(rs);
let output : xlenbits = 0;
let j = xlen - 1;
foreach (i from 0 to (xlen - 8) by 8) {
output[i..(i + 7)] = input[(j - 7)..j];
j = j - 8;
}
X[rd] = output
The rev8 mnemonic corresponds to different instruction encodings in RV32 and RV64.
The byte-reverse operation is only available for the full register
width. To emulate word-sized and halfword-sized byte-reversal,
perform a rev8 rd,rs followed by a srai rd,rd,K, where K is
XLEN-32 and XLEN-16, respectively.
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | v1.0 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.30 brev8
Synopsis Reverse the bits in each byte of a source register.
Mnemonic brev8 rd, rs
Encoding
Description This instruction reverses the order of the bits in every byte of a register.
Operation
result : xlenbits = EXTZ(0b0);
foreach (i from 0 to sizeof(xlen) by 8) {
result[i+7..i] = reverse_bits_in_byte(X(rs1)[i+7..i]);
};
X(rd) = result;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.31 rol
Synopsis Rotate Left (Register)
Mnemonic rol rd, rs1, rs2
Encoding
Description This instruction performs a rotate left of rs1 by the amount in least-significant log2(XLEN) bits of rs2.
Operation
let shamt = if xlen == 32
then X(rs2)[4..0]
else X(rs2)[5..0];
let result = (X(rs1) \<\< shamt) | (X(rs1) >> (xlen - shamt));
X(rd) = result;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.32 rolw
Synopsis Rotate Left Word (Register)
Mnemonic rolw rd, rs1, rs2
Encoding
Description This instruction performs a rotate left on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits.
Operation
let rs1 = EXTZ(X(rs1)[31..0])
let shamt = X(rs2)[4..0];
let result = (rs1 \<\< shamt) | (rs1 >> (32 - shamt));
X(rd) = EXTS(result[31..0]);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.33 ror
Synopsis Rotate Right
Mnemonic ror rd, rs1, rs2
Encoding
Description This instruction performs a rotate right of rs1 by the amount in least-significant log2(XLEN) bits of rs2.
Operation
let shamt = if xlen == 32
then X(rs2)[4..0]
else X(rs2)[5..0];
let result = (X(rs1) >> shamt) | (X(rs1) \<\< (xlen - shamt));
X(rd) = result;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.34 rori
Synopsis Rotate Right (Immediate)
Mnemonic rori rd, rs1, shamt
Encoding (RV32)
Encoding (RV64)
Description This instruction performs a rotate right of rs1 by the amount in the least-significant log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
Operation
let shamt = if xlen == 32
then shamt[4..0]
else shamt[5..0];
let result = (X(rs1) >> shamt) | (X(rs1) \<\< (xlen - shamt));
X(rd) = result;
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.35 roriw
Synopsis Rotate Right Word by Immediate
Mnemonic roriw rd, rs1, shamt
Encoding
Description This instruction performs a rotate right on the least-significant word of rs1 by the amount in the least-significant log2(XLEN) bits of shamt. The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits.
Operation
let rs1_data = EXTZ(X(rs1)[31..0];
let result = (rs1_data >> shamt) | (rs1_data \<\< (32 - shamt));
X(rd) = EXTS(result[31..0]);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.36 rorw
Synopsis Rotate Right Word (Register)
Mnemonic rorw rd, rs1, rs2
Encoding
Description This instruction performs a rotate right on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. The resultant word is sign-extended by copying bit 31 to all of the more-significant bits.
Operation
let rs1 = EXTZ(X(rs1)[31..0])
let shamt = X(rs2)[4..0];
let result = (rs1 >> shamt) | (rs1 \<\< (32 - shamt));
X(rd) = EXTS(result);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.37 sext.b
Synopsis Sign-extend byte
Mnemonic sext.b rd, rs
Encoding
Description This instruction sign-extends the least-significant byte in the source to XLEN by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits.
Operation
X(rd) = EXTS(X(rs)[7..0]);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
11.9.38 sext.h
Synopsis Sign-extend halfword
Mnemonic sext.h rd, rs
Encoding
Description This instruction sign-extends the least-significant halfword in rs to XLEN by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits.
Operation
X(rd) = EXTS(X(rs)[15..0]);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
11.9.39 sh1add
Synopsis Shift left by 1 and add
Mnemonic sh1add rd, rs1, rs2
Encoding
Description This instruction shifts rs1 to the left by 1 bit and adds it to rs2.
Operation
X(rd) = X(rs2) + (X(rs1) \<\< 1);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
11.9.40 sh1add.uw
Synopsis Shift unsigned word left by 1 and add
Mnemonic sh1add.uw rd, rs1, rs2
Encoding
Description This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 1 place.
Operation
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + (index \<\< 1);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
11.9.41 sh2add
Synopsis Shift left by 2 and add
Mnemonic sh2add rd, rs1, rs2
Encoding
Description This instruction shifts rs1 to the left by 2 places and adds it to rs2.
Operation
X(rd) = X(rs2) + (X(rs1) \<\< 2);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
11.9.42 sh2add.uw
Synopsis Shift unsigned word left by 2 and add
Mnemonic sh2add.uw rd, rs1, rs2
Encoding
Description This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 2 places.
Operation
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + (index \<\< 2);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
11.9.43 sh3add
Synopsis Shift left by 3 and add
Mnemonic sh3add rd, rs1, rs2
Encoding
Description This instruction shifts rs1 to the left by 3 places and adds it to rs2.
Operation
X(rd) = X(rs2) + (X(rs1) \<\< 3);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
11.9.44 sh3add.uw
Synopsis Shift unsigned word left by 3 and add
Mnemonic sh3add.uw rd, rs1, rs2
Encoding
Description This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 3 places.
Operation
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + (index \<\< 3);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
11.9.45 slli.uw
Synopsis Shift-left unsigned word (Immediate)
Mnemonic slli.uw rd, rs1, shamt
Encoding
Description This instruction takes the least-significant word of rs1, zero-extends it, and shifts it left by the immediate.
Operation
X(rd) = (EXTZ(X(rs)[31..0]) \<\< shamt);
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zba (zba) | 0.93 | Ratified |
This instruction is the same as slli with zext.w performed on rs1 before shifting.
11.9.46 unzip
Synopsis Place odd and even bits of the source register into upper and lower halves of the destination register, respectively.
Mnemonic unzip rd, rs
Encoding
Description This instruction scatters all of the odd and even bits of a source word into the high and low halves of a destination word. It is the inverse of the zip instruction. This instruction is available only on RV32.
Operation
foreach (i from 0 to xlen/2-1) {
X(rd)[i] = X(rs1)[2*i]
X(rd)[i+xlen/2] = X(rs1)[2*i+1]
}
This instruction is useful for implementing the SHA3 cryptographic hash function on a 32-bit architecture, as it implements the bit-interleaving operation used to speed up the 64-bit rotations directly.
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkb (zbkb) (RV32) | v1.0 | Ratified |
11.9.47 xnor
Synopsis Exclusive NOR
Mnemonic xnor rd, rs1, rs2
Encoding
Description This instruction performs the bit-wise exclusive-NOR operation on rs1 and rs2.
Operation
X(rd) = ~(X(rs1) ^ X(rs2));
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
| Zbkb (zbkb) | v1.0 | Ratified |
11.9.48 xperm8
Synopsis Byte-wise lookup of indices into a vector in registers.
Mnemonic xperm8 rd, rs1, rs2
Encoding
Description The xperm8 instruction operates on bytes. The rs1 register contains a vector of XLEN/8 8-bit elements. The rs2 register contains a vector of XLEN/8 8-bit indexes. The result is each element in rs2 replaced by the indexed element in rs1, or zero if the index into rs2 is out of bounds.
Operation
val xperm8_lookup : (bits(8), xlenbits) -> bits(8)
function xperm8_lookup (idx, lut) = {
(lut >> (idx @ 0b000))[7..0]
}
function clause execute ( XPERM8 (rs2,rs1,rd)) = {
result : xlenbits = EXTZ(0b0);
foreach(i from 0 to xlen by 8) {
result[i+7..i] = xperm8_lookup(X(rs2)[i+7..i], X(rs1));
};
X(rd) = result;
RETIRE_SUCCESS
}
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkx (zbkx) | v1.0 | Ratified |
11.9.49 xperm4
Synopsis Nibble-wise lookup of indices into a vector.
Mnemonic xperm4 rd, rs1, rs2
Encoding
Description The xperm4 instruction operates on nibbles. The rs1 register contains a vector of XLEN/4 4-bit elements. The rs2 register contains a vector of XLEN/4 4-bit indexes. The result is each element in rs2 replaced by the indexed element in rs1, or zero if the index into rs2 is out of bounds.
Operation
val xperm4_lookup : (bits(4), xlenbits) -> bits(4)
function xperm4_lookup (idx, lut) = {
(lut >> (idx @ 0b00))[3..0]
}
function clause execute ( XPERM4 (rs2,rs1,rd)) = {
result : xlenbits = EXTZ(0b0);
foreach(i from 0 to xlen by 4) {
result[i+3..i] = xperm4_lookup(X(rs2)[i+3..i], X(rs1));
};
X(rd) = result;
RETIRE_SUCCESS
}
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkx (zbkx) | v1.0 | Ratified |
11.9.50 zext.h
Synopsis Zero-extend halfword
Mnemonic zext.h rd, rs
Encoding (RV32)
Encoding (RV64)
Description This instruction zero-extends the least-significant halfword of the source to XLEN by inserting 0’s into all of the bits more significant than 15.
Operation
X(rd) = EXTZ(X(rs)[15..0]);
The zext.h mnemonic corresponds to different instruction encodings in RV32 and RV64.
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbb (zbb) | 0.93 | Ratified |
11.9.51 zip
Synopsis Interleave upper and lower halves of the source register into odd and even bits of the destination register, respectively.
Mnemonic zip rd, rs
Encoding
Description This instruction gathers bits from the high and low halves of the source word into odd/even bit positions in the destination word. It is the inverse of the unzip instruction. This instruction is available only on RV32.
Operation
foreach (i from 0 to xlen/2-1) {
X(rd)[2*i] = X(rs1)[i]
X(rd)[2*i+1] = X(rs1)[i+xlen/2]
}
This instruction is useful for implementing the SHA3 cryptographic hash function on a 32-bit architecture, as it implements the bit-interleaving operation used to speed up the 64-bit rotations directly.
Included in
| Extension | Minimum version | Lifecycle state |
|---|---|---|
| Zbkb (zbkb) (RV32) | v1.0 | Ratified |