8 Scalar Floating-Point Extensions
This chapter is currently being restructured. Its contents are normative, but the presentation might appear disjoint.
This chapter describes the scalar floating-point extensions.
The F extension adds floating-point registers and instructions for
computation on single-precision floating-point values.
The D and Q extensions widen those registers to hold double- and
quad-precision floating-point values, respectively, and add instructions
for computation on those formats.
Several additional extensions with the Zf and Zd prefixes provide
additional computational instructions.
The Zfinx and Zdinx extensions add computational instructions analogous to
those in the F and D extensions, but they instead operate on
floating-point numbers in the x registers.
These extensions, intended for lower-cost systems, are incompatible with the
F and D extensions.
8.1 "F" Extension for Single-Precision Floating-Point, Version 2.2
This chapter describes the F standard extension for single-precision floating-point, which adds computational instructions compliant with the IEEE 754-2008 arithmetic standard's binary32 format and operations. The F extension depends on the "Zicsr" extension for control and status register access.
8.1.1 F Register State
The F extension adds 32 floating-point registers, f0-f31, each 32
bits wide, and a floating-point control and status register fcsr,
which contains the operating mode and exception status of the
floating-point unit. This additional state is shown in
fprs. We use the term FLEN to describe the width of
the floating-point registers in the RISC-V ISA, and FLEN=32 for the F
single-precision floating-point extension. Most floating-point
instructions operate on values in the floating-point register file.
Floating-point load and store instructions transfer floating-point
values between registers and memory. Instructions to transfer values to and from the integer register file are also provided.
We considered a unified register file for both integer and floating-point values as this simplifies software register allocation and calling conventions, and reduces total user state. However, a split organization increases the total number of registers accessible with a given instruction width, simplifies provision of enough register file ports for wide superscalar issue, supports decoupled floating-point-unit architectures, and simplifies use of internal floating-point encoding techniques. Compiler support and calling conventions for split register file architectures are well understood, and using dirty bits on floating-point register file state can reduce context-switch overhead.
| FLEN-1 | 0 | |
|---|---|---|
| f0 | ||
| f1 | ||
| f2 | ||
| f3 | ||
| f4 | ||
| f5 | ||
| f6 | ||
| f7 | ||
| f8 | ||
| f9 | ||
| f10 | ||
| f11 | ||
| f12 | ||
| f13 | ||
| f14 | ||
| f15 | ||
| f16 | ||
| f17 | ||
| f18 | ||
| f19 | ||
| f20 | ||
| f21 | ||
| f22 | ||
| f23 | ||
| f24 | ||
| f25 | ||
| f26 | ||
| f27 | ||
| f28 | ||
| f29 | ||
| f30 | ||
| f31 | ||
| FLEN | ||
| 31 | 0 | |
| fcsr | ||
| 32 | ||
8.1.2 Floating-Point Control and Status Register
The floating-point control and status register, fcsr, is a RISC-V
control and status register (CSR). It is a 32-bit read/write register
that selects the dynamic rounding mode for floating-point arithmetic
operations and holds the accrued exception flags, as shown in fcsr.
The fcsr register can be read and written with the FRCSR and FSCSR
instructions, which are assembler pseudoinstructions built on the
underlying CSR access instructions. FRCSR reads fcsr by copying it
into integer register rd. FSCSR swaps the value in fcsr by copying
the original value into integer register rd, and then writing a new
value obtained from integer register rs1 into fcsr.
The fields within the fcsr can also be accessed individually through
different CSR addresses, and separate assembler pseudoinstructions are defined
for these accesses. The FRRM instruction reads the Rounding Mode field frm
(fcsr bits 7—5) and copies it into the least-significant three bits of
integer register rd, with zero in all other bits. FSRM swaps the value in
frm by copying the original value into integer register rd, and then
writing a new value obtained from the three least-significant bits of integer
register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the
Accrued Exception Flags field fflags (fcsr bits 4—0).
Bits 31—8 of the fcsr are reserved for other standard extensions. If
these extensions are not present, implementations shall ignore writes to
these bits and supply a zero value when read. Standard software should
preserve the contents of these bits.
Floating-point operations use either a static rounding mode encoded in
the instruction, or a dynamic rounding mode held in frm. Rounding
modes are encoded as shown in norm:dyn_round_enc. A value of 111 in the
instruction’s rm field selects the dynamic rounding mode held in
frm. The behavior of floating-point instructions that depend on
rounding mode when executed with a reserved rounding mode is reserved, including both static reserved rounding modes (101-110) and dynamic reserved rounding modes (101-111). Some instructions, including widening conversions, have the rm field but are nevertheless mathematically unaffected by the rounding mode; software should set their rm field to
RNE (000) but implementations must treat the rm field as usual (in
particular, with regard to decoding legal vs. reserved encodings).
| Rounding Mode | Mnemonic | Meaning |
|---|---|---|
| 000 | RNE | Round to Nearest, ties to Even |
| 001 | RTZ | Round towards Zero |
| 010 | RDN | Round Down (towards −∞) |
| 011 | RUP | Round Up (towards +∞) |
| 100 | RMM | Round to Nearest, ties to Max Magnitude |
| 101 | Reserved for future use. | |
| 110 | Reserved for future use. | |
| 111 | DYN | In instruction’s rm field, selects dynamic rounding mode; In Rounding Mode register, reserved. |
The C99 language standard effectively mandates the provision of a dynamic rounding mode register. In typical implementations, writes to the dynamic rounding mode CSR state will serialize the pipeline. Static rounding modes are used to implement specialized arithmetic operations that often have to switch frequently between different rounding modes.
The ratified version of the F spec mandated that an illegal-instruction exception was raised when an instruction was executed with a reserved dynamic rounding mode. This has been weakened to reserved, which matches the behavior of static rounding-mode instructions. Raising an illegal-instruction exception is still valid behavior when encountering a reserved encoding, so implementations compatible with the ratified spec are compatible with the weakened spec.
The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software, as shown in bitdef. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
| Flag Mnemonic | Flag Meaning |
|---|---|
| NV | Invalid Operation |
| DZ | Divide by Zero |
| OF | Overflow |
| UF | Underflow |
| NX | Inexact |
As allowed by IEEE 754-2008, we do not support traps on floating-point exceptions in the F extension, but instead require explicit checks of the flags in software. We considered adding branches controlled directly by the contents of the floating-point accrued exception flags, but ultimately chose to omit these instructions to keep the ISA simple.
8.1.3 NaN Generation and Propagation
Except when otherwise stated, if the result of a floating-point
operation is NaN, it is the canonical NaN. The canonical NaN has a
positive sign and all significand bits clear except the MSB, a.k.a. the
quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000.
We considered propagating NaN payloads, as is recommended by IEEE 754-2008, but this decision would have increased hardware cost. Moreover, since this feature is optional in IEEE 754-2008, it cannot be used in portable code.
Implementers are free to provide a NaN payload propagation scheme as a nonstandard extension enabled by a nonstandard operating mode. However, the canonical NaN scheme described above must always be supported and should be the default mode.
We require implementations to return the standard-mandated default values in the case of exceptional conditions, without any further intervention on the part of user-level software (unlike the Alpha ISA floating-point trap barriers). We believe full hardware handling of exceptional cases will become more common, and so wish to avoid complicating the user-level ISA to optimize other approaches. Implementations can always trap to machine-mode software handlers to provide exceptional default values.
8.1.4 Subnormal Arithmetic
Operations on subnormal numbers are handled in accordance with IEEE 754-2008.
In the parlance of IEEE 754-2008, tininess is detected after rounding.
Detecting tininess after rounding results in fewer spurious underflow signals.
8.1.5 Single-Precision Load and Store Instructions
Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory.
FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned.
FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
As described in ldst, the execution environment defines whether misaligned floating-point loads and stores are handled invisibly or raise a contained or fatal trap.
8.1.6 Single-Precision Floating-Point Computational Instructions
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.
The 2-bit floating-point format field fmt is encoded as shown in fmt. It is set to S (00) for all instructions in the F extension.
| fmt field | Mnemonic | Meaning |
|---|---|---|
| 00 | S | 32-bit single-precision |
| 01 | D | 64-bit double-precision |
| 10 | H | 16-bit half-precision |
| 11 | Q | 128-bit quad-precision |
All floating-point operations that perform rounding can select the rounding mode using the rm field with the encoding shown in norm:dyn_round_enc.
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value −0.0 is considered to be less than the value +0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.
Floating-point fused multiply-add instructions require a new standard instruction format. R4-type instructions specify three source registers (rs1, rs2, and rs3) and a destination register (rd). This format is only used by the floating-point fused multiply-add instructions.
FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3.
FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)−rs3.
FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes −(rs1×rs2)+rs3.
FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes −(rs1×rs2)−rs3.
The FNMSUB and FNMADD instructions are counterintuitively named, owing to the naming of the corresponding instructions in MIPS-IV. The MIPS instructions were defined to negate the sum, rather than negating the product as the RISC-V instructions do, so the naming scheme was more rational at the time. The two definitions differ with respect to signed-zero results. The RISC-V definition matches the behavior of the x86 and ARM fused multiply-add instructions, but unfortunately the RISC-V FNMSUB and FNMADD instruction names are swapped as compared to x86, whereas the RISC-V FMSUB and FNMSUB instruction names are swapped as compared to ARM.
The fused multiply-add (FMA) instructions consume a large part of the 32-bit instruction encoding space. Some alternatives considered were to restrict FMA to only use dynamic rounding modes, but static rounding modes are useful in code that exploits the lack of product rounding. Another alternative would have been to use rd to provide rs3, but this would require additional move instructions in some common sequences. The current design still leaves a large portion of the 32-bit encoding space open while avoiding having FMA be non-orthogonal.
The fused multiply-add instructions must set the invalid operation exception flag when the multiplicands are ∞ and zero, even when the addend is a quiet NaN.
IEEE 754-2008 permits, but does not require, raising the invalid exception for the operation ∞×0 + qNaN.
8.1.7 Single-Precision Floating-Point Conversion and Move Instructions
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN>32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. int_conv gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.
All floating-point to integer and integer to floating-point conversion
instructions round according to the rm field. A floating-point
register can be initialized to floating-point positive zero using
FCVT.S.W rd, x0, which will never set any exception flags.
| FCVT.W.S | FCVT.WU.S | FCVT.L.S | FCVT.LU.S | |
|---|---|---|---|---|
| Minimum valid input (after rounding) | −231 | 0 | −263 | 0 |
| Maximum valid input (after rounding) | 231−1 | 232−1 | 263−1 | 264−1 |
| Output for out-of-range negative input | −231 | 0 | −263 | 0 |
| Output for -∞ | −231 | 0 | −263 | 0 |
| Output for out-of-range positive input | 231−1 | 232−1 | 263−1 | 264−1 |
| Output for +∞ or NaN | 231−1 | 232−1 | 263−1 | 264−1 |
All floating-point conversion instructions set the Inexact exception flag if the rounded result differs from the operand value and the Invalid exception flag is not set.
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result’s sign bit is rs2's sign bit; for FSGNJN, the result’s sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).
The sign-injection instructions provide floating-point MV, ABS, and NEG,
as well as supporting a few other operations, including the IEEE
754-2008 copySign operation and sign manipulation in transcendental
math function libraries. Although MV, ABS, and NEG only need a single
register operand, whereas FSGNJ instructions need two, it is unlikely
most microarchitectures would add optimizations to benefit from the
reduced number of register reads for these relatively infrequent
instructions. Even in this case, a microarchitecture can simply detect
when both source registers are the same for FSGNJ instructions and only
read a single copy.
Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.W moves the single-precision value in floating-point register rs1 represented in the IEEE 754-2008 encoding to the lower 32 bits of integer register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved. For RV64, the higher 32 bits of the destination register are filled with copies of the floating-point number’s sign bit.
FMV.W.X moves the single-precision value encoded in the IEEE 754-2008 encoding from the lower 32 bits of integer register rs1 to the floating-point register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved.
The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.
The base floating-point ISA was defined so as to allow implementations to employ an internal recoding of the floating-point format in registers to simplify handling of subnormal values and possibly to reduce functional unit latency. To this end, the F extension avoids representing integer values in the floating-point registers by defining conversion and comparison operations that read and write the integer register file directly. This also removes many of the common cases where explicit moves between integer and floating-point registers are required, reducing instruction count and critical paths for common mixed-format code sequences.
8.1.8 Single-Precision Floating-Point Compare Instructions
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 ≤ rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.
FLT.S and FLE.S perform what IEEE 754-2008 refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.
The F extension provides a ≤ comparison, whereas the base ISAs provide a ≥ branch comparison. Because ≤ can be synthesized from ≥ and vice-versa, there is no performance implication to this inconsistency, but it is nevertheless an unfortunate incongruity in the ISA.
8.1.9 Single-Precision Floating-Point Classify Instruction
The FCLASS.S instruction examines the value in floating-point register rs1 and writes to integer register rd a 10-bit mask that indicates the class of the floating-point number. The format of the mask is described in fclass. The corresponding bit in rd will be set if the property is true and clear otherwise. All other bits in rd are cleared. Note that exactly one bit in rd will be set. FCLASS.S does not set the floating-point exception flags.
| rd bit | Meaning |
|---|---|
| 0 | rs1 is −∞. |
| 1 | rs1 is a negative normal number. |
| 2 | rs1 is a negative subnormal number. |
| 3 | rs1 is −0. |
| 4 | rs1 is +0. |
| 5 | rs1 is a positive subnormal number. |
| 6 | rs1 is a positive normal number. |
| 7 | rs1 is +∞. |
| 8 | rs1 is a signaling NaN. |
| 9 | rs1 is a quiet NaN. |
8.2 "D" Extension for Double-Precision Floating-Point, Version 2.2
This chapter describes the D standard extension for double-precision floating-point, which adds computational instructions compliant with the IEEE 754-2008 arithmetic standard’s binary64 format and operations. The D extension depends on the F extension.
8.2.1 D Register State
The D extension widens the 32 floating-point registers,
f0-f31, to 64 bits (FLEN=64 in fprs). The f registers can now hold
either 32-bit or 64-bit floating-point values as described below in
nanboxing.
FLEN can be 32, 64, or 128 depending on which of the F, D, and Q extensions are supported. There can be up to four different floating-point precisions supported, including H, F, D, and Q.
8.2.2 NaN Boxing of Narrower Values
When multiple floating-point precisions are supported, then valid values of narrower n-bit types, n<FLEN, are represented in the lower n bits of an FLEN-bit NaN value, in a process termed NaN-boxing. The upper bits of a valid NaN-boxed value must be all 1s. Valid NaN-boxed n-bit values therefore appear as negative quiet NaNs (qNaNs) when viewed as any wider m-bit value, n < m ≤ FLEN. Any operation that writes a narrower result to an 'f' register must write all 1s to the uppermost FLEN-n bits to yield a legal NaN-boxedvalue.
Software might not know the current type of data stored in a floating-point register but has to be able to save and restore the register values, hence the result of using wider operations to transfer narrower values has to be defined. A common case is for callee-saved registers, but a standard convention is also desirable for features including variadic functions, user-level threading libraries, virtual machine migration, and debugging.
Floating-point n-bit transfer operations move external values held in
the IEEE 754-2008 formats into and out of the f registers, and
comprise floating-point loads and stores (FL_n_/FS_n_) and
floating-point move instructions (FMV.n.X/FMV.X.n).
A narrower n-bit
transfer, n<FLEN, into the f registers will create a valid NaN-boxed
value. A narrower
n-bit transfer out of the floating-point registers will
transfer the lower n bits of the register ignoring the
upper FLEN-n bits.
Apart from transfer operations described in the previous paragraph, all other floating-point operations on narrower n-bit operations, n<FLEN, check if the input operands are correctly NaN-boxed, i.e., all upper FLEN-n bits are 1. If so, the n least-significant bits of the input are used as the input value, otherwise the input value is treated as an n-bit canonical NaN.
Earlier versions of this extension did not define the behavior of feeding the results of narrower or wider operands into an operation, except to require that wider saves and restores would preserve the value of a narrower operand. The new definition removes this implementation-specific behavior, while still accommodating both non-recoded and recoded implementations of the floating-point unit. The new definition also helps catch software errors by propagating NaNs if values are used incorrectly.
Non-recoded implementations unpack and pack the operands to the IEEE 754-2008 format on the input and output of every floating-point operation. The NaN-boxing cost to a non-recoded implementation is primarily in checking if the upper bits of a narrower operation represent a legal NaN-boxed value, and in writing all 1s to the upper bits of a result.
Recoded implementations use a more convenient internal format to represent floating-point values, with an added exponent bit to allow all values to be held normalized. The cost to the recoded implementation is primarily the extra tagging needed to track the internal types and sign bits, but this can be done without adding new state bits by recoding NaNs internally in the exponent field. Small modifications are needed to the pipelines used to transfer values in and out of the recoded format, but the datapath and latency costs are minimal. The recoding process has to handle shifting of input subnormal values for wide operands in any case, and extracting the NaN-boxed value is a similar process to normalization except for skipping over leading-1 bits instead of skipping over leading-0 bits, allowing the datapath multiplexing to be shared.
8.2.3 Double-Precision Load and Store Instructions
The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory.
The double-precision value may be a NaN-boxed single-precision value.
FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN≥64.
FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
8.2.4 Double-Precision Floating-Point Computational Instructions
The double-precision floating-point computational instructions are defined analogously to their single-precision counterparts, but operate on double-precision operands and produce double-precision results.
8.2.5 Double-Precision Floating-Point Conversion and Move Instructions
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
All floating-point to integer and integer to floating-point conversion instructions round according to the rm field. Note FCVT.D.W[U] always produces an exact result and is unaffected by rounding mode.
The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers. The rs2 field encodes the datatype of the source, and the fmt field encodes the datatype of the destination. FCVT.S.D rounds according to the RM field; FCVT.D.S will never round.
Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction.
For XLEN≥64 only, instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in the IEEE 754-2008 encoding in integer register rd. FMV.D.X moves the double-precision value encoded in the IEEE 754-2008 encoding from the integer register rs1 to the floating-point register rd.
FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
Early versions of the RISC-V ISA had additional instructions to allow RV32 systems to transfer between the upper and lower portions of a 64-bit floating-point register and an integer register. However, these would be the only instructions with partial register writes and would add complexity in implementations with recoded floating-point or register renaming, requiring a pipeline read-modify-write sequence. Scaling up to handling quad-precision for RV32 and RV64 would also require additional instructions if they were to follow this pattern. The ISA was defined to reduce the number of explicit int-float register moves, by having conversions and comparisons write results to the appropriate register file, so we expect the benefit of these instructions to be lower than for other ISAs.
We note that for systems that implement a 64-bit floating-point unit including fused multiply-add support and 64-bit floating-point loads and stores, the marginal hardware cost of moving from a 32-bit to a 64-bit integer datapath is low, and a software ABI supporting 32-bit wide address-space and pointers can be used to avoid growth of static data and dynamic memory traffic.
8.2.6 Double-Precision Floating-Point Compare Instructions
The double-precision floating-point compare instructions are defined analogously to their single-precision counterparts, but operate on double-precision operands.
8.2.7 Double-Precision Floating-Point Classify Instruction
The double-precision floating-point classify instruction, FCLASS.D, is defined analogously to its single-precision counterpart, but operates on double-precision operands.
8.3 "Q" Extension for Quad-Precision Floating-Point, Version 2.2
This chapter describes the Q standard extension for quad-precision floating-point, which adds computational instructions compliant with the IEEE 754-2008 arithmetic standard’s binary128 format and operations. The Q extension depends on the D extension.
The floating-point registers are now extended to hold either a single, double, or quad-precision floating-point value (FLEN=128). The NaN-boxing scheme described in nanboxing is now extended recursively to allow a single-precision value to be NaN-boxed inside a double-precision value which is itself NaN-boxed inside a quad-precision value.
8.3.1 Quad-Precision Load and Store Instructions
New 128-bit variants of LOAD-FP and STORE-FP instructions are added, encoded with a new value for the funct3 width field.
FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128.
FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
8.3.2 Quad-Precision Computational Instructions
A new supported format is added to the format field of most instructions, as shown in fpextfmt
| fmt field | Mnemonic | Meaning |
|---|---|---|
| 00 | S | 32-bit single-precision |
| 01 | D | 64-bit double-precision |
| 10 | H | 16-bit half-precision |
| 11 | Q | 128-bit quad-precision |
The quad-precision floating-point computational instructions are defined analogously to their double-precision counterparts, but operate on quad-precision operands and produce quad-precision results.
8.3.3 Quad-Precision Convert and Move Instructions
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. Note FCVT.Q.L[U] always produces an exact result and is unaffected by rounding mode.
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.
Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction.
FMV.X.Q and FMV.Q.X instructions are not provided in RV32 or RV64, so quad-precision bit patterns must be moved to the integer registers via memory.
8.3.4 Quad-Precision Floating-Point Compare Instructions
The quad-precision floating-point compare instructions are defined analogously to their double-precision counterparts, but operate on quad-precision operands.
8.3.5 Quad-Precision Floating-Point Classify Instruction
The quad-precision floating-point classify instruction, FCLASS.Q, is defined analogously to its double-precision counterpart, but operates on quad-precision operands.
8.4 Zfh Extension for Half-Precision Floating-Point
This chapter describes the Zfh standard extension for half-precision floating-point, which adds computational instructions compliant with the IEEE 754-2008 arithmetic standard’s binary16 format and operations. The Zfh extension depends on the F extension. The NaN-boxing scheme described in nanboxing is extended to allow a half-precision value to be NaN-boxed inside a single-precision value (which may be recursively NaN-boxed inside a double- or quad-precision value when the D or Q extension is present).
This extension primarily provides instructions that consume half-precision operands and produce half-precision results. However, it is also common to compute on half-precision data using higher intermediate precision. Although this extension provides explicit conversion instructions that suffice to implement that pattern, future extensions might further accelerate such computation with additional instructions that implicitly widen their operands—e.g., half×half+single→single—or implicitly narrow their results—e.g., half+single→half.
8.4.1 Half-Precision Load and Store Instructions
New 16-bit variants of LOAD-FP and STORE-FP instructions are added, encoded with a new value for the funct3 width field.
FLH and FSH are only guaranteed to execute atomically if the effective address is naturally aligned.
FLH and FSH do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. FLH NaN-boxes the result written to rd, whereas FSH ignores all but the lower 16 bits in rs2.
8.4.2 Half-Precision Computational Instructions
A new supported format is added to the format field of most instructions, as shown in tab:fpextfmth.
| fmt field | Mnemonic | Meaning |
|---|---|---|
| 00 | S | 32-bit single-precision |
| 01 | D | 64-bit double-precision |
| 10 | H | 16-bit half-precision |
| 11 | Q | 128-bit quad-precision |
The half-precision floating-point computational instructions are defined analogously to their single-precision counterparts, but operate on half-precision operands and produce half-precision results.
8.4.3 Half-Precision Conversion and Move Instructions
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.
Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction.
Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.H moves the half-precision value in floating-point register rs1 to a representation in the IEEE 754-2008 encoding in integer register rd, filling the upper XLEN-16 bits with copies of the floating-point number’s sign bit.
FMV.H.X moves the half-precision value encoded in the IEEE 754-2008 encoding from the lower 16 bits of integer register rs1 to the floating-point register rd, NaN-boxing the result.
FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
8.4.4 Half-Precision Floating-Point Compare Instructions
The half-precision floating-point compare instructions are defined analogously to their single-precision counterparts, but operate on half-precision operands.
8.4.5 Half-Precision Floating-Point Classify Instruction
The half-precision floating-point classify instruction, FCLASS.H, is defined analogously to its single-precision counterpart, but operates on half-precision operands.
8.5 Zfhmin Standard Extension for Minimal Half-Precision Floating-Point
This section describes the Zfhmin standard extension, which provides minimal support for 16-bit half-precision binary floating-point instructions. The Zfhmin extension is a subset of the Zfh extension, consisting only of data transfer and conversion instructions. Like Zfh, the Zfhmin extension depends on the single-precision floating-point extension, F. The expectation is that Zfhmin software primarily uses the half-precision format for storage, performing most computation in higher precision.
The Zfhmin extension includes the following instructions from the Zfh extension: FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S. If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are also included. If the Q extension is present, the FCVT.Q.H and FCVT.H.Q instructions are additionally included.
Zfhmin does not include the FSGNJ.H instruction, because it suffices to instead use the FSGNJ.S instruction to move half-precision values between floating-point registers.
Half-precision addition, subtraction, multiplication, division, and square-root operations can be faithfully emulated by converting the half-precision operands to single-precision, performing the operation using single-precision arithmetic, then converting back to half-precision. [22] Performing half-precision fused multiply-addition using this method incurs a 1-ulp error on some inputs for the RNE and RMM rounding modes.
Conversion from 8- or 16-bit integers to half-precision can be emulated by first converting to single-precision, then converting to half-precision. Conversion from 32-bit integer can be emulated by first converting to double-precision. If the D extension is not present and a 1-ulp error under RNE or RMM is tolerable, 32-bit integers can be first converted to single-precision instead. The same remark applies to conversions from 64-bit integers without the Q extension.
8.6 "Zfa" Extension for Additional Floating-Point Instructions, Version 1.0
This chapter describes the Zfa standard extension, which adds
instructions for immediate loads, IEEE 754-2019 minimum and maximum
operations, round-to-integer operations, and quiet floating-point
comparisons. For RV32D, the Zfa extension also adds instructions to
transfer double-precision floating-point values to and from integer
registers, and for RV64Q, it adds analogous instructions for
quad-precision floating-point values. The Zfa extension depends on the
F extension.
8.6.1 Load-Immediate Instructions
The FLI.S instruction loads one of 32 single-precision floating-point constants, encoded in the rs1 field, into floating-point register rd. The correspondence of rs1 field values and single-precision floating-point values is shown in tab:flis. FLI.S is encoded like FMV.W.X, but with rs2=1.
| rs1 | Value | Sign | Exponent | Significand |
|---|---|---|---|---|
| 0 | −1.0 | 1 | 01111111 | 000…​000 |
| 1 | Minimum positive normal | 0 | 00000001 | 000…​000 |
| 2 | 1.0 × 2−16 | 0 | 01101111 | 000…​000 |
| 3 | 1.0 × 2−15 | 0 | 01110000 | 000…​000 |
| 4 | 1.0 × 2−8 | 0 | 01110111 | 000…​000 |
| 5 | 1.0 × 2−7 | 0 | 01111000 | 000…​000 |
| 6 | 0.0625 (2−4) | 0 | 01111011 | 000…​000 |
| 7 | 0.125 (2−3) | 0 | 01111100 | 000…​000 |
| 8 | 0.25 | 0 | 01111101 | 000…​000 |
| 9 | 0.3125 | 0 | 01111101 | 010…​000 |
| 10 | 0.375 | 0 | 01111101 | 100…​000 |
| 11 | 0.4375 | 0 | 01111101 | 110…​000 |
| 12 | 0.5 | 0 | 01111110 | 000…​000 |
| 13 | 0.625 | 0 | 01111110 | 010…​000 |
| 14 | 0.75 | 0 | 01111110 | 100…​000 |
| 15 | 0.875 | 0 | 01111110 | 110…​000 |
| 16 | 1.0 | 0 | 01111111 | 000…​000 |
| 17 | 1.25 | 0 | 01111111 | 010…​000 |
| 18 | 1.5 | 0 | 01111111 | 100…​000 |
| 19 | 1.75 | 0 | 01111111 | 110…​000 |
| 20 | 2.0 | 0 | 10000000 | 000…​000 |
| 21 | 2.5 | 0 | 10000000 | 010…​000 |
| 22 | 3 | 0 | 10000000 | 100…​000 |
| 23 | 4 | 0 | 10000001 | 000…​000 |
| 24 | 8 | 0 | 10000010 | 000…​000 |
| 25 | 16 | 0 | 10000011 | 000…​000 |
| 26 | 128 (27) | 0 | 10000110 | 000…​000 |
| 27 | 256 (28) | 0 | 10000111 | 000…​000 |
| 28 | 215 | 0 | 10001110 | 000…​000 |
| 29 | 216 | 0 | 10001111 | 000…​000 |
| 30 | +∞ | 0 | 11111111 | 000…​000 |
| 31 | Canonical NaN | 0 | 11111111 | 100…​000 |
The preferred assembly syntax for entries 1, 30, and 31 is min, inf,
and nan, respectively. For entries 0 through 29 (including entry 1),
the assembler will accept decimal constants in C-like syntax.
The set of 32 constants was chosen by examining floating-point libraries, including the C standard math library, and to optimize fixed-point to floating-point conversion.
Entries 8-22 follow a regular encoding pattern. No entry sets mantissa bits other than the two most significant ones.
If the D extension is implemented, FLI.D performs the analogous operation, but loads a double-precision value into floating-point register rd. Note that entry 1 (corresponding to the minimum positive normal value) has a numerically different value for double-precision than for single-precision. FLI.D is encoded like FLI.S, but with fmt=D.
If the Q extension is implemented, FLI.Q performs the analogous operation, but loads a quad-precision value into floating-point register rd. Note that entry 1 (corresponding to the minimum positive normal value) has a numerically different value for quad-precision. FLI.Q is encoded like FLI.S, but with fmt=Q.
If the Zfh or Zvfh extension is implemented, FLI.H performs the analogous operation, but loads a half-precision floating-point value into register rd. Note that entry 1 (corresponding to the minimum positive normal value) has a numerically different value for half-precision. Furthermore, since 216 is not representable in half-precision floating-point, entry 29 in the table instead loads positive infinity—i.e., it is redundant with entry 30. FLI.H is encoded like FLI.S, but with fmt=H.
Additionally, since 2−16 and 2−15 are subnormal in half-precision, entry 1 is numerically greater than entries 2 and 3 for FLI.H.
The FLI.fmt instructions never set any floating-point exception flags.
8.6.2 Minimum and Maximum Instructions
The FMINM.S and FMAXM.S instructions are defined like the FMIN.S and FMAX.S instructions, except that if either input is NaN, the result is the canonical NaN.
If the D extension is implemented, FMINM.D and FMAXM.D instructions are analogously defined to operate on double-precision numbers.
If the Zfh extension is implemented, FMINM.H and FMAXM.H instructions are analogously defined to operate on half-precision numbers.
If the Q extension is implemented, FMINM.Q and FMAXM.Q instructions are analogously defined to operate on quad-precision numbers.
These instructions are encoded like their FMIN and FMAX counterparts, but with instruction bit 13 set to 1.
These instructions implement the IEEE 754-2019 minimum and maximum
operations.
8.6.3 Round-to-Integer Instructions
The FROUND.S instruction rounds the single-precision floating-point number in floating-point register rs1 to an integer, according to the rounding mode specified in the instruction’s rm field. It then writes that integer, represented as a single-precision floating-point number, to floating-point register rd. Zero and infinite inputs are copied to rd unmodified. Signaling NaN inputs cause the invalid operation exception flag to be set; no other exception flags are set. FROUND.S is encoded like FCVT.S.D, but with rs2=4.
The FROUNDNX.S instruction is defined similarly, but it also sets the inexact exception flag if the input differs from the rounded result and is not NaN. FROUNDNX.S is encoded like FCVT.S.D, but with rs2=5.
If the D extension is implemented, FROUND.D and FROUNDNX.D instructions are analogously defined to operate on double-precision numbers. They are encoded like FCVT.D.S, but with rs2=4 and 5, respectively,
If the Zfh extension is implemented, FROUND.H and FROUNDNX.H instructions are analogously defined to operate on half-precision numbers. They are encoded like FCVT.H.S, but with rs2=4 and 5, respectively,
If the Q extension is implemented, FROUND.Q and FROUNDNX.Q instructions are analogously defined to operate on quad-precision numbers. They are encoded like FCVT.Q.S, but with rs2=4 and 5, respectively,
The FROUNDNX.fmt instructions implement the IEEE 754-2019
roundToIntegralExact operation, and the FROUND.fmt instructions
implement the other operations in the roundToIntegral family.
8.6.4 Modular Convert-to-Integer Instruction
The FCVTMOD.W.D instruction is defined similarly to the FCVT.W.D instruction, with the following differences. FCVTMOD.W.D always rounds towards zero. Bits 31:0 are taken from the rounded, unbounded two’s complement result, then sign-extended to XLEN bits and written to integer register rd. ±∞ and NaN are converted to zero.
Floating-point exception flags are raised the same as they would be for FCVT.W.D with the same input operand.
This instruction is only provided if the D extension is implemented. It is encoded like FCVT.W.D, but with the rs2 field set to 8 and the rm field set to 1 (RTZ). Other rm values are reserved.
The assembly syntax requires the RTZ rounding mode to be explicitly
specified, i.e., fcvtmod.w.d rd, rs1, rtz.
The FCVTMOD.W.D instruction was added principally to accelerate the processing of JavaScript Numbers. Numbers are double-precision values, but some operators implicitly truncate them to signed integers mod 232.
8.6.5 Move Instructions
For RV32 only, if the D extension is implemented, the FMVH.X.D instruction moves bits 63:32 of floating-point register rs1 into integer register rd. It is encoded in the OP-FP major opcode with funct3=0, rs2=1, and funct7=1110001.
FMVH.X.D is used in conjunction with the existing FMV.X.W instruction to move a double-precision floating-point number to a pair of x-registers.
For RV32 only, if the D extension is implemented, the FMVP.D.X instruction moves a double-precision number from a pair of integer registers into a floating-point register. Integer registers rs1 and rs2 supply bits 31:0 and 63:32, respectively; the result is written to floating-point register rd. FMVP.D.X is encoded in the OP-FP major opcode with funct3=0 and funct7=1011001.
For RV64 only, if the Q extension is implemented, the FMVH.X.Q instruction moves bits 127:64 of floating-point register rs1 into integer register rd. It is encoded in the OP-FP major opcode with funct3=0, rs2=1, and funct7=1110011.
FMVH.X.Q is used in conjunction with the existing FMV.X.D instruction to move a quad-precision floating-point number to a pair of x-registers.
For RV64 only, if the Q extension is implemented, the FMVP.Q.X instruction moves a double-precision number from a pair of integer registers into a floating-point register. Integer registers rs1 and rs2 supply bits 63:0 and 127:64, respectively; the result is written to floating-point register rd. FMVP.Q.X is encoded in the OP-FP major opcode with funct3=0 and funct7=1011011.
8.6.6 Comparison Instructions
The FLEQ.S and FLTQ.S instructions are defined like the FLE.S and FLT.S instructions, except that quiet NaN inputs do not cause the invalid operation exception flag to be set.
If the D extension is implemented, FLEQ.D and FLTQ.D instructions are analogously defined to operate on double-precision numbers.
If the Zfh extension is implemented, FLEQ.H and FLTQ.H instructions are analogously defined to operate on half-precision numbers.
If the Q extension is implemented, FLEQ.Q and FLTQ.Q instructions are analogously defined to operate on quad-precision numbers.
These instructions are encoded like their FLE and FLT counterparts, but with instruction bit 14 set to 1.
We do not expect analogous comparison instructions will be added to the vector ISA, since they can be reasonably efficiently emulated using masking.
8.7 Zfbfmin Extension for Scalar BFloat16 Conversions
This extension provides the minimal set of instructions needed to enable scalar support of the BF16 format. It enables BF16 as an interchange format as it provides conversion between BF16 values and FP32 values.
This extension depends upon the single-precision floating-point extension
F.
This extension includes six instructions: the FCVT.BF16.S and FCVT.S.BF16
instructions, defined below, and the FLH, FSH, FMV.X.H, and FMV.H.X
instructions, defined in chap:zfh.
While conversion instructions tend to include all supported formats, in these extensions we only support conversion between BF16 and FP32 as we are targeting a special use case. These extensions are intended to support the case where BF16 values are used as reduced precision versions of FP32 values, where use of BF16 provides a two-fold advantage for storage, bandwidth, and computation. In this use case, the BF16 values are typically multiplied by each other and accumulated into FP32 sums. These sums are typically converted to BF16 and then used as subsequent inputs. The operations on the BF16 values can be performed on the CPU or a loosely coupled coprocessor.
Subsequent extensions might provide support for native BF16 arithmetic. Such extensions could add additional conversion instructions to allow all supported formats to be converted to and from BF16.
BF16 addition, subtraction, multiplication, division, and square-root operations can be faithfully emulated by converting the BF16 operands to single-precision, performing the operation using single-precision arithmetic, and then converting back to BF16. Performing BF16 fused multiply-addition using this method can produce results that differ by 1-ulp on some inputs for the RNE and RMM rounding modes.
Conversions between BF16 and formats larger than FP32 can be emulated. Exact widening conversions from BF16 can be synthesized by first converting to FP32 and then converting from FP32 to the target precision. Conversions narrowing to BF16 can be synthesized by first converting to FP32 through a series of halving steps and then converting from FP32 to BF16. As with the fused multiply-addition instruction described above, this method of converting values to BF16 can be off by 1-ulp on some inputs for the RNE and RMM rounding modes.
8.7.1 BF16 Number Format
BF16 bits
While BF16 (also known as BFloat16) is not an IEEE 754 standard format, it is a valid floating-point format as defined by IEEE 754-2008, with radix 2, number of significand digits 8, and maximum exponent 127.
BF16 computational instructions defined in this chapter support all IEEE 754-2008 features, including all rounding modes, subnormal inputs and outputs, overflow and underflow, and default exception handling. Tininess is detected after rounding.
The BF16 canonical NaN is 0x7fc0.
BF16 values are NaN-boxed when held in f registers, as described in nanboxing.
8.7.2 fcvt.bf16.s
Synopsis Convert FP32 value to a BF16 value
Mnemonic fcvt.bf16.s rd, rs1
Encoding
While the mnemonic of this instruction is consistent with that of the other RISC-V floating-point convert instructions, a new encoding is used in bits 24:20.
BF16.S and H are used to signify that the source is FP32 and the destination is BF16.
Description
Narrowing convert FP32 value to a BF16 value. Round according to the RM field.
This instruction is similar to other narrowing floating-point-to-floating-point conversion instructions.
Exceptions: Overflow, Underflow, Inexact, Invalid
8.7.3 fcvt.s.bf16
Synopsis Convert BF16 value to an FP32 value
Mnemonic fcvt.s.bf16 rd, rs1
Encoding
While the mnemonic of this instruction is consistent with that of the other RISC-V floating-point convert instructions, a new encoding is used in bits 24:20 to indicate that the source is BF16.
Description
Converts a BF16 value to an FP32 value. The conversion is exact.
This instruction is similar to other widening floating-point-to-floating-point conversion instructions.
If the input is normal or infinity, the BF16 encoded value is shifted to the left by 16 places and the least significant 16 bits are written with 0s.
The result is NaN-boxed by writing the most significant FLEN-32 bits with 1s.
Exceptions: Invalid
8.8 "Zfinx", "Zdinx", "Zhinx", "Zhinxmin" Extensions for Floating-Point in Integer Registers, Version 1.0
This chapter defines the "Zfinx" extension (pronounced "z-f-in-x")
that provides instructions similar to those in the standard
floating-point F extension for single-precision floating-point
instructions but which operate on the x registers instead of the f
registers. This chapter also defines the "Zdinx", "Zhinx", and
"Zhinxmin" extensions that provide similar instructions for other
floating-point precisions.
The F extension uses separate f registers for floating-point
computation, to reduce register pressure and simplify the provision of
register-file ports for wide superscalars. However, the additional 128 B of
architectural state increases the minimal implementation cost. By
eliminating the f registers, the Zfinx extension substantially reduces
the cost of simple RISC-V implementations with floating-point
instruction-set support. Zfinx also reduces context-switch cost.
In general, software that assumes the presence of the F extension is incompatible with software that assumes the presence of the Zfinx extension, and vice versa.
The Zfinx extension adds all of the instructions that the F extension adds, except for the transfer instructions FLW, FSW, FMV.W.X, FMV.X.W, C.FLW[SP], and C.FSW[SP].
Zfinx software uses integer loads and stores to transfer floating-point values from and to memory. Transfers between registers use either integer arithmetic or floating-point sign-injection instructions.
The Zfinx variants of these F-extension instructions have the same
semantics, except that whenever such an instruction would have accessed
an f register, it instead accesses the x register with the same
number.
The Zfinx extension depends on the "Zicsr" extension for control and status register access.
8.8.1 Processing of Narrower Values
Floating-point operands of width w < XLEN bits occupy
bits w-1:0 of an x register. Floating-point operations on w-bit
operands ignore operand bits XLEN-1: w.
Floating-point operations that produce w < XLEN-bit results fill bits XLEN-1: w with copies of bit w-1 (the sign bit).
The NaN-boxing scheme employed in the f registers was designed to
efficiently support recoded floating-point formats. Recoding is less
practical for Zfinx, though, since the same registers hold both
floating-point and integer operands. Hence, the need for NaN boxing is
diminished.
Sign-extending 32-bit floating-point numbers when held in RV64 x
registers is compatible with the existing RV64 calling conventions, which leave bits 63-32 undefined when passing a 32-bit floating point value in x registers. To keep the architecture more regular, we extend this pattern to 16-bit floating-point numbers in both RV32 and RV64.
8.8.2 Zdinx
The Zdinx extension provides analogous double-precision floating-point instructions. The Zdinx extension depends upon the Zfinx extension.
The Zdinx extension adds all of the instructions that the D extension adds, except for the transfer instructions FLD, FSD, FMV.D.X, FMV.X.D, C.FLD[SP], and C.FSD[SP].
The Zdinx variants of these D-extension instructions have the same
semantics, except that whenever such an instruction would have accessed
an f register, it instead accesses the x register with the same
number.
8.8.3 Processing of Wider Values
Double-precision operands in RV32Zdinx are held in aligned x-register
pairs, i.e., register numbers must be even. Use of misaligned
(odd-numbered) registers for double-width floating-point operands is
reserved.
Regardless of endianness, the lower-numbered register holds the
low-order bits, and the higher-numbered register holds the high-order
bits: e.g., bits 31:0 of a double-precision operand in RV32Zdinx might
be held in register x14, with bits 63:32 of that operand held in
x15.
When a double-width floating-point result is written to x0, the entire
write takes no effect: e.g., for RV32Zdinx, writing a double-precision
result to x0 does not cause x1 to be written.
When x0 is used as a double-width floating-point operand, the entire
operand is zero—i.e., x1 is not accessed.
Load-pair and store-pair instructions are contained in a separate extension (see Section Extensions for Load/Store pair for RV32). In case this is not available, transferring double-precision operands in RV32Zdinx from or to memory requires two loads or stores. Register moves need only a single FSGNJ.D instruction, however.
8.8.4 Zhinx
The Zhinx extension provides analogous half-precision floating-point instructions. The Zhinx extension depends upon the Zfinx extension.
The Zhinx extension adds all of the instructions that the Zfh extension adds, except for the transfer instructions FLH, FSH, FMV.H.X, and FMV.X.H.
The Zhinx variants of these Zfh-extension instructions have the same
semantics, except that whenever such an instruction would have accessed
an f register, it instead accesses the x register with the same
number.
8.8.5 Zhinxmin
The Zhinxmin extension provides minimal support for 16-bit
half-precision floating-point instructions that operate on the x
registers. The Zhinxmin extension depends upon the Zfinx extension.
The Zhinxmin extension includes the following instructions from the Zhinx extension: FCVT.S.H and FCVT.H.S. If the Zdinx extension is present, the FCVT.D.H and FCVT.H.D instructions are also included.
In the future, an RV64Zqinx quad-precision extension could be defined analogously to RV32Zdinx. An RV32Zqinx extension could also be defined but would require quad-register groups.
8.8.6 Privileged Architecture Implications
As described in sec:mstatus, the
mstatus field FS is hardwired to 0 if the Zfinx extension is
implemented, and FS no longer affects the trapping behavior of
floating-point instructions or fcsr accesses.
The misa bits F, D, and Q are hardwired to 0 when the Zfinx extension
is implemented.
A future discoverability mechanism might be used to probe the existence of the Zfinx, Zhinx, and Zdinx extensions.