3 RISC-V Memory Models
This chapter is currently being restructured. Its contents are normative, but the presentation might appear disjoint.
This chapter describes the two RISC-V memory consistency models:
RVWMO, the base weakly ordered model, and RVTSO,
a more strongly ordered model enabled via the Ztso standard extension.
Additional explanatory material for both models can be found in Section B.1.
3.1 RVWMO Memory Consistency Model
This section defines the base RISC-V memory consistency model. A memory
consistency model is a set of rules specifying the values that can be
returned by loads of memory. RISC-V uses a memory model called RISC-V
Weak Memory Ordering (RVWMO) which is designed to provide flexibility
for architects to build high-performance scalable designs while
simultaneously supporting a tractable programming model.
Under RVWMO, code running on a single hart appears to execute in order
from the perspective of other memory instructions in the same hart, but
memory instructions from another hart may observe the memory
instructions from the first hart being executed in a different order.
Therefore, multithreaded code may require explicit synchronization to
guarantee ordering between memory instructions from different harts. The
base RISC-V ISA provides a FENCE instruction for this purpose, described
in Section 2.1.7, while the Zalrsc and Zaamo extensions additionally
define load-reserved/store-conditional and atomic read-modify-write instructions.
The standard ISA extension for total store ordering Ztso augments
RVWMO with additional rules specific to those extensions.
The appendices to this specification provide both axiomatic and operational formalizations of the memory consistency model as well as additional explanatory material.
This chapter defines the memory model for regular main memory
operations. The interaction of the memory model with I/O memory,
instruction fetches (Ziccif), FENCE.I, page-table walks, and SFENCE.VMA is not
(yet) formalized. Some or all of the above may be formalized in a future
revision of this specification.
Memory consistency models supporting overlapping memory accesses of different widths simultaneously remain an active area of academic research and are not yet fully understood. The specifics of how memory accesses of different sizes interact under RVWMO are specified to the best of our current abilities, but they are subject to revision should new issues be uncovered.
3.1.1 Definition of the RVWMO Memory Model
The RVWMO memory model is defined in terms of the global memory order, a total ordering of the memory operations produced by all harts. In general, a multithreaded program has many different possible executions, with each execution having its own corresponding global memory order.
The global memory order is defined over the primitive load and store operations generated by memory instructions. It is then subject to the constraints defined in the rest of this chapter. Any execution satisfying all of the memory model constraints is a legal execution (as far as the memory model is concerned).
3.1.1.1 Memory Model Primitives
The program order over memory operations reflects the order in which the instructions that generate each load and store are logically laid out in that hart’s dynamic instruction stream; i.e., the order in which a simple in-order processor would execute the instructions of that hart.
Memory-accessing instructions give rise to memory operations. A memory operation can be either a load operation, a store operation, or both simultaneously. All memory operations are single-copy atomic: they can never be observed in a partially complete state. Each aligned memory instruction that accesses XLEN or fewer bits gives rise to exactly one memory operation, unless specified otherwise. An aligned AMO gives rise to a single memory operation that is both a load operation and a store operation simultaneously.
Among instructions in RV32GC and RV64GC, the following are exceptions to the rule that an aligned memory instruction gives rise to exactly one memory operation:
- An unsuccessful SC instruction does not give rise to any memory operations.
- Floating-point load and store instructions that access more than XLEN bits (e.g., FLD and FSD in RV32) may each give rise to multiple memory operations.
ISA extensions such as the V extension may
give rise to multiple memory operations.
However, the memory model for these extensions has not yet been formalized.
A misaligned load or store instruction may be decomposed
into a set of component memory operations of any granularity.
A floating-point load or store of more than XLEN bits may
also be decomposed into a set of component memory operations of any granularity.
The memory operations generated by such instructions
are not ordered with respect to each other in program order, but they are ordered normally with
respect to the memory operations generated by preceding and subsequent
instructions in program order.
The Zaamo extension does not require execution environments to support
misaligned atomic instructions at all.
However, if misaligned atomics are supported via the
misaligned atomicity granule PMA, then AMOs within an atomicity granule are not decomposed, nor are
loads and stores defined in the base ISAs, nor are loads and stores of no more
than XLEN bits defined in the F, D, and Q extensions.
The decomposition of misaligned memory operations down to byte granularity facilitates emulation on implementations that do not natively support misaligned accesses. Such implementations might, for example, simply iterate over the bytes of a misaligned access one by one.
An LR instruction and an SC instruction are said to be paired if the
LR precedes the SC in program order and if there are no other LR or
SC instructions in between; the corresponding memory operations are said to
be paired as well (except in case of a failed SC, where no store
operation is generated). The complete list of conditions determining
whether an SC must succeed, may succeed, or must fail is defined in
Zalrsc.
Load and store operations may also carry one or more ordering annotations from the following set: acquire-RCpc, acquire-RCsc, release-RCpc, and release-RCsc. An AMO or LR instruction with aq set has an acquire-RCsc annotation. An AMO or SC instruction with rl set has a release-RCsc annotation. An AMO, LR, or SC instruction with both aq and rl set has both acquire-RCsc and release-RCsc annotations.
For convenience, we use the term acquire annotation to refer to an acquire-RCpc annotation or an acquire-RCsc annotation. Likewise, a release annotation refers to a release-RCpc annotation or a release-RCsc annotation. An RCpc annotation refers to an acquire-RCpc annotation or a release-RCpc annotation. An RCsc annotation refers to an acquire-RCsc annotation or a release-RCsc annotation.
In the memory model literature, the term RCpc stands for release consistency with processor-consistent synchronization operations, and the term RCsc stands for release consistency with sequentially consistent synchronization operations.
While there are many different definitions for acquire and release annotations in the literature, in the context of RVWMO these terms are concisely and completely defined by Preserved Program Order rules 5-7.
RCpc annotations are currently only used when implicitly assigned to every
memory access per the standard extension Ztso. Furthermore, although
the ISA does not currently contain native RCpc load-acquire or store-release
instructions, the RVWMO model itself is designed to be forwards-compatible with
the potential addition of them into the ISA in a future extension.
3.1.1.2 Syntactic Dependencies
The definition of the RVWMO memory model depends in part on the notion of a syntactic dependency, defined as follows.
In the context of defining dependencies, a register refers either to an entire general-purpose register, some portion of a CSR, or an entire CSR. The granularity at which dependencies are tracked through CSRs is specific to each CSR and is defined in Section 3.1.2.
Syntactic dependencies are defined in terms of instructions' source registers, instructions' destination registers, and the way instructions carry a dependency from their source registers to their destination registers. This section provides a general definition of all of these terms; however, Section 3.1.3 provides a complete listing of the specifics for each instruction.
In general, a register r other than x0 is a source
register for an instruction i if any of the following
hold:
- In the opcode of i, rs1, rs2, or rs3 is set to r
- i is a CSR instruction, and in the opcode of
i, csr is set to r, unless i
is CSRRW or CSRRWI and rd is set to
x0 - r is a CSR and an implicit source register for i, as defined in Section 3.1.3
- r is a CSR that aliases with another source register for i
Memory instructions also further specify which source registers are address source registers and which are data source registers.
In general, a register r other than x0 is a destination
register for an instruction i if any of the following
hold:
- In the opcode of i, rd is set to r
- i is a CSR instruction, and in the opcode of
i, csr is set to r, unless i
is CSRRS or CSRRC and rs1 is set to
x0or i is CSRRSI or CSRRCI and uimm[4:0] is set to zero. - r is a CSR and an implicit destination register for i, as defined in Section 3.1.3
- r is a CSR that aliases with another destination register for i
Most non-memory instructions carry a dependency from each of their source registers to each of their destination registers. However, there are exceptions to this rule; see Section 3.1.3.
Instruction j has a syntactic dependency on instruction i via destination register s of i and source register r of j if either of the following hold:
- s is the same as r, and no instruction program-ordered between i and j has r as a destination register
- There is an instruction m program-ordered between i and j such that all of the following hold:
- j has a syntactic dependency on m via destination register q and source register r
- m has a syntactic dependency on i via destination register s and source register p
- m carries a dependency from p to q
Finally, in the definitions that follow, let a and b be two memory operations, and let i and j be the instructions that generate a and b, respectively.
b has a syntactic address dependency on a if r is an address source register for j and j has a syntactic dependency on i via source register r
b has a syntactic data dependency on a if b is a store operation, r is a data source register for j, and j has a syntactic dependency on i via source register r
b has a syntactic control dependency on a if there is an instruction m program-ordered between i and j such that m is a branch or indirect jump and m has a syntactic dependency on i.
Generally speaking, non-AMO load instructions do not have data source registers, and unconditional non-AMO store instructions do not have destination registers. However, a successful SC instruction is considered to have the register specified in rd as a destination register, and hence it is possible for an instruction to have a syntactic dependency on a successful SC instruction that precedes it in program order.
3.1.1.3 Preserved Program Order
The global memory order for any given execution of a program respects some but not all of each hart’s program order. The subset of program order that must be respected by the global memory order is known as preserved program order.
The complete definition of preserved program order is as follows (and note that AMOs are simultaneously both loads and stores): memory operation a precedes memory operation b in preserved program order (and hence also in the global memory order) if a precedes b in program order, a and b both access regular main memory (rather than I/O regions), and any of the following hold:
- Overlapping-Address Orderings:
- b is a store, and a and b access overlapping memory addresses
- a and b are loads, x is a byte read by both a and b, there is no store to x between a and b in program order, and a and b return values for x written by different memory operations
- a is generated by an AMO or SC instruction, b is a load, and b returns a value written by a
- Explicit Synchronization:
- There is a FENCE instruction that orders a before b
- a has an acquire annotation
- b has a release annotation
- a and b both have RCsc annotations
- a is paired with b
- Syntactic Dependencies:
- b has a syntactic address dependency on a
- b has a syntactic data dependency on a
- b is a store, and b has a syntactic control dependency on a
- Pipeline Dependencies:
- b is a load, and there exists some store m between a and b in program order such that m has an address or data dependency on a, and b returns a value written by m
- b is a store, and there exists some instruction m between a and b in program order such that m has an address dependency on a
3.1.1.4 Memory Model Axioms
An execution of a RISC-V program obeys the RVWMO memory consistency model only if there exists a global memory order conforming to preserved program order and satisfying the load value axiom, the atomicity axiom, and the progress axiom.
3.1.1.4.1 Load Value Axiom
Each byte of each load i returns the value written to that byte by the store that is the latest in global memory order among the following stores:
- Stores that write that byte and that precede i in the global memory order
- Stores that write that byte and that precede i in program order
3.1.1.4.2 Atomicity Axiom
If r and w are paired load and store operations generated by aligned LR and SC instructions in a hart h, s is a store to byte x, and r returns a value written by s, then s must precede w in the global memory order, and there can be no store from a hart other than h to byte x following s and preceding w in the global memory order.
The Atomicity Axiom theoretically supports LR/SC pairs of different widths and to mismatched addresses, since implementations are permitted to allow SC operations to succeed in such cases. However, in practice, we expect such patterns to be rare, and their use is discouraged.
3.1.1.4.3 Progress Axiom
No memory operation may be preceded in the global memory order by an infinite sequence of other memory operations.
3.1.2 CSR Dependency Tracking Granularity
Table 6. Granularities at which syntactic dependencies are tracked through CSRs
| Name | Portions Tracked as Independent Units | Aliases |
|---|---|---|
fflags | Bits 4, 3, 2, 1, 0 | fcsr |
frm | entire CSR | fcsr |
fcsr | Bits 7-5, 4, 3, 2, 1, 0 | fflags, frm |
Note: read-only CSRs are not listed, as they do not participate in the definition of syntactic dependencies.
3.1.3 Source and Destination Register Listings
This section provides a concrete listing of the source and destination registers for each instruction. These listings are used in the definition of syntactic dependencies in Section 3.1.1.2.
The term accumulating CSR is used to describe a CSR that is both a source and a destination register, but which carries a dependency only from itself to itself.
Instructions carry a dependency from each source register in the Source Registers column to each destination register in the Destination Registers column, from each source register in the Source Registers column to each CSR in the Accumulating CSRs column, and from each CSR in the Accumulating CSRs column to itself, except where annotated otherwise.
Key:
- AAddress source register
- DData source register
- † The instruction does not carry a dependency from any source register to any destination register
- ‡ The instruction carries dependencies from source register(s) to destination register(s) as specified
Table 7. RV32I Base Integer Instruction Set
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| LUI | rd | |||
| AUIPC | rd | |||
| JAL | rd | |||
| JALR† | rs1 | rd | ||
| BEQ | rs1, rs2 | |||
| BNE | rs1, rs2 | |||
| BLT | rs1, rs2 | |||
| BGE | rs1, rs2 | |||
| BLTU | rs1, rs2 | |||
| BGEU | rs1, rs2 | |||
| LB† | rs1 A | rd | ||
| LH† | rs1 A | rd | ||
| LW† | rs1 A | rd | ||
| LBU† | rs1 A | rd | ||
| LHU† | rs1 A | rd | ||
| SB | rs1 A, rs2 D | |||
| SH | rs1 A, rs2 D | |||
| SW | rs1 A, rs2 D | |||
| ADDI | rs1 | rd | ||
| SLTI | rs1 | rd | ||
| SLTIU | rs1 | rd | ||
| XORI | rs1 | rd | ||
| ORI | rs1 | rd | ||
| ANDI | rs1 | rd | ||
| SLLI | rs1 | rd | ||
| SRLI | rs1 | rd | ||
| SRAI | rs1 | rd | ||
| ADD | rs1, rs2 | rd | ||
| SUB | rs1, rs2 | rd | ||
| SLL | rs1, rs2 | rd | ||
| SLT | rs1, rs2 | rd | ||
| SLTU | rs1, rs2 | rd | ||
| XOR | rs1, rs2 | rd | ||
| SRL | rs1, rs2 | rd | ||
| SRA | rs1, rs2 | rd | ||
| OR | rs1, rs2 | rd | ||
| AND | rs1, rs2 | rd | ||
| FENCE | ||||
| FENCE.I | ||||
| ECALL | ||||
| EBREAK | ||||
| CSRRW‡ | rs1, csr* | rd, csr | *unless rd=x0 | |
| ‡ carries a dependency from rs1 to csr and from csr to rd | ||||
| CSRRS‡ | rs1, csr | rd, csr* | *unless rs1=x0 | |
| CSRRC‡ | rs1, csr | rd, csr* | *unless rs1=x0 | |
| ‡ carries a dependency from csr and rs1 to csr and from csr to rd | ||||
| CSRRWI‡ | csr * | rd, csr | *unless rd=x0 | |
| ‡ carries a dependency from csr to rd | ||||
| CSRRSI‡ | csr | rd, csr* | *unless uimm[4:0]=0 | |
| CSRRCI‡ | csr | rd, csr* | *unless uimm[4:0]=0 | |
| ‡ carries a dependency from csr to rd and csr | ||||
Table 8. RV64I Base Integer Instruction Set
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| LWU† | rs1 A | rd | ||
| LD† | rs1 A | rd | ||
| SD | rs1 A, rs2 D | |||
| SLLI | rs1 | rd | ||
| SRLI | rs1 | rd | ||
| SRAI | rs1 | rd | ||
| ADDIW | rs1 | rd | ||
| SLLIW | rs1 | rd | ||
| SRLIW | rs1 | rd | ||
| SRAIW | rs1 | rd | ||
| ADDW | rs1, rs2 | rd | ||
| SUBW | rs1, rs2 | rd | ||
| SLLW | rs1, rs2 | rd | ||
| SRLW | rs1, rs2 | rd | ||
| SRAW | rs1, rs2 | rd |
Table 9. RV32M Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| MUL | rs1, rs2 | rd | ||
| MULH | rs1, rs2 | rd | ||
| MULHSU | rs1, rs2 | rd | ||
| MULHU | rs1, rs2 | rd | ||
| DIV | rs1, rs2 | rd | ||
| DIVU | rs1, rs2 | rd | ||
| REM | rs1, rs2 | rd | ||
| REMU | rs1, rs2 | rd |
Table 10. RV64M Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| MULW | rs1, rs2 | rd | ||
| DIVW | rs1, rs2 | rd | ||
| DIVUW | rs1, rs2 | rd | ||
| REMW | rs1, rs2 | rd | ||
| REMUW | rs1, rs2 | rd |
Table 11. RV32A Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| LR.W† | rs1 A | rd | ||
| SC.W† | rs1 A, rs2 D | rd * | * if successful | |
| AMOSWAP.W† | rs1 A, rs2 D | rd | ||
| AMOADD.W† | rs1 A, rs2 D | rd | ||
| AMOXOR.W† | rs1 A, rs2 D | rd | ||
| AMOAND.W† | rs1 A, rs2 D | rd | ||
| AMOOR.W† | rs1 A, rs2D | rd | ||
| AMOMIN.W† | rs1 A, rs2 D | rd | ||
| AMOMAX.W† | rs1 A, rs2 D | rd | ||
| AMOMINU.W† | rs1 A, rs2 D | rd | ||
| AMOMAXU.W† | rs1 A, rs2 D | rd |
Table 12. RV64A Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| LR.D† | rs1 A | rd | ||
| SC.D† | rs1 A, rs2 D | rd * | *if successful | |
| AMOSWAP.D† | rs1 A, rs2 D | rd | ||
| AMOADD.D† | rs1 A, rs2 D | rd | ||
| AMOXOR.D† | rs1 A, rs2 D | rd | ||
| AMOAND.D† | rs1 A, rs2D | rd | ||
| AMOOR.D† | rs1 A, rs2D | rd | ||
| AMOMIN.D† | rs1 A, rs2D | rd | ||
| AMOMAX.D† | rs1 A, rs2D | rd | ||
| AMOMINU.D† | rs1 A, rs2D | rd | ||
| AMOMAXU.D† | rs1 A, rs2D | rd |
Table 13. RV32F Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| FLW† | rs1 A | rd | ||
| FSW | rs1 A, rs2D | |||
| FMADD.S | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FMSUB.S | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FNMSUB.S | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FNMADD.S | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FADD.S | rs1, rs2, frm* | rd | NV, OF, NX | *if rm=111 |
| FSUB.S | rs1, rs2, frm* | rd | NV, OF, NX | *if rm=111 |
| FMUL.S | rs1, rs2, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FDIV.S | rs1, rs2, frm* | rd | NV, DZ, OF, UF, NX | *if rm=111 |
| FSQRT.S | rs1, frm* | rd | NV, NX | *if rm=111 |
| FSGNJ.S | rs1, rs2 | rd | ||
| FSGNJN.S | rs1, rs2 | rd | ||
| FSGNJX.S | rs1, rs2 | rd | ||
| FMIN.S | rs1, rs2 | rd | NV | |
| FMAX.S | rs1, rs2 | rd | NV | |
| FCVT.W.S | rs1, frm* | rd | NV, NX | *if rm=111 |
| FCVT.WU.S | rs1, frm* | rd | NV, NX | *if rm=111 |
| FMV.X.W | rs1 | rd | ||
| FEQ.S | rs1, rs2 | rd | NV | |
| FLT.S | rs1, rs2 | rd | NV | |
| FLE.S | rs1, rs2 | rd | NV | |
| FCLASS.S | rs1 | rd | ||
| FCVT.S.W | rs1, frm* | rd | NX | *if rm=111 |
| FCVT.S.WU | rs1, frm* | rd | NX | *if rm=111 |
| FMV.W.X | rs1 | rd |
Table 14. RV64F Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| FCVT.L.S | rs1, frm* | rd | NV, NX | *if rm=111 |
| FCVT.LU.S | rs1, frm* | rd | NV, NX | *if rm=111 |
| FCVT.S.L | rs1, frm* | rd | NX | *if rm=111 |
| FCVT.S.LU | rs1, frm* | rd | NX | *if rm=111 |
Table 15. RV32D Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| FLD† | rs1 A | rd | ||
| FSD | rs1 A, rs2D | |||
| FMADD.D | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FMSUB.D | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FNMSUB.D | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FNMADD.D | rs1, rs2, rs3, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FADD.D | rs1, rs2, frm* | rd | NV, OF, NX | *if rm=111 |
| FSUB.D | rs1, rs2, frm* | rd | NV, OF, NX | *if rm=111 |
| FMUL.D | rs1, rs2, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FDIV.D | rs1, rs2, frm* | rd | NV, DZ, OF, UF, NX | *if rm=111 |
| FSQRT.D | rs1, frm* | rd | NV, NX | *if rm=111 |
| FSGNJ.D | rs1, rs2 | rd | ||
| FSGNJN.D | rs1, rs2 | rd | ||
| FSGNJX.D | rs1, rs2 | rd | ||
| FMIN.D | rs1, rs2 | rd | NV | |
| FMAX.D | rs1, rs2 | rd | NV | |
| FCVT.S.D | rs1, frm* | rd | NV, OF, UF, NX | *if rm=111 |
| FCVT.D.S | rs1 | rd | NV | |
| FEQ.D | rs1, rs2 | rd | NV | |
| FLT.D | rs1, rs2 | rd | NV | |
| FLE.D | rs1, rs2 | rd | NV | |
| FCLASS.D | rs1 | rd | ||
| FCVT.W.D | rs1, frm* | rd | NV, NX | *if rm=111 |
| FCVT.WU.D | rs1, frm* | rd | NV, NX | *if rm=111 |
| FCVT.D.W | rs1 | rd | ||
| FCVT.D.WU | rs1 | rd |
Table 16. RV64D Standard Extension
| Source Registers | Destination Registers | Accumulating CSRs | ||
|---|---|---|---|---|
| FCVT.L.D | rs1, frm* | rd | NV, NX | *if rm=111 |
| FCVT.LU.D | rs1, frm* | rd | NV, NX | *if rm=111 |
| FMV.X.D | rs1 | rd | ||
| FCVT.D.L | rs1, frm* | rd | NX | *if rm=111 |
| FCVT.D.LU | rs1, frm* | rd | NX | *if rm=111 |
| FMV.D.X | rs1 | rd |
3.2 Ztso Extension for Total Store Ordering
This section defines the Ztso extension for the RISC-V Total Store
Ordering (RVTSO) memory consistency model. RVTSO is defined as a delta
from RVWMO.
The Ztso extension is meant to facilitate the porting of code originally
written for the x86 or SPARC architectures, both of which use TSO by
default. It also supports implementations which inherently provide RVTSO
behavior and want to expose that fact to software.
RVTSO makes the following adjustments to RVWMO:
- All load operations behave as if they have an acquire-RCpc annotation
- All store operations behave as if they have a release-RCpc annotation.
- All AMOs behave as if they have both acquire-RCsc and release-RCsc annotations.
These rules render all PPO rules except 4-7 redundant. They also make redundant any non-I/O fences that do not have both PW and SR set. Finally, they also imply that no memory operation will be reordered past an AMO in either direction.
In the context of RVTSO, as is the case for RVWMO, the storage ordering annotations are concisely and completely defined by PPO rules 5-7. In both of these memory models, it is the Section 3.1.1.4.1 that allows a hart to forward a value from its store buffer to a subsequent (in program order) load—that is to say that stores can be forwarded locally before they are visible to other harts.
Additionally, if the Ztso extension is implemented, then vector memory
instructions in the V extension and Zve* family of extensions follow RVTSO at
the instruction level.
The Ztso extension does not strengthen the ordering of intra-instruction
element accesses.
In spite of the fact that Ztso adds no new instructions to the ISA, code
written assuming RVTSO will not run correctly on implementations not
supporting Ztso. Binaries compiled to run only under Ztso should
indicate as such via a flag in the binary, so that platforms which do
not implement Ztso can simply refuse to run them.