7 "Sv" Supervisor Virtual-Memory Extensions
This chapter is currently being restructured. Its contents are normative, but the presentation might appear disjoint.
7.1 "Svnapot" Extension for NAPOT Translation Contiguity, Version 1.0
In Sv39, Sv48, and Sv57, when a PTE has N=1, the PTE represents a translation that is part of a range of contiguous virtual-to-physical translations with the same values for PTE bits 5–0. Such ranges must be of a naturally aligned power-of-2 (NAPOT) granularity larger than the base page size.
The Svnapot extension depends on the Sv39 extension.
| i | pte.ppn[i] | Description | pte.napot_bits |
|---|---|---|---|
| ≥1 | x xxxx xxxx | Reserved | - |
NAPOT PTEs behave identically to non-NAPOT PTEs within the address-translation algorithm in sv32algorithm, except that:
- If the encoding in pte is valid according to ptenapot, then instead of returning the original value of pte, implicit reads of a NAPOT PTE return a copy of pte in which pte.ppn[i][pte.napot_bits-1:0] is replaced by vpn[i][pte.napot_bits-1:0]. If the encoding in pte is reserved according to ptenapot, then a page-fault exception must be raised.
- Implicit reads of NAPOT page table entries may create address-translation cache entries mapping a + j×PTESIZE to a copy of pte in which pte.ppn[i][pte.napot_bits-1:0] is replaced by vpn[i][pte.napot_bits-1:0], for any or all j such that j >> napot_bits = vpn[i] >> napot_bits, all for the address space identified in satp as loaded by step 1.
The motivation for a NAPOT PTE is that it can be cached in a TLB as one or more entries representing the contiguous region as if it were a single (large) page covered by a single translation. This compaction can help relieve TLB pressure in some scenarios. The encoding is designed to fit within the pre-existing Sv39, Sv48, and Sv57 PTE formats so as not to disrupt existing implementations or designs that choose not to implement the scheme. It is also designed so as not to complicate the definition of the address-translation algorithm.
The address translation cache abstraction captures the behavior that would result from the creation of a single TLB entry covering the entire NAPOT region. It is also designed to be consistent with implementations that support NAPOT PTEs by splitting the NAPOT region into TLB entries covering any smaller power-of-two region sizes. For example, a 64 KiB NAPOT PTE might trigger the creation of 16 standard 4 KiB TLB entries, all with contents generated from the NAPOT PTE (even if the PTEs for the other 4 KiB regions have different contents).
In typical usage scenarios, NAPOT PTEs in the same region will have the
same attributes, same PPNs, and same values for bits 5-0. RSW remains
reserved for supervisor software control. It is the responsibility of
the OS and/or hypervisor to configure the page tables in such a way that
there are no inconsistencies between NAPOT PTEs and other NAPOT or
non-NAPOT PTEs that overlap the same address range. If an update needs
to be made, the OS generally should first mark all of the PTEs invalid,
then issue SFENCE.VMA instruction(s) covering all 4 KiB regions within
the range (either via a single SFENCE.VMA with rs1=x0, or with
multiple SFENCE.VMA instructions with rs1≠x0), then update the PTE(s), as described in sfence.vma, unless any inconsistencies are known to be benign. If any inconsistencies do exist, then the effect is the same as when SFENCE.VMA
is used incorrectly: one of the translations will be chosen, but the
choice is unpredictable.
If an implementation chooses to use a NAPOT PTE (or cached version thereof), it might not consult the PTE directly specified by the algorithm in sv32algorithm at all. Therefore, the D and A bits may not be identical across all mappings of the same address range even in typical use cases The operating system must query all NAPOT aliases of a page to determine whether that page has been accessed and/or is dirty. If the OS manually sets the A and/or D bits for a page, it is recommended that the OS also set the A and/or D bits for other NAPOT aliases as appropriate in order to avoid unnecessary traps.
Just as with normal PTEs, TLBs are permitted to cache NAPOT PTEs whose V (Valid) bit is clear.
Depending on need, the NAPOT scheme may be extended to other intermediate page sizes and/or to other levels of the page table in the future. The encoding is designed to accommodate other NAPOT sizes should that need arise. For example:
__
| i | pte.ppn[i] | Description | pte.napot_bits |
|---|---|---|---|
| … | … | … | … |
In such a case, an implementation may or may not support all options. The discoverability mechanism for this extension would be extended to allow system software to determine which sizes are supported.
Other sizes may remain deliberately excluded, so that PPN bits not being used to indicate a valid NAPOT region size (e.g., the least-significant bit of pte.ppn[i]) may be repurposed for other uses in the future.
However, in case finer-grained intermediate page size support proves not to be useful, we have chosen to standardize only 64 KiB support as a first step.
If the hypervisor extension is also implemented, Svnapot is also supported in G-stage translation.
7.2 "Svpbmt" Extension for Page-Based Memory Types, Version 1.0
In Sv39, Sv48, and Sv57, bits 62-61 of a leaf page table entry indicate the use of page-based memory types that override the PMA(s) for the associated memory pages. The encoding for the PBMT bits is captured in pbmt.
The Svpbmt extension depends on the Sv39 extension.
| Mode | Value | Requested Memory Attributes |
|---|---|---|
| - | 3 | Reserved for future standard use |
Implementations may override additional PMAs not explicitly listed in pbmt. For example, to be consistent with the characteristics of a typical I/O region, a misaligned memory access to a page with PBMT=IO might raise an exception, even if the underlying region were main memory and the same access would have succeeded for PBMT=PMA.
Future extensions may provide more and/or finer-grained control over which PMAs can be overridden.
For non-leaf PTEs, bits 62-61 are reserved for future standard use. Until their use is defined by a standard extension, they must be cleared by software for forward compatibility, or else a page-fault exception is raised.
For leaf PTEs, setting bits 62-61 to the value 3 is reserved for future standard use. Until this value is defined by a standard extension, using this reserved value in a leaf PTE raises a page-fault exception.
When PBMT settings override a main memory page into I/O or vice versa, memory accesses to such pages obey the memory ordering rules of the final effective attribute, as follows.
If the underlying physical memory attribute for a page is I/O, and the page has PBMT=NC, then accesses to that page obey RVWMO. However, accesses to such pages are considered to be both I/O and main memory accesses for the purposes of FENCE, .aq, and .rl.
If the underlying physical memory attribute for a page is main memory, and the page has PBMT=IO, then accesses to that page obey strong channel 0 I/O ordering rules. However, accesses to such pages are considered to be both I/O and main memory accesses for the purposes of FENCE, .aq, and .rl.
A device driver written to rely on I/O strong ordering rules will not operate correctly if the address range is mapped with PBMT=NC. As such, this configuration is discouraged.
It will often still be useful to map physical I/O regions using PBMT=NC so that write combining and speculative accesses can be performed. Such optimizations will likely improve performance when applied with adequate care.
When Svpbmt is used with non-zero PBMT encodings, it is possible for multiple virtual aliases of the same physical page to exist simultaneously with different memory attributes. It is also possible for a U-mode or S-mode mapping through a PTE with Svpbmt enabled to observe different memory attributes for a given region of physical memory than a concurrent access to the same page performed by M-mode or when MODE=Bare. In such cases, the behaviors dictated by the attributes (including coherence, which is otherwise unaffected) may be violated.
Accessing the same location using different attributes that are both
non-cacheable (e.g., NC and IO)
does not cause loss of coherence, but
might result in weaker memory ordering than the stricter attribute
ordinarily guarantees. Executing a
fence iorw, iorw instruction
between such accesses suffices to prevent loss of memory ordering.
Accessing the same location using different cacheability attributes
may
cause loss of coherence. Executing the following sequence between such
accesses
prevents both loss of coherence and loss of memory ordering:
fence iorw, iorw, followed by cbo.flush to an address of that
location, followed by a fence iorw, iorw.
It follows that, if the same location might later be referenced using the original attributes, then this sequence must be repeated beforehand.
In certain cases, a weaker sequence might suffice to prevent loss of coherence. These situations will be detailed following the forthcoming formalization of the interaction of the RVWMO memory model with the instructions in the Zicbom extension.
When two-stage address translation is enabled within the H extension,
the page-based memory types are also applied in two stages. First,
if hgatp.MODE is not equal to zero, non-zero G-stage PTE PBMT bits
override the attributes in the PMA to produce an intermediate set of
attributes. Otherwise, the PMAs serve as the intermediate attributes.
Second,
if vsatp.MODE is not equal to zero, non-zero VS-stage PTE PBMT
bits override the intermediate attributes to produce the final set of
attributes used by accesses to the page in question. Otherwise, the
intermediate attributes are used as the final set of attributes.
These final attributes apply to implicit and explicit accesses that are subject to both stages of address translation. For accesses that are not subject to the first stage of address translation, e.g. VS-stage page-table accesses, the intermediate attributes apply instead.
7.3 "Svadu" Extension for Hardware Updating of A/D Bits, Version 1.0
The Svadu extension adds support and CSR controls for hardware updating of PTE A/D bits.
If the Svadu extension is implemented, the menvcfg.ADUE field is writable.
If the hypervisor extension is additionally implemented, the henvcfg.ADUE
field is also writable.
See sec:menvcfg and sec:henvcfg for the definitions of those fields.
translation defines the semantics of hardware updating of A/D bits. When hardware updating of A/D bits is disabled, the Svade extension, which mandates exceptions when A/D bits need be set, instead takes effect. The Svade extension is also defined in translation.
7.4 "Svinval" Extension for Fine-Grained Address-Translation Cache Invalidation, Version 1.0
The Svinval extension splits SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA instructions into finer-grained invalidation and ordering operations that can be more efficiently batched or pipelined on certain classes of high-performance implementation.
The SINVAL.VMA instruction invalidates any address-translation cache entries that an SFENCE.VMA instruction with the same values of rs1 and rs2 would invalidate. However, unlike SFENCE.VMA, SINVAL.VMA instructions are only ordered with respect to SFENCE.VMA, SFENCE.W.INVAL, and SFENCE.INVAL.IR instructions as defined below.
The SFENCE.W.INVAL instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before subsequent SINVAL.VMA instructions executed by the same hart. The SFENCE.INVAL.IR instruction guarantees that any previous SINVAL.VMA instructions executed by the current hart are ordered before subsequent implicit references by that hart to the memory-management data structures.
When executed in order (but not necessarily consecutively) by a single hart, the sequence SFENCE.W.INVAL, SINVAL.VMA, and SFENCE.INVAL.IR has the same effect as a hypothetical SFENCE.VMA instruction in which:
- the values of rs1 and rs2 for the SFENCE.VMA are the same as those used in the SINVAL.VMA,
- reads and writes prior to the SFENCE.W.INVAL are considered to be those prior to the SFENCE.VMA, and
- reads and writes following the SFENCE.INVAL.IR are considered to be those subsequent to the SFENCE.VMA.
If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs.
SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions
and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and
HFENCE.GVMA, respectively.
In particular, an attempt to execute any of
these instructions in U-mode always raises an illegal-instruction
exception.
An attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when
mstatus.TVM=1 also raises an illegal-instruction exception.
An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to
execute SINVAL.VMA in VU-mode, raises a virtual-instruction exception.
When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises
a virtual-instruction exception.
Attempting to execute SFENCE.W.INVAL or SFENCE.INVAL.IR in U-mode
raises an illegal-instruction exception.
Doing so in VU-mode raises a virtual-instruction exception.
SFENCE.W.INVAL and SFENCE.INVAL.IR are unaffected by the mstatus.TVM and
hstatus.VTVM fields and hence are always permitted in S-mode and VS-mode.
SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be
trapped when mstatus.TVM=1 or when hstatus.VTVM=1, as they only have
ordering effects but no visible side effects. Trapping of the SINVAL.VMA
instruction is sufficient to enable emulation of the intended overall
TLB maintenance functionality.
In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction.
High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed.
Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.
7.5 "Svvptc" Extension for Obviating Memory-Management Instructions after Marking PTEs Valid, Version 1.0
When the Svvptc extension is implemented, explicit stores by a hart that update the Valid bit of leaf and/or non-leaf PTEs from 0 to 1 and are visible to a hart will eventually become visible within a bounded timeframe to subsequent implicit accesses by that hart to such PTEs.
Svvptc relieves an operating system from executing certain memory-management
instructions, such as SFENCE.VMA or SINVAL.VMA, which would normally be used
to synchronize the hart’s address-translation caches when a memory-resident PTE
is changed from Invalid to Valid. Synchronizing the hart’s address-translation
caches with other forms of updates to a memory-resident PTE, including when a
PTE is changed from Valid to Invalid, requires the use of suitable
memory-management instructions. Svvptc guarantees that a change to a PTE from
Invalid to Valid is made visible within a bounded time, thereby making the
execution of these memory-management instructions redundant. The performance
benefit of eliding these instructions outweighs the cost of an occasional
gratuitous additional page fault that may occur.
Depending on the microarchitecture, some possible ways to facilitate implementation of Svvptc include: not having any address-translation caches, not storing Invalid PTEs in the address-translation caches, automatically evicting Invalid PTEs using a bounded timer, or making address-translation caches coherent with store instructions that modify PTEs.
7.6 "Svrsw60t59b" Extension for PTE Reserved-for-Software Bits 60-59, Version 1.0
If the Svrsw60t59b extension is implemented, then bits 60-59 of the page table entries (PTEs) are reserved for use by supervisor software and are ignored by the implementation.
If the Hypervisor (H) extension is also implemented, then bits 60-59 of the G-stage PTEs are reserved for use by supervisor software and are ignored by the implementation.
The Svrsw60t59b extension depends on Sv39.
Operating systems frequently use reserved bits within PTEs to store metadata for advanced memory management features. Embedding these metadata bits directly within the PTEs allows for fast access with minimal overhead, avoiding costly lookups in auxiliary data structures. By default, Sv39 and Sv39x4 require a page fault and a guest-page fault exception, respectively, to be raised if bits 60–59 are not zero.