Skip to main content

6 Behavior

CTR records qualified control transfers. Control transfers are qualified if they meet the following criteria:

  • The current privilege mode is enabled
  • The transfer type is not inhibited
  • sctrstatus.FROZEN is not set
  • The transfer completes/retires

Such qualified transfers update the Entry Registers at logical entry 0. As a result, older entries are pushed down the stack; the record previously in logical entry 0 moves to logical entry 1, the record in logical entry 1 moves to logical entry 2, and so on. If the CTR buffer is full, the oldest recorded entry (previously at entry depth-1) is lost.

Recorded transfers will set the ctrsource.V bit to 1, and will update all implemented record fields.

note

In order to collect accurate and representative performance profiles while using CTR, it is recommended that hardware recording of control transfers incurs no added performance overhead, e.g., in the form of retirement or instruction execution restrictions that are not present when CTR is not active.

6.1 Privilege Mode Transitions

Transfers that change the privilege mode are a special case. What is recorded, if anything, depends on whether the source privilege mode and/or target privilege mode are enabled for recording, and on the transfer type (trap or trap return).

Traps between enabled privilege modes are recorded as normal. Traps from a disabled privilege mode to an enabled privilege mode are partially recorded, such that the ctrsource.PC is 0. Traps from an enabled mode to a disabled mode, known as external traps, are not recorded by default. See Section 6.1.2 for how they can be recorded.

Trap returns have similar treatment. Trap returns between enabled privilege modes are recorded as normal. Trap returns from an enabled mode back to a disabled mode are partially recorded, such that ctrtarget.PC is 0. Trap returns from a disabled mode to an enabled mode are not recorded.

note

If privileged software is configuring CTR on behalf of less privileged software, it should ensure that its privilege mode enable bit (e.g., sctrctl.S for Supervisor software) is cleared before a trap return to the less privileged mode. Otherwise the trap return will be recorded, leaking the privileged source pc.

Recording in Debug Mode is always inhibited. Transfers into and out of Debug Mode are never recorded.

The table below provides details on recording of privilege mode transitions. Standard dependencies on FROZEN and transfer type inhibits also apply, but are not covered by the table.

Table 7. Trap and Trap Return Recording

+-------------------+-----------------+--------------------------------+------------------------------------------------------------------------------------+ | Transfer Type | Source Mode | Target Mode |

    • +--------------------------------+------------------------------------------------------------------------------------+ | | | Enabled | Disabled | +-------------------+-----------------+--------------------------------+------------------------------------------------------------------------------------+ | Trap | Enabled | Recorded. | External trap. Not recorded by default, but see Section 6.1.2. |
  • +-----------------+--------------------------------+------------------------------------------------------------------------------------+ | | Disabled | Recorded, ctrsource.PC is 0. | Not recorded. | +-------------------+-----------------+--------------------------------+------------------------------------------------------------------------------------+ | Trap Return | Enabled | Recorded. | Recorded, ctrtarget.PC is 0. |

  • +-----------------+--------------------------------+------------------------------------------------------------------------------------+ | | Disabled | Not recorded. | Not recorded. | +-------------------+-----------------+--------------------------------+------------------------------------------------------------------------------------+

    6.1.1 Virtualization Mode Transitions

    Transitions between VS/VU-mode and M/HS-mode are unique in that they effect a change in the active CTR control register, and hence the CTR configuration. What is recorded, if anything, on these virtualization mode transitions depends upon fields from both [ms]ctrctl and vsctrctl.

  • mctrctl.M, sctrctl.S, and vsctrctl.{S,U} are used to determine whether the source and target modes are enabled;
  • mctrctl.MTE, sctrctl.STE, and vsctrctl.STE are used to determine whether an external trap is recorded (see Section 6.1.2);
  • sctrctl.LCOFIFRZ and sctrctl.BPFRZ determine whether CTR becomes frozen (see Section 6.5)
  • For all other _x_ctrctl fields, the value in vsctrctl is used.
note

Consider an exception that traps from VU-mode to HS-mode, with vsctrctl.U=1 and sctrctl.S=1. Because both the source mode and target mode are enabled for recording, whether the trap is recorded then depends on the CTR configuration (e.g., the transfer type filter bits) in vsctrctl, not in sctrctl.

6.1.2 External Traps

External traps are traps from a privilege mode enabled for CTR recording to a privilege mode that is not enabled for CTR recording. By default external traps are not recorded, but privileged software running in the target mode of the trap can opt-in to allowing CTR to record external traps into that mode. The _x_ctrctl._x_TE bits allow M-mode, S-mode, and VS-mode to opt-in separately.

External trap recording depends not only on the target mode, but on any intervening modes, which are modes that are more privileged than the source mode but less privileged than the target mode. Not only must the external trap enable bit for the target mode be set, but the external trap enable bit(s) for any intervening modes must also be set. See the table below for details.

note

Requiring intervening modes to be enabled for external traps simplifies software management of CTR. Consider a scenario where S-mode software is configuring CTR for U-mode contexts A and B, such that external traps (to any mode) are enabled for A but not for B. When switching between the two contexts, S-mode can simply toggle sctrctl.STE, rather than requiring a trap to M-mode to additionally toggle mctrctl.MTE.

This method does not provide the flexibility to record external traps to a more privileged mode but not to all intervening mode(s). Because it is expected that profiling tools generally wish to observe all external traps or none, this is not considered a meaningful limitation.

Table 8. External Trap Enable Requirements

Source ModeTarget ModeExternal Trap Enable(s) Required
U-modeS-modesctrctl.STE
  • +-------------+----------------------------------------------+ | | M-mode | mctrctl.MTE, sctrctl.STE |

    S-modeM-modemctrctl.MTE
    VU-modeVS-modevsctrctl.STE
  • +-------------+----------------------------------------------+ | | HS-mode | sctrctl.STE, vsctrctl.STE |

  • +-------------+----------------------------------------------+ | | M-mode | mctrctl.MTE, sctrctl.STE, vsctrctl.STE | +-------------+-------------+----------------------------------------------+ | VS-mode | HS-mode | sctrctl.STE |

  • +-------------+----------------------------------------------+ | | M-mode | mctrctl.MTE, sctrctl.STE | +-------------+-------------+----------------------------------------------+

    In records for external traps, the ctrtarget.PC is 0.

    note

    No mechanism exists for recording external trap returns, because the external trap record includes all relevant information, and gives the trap handler (e.g., an emulator) the opportunity to modify the record.

    :::

    note

    Note that external trap recording does not depend on EXCINH/INTRINH. Thus, when external traps are enabled, both external interrupts and external exceptions are recorded.

    STE allows recording of traps from U-mode to S-mode as well as from VS/VU-mode to HS-mode. The hypervisor can flip sctrctl.STE before entering a guest if it wants different behavior for U-to-S vs VS/VU-to-HS.

    :::

    If external trap recording is implemented, mctrctl.MTE and sctrctl.STE must be implemented, while vsctrctl.STE must be implemented if the H extension is implemented.

    6.2 Transfer Type Filtering

    Default CTR behavior, when all transfer type filter bits (_x_ctrctl[47:32]) are unimplemented or 0, is to record all control transfers within enabled privileged modes. By setting transfer type filter bits, software can opt out of recording select transfer types, or opt into recording non-default operations. All transfer type filter bits are optional.

    note

    Because not-taken branches are not recorded by default, the polarity of the associated enable bit (NTBREN) is the opposite of other bits associated with transfer type filtering (TKBRINH, RETINH, etc). Non-default operations require opt-in rather than opt-out.

    :::

    The transfer type filter bits leverage the type definitions specified in the RISC-V Efficient Trace Spec v2.0 (Table 4.4 and Section 4.1.1). For completeness, the definitions are reproduced below.

    note

    Here "indirect" is used interchangeably with "uninferrable", which is used in the trace spec. Both imply that the target of the jump is not encoded in the opcode.

    :::

    Table 9. Control Transfer Type Definitions

    | Encoding | Transfer Type Name | | --- | --- | | 0 | Not used by CTR | | 1 | Exception | | 2 | Interrupt | | 3 | Trap return | | 4 | Not-taken branch | | 5 | Taken branch | | 6 | reserved | | 7 | reserved | | 8 | Indirect call | | 9 | Direct call | | 10 | Indirect jump (without linkage) | | 11 | Direct jump (without linkage) | | 12 | Co-routine swap | | 13 | Function return | | 14 | Other indirect jump (with linkage) | | 15 | Other direct jump (with linkage) |

    Encodings 8 through 15 refer to various encodings of jump instructions. The types are distinguished as described below.

    Table 10. Control Transfer Type Definitions

    Transfer Type NameAssociated Opcodes
    Indirect callJALR x1, rs where rs != x5
  • +--------------------------------------------------------------------------------+ | | JALR x5, rs where rs != x1 |

  • +--------------------------------------------------------------------------------+ | | C.JALR rs1 where rs1 != x5 | +------------------------------------+--------------------------------------------------------------------------------+ | Direct call | JAL x1 |

  • +--------------------------------------------------------------------------------+ | | JAL x5 |

  • +--------------------------------------------------------------------------------+ | | C.JAL |

  • +--------------------------------------------------------------------------------+ | | CM.JALT index | +------------------------------------+--------------------------------------------------------------------------------+ | Indirect jump (without linkage) | JALR x0, rs where rs != (x1 or x5) |

  • +--------------------------------------------------------------------------------+ | | C.JR rs1 where rs1 != (x1 or x5) | +------------------------------------+--------------------------------------------------------------------------------+ | Direct jump (without linkage) | JAL x0 |

  • +--------------------------------------------------------------------------------+ | | C.J |

  • +--------------------------------------------------------------------------------+ | | CM.JT index | +------------------------------------+--------------------------------------------------------------------------------+ | Co-routine swap | JALR x1, x5 |

  • +--------------------------------------------------------------------------------+ | | JALR x5, x1 |

  • +--------------------------------------------------------------------------------+ | | C.JALR x5 | +------------------------------------+--------------------------------------------------------------------------------+ | Function return | JALR rd, rs where rs == (x1 or x5) and rd != (x1 or x5) |

  • +--------------------------------------------------------------------------------+ | | C.JR rs1 where rs1 == (x1 or x5) |

  • +--------------------------------------------------------------------------------+ | | CM.POPRET(Z) |

    Other indirect jump (with linkage)JALR rd, rs where rs != (x1 or x5) and rd != (x0, x1, or x5)
    Other direct jump (with linkage)JAL rd where rd != (x0, x1, or x5)
    note

    If implementation of any transfer type filter bit results in reduced software performance, perhaps due to additional retirement restrictions, it is strongly recommended that this reduced performance apply only when the bit is set. Alternatively, support for the bit may be omitted. Maintaining software performance for the default CTR configuration, when all transfer type bits are cleared, is recommended.

    :::

    6.3 Cycle Counting

    The ctrdata register may optionally include a count of CPU cycles elapsed since the prior CTR record. The elapsed cycle count value is represented by the CC field, which has a 12-bit mantissa component (Cycle Count Mantissa, or CCM) and a 4-bit exponent component (Cycle Count Exponent, or CCE).

    The elapsed cycle counter (CtrCycleCounter) increments at the same rate as the mcycle counter. Only cycles while CTR is active are counted, where active implies that the current privilege mode is enabled for recording and CTR is not frozen. The CC field is encoded such that CCE holds 0 if the CtrCycleCounter value is less than 4096, otherwise it holds the index of the most significant one bit in the CtrCycleCounter value, minus 11. CCM holds CtrCycleCounter bits CCE+10:CCE-1.

    The elapsed cycle count can then be calculated by software using the following formula:

    if (CCE==0): return CCM else: return (2^12^ + CCM) << CCE-1 endif

    The CtrCycleCounter is reset on writes to _x_ctrctl, and on execution of SCTRCLR, to ensure that any accumulated cycle counts do not persist across a context switch.

    An implementation that supports cycle counting must implement CCV and all CCM bits, but may implement 0..4 exponent bits in CCE. Unimplemented CCE bits are read-only 0. For implementations that support transfer type filtering, it is recommended to implement at least 3 exponent bits. This allows capturing the full latency of most functions, when recording only calls and returns.

    The size of the CtrCycleCounter required to support each CCE width is given in the table below.

    Table 11. Cycle Counter Size Options

    | CCE bits | CtrCycleCounter bits | Max elapsed cycle value | | --- | --- | --- | | 0 | 12 | 4095 | | 1 | 13 | 8191 | | 2 | 15 | 32764 | | 3 | 19 | 524224 | | 4 | 27 | 134201344 |

    note

    When CCE>1, the granularity of the reported cycle count is reduced. For example, when CCE=3, the bottom 2 bits of the cycle counter are not reported, and thus the reported value increments only every 4 cycles. As a result, the reported value represents an undercount of elapsed cycles for most cases (when the unreported bits are non-zero). On average, the undercount will be (2CCE-1-1)/2. Software can reduce the average undercount to 0 by adding (2CCE-1-1)/2 to each computed cycle count value when CCE>1.

    Though this compressed method of representation results in some imprecision for larger cycle count values, it produces meaningful area savings, reducing storage per entry from 27 bits to 16.

    :::

    The CC value saturates when all implemented bits in CCM and CCE are 1.

    The CC value is only valid when the Cycle Count Valid (CCV) bit is set. If CCV=0, the CC value might not hold the correct count of elapsed active cycles since the last recorded transfer. The next record will have CCV=0 after a write to _x_ctrctl, or execution of SCTRCLR, since CtrCycleCounter is reset. CCV should additionally be cleared after any other implementation-specific scenarios where active cycles might not be counted in CtrCycleCounter.

    6.4 RAS (Return Address Stack) Emulation Mode

    When the optional _x_ctrctl.RASEMU bit is implemented and set to 1, transfer recording behavior is altered to emulate the behavior of a return-address stack (RAS).

  • Indirect and direct calls are recorded as normal
  • Function returns pop the most recent call, by decrementing the WRPTR then invalidating the WRPTR entry (by setting ctrsource.V=0). As a result, logical entry 0 is invalidated and moves to logical entry depth-1, while logical entries 1..depth-1 move to 0..depth-2.
  • Co-routine swaps affect both a return and a call. Logical entry 0 is overwritten, and WRPTR is not modified.
  • Other transfer types are inhibited
  • Transfer type filtering bits (_x_ctrctl[47:32]) and external trap enable bits (_x_ctrctl._x_TE) are ignored
note

Profiling tools often collect call stacks along with each sample. Stack walking, however, is a complex and often slow process that may require recompilation (e.g., -fno-omit-frame-pointer) to work reliably. With RAS emulation, tools can ask CTR hardware to save call stacks even for unmodified code.

CTR RAS emulation has limitations. The CTR buffer will contain only partial stacks in cases where the call stack depth was greater than the CTR depth, CTR recording was enabled at a lower point in the call stack than main(), or where the CTR buffer was cleared since main().

The CTR stack may be corrupted in cases where calls and returns are not symmetric, such as with stack unwinding (e.g., setjmp/longjmp, C++ exceptions), where stale call entries may be left on the CTR stack, or user stack switching, where calls from multiple stacks may be intermixed.

note

As described in Section 6.3, when CCV=1, the CC field provides the elapsed cycles since the prior CTR entry was recorded. This introduces implementation challenges when RASEMU=1 because, for each recorded call, there may have been several recorded calls (and returns which “popped” them) since the prior remaining call entry was recorded (see Section 6.4). The implication is that returns that pop a call entry not only do not reset the cycle counter, but instead add the CC field from the popped entry to the counter. For simplicity, an implementation may opt to record CCV=0 for all calls, or those whose parent call was popped, when RASEMU=1.

6.5 Freeze

When sctrstatus.FROZEN=1, transfer recording is inhibited. This bit can be set by hardware, as described below, or by software.

When sctrctl.LCOFIFRZ=1 and a local counter overflow interrupt (LCOFI) traps (as a result of an HPM counter overflow) to M-mode or to S-mode, sctrstatus.FROZEN is set by hardware. This inhibits CTR recording until software clears FROZEN. The LCOFI trap itself is not recorded.

note

Freeze on LCOFI ensures that the execution path leading to the sampled instruction (xepc) is preserved, and that the local counter overflow interrupt (LCOFI) and associated Interrupt Service Routine (ISR) do not displace any recorded transfer history state. It is the responsibility of the ISR to clear FROZEN before xRET, if continued control transfer recording is desired.

LCOFI refers only to architectural traps directly caused by a local counter overflow. If a local counter overflow interrupt is recognized without a trap, FROZEN is not automatically set. For instance, no freeze occurs if the LCOFI is pended while interrupts are masked, and software recognizes the LCOFI (perhaps by reading stopi or sip) and clears sip.LCOFIP before the trap is raised. As a result, some or all CTR history may be overwritten while handling the LCOFI. Such cases are expected to be very rare; for most usages (e.g., application profiling) privilege mode filtering is sufficient to ensure that CTR updates are inhibited while interrupts are handled in a more privileged mode.

Similarly, on a breakpoint exception that traps to M-mode or S-mode with sctrctl.BPFRZ=1, FROZEN is set by hardware. The breakpoint exception itself is not recorded.

note

Breakpoint exception refers to synchronous exceptions with a cause value of Breakpoint (3), regardless of source (ebreak, c.ebreak, Sdtrig); it does not include entry into Debug Mode, even in cores where this is implemented as an exception.

If the H extension is implemented, freeze behavior for LCOFIs and breakpoint exceptions that trap to VS-mode is determined by the LCOFIFRZ and BPFRZ values, respectively, in vsctrctl. This includes virtual LCOFIs pended by a hypervisor.

note

When a guest uses the SBI Supervisor Software Events (SSE) extension, the LCOFI will trap to HS-mode, which will then invoke a registered VS-mode LCOFI handler routine. If vsctrctl.LCOFIFRZ=1, the HS-mode handler will need to emulate the freeze by setting sctrstatus.FROZEN=1 before invoking the registered handler routine.