2025-03-13 Ordinary Meeting Notes

Mar 13, 2025 | RV-LFX Performance Event Sampling TG

Attendees: Beeman Strong

Notes

Attendees: Beeman, Bruce, Atish, Hasan, Atish, Chun, Qin, Daniel, Snehasish
Slides, video
Agenda
- Accelerated sampling after clear
- Sample collection/output
- New perf events
Begin with recap of TG’s extensions
- Event-based sampling
  - Targeted but late binding
  - Good for sampling rare events, but hard to get precise state
  - Ssplcofi, Sspesa serve this usage
- Instruction-based sampling
  - Early binding but noisy (impact of speculation)
  - Easy to get precise state, but not so good for sampling rare events
  - Instruction Sampling serves this usage
Accelerated sampling after clear
- When an instruction sample is dropped, should we wait for the next overflow or accelerate?
- Do we need randomness to avoid beat patterns?
  - Software can init counter to overflow after a prime number of events, makes beat patterns very unlikely
  - Randomness introduces more noise to sample period
    - Does it matter?
    - SW wants to know sample period, random number would have to be exposed
    - Could do so if random value is added after overflow, then counter holds the number of additional counts
    - PIPES adds it when initializing counter, so no way for SW to know. Does it matter?
    - Google compares counts across systems, so want precise count
    - If know LFSR init value could compute it
      - LFSR means might add or subtract from the initial counter value
    - Only really need precision on macro level, dropped samples are a problem because off by 2x
- Could include in record count of dropped samples since the previous sample, so can compute total number of events
- If accelerating, then would have to snapshot the counter value at time of accelerated selection to preserve event count
  - Not just a multiple of the sample period anymore
- Acceleration could be a mode, let SW choose
  - Yes, that’s an option. Want to be careful about SW burden with too much optionality, but could be reasonable.
Might want an option to count cycles, to see where time is being spent
- Use event sampling for that. Counting cycles with instr sampling would just tell you which instructions are decoded most otten
What if config gets no samples?
- That’s as expected, only care about the insts that are sampled enough
- Do you care about coverage?
- Google samples briefly on rotating set of systems in the fleet, only interested in code that accumulates samples
- Bytedance does similar continuous sampling to Google
Sample collection
- Propose that use scountovf[1] to indicate LCOFI for instr sampling
- Prefer not to count/sample instructions in LCOFI handler, could freeze counter when OF=1? Or leave it to SW to inhibit it?
Out of time, continue here next time
- We’re close to the end of the high level arch discussion
- Do we want to include counters from say the memory controller in this?
  - Don’t have any standard for non-hart PMUs yet, that’s our next top priority gap
  - Should we define that first, before this, in case it will impact this?
  - Don’t see how it would impact this, since this is a hart capability. But send email since we’re out of time

Action items