2025-03-13 Ordinary Meeting Notes

2025-03-13 Ordinary Meeting Notes

Mar 13, 2025 | RV-LFX Performance Event Sampling TG

Attendees: Beeman Strong

 

Notes

  • Attendees: Beeman, Bruce, Atish, Hasan, Atish, Chun, Qin, Daniel, Snehasish

  • Slides, video

  • Agenda

    • Accelerated sampling after clear

    • Sample collection/output

    • New perf events

  • Begin with recap of TG’s extensions

    • Event-based sampling

      • Targeted but late binding

      • Good for sampling rare events, but hard to get precise state

      • Ssplcofi, Sspesa serve this usage

    • Instruction-based sampling

      • Early binding but noisy (impact of speculation)

      • Easy to get precise state, but not so good for sampling rare events

      • Instruction Sampling serves this usage

  • Accelerated sampling after clear

    • When an instruction sample is dropped, should we wait for the next overflow or accelerate?

    • Do we need randomness to avoid beat patterns?

      • Software can init counter to overflow after a prime number of events, makes beat patterns very unlikely

      • Randomness introduces more noise to sample period

        • Does it matter?

        • SW wants to know sample period, random number would have to be exposed

        • Could do so if random value is added after overflow, then counter holds the number of additional counts

        • PIPES adds it when initializing counter, so no way for SW to know.  Does it matter?

        • Google compares counts across systems, so want precise count

        • If know LFSR init value could compute it

          • LFSR means might add or subtract from the initial counter value

        • Only really need precision on macro level, dropped samples are a problem because off by 2x

    • Could include in record count of dropped samples since the previous sample, so can compute total number of events

    • If accelerating, then would have to snapshot the counter value at time of accelerated selection to preserve event count

      • Not just a multiple of the sample period anymore

    • Acceleration could be a mode, let SW choose

      • Yes, that’s an option.  Want to be careful about SW burden with too much optionality, but could be reasonable.

  • Might want an option to count cycles, to see where time is being spent

    • Use event sampling for that.  Counting cycles with instr sampling would just tell you which instructions are decoded most otten

  • What if config gets no samples?

    • That’s as expected, only care about the insts that are sampled enough

    • Do you care about coverage?

    • Google samples briefly on rotating set of systems in the fleet, only interested in code that accumulates samples

    • Bytedance does similar continuous sampling to Google

  • Sample collection

    • Propose that use scountovf[1] to indicate LCOFI for instr sampling

    • Prefer not to count/sample instructions in LCOFI handler, could freeze counter when OF=1?  Or leave it to SW to inhibit it?

  • Out of time, continue here next time

    • We’re close to the end of the high level arch discussion

    • Do we want to include counters from say the memory controller in this?

      • Don’t have any standard for non-hart PMUs yet, that’s our next top priority gap

      • Should we define that first, before this, in case it will impact this?

      • Don’t see how it would impact this, since this is a hart capability.  But send email since we’re out of time

 

Action items


Standard_2.png

RISC-V International