2025-03-13 Ordinary Meeting Notes
Mar 13, 2025 | RV-LFX Performance Event Sampling TG
Attendees: Beeman Strong
Notes
Attendees: Beeman, Bruce, Atish, Hasan, Atish, Chun, Qin, Daniel, Snehasish
Agenda
Accelerated sampling after clear
Sample collection/output
New perf events
Begin with recap of TG’s extensions
Event-based sampling
Targeted but late binding
Good for sampling rare events, but hard to get precise state
Ssplcofi, Sspesa serve this usage
Instruction-based sampling
Early binding but noisy (impact of speculation)
Easy to get precise state, but not so good for sampling rare events
Instruction Sampling serves this usage
Accelerated sampling after clear
When an instruction sample is dropped, should we wait for the next overflow or accelerate?
Do we need randomness to avoid beat patterns?
Software can init counter to overflow after a prime number of events, makes beat patterns very unlikely
Randomness introduces more noise to sample period
Does it matter?
SW wants to know sample period, random number would have to be exposed
Could do so if random value is added after overflow, then counter holds the number of additional counts
PIPES adds it when initializing counter, so no way for SW to know. Does it matter?
Google compares counts across systems, so want precise count
If know LFSR init value could compute it
LFSR means might add or subtract from the initial counter value
Only really need precision on macro level, dropped samples are a problem because off by 2x
Could include in record count of dropped samples since the previous sample, so can compute total number of events
If accelerating, then would have to snapshot the counter value at time of accelerated selection to preserve event count
Not just a multiple of the sample period anymore
Acceleration could be a mode, let SW choose
Yes, that’s an option. Want to be careful about SW burden with too much optionality, but could be reasonable.
Might want an option to count cycles, to see where time is being spent
Use event sampling for that. Counting cycles with instr sampling would just tell you which instructions are decoded most otten
What if config gets no samples?
That’s as expected, only care about the insts that are sampled enough
Do you care about coverage?
Google samples briefly on rotating set of systems in the fleet, only interested in code that accumulates samples
Bytedance does similar continuous sampling to Google
Sample collection
Propose that use scountovf[1] to indicate LCOFI for instr sampling
Prefer not to count/sample instructions in LCOFI handler, could freeze counter when OF=1? Or leave it to SW to inhibit it?
Out of time, continue here next time
We’re close to the end of the high level arch discussion
Do we want to include counters from say the memory controller in this?
Don’t have any standard for non-hart PMUs yet, that’s our next top priority gap
Should we define that first, before this, in case it will impact this?
Don’t see how it would impact this, since this is a hart capability. But send email since we’re out of time
Action items
RISC-V International