-
Notifications
You must be signed in to change notification settings - Fork 572
stdlib: exclude null frequencies when computing avg freq #2757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev/cphlipot1/cpu-stdlib-fixes-2
Are you sure you want to change the base?
stdlib: exclude null frequencies when computing avg freq #2757
Conversation
This fixes unexpected behavior where the inequality min_freq <= avg_freq <= max_freq may not hold when there are null frequencies. null frequencies may occur for sched slice intervals where a cpu frequency event has not been emitted yet and thus CPU freq is unknown at that point in the trace.
🤖 Hi @cphlipot1, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This pull request correctly fixes a bug in the avg_freq
calculation where NULL
frequencies could lead to incorrect results. The change is well-reasoned, consistently applied across all relevant SQL files, and includes corresponding updates to the test suite to validate the fix.
🔍 General Feedback
- The fix is implemented correctly by excluding durations associated with
NULL
frequencies from the denominator of the weighted average calculation. - The test updates provide confidence in the correctness of the fix.
@@ -126,7 +126,7 @@ SELECT | |||
sum(ii.dur) AS runtime, | |||
min(freq) AS min_freq, | |||
max(freq) AS max_freq, | |||
cast_int!(SUM(ii.dur * freq) / SUM(ii.dur)) AS avg_freq | |||
cast_int!(SUM(ii.dur * freq) / SUM(CASE WHEN freq IS NOT NULL THEN ii.dur END)) AS avg_freq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟢 The logic to correctly calculate the sum of durations for non-null frequencies is now duplicated in process.sql
, system.sql
, and thread.sql
. While it's fine for now, if this pattern is expected to be used more widely, consider creating a helper function or view to encapsulate this logic. This would improve maintainability and reduce code duplication.
This fixes unexpected behavior where the inequality min_freq <= avg_freq <= max_freq
may not hold when there are null frequencies.
null frequencies may occur for sched slice intervals where a cpu frequency
event has not been emitted yet and thus CPU freq is unknown at that
point in the trace.