-
Notifications
You must be signed in to change notification settings - Fork 53
add Gaudi microarchitecture to report #479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Harper, Jason M <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Gaudi microarchitecture information to the system report by detecting the device type based on PCI IDs and including it in the Gaudi device table.
- Adds a new script to detect Gaudi microarchitecture (gaudi, gaudi2, gaudi3) based on PCI device IDs
- Updates the Gaudi struct and table to include microarchitecture information
- Modifies table field mapping to accommodate the new microarchitecture column
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
internal/script/script_defs.go | Adds new script definition for detecting Gaudi microarchitecture via PCI ID matching |
internal/report/table_helpers.go | Updates Gaudi struct with Microarchitecture field and populates it from script output |
internal/report/table_defs.go | Adds microarchitecture field to Gaudi table definition and updates field mappings |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
__DEFAULT_HL_DEVICE="gaudi" | ||
elif echo $__pcidev | grep -qE '1020|1030'; then | ||
__DEFAULT_HL_DEVICE="gaudi2" | ||
elif echo $__pcidev | grep -qE '106[0-9]'; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern '106[0-9]' is overly broad and could match unintended PCI IDs. Consider using a more specific pattern like '1060|1061|1062' if those are the exact IDs for gaudi3, or document the rationale for the broad range.
elif echo $__pcidev | grep -qE '106[0-9]'; then | |
elif echo $__pcidev | grep -qE '1060|1061|1062'; then |
Copilot uses AI. Check for mistakes.
sort.Slice(gaudis, func(i, j int) bool { | ||
return gaudis[i].ModuleID < gaudis[j].ModuleID | ||
}) | ||
// set microarchitecture (assumes same arch for all gaudi devices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation assigns the same microarchitecture to all Gaudi devices, but the PR description states this assumption may not be valid. Consider detecting microarchitecture per device or updating the comment to reflect this known limitation.
// set microarchitecture (assumes same arch for all gaudi devices) | |
// Set microarchitecture for all Gaudi devices. | |
// NOTE: This assumes all Gaudi devices have the same microarchitecture, which may not be valid. | |
// See PR description and CodeQL warning for details. If per-device microarchitecture is needed, | |
// update this code to parse and assign microarchitecture per device. |
Copilot uses AI. Check for mistakes.
* Metric names modernized to match perfmon. "Legacy" names optional and deprecated. All metric descriptions in HTML report. Signed-off-by: Harper, Jason M <[email protected]> * temporary threshold value Signed-off-by: Harper, Jason M <[email protected]> * tma metric thresholds Signed-off-by: Harper, Jason M <[email protected]> * style the tooltips Signed-off-by: Harper, Jason M <[email protected]> * indent on TMA child metrics Signed-off-by: Harper, Jason M <[email protected]> * add tuning tips Signed-off-by: Harper, Jason M <[email protected]> * skip threshold if no metric variables defined Signed-off-by: Harper, Jason M <[email protected]> * revert to legacy names Signed-off-by: Harper, Jason M <[email protected]> * set legacy name to name for legacy metric defs Signed-off-by: Harper, Jason M <[email protected]> * fix kernel_CPI metric formula and add threshold to kernel utilization Signed-off-by: Harper, Jason M <[email protected]> --------- Signed-off-by: Harper, Jason M <[email protected]>
Signed-off-by: Harper, Jason M <[email protected]>
currently assumes that all Gaudi devices installed in a system are of the same microarchitecture