Skip to content

Commit f7b58cb

Browse files
Ravi Bangoriaacmel
Ravi Bangoria
authored andcommitted
perf mem/c2c: Add load store event mappings for AMD
The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record' with mem load/ store events. IBS tagged load/store sample provides most of the information needed for these tools. Wire in the "ibs_op//" event as mem-ldst event for AMD. There are some limitations though: Only load/store micro-ops provide mem/c2c information. Whereas, IBS does not have a way to choose a particular type of micro-op to tag. This results in many non-LS micro-ops being tagged which appear as N/A in the perf report. IBS, being an uncore pmu from kernel point of view[1], does not support per process monitoring. Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ [1]: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
1 parent 4173cc0 commit f7b58cb

File tree

3 files changed

+41
-7
lines changed

3 files changed

+41
-7
lines changed

tools/perf/Documentation/perf-c2c.txt

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@ C2C stands for Cache To Cache.
1919
The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
2020
you to track down the cacheline contentions.
2121

22-
On x86, the tool is based on load latency and precise store facility events
22+
On Intel, the tool is based on load latency and precise store facility events
2323
provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
24-
with thresholding feature.
24+
with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
25+
limitations, perf c2c is not supported on Zen3 cpus).
2526

2627
These events provide:
2728
- memory address of the access
@@ -49,7 +50,8 @@ RECORD OPTIONS
4950

5051
-l::
5152
--ldlat::
52-
Configure mem-loads latency. (x86 only)
53+
Configure mem-loads latency. Supported on Intel and Arm64 processors
54+
only. Ignored on other archs.
5355

5456
-k::
5557
--all-kernel::
@@ -135,11 +137,15 @@ Following perf record options are configured by default:
135137
-W,-d,--phys-data,--sample-cpu
136138

137139
Unless specified otherwise with '-e' option, following events are monitored by
138-
default on x86:
140+
default on Intel:
139141

140142
cpu/mem-loads,ldlat=30/P
141143
cpu/mem-stores/P
142144

145+
following on AMD:
146+
147+
ibs_op//
148+
143149
and following on PowerPC:
144150

145151
cpu/mem-loads/

tools/perf/Documentation/perf-mem.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,8 @@ RECORD OPTIONS
8585
Be more verbose (show counter open errors, etc)
8686

8787
--ldlat <n>::
88-
Specify desired latency for loads event. (x86 only)
88+
Specify desired latency for loads event. Supported on Intel and Arm64
89+
processors only. Ignored on other archs.
8990

9091
In addition, for report all perf report options are valid, and for record
9192
all perf record options.

tools/perf/arch/x86/util/mem-events.c

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
// SPDX-License-Identifier: GPL-2.0
22
#include "util/pmu.h"
3+
#include "util/env.h"
34
#include "map_symbol.h"
45
#include "mem-events.h"
6+
#include "linux/string.h"
57

68
static char mem_loads_name[100];
79
static bool mem_loads_name__init;
@@ -12,18 +14,43 @@ static char mem_stores_name[100];
1214

1315
#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
1416

15-
static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
17+
static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
1618
E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads"),
1719
E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores"),
1820
E(NULL, NULL, NULL),
1921
};
2022

23+
static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
24+
E(NULL, NULL, NULL),
25+
E(NULL, NULL, NULL),
26+
E("mem-ldst", "ibs_op//", "ibs_op"),
27+
};
28+
29+
static int perf_mem_is_amd_cpu(void)
30+
{
31+
struct perf_env env = { .total_mem = 0, };
32+
33+
perf_env__cpuid(&env);
34+
if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
35+
return 1;
36+
return -1;
37+
}
38+
2139
struct perf_mem_event *perf_mem_events__ptr(int i)
2240
{
41+
/* 0: Uninitialized, 1: Yes, -1: No */
42+
static int is_amd;
43+
2344
if (i >= PERF_MEM_EVENTS__MAX)
2445
return NULL;
2546

26-
return &perf_mem_events[i];
47+
if (!is_amd)
48+
is_amd = perf_mem_is_amd_cpu();
49+
50+
if (is_amd == 1)
51+
return &perf_mem_events_amd[i];
52+
53+
return &perf_mem_events_intel[i];
2754
}
2855

2956
bool is_mem_loads_aux_event(struct evsel *leader)

0 commit comments

Comments
 (0)