Skip to content

[DRAFT][memprof][darwin] Support memprof on Darwin platform and add binary access profile #142884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SharonXSharon
Copy link
Contributor

The patch has two main parts,

  1. add support for memprof on Darwin platform
  2. add a new binary access profile, including the format definition, the serialization from the memprof compiler-rt, and the code in llvm-profdata to deserialize.

For 1) @manman-ren tried to upstream earlier on at https://github.com/llvm/llvm-project/pull/69640/files, this patch has some changes compared with it.

Currently, in order to properly upstream this patch, we need to resolve 3 issues,
a) The Darwin support may have some issues in the interception part, especially the llvm-project/compiler-rt/lib/memprof/memprof_interceptors.cpp/.h files, given interceptions on Apple is a bit different. (For our internal testing, we are commenting out a lot of the interceptions). In theory we should be able to follow what Asan is doing for Apple, but I am not super familiar with asan interception either, so having a little learning curve here.

b) We need to discuss how to integrate the binary access profile format into the existing raw profile format, rather than creating a new format

c) we have been using the MEM_GRANULARITY = 8 bytes before the HISTOGRAM_GRANULARITY was added, now we should be able to just reuse the histogram code? Does the histogram just effectively make the profile granularity being 8 bytes and then show the access count of every 8 bytes memblock in a histogram?

Please let me know your thoughts/insights on the above 3 issues, thanks a lot!
@snehasish @teresajohnson @mingmingl-llvm @manman-ren

@SharonXSharon SharonXSharon changed the title [memprof][darwin] Support memprof on Darwin platform and add binary access profile [DRAFT][memprof][darwin] Support memprof on Darwin platform and add binary access profile Jun 5, 2025
Copy link

github-actions bot commented Jun 5, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions c,inc,h,cpp -- clang/test/Driver/fmemprof-darwin.c compiler-rt/include/profile/MemProfBinaryAccessData.inc compiler-rt/lib/memprof/memprof_mac.cpp compiler-rt/lib/memprof/memprof_malloc_mac.cpp llvm/include/llvm/ProfileData/MemProfBinaryAccessData.inc clang/lib/Driver/ToolChains/Darwin.cpp compiler-rt/include/profile/MemProfData.inc compiler-rt/lib/memprof/memprof_allocator.cpp compiler-rt/lib/memprof/memprof_allocator.h compiler-rt/lib/memprof/memprof_flags.inc compiler-rt/lib/memprof/memprof_interceptors.cpp compiler-rt/lib/memprof/memprof_interceptors.h compiler-rt/lib/memprof/memprof_linux.cpp compiler-rt/lib/memprof/memprof_malloc_linux.cpp compiler-rt/lib/memprof/memprof_mapping.h compiler-rt/lib/memprof/memprof_new_delete.cpp compiler-rt/lib/memprof/memprof_rawprofile.cpp compiler-rt/lib/memprof/memprof_rawprofile.h llvm/include/llvm/ProfileData/MemProfData.inc llvm/tools/llvm-profdata/llvm-profdata.cpp
View the diff from clang-format here.
diff --git a/llvm/include/llvm/ProfileData/MemProfBinaryAccessData.inc b/llvm/include/llvm/ProfileData/MemProfBinaryAccessData.inc
index 6e793c49b..286dcfb8d 100644
--- a/llvm/include/llvm/ProfileData/MemProfBinaryAccessData.inc
+++ b/llvm/include/llvm/ProfileData/MemProfBinaryAccessData.inc
@@ -1,4 +1,5 @@
-/*=-- MemProfBinaryAccessData.inc - MemProf profiling runtime macros -*- C++ -*-=*\
+/*=-- MemProfBinaryAccessData.inc - MemProf profiling runtime macros -*- C++
+-*-=*\
 |*
 |* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 |* See https://llvm.org/LICENSE.txt for license information.

@SharonXSharon SharonXSharon marked this pull request as ready for review June 5, 2025 17:02
@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' PGO Profile Guided Optimizations labels Jun 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 5, 2025

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-pgo

Author: None (SharonXSharon)

Changes

The patch has two main parts,

  1. add support for memprof on Darwin platform
  2. add a new binary access profile, including the format definition, the serialization from the memprof compiler-rt, and the code in llvm-profdata to deserialize.

For 1) @manman-ren tried to upstream earlier on at https://github.com/llvm/llvm-project/pull/69640/files, this patch has some changes compared with it.

Currently, in order to properly upstream this patch, we need to resolve 3 issues,
a) The Darwin support may have some issues in the interception part, especially the llvm-project/compiler-rt/lib/memprof/memprof_interceptors.cpp/.h files, given interceptions on Apple is a bit different. (For our internal testing, we are commenting out a lot of the interceptions). In theory we should be able to follow what Asan is doing for Apple, but I am not super familiar with asan interception either, so having a little learning curve here.

b) We need to discuss how to integrate the binary access profile format into the existing raw profile format, rather than creating a new format

c) we have been using the MEM_GRANULARITY = 8 bytes before the HISTOGRAM_GRANULARITY was added, now we should be able to just reuse the histogram code? Does the histogram just effectively make the profile granularity being 8 bytes and then show the access count of every 8 bytes memblock in a histogram?

Please let me know your thoughts/insights on the above 3 issues, thanks a lot!
@snehasish @teresajohnson @mingmingl-llvm @manman-ren


Patch is 63.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142884.diff

26 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/Darwin.cpp (+2)
  • (added) clang/test/Driver/fmemprof-darwin.c (+11)
  • (modified) compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake (+1-1)
  • (modified) compiler-rt/cmake/config-ix.cmake (+1-1)
  • (added) compiler-rt/include/profile/MemProfBinaryAccessData.inc (+41)
  • (modified) compiler-rt/include/profile/MemProfData.inc (+2)
  • (modified) compiler-rt/lib/memprof/CMakeLists.txt (+130-107)
  • (modified) compiler-rt/lib/memprof/memprof_allocator.cpp (+32-6)
  • (modified) compiler-rt/lib/memprof/memprof_allocator.h (+22-7)
  • (modified) compiler-rt/lib/memprof/memprof_flags.inc (+2)
  • (modified) compiler-rt/lib/memprof/memprof_interceptors.cpp (+2)
  • (modified) compiler-rt/lib/memprof/memprof_interceptors.h (+5)
  • (modified) compiler-rt/lib/memprof/memprof_linux.cpp (+2-3)
  • (added) compiler-rt/lib/memprof/memprof_mac.cpp (+56)
  • (modified) compiler-rt/lib/memprof/memprof_malloc_linux.cpp (+2-3)
  • (added) compiler-rt/lib/memprof/memprof_malloc_mac.cpp (+72)
  • (modified) compiler-rt/lib/memprof/memprof_mapping.h (+12-2)
  • (modified) compiler-rt/lib/memprof/memprof_new_delete.cpp (+37-4)
  • (modified) compiler-rt/lib/memprof/memprof_rawprofile.cpp (+134)
  • (modified) compiler-rt/lib/memprof/memprof_rawprofile.h (+6)
  • (modified) compiler-rt/test/memprof/CMakeLists.txt (+74-15)
  • (modified) compiler-rt/test/memprof/lit.cfg.py (+1-1)
  • (added) llvm/include/llvm/ProfileData/MemProfBinaryAccessData.inc (+40)
  • (modified) llvm/include/llvm/ProfileData/MemProfData.inc (+2)
  • (modified) llvm/include/module.modulemap (+1)
  • (modified) llvm/tools/llvm-profdata/llvm-profdata.cpp (+266-20)
diff --git a/clang/lib/Driver/ToolChains/Darwin.cpp b/clang/lib/Driver/ToolChains/Darwin.cpp
index 26e24ad0ab17c..113127ace8356 100644
--- a/clang/lib/Driver/ToolChains/Darwin.cpp
+++ b/clang/lib/Driver/ToolChains/Darwin.cpp
@@ -1619,6 +1619,8 @@ void DarwinClang::AddLinkRuntimeLibArgs(const ArgList &Args,
       AddLinkRuntimeLib(Args, CmdArgs, "stats_client", RLO_AlwaysLink);
       AddLinkSanitizerLibArgs(Args, CmdArgs, "stats");
     }
+    if (Sanitize.needsMemProfRt())
+      AddLinkSanitizerLibArgs(Args, CmdArgs, "memprof");
   }
 
   if (Sanitize.needsMemProfRt())
diff --git a/clang/test/Driver/fmemprof-darwin.c b/clang/test/Driver/fmemprof-darwin.c
new file mode 100644
index 0000000000000..2b1ad67d176ff
--- /dev/null
+++ b/clang/test/Driver/fmemprof-darwin.c
@@ -0,0 +1,11 @@
+	
+// Test sanitizer link flags on Darwin.
+
+// RUN: %clang -### --target=x86_64-darwin \
+// RUN:   -stdlib=platform -fmemory-profile %s 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-MEMPROF %s
+
+// CHECK-MEMPROF: "{{.*}}ld{{(.exe)?}}"
+// CHECK-MEMPROF-SAME: libclang_rt.memprof_osx_dynamic.dylib"
+// CHECK-MEMPROF-SAME: "-rpath" "@executable_path"
+// CHECK-MEMPROF-SAME: "-rpath" "{{[^"]*}}lib{{[^"]*}}darwin"
diff --git a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
index ca45d7bd2af7f..d89c07f690740 100644
--- a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
+++ b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
@@ -78,7 +78,7 @@ else()
 endif()
 set(ALL_NSAN_SUPPORTED_ARCH ${X86_64})
 set(ALL_HWASAN_SUPPORTED_ARCH ${X86_64} ${ARM64} ${RISCV64})
-set(ALL_MEMPROF_SUPPORTED_ARCH ${X86_64})
+set(ALL_MEMPROF_SUPPORTED_ARCH ${X86_64} ${ARM64})
 set(ALL_PROFILE_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${PPC32} ${PPC64}
     ${MIPS32} ${MIPS64} ${S390X} ${SPARC} ${SPARCV9} ${HEXAGON}
     ${RISCV32} ${RISCV64} ${LOONGARCH64} ${WASM32})
diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake
index e3310b1ff0e2c..305dbadde71a9 100644
--- a/compiler-rt/cmake/config-ix.cmake
+++ b/compiler-rt/cmake/config-ix.cmake
@@ -829,7 +829,7 @@ else()
 endif()
 
 if (COMPILER_RT_HAS_SANITIZER_COMMON AND MEMPROF_SUPPORTED_ARCH AND
-    OS_NAME MATCHES "Linux")
+    OS_NAME MATCHES "Linux|Darwin")
   set(COMPILER_RT_HAS_MEMPROF TRUE)
 else()
   set(COMPILER_RT_HAS_MEMPROF FALSE)
diff --git a/compiler-rt/include/profile/MemProfBinaryAccessData.inc b/compiler-rt/include/profile/MemProfBinaryAccessData.inc
new file mode 100644
index 0000000000000..25333a8f7fcc5
--- /dev/null
+++ b/compiler-rt/include/profile/MemProfBinaryAccessData.inc
@@ -0,0 +1,41 @@
+/*===-- MemProfBinaryAccessData.inc - MemProf profiling runtime macros -*- C++
+-*-======== *\
+|*
+|* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+|* See https://llvm.org/LICENSE.txt for license information.
+|* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+|*
+\*===----------------------------------------------------------------------===*/
+/*
+ * This file defines the macros for memprof profiling data structures
+ * for binary access.
+ *
+ * This file has two identical copies. The primary copy lives in LLVM and
+ * the other one sits in compiler-rt/include/profile directory. To make changes
+ * in this file, first modify the primary copy and copy it over to compiler-rt.
+ * Testing of any change in this file can start only after the two copies are
+ * synced up.
+ *
+\*===----------------------------------------------------------------------===*/
+
+// A 64-bit magic number to uniquely identify the raw binary memprof binary
+// access profile file.
+#define MEMPROF_BINARY_ACCESS_RAW_MAGIC_64                                     \
+  ((uint64_t)255 << 56 | (uint64_t)'f' << 48 | (uint64_t)'b' << 40 |           \
+   (uint64_t)'m' << 32 | (uint64_t)'b' << 24 | (uint64_t)'i' << 16 |           \
+   (uint64_t)'n' << 8 | (uint64_t)129)
+
+// The version number of the raw binary format.
+#define MEMPROF_BINARY_ACCESS_RAW_VERSION 1ULL
+
+// A struct describing the header used for the raw binary memprof profile
+// format.
+PACKED(struct BinaryAccessHeader {
+  uint64_t Magic;
+  uint64_t Version;
+  uint64_t TotalSize;
+  uint64_t SegmentOffset;
+  uint64_t NumSegments;
+  uint64_t MemAddressOffset;
+  uint64_t NumMemBlockAddresses;
+});
diff --git a/compiler-rt/include/profile/MemProfData.inc b/compiler-rt/include/profile/MemProfData.inc
index 3f785bd23fce3..f30e95bc51111 100644
--- a/compiler-rt/include/profile/MemProfData.inc
+++ b/compiler-rt/include/profile/MemProfData.inc
@@ -45,6 +45,8 @@
 
 namespace llvm {
 namespace memprof {
+#include "MemProfBinaryAccessData.inc"
+
 // A struct describing the header used for the raw binary memprof profile format.
 PACKED(struct Header {
   uint64_t Magic;
diff --git a/compiler-rt/lib/memprof/CMakeLists.txt b/compiler-rt/lib/memprof/CMakeLists.txt
index e6d99daca6ee7..d1c4d607ef396 100644
--- a/compiler-rt/lib/memprof/CMakeLists.txt
+++ b/compiler-rt/lib/memprof/CMakeLists.txt
@@ -7,7 +7,9 @@ set(MEMPROF_SOURCES
   memprof_interceptors.cpp
   memprof_interceptors_memintrinsics.cpp
   memprof_linux.cpp
+  memprof_mac.cpp
   memprof_malloc_linux.cpp
+  memprof_malloc_mac.cpp
   memprof_mibmap.cpp
   memprof_posix.cpp
   memprof_rawprofile.cpp
@@ -86,124 +88,145 @@ add_compiler_rt_object_libraries(RTMemprof_dynamic
   DEFS ${MEMPROF_DYNAMIC_DEFINITIONS}
   DEPS ${MEMPROF_DEPS})
 
-add_compiler_rt_object_libraries(RTMemprof
-  ARCHS ${MEMPROF_SUPPORTED_ARCH}
-  SOURCES ${MEMPROF_SOURCES}
-  ADDITIONAL_HEADERS ${MEMPROF_HEADERS}
-  CFLAGS ${MEMPROF_CFLAGS}
-  DEFS ${MEMPROF_COMMON_DEFINITIONS}
-  DEPS ${MEMPROF_DEPS})
-add_compiler_rt_object_libraries(RTMemprof_cxx
-  ARCHS ${MEMPROF_SUPPORTED_ARCH}
-  SOURCES ${MEMPROF_CXX_SOURCES}
-  ADDITIONAL_HEADERS ${MEMPROF_HEADERS}
-  CFLAGS ${MEMPROF_CFLAGS}
-  DEFS ${MEMPROF_COMMON_DEFINITIONS}
-  DEPS ${MEMPROF_DEPS})
-add_compiler_rt_object_libraries(RTMemprof_preinit
-  ARCHS ${MEMPROF_SUPPORTED_ARCH}
-  SOURCES ${MEMPROF_PREINIT_SOURCES}
-  ADDITIONAL_HEADERS ${MEMPROF_HEADERS}
-  CFLAGS ${MEMPROF_CFLAGS}
-  DEFS ${MEMPROF_COMMON_DEFINITIONS}
-  DEPS ${MEMPROF_DEPS})
-
-file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp "")
-add_compiler_rt_object_libraries(RTMemprof_dynamic_version_script_dummy
-  ARCHS ${MEMPROF_SUPPORTED_ARCH}
-  SOURCES ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp
-  CFLAGS ${MEMPROF_DYNAMIC_CFLAGS}
-  DEFS ${MEMPROF_DYNAMIC_DEFINITIONS}
-  DEPS ${MEMPROF_DEPS})
+if(NOT APPLE)
+  add_compiler_rt_object_libraries(RTMemprof
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    SOURCES ${MEMPROF_SOURCES}
+    ADDITIONAL_HEADERS ${MEMPROF_HEADERS}
+    CFLAGS ${MEMPROF_CFLAGS}
+    DEFS ${MEMPROF_COMMON_DEFINITIONS}
+    DEPS ${MEMPROF_DEPS})
+  add_compiler_rt_object_libraries(RTMemprof_cxx
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    SOURCES ${MEMPROF_CXX_SOURCES}
+    ADDITIONAL_HEADERS ${MEMPROF_HEADERS}
+    CFLAGS ${MEMPROF_CFLAGS}
+    DEFS ${MEMPROF_COMMON_DEFINITIONS}
+    DEPS ${MEMPROF_DEPS})
+  add_compiler_rt_object_libraries(RTMemprof_preinit
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    SOURCES ${MEMPROF_PREINIT_SOURCES}
+    ADDITIONAL_HEADERS ${MEMPROF_HEADERS}
+    CFLAGS ${MEMPROF_CFLAGS}
+    DEFS ${MEMPROF_COMMON_DEFINITIONS}
+    DEPS ${MEMPROF_DEPS})
+
+  file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp "")
+  add_compiler_rt_object_libraries(RTMemprof_dynamic_version_script_dummy
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    SOURCES ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp
+    CFLAGS ${MEMPROF_DYNAMIC_CFLAGS}
+    DEFS ${MEMPROF_DYNAMIC_DEFINITIONS}
+    DEPS ${MEMPROF_DEPS})
+endif()
 
 # Build MemProf runtimes shipped with Clang.
 add_compiler_rt_component(memprof)
 
-# Build separate libraries for each target.
-
-set(MEMPROF_COMMON_RUNTIME_OBJECT_LIBS
-  RTInterception
-  RTSanitizerCommon
-  RTSanitizerCommonLibc
-  RTSanitizerCommonCoverage
-  RTSanitizerCommonSymbolizer
-  # FIXME: hangs.
-  # RTSanitizerCommonSymbolizerInternal
-)
-
-add_compiler_rt_runtime(clang_rt.memprof
-  STATIC
-  ARCHS ${MEMPROF_SUPPORTED_ARCH}
-  OBJECT_LIBS RTMemprof_preinit
-              RTMemprof
-              ${MEMPROF_COMMON_RUNTIME_OBJECT_LIBS}
-  CFLAGS ${MEMPROF_CFLAGS}
-  DEFS ${MEMPROF_COMMON_DEFINITIONS}
-  PARENT_TARGET memprof)
-
-add_compiler_rt_runtime(clang_rt.memprof_cxx
-  STATIC
-  ARCHS ${MEMPROF_SUPPORTED_ARCH}
-  OBJECT_LIBS RTMemprof_cxx
-  CFLAGS ${MEMPROF_CFLAGS}
-  DEFS ${MEMPROF_COMMON_DEFINITIONS}
-  PARENT_TARGET memprof)
-
-add_compiler_rt_runtime(clang_rt.memprof-preinit
-  STATIC
-  ARCHS ${MEMPROF_SUPPORTED_ARCH}
-  OBJECT_LIBS RTMemprof_preinit
-  CFLAGS ${MEMPROF_CFLAGS}
-  DEFS ${MEMPROF_COMMON_DEFINITIONS}
-  PARENT_TARGET memprof)
-
-foreach(arch ${MEMPROF_SUPPORTED_ARCH})
-  if (UNIX)
-    add_sanitizer_rt_version_list(clang_rt.memprof-dynamic-${arch}
-                                  LIBS clang_rt.memprof-${arch} clang_rt.memprof_cxx-${arch}
-                                  EXTRA memprof.syms.extra)
-    set(VERSION_SCRIPT_FLAG
-         -Wl,--version-script,${CMAKE_CURRENT_BINARY_DIR}/clang_rt.memprof-dynamic-${arch}.vers)
-    set_property(SOURCE
-      ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp
-      APPEND PROPERTY
-      OBJECT_DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/clang_rt.memprof-dynamic-${arch}.vers)
-  else()
-    set(VERSION_SCRIPT_FLAG)
-  endif()
-
-  set(MEMPROF_DYNAMIC_WEAK_INTERCEPTION)
-
+if(APPLE)
+  # following asan
+  add_weak_symbols("memprof" WEAK_SYMBOL_LINK_FLAGS)
+  add_weak_symbols("sanitizer_common" WEAK_SYMBOL_LINK_FLAGS)
   add_compiler_rt_runtime(clang_rt.memprof
     SHARED
-    ARCHS ${arch}
-    OBJECT_LIBS ${MEMPROF_COMMON_RUNTIME_OBJECT_LIBS}
-            RTMemprof_dynamic
-            # The only purpose of RTMemprof_dynamic_version_script_dummy is to
-            # carry a dependency of the shared runtime on the version script.
-            # Replacing it with a straightforward
-            # add_dependencies(clang_rt.memprof-dynamic-${arch} clang_rt.memprof-dynamic-${arch}-version-list)
-            # generates an order-only dependency in ninja.
-            RTMemprof_dynamic_version_script_dummy
-            ${MEMPROF_DYNAMIC_WEAK_INTERCEPTION}
+    OS ${SANITIZER_COMMON_SUPPORTED_OS}
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    OBJECT_LIBS RTMemprof_dynamic
+                RTInterception
+                RTSanitizerCommon
+                RTSanitizerCommonLibc
+                RTSanitizerCommonCoverage
+                RTSanitizerCommonSymbolizer
     CFLAGS ${MEMPROF_DYNAMIC_CFLAGS}
-    LINK_FLAGS ${MEMPROF_DYNAMIC_LINK_FLAGS}
-              ${VERSION_SCRIPT_FLAG}
-    LINK_LIBS ${MEMPROF_DYNAMIC_LIBS}
+    LINK_FLAGS ${WEAK_SYMBOL_LINK_FLAGS}
     DEFS ${MEMPROF_DYNAMIC_DEFINITIONS}
     PARENT_TARGET memprof)
+else()
+  # Build separate libraries for each target.
+
+  set(MEMPROF_COMMON_RUNTIME_OBJECT_LIBS
+    RTInterception
+    RTSanitizerCommon
+    RTSanitizerCommonLibc
+    RTSanitizerCommonCoverage
+    RTSanitizerCommonSymbolizer
+    # FIXME: hangs.
+    # RTSanitizerCommonSymbolizerInternal
+  )
 
-  if (SANITIZER_USE_SYMBOLS)
-    add_sanitizer_rt_symbols(clang_rt.memprof_cxx
-      ARCHS ${arch})
-    add_dependencies(memprof clang_rt.memprof_cxx-${arch}-symbols)
-    add_sanitizer_rt_symbols(clang_rt.memprof
-      ARCHS ${arch}
-      EXTRA memprof.syms.extra)
-    add_dependencies(memprof clang_rt.memprof-${arch}-symbols)
-  endif()
-endforeach()
+  add_compiler_rt_runtime(clang_rt.memprof
+    STATIC
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    OBJECT_LIBS RTMemprof_preinit
+                RTMemprof
+                ${MEMPROF_COMMON_RUNTIME_OBJECT_LIBS}
+    CFLAGS ${MEMPROF_CFLAGS}
+    DEFS ${MEMPROF_COMMON_DEFINITIONS}
+    PARENT_TARGET memprof)
+
+  add_compiler_rt_runtime(clang_rt.memprof_cxx
+    STATIC
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    OBJECT_LIBS RTMemprof_cxx
+    CFLAGS ${MEMPROF_CFLAGS}
+    DEFS ${MEMPROF_COMMON_DEFINITIONS}
+    PARENT_TARGET memprof)
 
+  add_compiler_rt_runtime(clang_rt.memprof-preinit
+    STATIC
+    ARCHS ${MEMPROF_SUPPORTED_ARCH}
+    OBJECT_LIBS RTMemprof_preinit
+    CFLAGS ${MEMPROF_CFLAGS}
+    DEFS ${MEMPROF_COMMON_DEFINITIONS}
+    PARENT_TARGET memprof)
+
+  foreach(arch ${MEMPROF_SUPPORTED_ARCH})
+    if (UNIX)
+      add_sanitizer_rt_version_list(clang_rt.memprof-dynamic-${arch}
+                                    LIBS clang_rt.memprof-${arch} clang_rt.memprof_cxx-${arch}
+                                    EXTRA memprof.syms.extra)
+      set(VERSION_SCRIPT_FLAG
+          -Wl,--version-script,${CMAKE_CURRENT_BINARY_DIR}/clang_rt.memprof-dynamic-${arch}.vers)
+      set_property(SOURCE
+        ${CMAKE_CURRENT_BINARY_DIR}/dummy.cpp
+        APPEND PROPERTY
+        OBJECT_DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/clang_rt.memprof-dynamic-${arch}.vers)
+    else()
+      set(VERSION_SCRIPT_FLAG)
+    endif()
+
+    set(MEMPROF_DYNAMIC_WEAK_INTERCEPTION)
+
+    add_compiler_rt_runtime(clang_rt.memprof
+      SHARED
+      ARCHS ${arch}
+      OBJECT_LIBS ${MEMPROF_COMMON_RUNTIME_OBJECT_LIBS}
+              RTMemprof_dynamic
+              # The only purpose of RTMemprof_dynamic_version_script_dummy is to
+              # carry a dependency of the shared runtime on the version script.
+              # Replacing it with a straightforward
+              # add_dependencies(clang_rt.memprof-dynamic-${arch} clang_rt.memprof-dynamic-${arch}-version-list)
+              # generates an order-only dependency in ninja.
+              RTMemprof_dynamic_version_script_dummy
+              ${MEMPROF_DYNAMIC_WEAK_INTERCEPTION}
+      CFLAGS ${MEMPROF_DYNAMIC_CFLAGS}
+      LINK_FLAGS ${MEMPROF_DYNAMIC_LINK_FLAGS}
+                ${VERSION_SCRIPT_FLAG}
+      LINK_LIBS ${MEMPROF_DYNAMIC_LIBS}
+      DEFS ${MEMPROF_DYNAMIC_DEFINITIONS}
+      PARENT_TARGET memprof)
+
+    if (SANITIZER_USE_SYMBOLS)
+      add_sanitizer_rt_symbols(clang_rt.memprof_cxx
+        ARCHS ${arch})
+      add_dependencies(memprof clang_rt.memprof_cxx-${arch}-symbols)
+      add_sanitizer_rt_symbols(clang_rt.memprof
+        ARCHS ${arch}
+        EXTRA memprof.syms.extra)
+      add_dependencies(memprof clang_rt.memprof-${arch}-symbols)
+    endif()
+  endforeach()
+endif()
 
 if(COMPILER_RT_INCLUDE_TESTS)
   add_subdirectory(tests)
diff --git a/compiler-rt/lib/memprof/memprof_allocator.cpp b/compiler-rt/lib/memprof/memprof_allocator.cpp
index 60f5c853f9d76..5b3e18e8a750c 100644
--- a/compiler-rt/lib/memprof/memprof_allocator.cpp
+++ b/compiler-rt/lib/memprof/memprof_allocator.cpp
@@ -90,7 +90,13 @@ static int GetCpuId(void) {
   // will seg fault as the address of __vdso_getcpu will be null.
   if (!memprof_inited)
     return -1;
+#if SANITIZER_APPLE
+  // On Apple platforms, sched_getcpu is not available.
+  // TODO: better way to handle this?
+  return -100;
+#else
   return sched_getcpu();
+#endif
 }
 
 // Compute the timestamp in ms.
@@ -356,11 +362,15 @@ struct Allocator {
 
     InsertLiveBlocks();
     if (flags()->print_text) {
-      if (!flags()->print_terse)
-        Printf("Recorded MIBs (incl. live on exit):\n");
-      MIBMap.ForEach(PrintCallback,
-                     reinterpret_cast<void *>(flags()->print_terse));
-      StackDepotPrintAll();
+      if (flags()->dump_binary_access_profile_only)
+        DumpBinaryAccesses();
+      else {
+        if (!flags()->print_terse)
+          Printf("Recorded MIBs (incl. live on exit):\n");
+        MIBMap.ForEach(PrintCallback,
+                       reinterpret_cast<void *>(flags()->print_terse));
+        StackDepotPrintAll();
+      }
     } else {
       // Serialize the contents to a raw profile. Format documented in
       // memprof_rawprofile.h.
@@ -369,7 +379,11 @@ struct Allocator {
       __sanitizer::ListOfModules List;
       List.init();
       ArrayRef<LoadedModule> Modules(List.begin(), List.end());
-      u64 BytesSerialized = SerializeToRawProfile(MIBMap, Modules, Buffer);
+      u64 BytesSerialized = 0;
+      if (flags()->dump_binary_access_profile_only)
+        BytesSerialized = SerializeBinaryAccesses(Modules, Buffer);
+      else
+        BytesSerialized = SerializeToRawProfile(MIBMap, Modules, Buffer);
       CHECK(Buffer && BytesSerialized && "could not serialize to buffer");
       report_file.Write(Buffer, BytesSerialized);
     }
@@ -766,6 +780,18 @@ uptr memprof_malloc_usable_size(const void *ptr) {
   return usable_size;
 }
 
+uptr memprof_mz_size(const void *ptr) {
+  return instance.AllocationSize(reinterpret_cast<uptr>(ptr));
+}
+
+void memprof_mz_force_lock() SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
+  instance.ForceLock();
+}
+
+void memprof_mz_force_unlock() SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
+  instance.ForceUnlock();
+}
+
 } // namespace __memprof
 
 // ---------------------- Interface ---------------- {{{1
diff --git a/compiler-rt/lib/memprof/memprof_allocator.h b/compiler-rt/lib/memprof/memprof_allocator.h
index 6d898f06f7e42..11cf93fce7c08 100644
--- a/compiler-rt/lib/memprof/memprof_allocator.h
+++ b/compiler-rt/lib/memprof/memprof_allocator.h
@@ -20,13 +20,6 @@
 #include "sanitizer_common/sanitizer_allocator.h"
 #include "sanitizer_common/sanitizer_list.h"
 
-#if !defined(__x86_64__)
-#error Unsupported platform
-#endif
-#if !SANITIZER_CAN_USE_ALLOCATOR64
-#error Only 64-bit allocator supported
-#endif
-
 namespace __memprof {
 
 enum AllocType {
@@ -46,6 +39,7 @@ struct MemprofMapUnmapCallback {
   void OnUnmap(uptr p, uptr size) const;
 };
 
+#if SANITIZER_CAN_USE_ALLOCATOR64
 constexpr uptr kAllocatorSpace = ~(uptr)0;
 constexpr uptr kAllocatorSize = 0x40000000000ULL; // 4T.
 typedef DefaultSizeClassMap SizeClassMap;
@@ -64,6 +58,23 @@ template <typename AddressSpaceView>
 using PrimaryAllocatorASVT = SizeClassAllocator64<AP64<AddressSpaceView>>;
 using PrimaryAllocator = PrimaryAllocatorASVT<LocalAddressSpaceView>;
 
+#else // Fallback to SizeClassAllocator32.
+typedef CompactSizeClassMap SizeClassMap;
+template <typename AddressSpaceViewTy> struct AP32 {
+  static const uptr kSpaceBeg = 0;
+  static const u64 kSpaceSize = SANITIZER_MMAP_RANGE_SIZE;
+  static const uptr kMetadataSize = 0;
+  typedef __memprof::SizeClassMap SizeClassMap;
+  static const uptr kRegionSizeLog = 20;
+  using AddressSpaceView = AddressSpaceViewTy;
+  typedef MemprofMapUnmapCallback MapUnmapCallback;
+  static const uptr kFlags = 0;
+};
+template <typename AddressSpaceView>
+using PrimaryAllocatorASVT = SizeClassAllocator32<AP32<AddressSpaceView>>;
+using PrimaryAllocator = PrimaryAllocatorASVT<LocalAddressSpaceView>;
+#endif
+
 static const uptr kNumberOfSizeClasses = SizeClassMap::kNumClasses;
 
 template <typename AddressSpaceView>
@@ -101,6 +112,10 @@ int memprof_posix_memalign(void **memptr, uptr alignment, uptr size,
                            BufferedStackTrace *stack);
 uptr memprof_malloc_usable_size(const void *ptr);
 
+uptr memprof_mz_size(const void *ptr);
+void memprof_mz_force_lock();
+void memprof_mz_force_unlock();
+
 void PrintInternalAllocatorStats();
 
 } // namespace __memprof
diff --git a/compiler-rt/lib/memprof/memprof_flags.inc b/compiler-rt/lib/memprof/memprof_flags.inc
index 1d8e77752caa5..d2c5ebe000d7a 100644
--- a/compiler-rt/lib/memprof/memprof_flags.inc
+++ b/compiler-rt/lib/memprof/memprof_flags.inc
@@ -42,3 +42,5 @@ MEMPROF_FLAG(bool, print_terse, false,
              "if print_text = true.")
 MEMPROF_FLAG(bool, dump_at_exit, true,
              "If set, dump profiles when the program terminates.")
+MEMPROF_FLAG(bool, dump_binary_access_profile_only, false,
+             "Dump the binary access MemProf format only.")
diff --git a/compiler-rt/lib/memprof/memprof_interceptors.cpp b/compiler-rt/lib/memprof/memprof_interceptors.cpp
index f4d7fd46e6198..c87144900eace 100644
--- a/compiler-rt/lib/memprof/memprof_interceptors.cpp
+++ b/compiler-rt/lib/memprof/memprof_interceptors.cpp
@@ -18,7 +18,9 @@
 #include "memprof_stack.h"
 #include "memprof_stats.h"
 #include "sanitizer_common/sanitizer_libc.h"
+#if SANITIZER_POSIX
 #include "sanitizer_common/sanitizer_posix.h"
+#endif
 
 namespace __memprof {
 
diff --git a/compiler-rt/lib/memprof/memprof_interceptors.h b/compiler-rt/lib/memprof/memprof_interceptors.h
index 53d685706b849..5793df2c0ed45 100644
--- a/compiler-rt/lib/memprof/memprof_interceptors.h
+++ b/compiler-rt/lib/memprof/memprof_interceptors.h
@@ -40,6 +40,7 @@ DECLARE_REAL(char *, strncpy, char *to, const char *from, SIZE_T siz...
[truncated]

@snehasish
Copy link
Contributor

Thanks for sharing the draft, here are my thoughts --

  1. The Darwin sanitizer changes are numerous but as you noted, a matter of following what asan does today. I don't think there are any fundamental challenges to making it work (though I am not a CMake expert). For the commented out interceptors, we should be able to achieve the same outcome through the use of SANITIZER_APPLE macros. I noted some in the draft that you shared.

  2. Raw profile format: I suggest we incorporate the binary address data as a separate section in the existing format without any additional changes to the contents. I think this can be achieved with the following changes ---

// Format
// ---------- Header
// Magic
// Version                   <-- Increment version
// Total Size
// Segment Offset
// MIB Info Offset
// Stack Offset
// MemAddressOffset          <-- New field added to header
// ---------- Segment Info

... Existing Content ...
// ----------- MemAddress
NumMemBlockAddresses         <-- How many u64 fields to read (itself a u64) 
u64 MemBlockAddress1;
u64 MemBlockAddress2;

The segment entries can be shared. On ELF, the main binary entry is the first one by convention (first match in the PT_LOAD segment) though we can add additional handling for the main ".app" module if necessary. 

  1. llvm-profdata changes -- The pre-requisites of this step are the raw profile and the profiled binary. Here our output requirements are different. We would like to generate an indexed profile which includes static data addresses as part of the PGHO section. How about the following approach --
    a. Use llvm-profdata to convert the raw binary access profile to an indexed profile
    b. Use llvm-profdata show to dump the indexed profile in YAML format (incl. binary data)
    c. Use a python script to convert the YAML output to order file (not part of LLVM tooling)

The rationale for this approach is that we keep the llvm-profdata usage model the same, i.e. merges raw profiles to produce indexed profiles and provides utilities to show and convert file formats. This also avoids teaching llvm-profdata about order file formats. For (a) above, Mingming has already implemented the ability to specify data access profiles using symbols or content hashes so it should be straight forward to implement.

  1. Shadow memory granularity -- Yes, we should reuse the histogram granularity which was added by @mattweingarten . The "memprof-histogram" flag controls the shadow mapping during instrumentation. We'll need to refactor it a bit to decouple the shadow mapping from the histogram implementation. Adding a new flag to imply the finer granularity without enabling histogram collection seems like a reasonable approach. Emitting full histogram information results in very large raw profiles so it's best to keep them decoupled.

Let me know your thoughts and how you would like to proceed. Thanks!

@SharonXSharon
Copy link
Contributor Author

@snehasish thanks for the detailed reply! I will take a closer week through the week and get back to you.

Btw a quick question, our sister team is actually also interested in supporting memprof for Android platform, i.e. getting the access profile of the Android native libraries (inside an android app apk) during app startup , and then order the data symbols (on ELF i believe it's rodata section) for startup performance win.

The ordering part I believe they should be able to reuse some of @mingmingl-llvm's work, and the patches I just landed for cstring ordering in lld;
but for supporting memprof for android platform, although it should be able to use most of the current linux configurations, android does seem to require some specialization by looking at Asan support for android. Given it's android, wondering is there anyone in Google actually also interested in or working on this?
Thanks
Sharon

@snehasish
Copy link
Contributor

@snehasish thanks for the detailed reply! I will take a closer week through the week and get back to you.

Btw a quick question, our sister team is actually also interested in supporting memprof for Android platform, i.e. getting the access profile of the Android native libraries (inside an android app apk) during app startup , and then order the data symbols (on ELF i believe it's rodata section) for startup performance win.

The ordering part I believe they should be able to reuse some of @mingmingl-llvm's work, and the patches I just landed for cstring ordering in lld; but for supporting memprof for android platform, although it should be able to use most of the current linux configurations, android does seem to require some specialization by looking at Asan support for android. Given it's android, wondering is there anyone in Google actually also interested in or working on this? Thanks Sharon

Can you share the improvement in startup time observed on MachO? I can reach out to the Android toolchain team to gauge interest.

@SharonXSharon
Copy link
Contributor Author

Can you share the improvement in startup time observed on MachO? I can reach out to the Android toolchain team to gauge interest.

@snehasish thanks!

The improvement in startup performance we observed for iOS apps (MachO) are,

  • socialApp1: for cold starts, we reduced 80ms (3%) of the entire app startup time (time to network content) on average; 270ms (6%) reduction p95. From another perspective, we reduced 5% of the total major page faults for the entire app startup phase.
  • socialApp2: for cold starts, we reduced 54ms from app startup on average, which is around 12% of the app's premain startup time.

@snehasish
Copy link
Contributor

Can you share the improvement in startup time observed on MachO? I can reach out to the Android toolchain team to gauge interest.

@snehasish thanks!

The improvement in startup performance we observed for iOS apps (MachO) are,

  • socialApp1: for cold starts, we reduced 80ms (3%) of the entire app startup time (time to network content) on average; 270ms (6%) reduction p95. From another perspective, we reduced 5% of the total major page faults for the entire app startup phase.
  • socialApp2: for cold starts, we reduced 54ms from app startup on average, which is around 12% of the app's premain startup time.

Great. For my understanding, the baseline app includes some form of PGO e.g. temporal profiling? These numbers are purely from static data layout on top of the baseline?

@SharonXSharon
Copy link
Contributor Author

Great. For my understanding, the baseline app includes some form of PGO e.g. temporal profiling? These numbers are purely from static data layout on top of the baseline?

yes the baseline already includes the temporal profiling thus function ordering; the win above is purely from ordering the static data in the app binaries.
In fact, the win theoretically should be even higher than this, because in MachO binaries, some data symbol names are not unique at this moment, like the objc metadata symbols, so we are not ordering those at this moment. We are working on uniquing them so the order_file can identify them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category compiler-rt PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants