Skip to content

[docs][IRPGO]Document two binary formats for instrumentation-based profiles, with a focus on IRPGO. #76105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 11, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Dec 20, 2023

Github review tool renders the rich diff (example)

@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review December 20, 2023 22:34
@mingmingl-llvm
Copy link
Contributor Author

mingmingl-llvm commented Dec 20, 2023

Please kindly add who might be interested in review, and feedbacks are very welcome, thanks!

@mingmingl-llvm mingmingl-llvm added the PGO Profile Guided Optimizations label Dec 20, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 20, 2023

@llvm/pr-subscribers-pgo

Author: Mingming Liu (minglotus-6)

Changes

A preview of the HTML is in this link.


Full diff: https://github.com/llvm/llvm-project/pull/76105.diff

2 Files Affected:

  • (added) llvm/docs/PGOProfileFormat.rst (+387)
  • (modified) llvm/docs/UserGuides.rst (+4)
diff --git a/llvm/docs/PGOProfileFormat.rst b/llvm/docs/PGOProfileFormat.rst
new file mode 100644
index 00000000000000..5602172e147f00
--- /dev/null
+++ b/llvm/docs/PGOProfileFormat.rst
@@ -0,0 +1,387 @@
+=====================
+IRPGO Profile Format
+=====================
+
+.. contents::
+   :local:
+
+
+Overview
+==========
+
+IR-based instrumentation (IRPGO) and its context-sensitive variant (CS-IRPGO)
+inserts `llvm.instrprof.*` `code generator intrinsics <https://llvm.org/docs/LangRef.html#code-generator-intrinsics>`_
+in LLVM IR to generate profiles. This document describes two binary profile
+formats (raw and indexed) used by IR-based instrumentation.
+
+.. note::
+
+  Both the compiler-rt profiling infrastructure and profile format are general
+  and could support other use cases (e.g., coverage and temporal profiling).
+  This document will focus on IRPGO while briefly introducing other use cases
+  with pointers.
+
+Raw PGO Profile Format
+========================
+
+The raw PGO profile is generated by running the instrumented binary. It is a
+memory dump of the profile data.
+
+Two kinds of frequently used profile information are function's basic block
+counters and its (various flavors of) value profiles. A function's profiled
+information span across several sections in the profile.
+
+General Storage Layout
+-----------------------
+
+A raw profile for an executable [1]_ consists of a profile header and several
+sections. The storage layout is illustrated below. Generally, when raw profile
+is read into an memory buffer, the actual byte offset of a section is inferred
+from the section's order in the layout and size information of all sections
+ahead of it.
+
+::
+
+  +----+-----------------------+
+  |    |        Magic          |
+  |    +-----------------------+
+  |    |        Version        |
+  |    +-----------------------+
+  H    |   Size Info for       |
+  E    |      Section 1        |
+  A    +-----------------------+
+  D    |   Size Info for       |
+  E    |      Section 2        |
+  R    +-----------------------+
+  |    |          ...          |
+  |    +-----------------------+
+  |    |   Size Info for       |
+  |    |      Section N        |
+  +----+-----------------------+
+  P    |       Section 1       |
+  A    +-----------------------+
+  Y    |       Section 2       |
+  L    +-----------------------+
+  O    |          ...          |
+  A    +-----------------------+
+  D    |       Section N       |
+  +----+-----------------------+
+
+
+.. note::
+   Sections might be padded to meet platform-specific alignment requirements.
+   For simplicity, header fields and data sections solely for padding purpose
+   are omitted in the data layout graph above and the rest of this document.
+
+Header
+-------
+
+``Magic``
+  With the magic number, data consumer could detect profile format and
+  endianness of the data, and quickly tells whether/how to continue reading.
+
+``Version``
+  The lower 32 bits specifies the actual version and the most significant 32
+  bits specify the variant types of the profile. IRPGO and CS-IRPGO are two
+  variant types.
+
+``BinaryIdsSize``
+  The byte size of binary id section.
+
+``NumData``
+  The number of per-function profile data control structures. The byte size of
+  profile data section could be computed with this field.
+
+``NumCounter``
+  The number of entries in the profile counter section. The byte size of counter
+  section could be computed with this field.
+
+``NumBitmapBytes``
+  The number of bytes in the profile bitmap section.
+
+``NamesSize``
+  The number of bytes in the name section.
+
+``CountersDelta``
+  Records the in-memory address difference between the data and counter section,
+  i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`. It's used jointly
+  with the in-memory address difference of profile data record and its counter
+  to find the counter of a profile data record. Check out calculation-of-counter-offset_
+  for details.
+
+``BitmapDelta``
+  Records the in-memory address difference between the data and bitmap section,
+  i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`. It's used jointly
+  with the in-memory address difference of a profile data record and its bitmap
+  to find the bitmap of a profile data record, in a similar to how counters are
+  referenced as explained by calculation-of-counter-offset_ .
+
+``NamesDelta``
+  Records the in-memory address of compressed name section. Not used except for
+  raw profile reader error checking.
+
+``ValueKindLast``
+  Records the number of value kinds. As of writing, two kinds of value profiles
+  are supported. `IndirectCallTarget` is to profile the frequent callees of
+  indirect call instructions and `MemOPSize` is for memory intrinsic function
+  size profiling.
+
+  The number of value kinds affects the byte size of per function profile data
+  control structure.
+
+Payload Sections
+------------------
+
+Binary Ids
+^^^^^^^^^^^
+Stores the binary ids of the instrumented binaries to associate binaries with
+profiles for source code coverage. See `Binary Id RFC`_ for introduction.
+
+.. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
+
+Profile Data
+^^^^^^^^^^^^^
+
+This section stores per-function profile data control structure. The in-memory
+representation of the control structure is `__llvm_profile_data` and the fields
+are defined by `INSTRPROFDATA` macro. Some fields are used to reference data
+from other sections in the profile. The fields are documented as follows:
+
+``NameRef``
+  The MD5 of the function's IRPGO name. IRPGO name has the format
+  `[<filepath>;]<linkage-name>` where `<filepath>;` is provided for local-linkage
+  functions to tell possibly identical function names.
+
+``FuncHash``
+  A fingerprint of the function's control flow graph.
+
+``CounterPtr``
+  The in-memory address difference between profile data and its corresponding counters.
+
+``BitmapPtr``
+  The in-memory address difference between profile data and its bitmap.
+
+``FunctionPointer``
+  Records the function address when instrumented binary runs. This is used to
+  map the profiled callee address of indirect calls to the `NameRef` during
+  conversion from raw to indexed profiles.
+
+``Values``
+  Represents value profiles in a two dimensional array. The number of elements
+  in the first dimension is the number of instrumented value sites across all
+  kinds. Each element in the first dimension is the head of a linked list, and
+  the each element in the second dimension is linked list element, carrying
+  `<profiled-value, count>` as payload. This is used by compiler runtime when
+  writing out value profiles.
+
+``NumCounters``
+  The number of counters for the instrumented function.
+
+``NumValueSites``
+  This is an array of counters, and each counter represents the number of
+  instrumented sites for a kind of value in the function.
+
+``NumBitmapBytes``
+  The number of bitmap bytes for the function.
+
+Profile Counters
+^^^^^^^^^^^^^^^^^
+
+For IRPGO [2]_, the counters within an instrumented function are stored contiguously
+and in an order that is consistent with basic block selection in the instrumentation
+pass.
+
+.. _calculation-of-counter-offset:
+
+So how are function counters associated with a function?
+
+Basically, the profile reader iterates per-function control structure (from the
+profile data section) and makes use of the recorded relative distances, as
+illustrated below.
+
+::
+
+        + --> start(__llvm_prf_data) --> +---------------------+ ------------+
+        |                                |       Data 1        |             |
+        |                                +---------------------+  =====||    |
+        |                                |       Data 2        |       ||    |
+        |                                +---------------------+       ||    |
+        |                                |        ...          |       ||    |
+ Counter|                                +---------------------+       ||    |
+  Delta |                                |       Data N        |       ||    |
+        |                                +---------------------+       ||    |   CounterPtr1
+        |                                                              ||    |
+        |                                              CounterPtr2     ||    |
+        |                                                              ||    |
+        |                                                              ||    |
+        + --> start(__llvm_prf_cnts) --> +---------------------+       ||    |
+                                         |        ...          |       ||    |
+                                         +---------------------+  -----||----+
+                                         |      Counter 1      |       ||
+                                         +---------------------+       ||
+                                         |        ...          |       ||
+                                         +---------------------+  =====||
+                                         |      Counter 2      |
+                                         +---------------------+
+                                         |        ...          |
+                                         +---------------------+
+                                         |      Counter N      |
+                                         +---------------------+
+
+
+In the graph,
+
+* The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
+  We will call it `CounterDeltaInitVal` below for convenience.
+* For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.
+
+Each time the reader advances to the next data record, it updates `CounterDelta` to minus the size of one `ProfileData`.
+
+For the counter corresponding to the first data record, the byte offset
+relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`.
+When profile reader advances to the second data record, note `CounterDelta` is now `CounterDeltaInitVal - sizeof(ProfileData)`.
+Thus the byte offset relative to the start of the counter section is calculated as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.
+
+Bitmap
+^^^^^^^
+This section is used for source-based MC/DC code coverage. Check out `Bitmap RFC`_
+if interested.
+
+.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244
+
+Names
+^^^^^^
+
+This section contains the concatenated string of function IRPGO names. If
+compressed, zlib compression algorithm is used.
+
+Function names serve as keys in the PGO data hash table when raw profiles are
+converted into indexed profiles. They are also crucial for `llvm-profdata` to
+show the profiles in a human-readable way.
+
+Value Profile Data
+^^^^^^^^^^^^^^^^^^^^
+
+This section contains the profile data for value profiling.
+
+The value profiles corresponding to a profile data are serialized contiguously
+as one record, and value profile records are stored in the same order as the
+respective profile data, such that a raw profile reader advances the pointer to
+profile data and the pointer to value profile records simutaneously [3]_ to find
+value profiles for a per function, per cfg fingerprint profile data.
+
+Indexed PGO Profile Format
+===========================
+
+General Storage Layout
+-----------------------
+
+::
+
+                            +-----------------------+---+
+                            |        Magic          |   |
+                            +-----------------------+   |
+                            |        Version        |   |
+                            +-----------------------+   |
+                            |        HashType       |   H
+                            +-----------------------+   E
+                    +-------|       HashOffset      |   A
+                    |       +-----------------------+   D
+                +-----------|     MemProfOffset     |   E
+                |   |       +-----------------------+   R
+                |   |       |     BinaryIdOffset    |   |
+                |   |       +-----------------------+   |
+            +---------------|      TemporalProf-    |   |
+            |   |   |       |      TracesOffset     |   |
+            |   |   |       +-----------------------+---+
+            |   |   |       |   Profile Summary     |   |
+            |   |   |       +-----------------------+   P
+            |   |   +------>|  Function PGO data    |   A
+            |   |           +-----------------------+   Y
+            |   +---------- |  MemProf profile data |   L
+            |               +-----------------------+   O
+            |               |    Binary Ids         |   A
+            |               +-----------------------+   D
+            +-------------->|  Temporal profiles    |   |
+                            +-----------------------+---+
+
+Header
+--------
+
+``Magic``
+  The purpose of the magic number is to be able to quickly tell if the profile
+  is an indexed profile.
+
+``Version``
+  Similar to raw profile version, the lower 32 bits specifies the version of the
+  indexed profile and the most significant 32 bits are reserved to specify the
+  variant types of the profile.
+
+``HashType``
+  The hashing scheme for on-disk hash table keys. Only MD5 hashing is used as of
+  writing.
+
+``HashOffset``
+  An on-disk hash table stores the per-function profile records.
+  Precisely speaking, `HashOffset` records the offset of this hash table's
+  metadata (i.e., the number of buckets and entries), which follows right after
+  the payload of the entire hash table.
+
+``MemProfOffset``
+  Records the byte offset of MemProf profiling data.
+
+``BinaryIdOffset``
+  Records the byte offset of binary id sections.
+
+``TemporalProfTracesOffset``
+  Records the byte offset of temporal profiles.
+
+Payload Sections
+------------------
+
+(CS) Profile Summary
+^^^^^^^^^^^^^^^^^^^^^
+This section is right after profile header. It stores the serialized profile
+summary. For context-sensitive IRPGO, this section stores an additional profile
+summary corresponding to the context-sensitive profiles.
+
+Function PGO data
+^^^^^^^^^^^^^^^^^^
+This section stores functions and their PGO profiling data as an on-disk hash
+table. The key of a hash table entry is function's PGO name, and the in-memory
+representation of value is a map. The key of this map is CFG hash, and the value
+is C++ struct `llvm::InstrProfRecord`. The C++ struct collects the profiling
+information like counters and value profiles.
+
+MemProf Profile data
+^^^^^^^^^^^^^^^^^^^^^^
+This section stores function's memory profiling data. See
+`MemProf binary serialization format RFC`_ for the design.
+
+.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
+
+Binary Ids
+^^^^^^^^^^^^^^^^^^^^^^
+The section to carry on binary-id information from raw profiles.
+
+Temporal Profile Traces
+^^^^^^^^^^^^^^^^^^^^^^^^
+The section to carry on temporal profile information from raw profiles.
+See `Temporal profiling RFC`_ for an overview.
+
+.. _`Temporal profiling RFC`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
+
+Profile Data Usage
+=======================================
+
+`llvm-profdata` is the command line tool to display and process profile data.
+For supported usages, check out its `documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
+
+
+.. [1] A raw profile file could contain multiple raw profiles. Raw profile
+   reader could parse all raw profiles from the file correctly.
+.. [2] The counter section is used by a few variant types (like coverage and
+   temporal profiling) and might have different semantics there.
+.. [3] The step size of data pointer is the `sizeof(ProfileData)`, and the step
+   size of value profile pointer is calcuated based on the number of collected
+   values.
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 006df613bc5e7d..14a2e161ea54cb 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -58,6 +58,7 @@ intermediate LLVM representation.
    NVPTXUsage
    Phabricator
    Passes
+   PGOProfileFormat
    ReportingGuide
    ResponseGuide
    Remarks
@@ -177,6 +178,9 @@ Optimizations
    referencing, to determine variable locations for debug info in the final
    stages of compilation.
 
+:doc:`PGOProfileFormat`
+   This document explains two binary formats of IRPGO profiles.
+
 Code Generation
 ---------------
 

@@ -0,0 +1,387 @@
=====================
IRPGO Profile Format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRPGO --> Instrumentation PGO. Note that Frontend PGO uses the same format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Overview
==========

IR-based instrumentation (IRPGO) and its context-sensitive variant (CS-IRPGO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrumentation PGO (both IR based and Frontend based).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and removed IRPGO only terms like (LLVM IR, basic block counters) from the doc.


.. note::

Both the compiler-rt profiling infrastructure and profile format are general
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage test uses (frontend) PGO instrumentation and coverage mapping. The format for coverageMap is not included in this document. Similarly the temporal profiling is not covered here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded based on my understanding that "frontend PGO instrumentation profiles have two use cases, PGO and source coverage" and the input that coverage mapping has its own format. PTAL.

The raw PGO profile is generated by running the instrumented binary. It is a
memory dump of the profile data.

Two kinds of frequently used profile information are function's basic block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instrumented binary currently collects two kinds of profile data: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

memory dump of the profile data.

Two kinds of frequently used profile information are function's basic block
counters and its (various flavors of) value profiles. A function's profiled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile data for a function can span ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to "The profile data for a function span across several sections in the profile", given the control structure and counters are in two sections.

General Storage Layout
-----------------------

A raw profile for an executable [1]_ consists of a profile header and several
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also shared libary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

-----------------------

A raw profile for an executable [1]_ consists of a profile header and several
sections. The storage layout is illustrated below. Generally, when raw profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the raw profile ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


``Magic``
With the magic number, data consumer could detect profile format and
endianness of the data, and quickly tells whether/how to continue reading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 'quickly'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

referenced as explained by calculation-of-counter-offset_ .

``NamesDelta``
Records the in-memory address of compressed name section. Not used except for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be uncompressed too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "compressed" as whether compressed or not is not very important for the documentation of this field.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name should probably be updated to InstrumentationPGOProfileFormat . Will wait and do this later as one-off to minimize diff..

@@ -0,0 +1,387 @@
=====================
IRPGO Profile Format
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Overview
==========

IR-based instrumentation (IRPGO) and its context-sensitive variant (CS-IRPGO)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Overview
==========

IR-based instrumentation (IRPGO) and its context-sensitive variant (CS-IRPGO)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and removed IRPGO only terms like (LLVM IR, basic block counters) from the doc.


.. note::

Both the compiler-rt profiling infrastructure and profile format are general
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded based on my understanding that "frontend PGO instrumentation profiles have two use cases, PGO and source coverage" and the input that coverage mapping has its own format. PTAL.

memory dump of the profile data.

Two kinds of frequently used profile information are function's basic block
counters and its (various flavors of) value profiles. A function's profiled
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to "The profile data for a function span across several sections in the profile", given the control structure and counters are in two sections.

General Storage Layout
-----------------------

A raw profile for an executable [1]_ consists of a profile header and several
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

-----------------------

A raw profile for an executable [1]_ consists of a profile header and several
sections. The storage layout is illustrated below. Generally, when raw profile
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


``Magic``
With the magic number, data consumer could detect profile format and
endianness of the data, and quickly tells whether/how to continue reading.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

referenced as explained by calculation-of-counter-offset_ .

``NamesDelta``
Records the in-memory address of compressed name section. Not used except for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "compressed" as whether compressed or not is not very important for the documentation of this field.

@@ -0,0 +1,395 @@
===================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps name this ProfileFormat.rst, since so much is shared with pure code coverage applications. It's fine that the doc currently focuses on PGO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps name this ProfileFormat.rst, since so much is shared with pure code coverage applications.

Ack. I wonder if we want to use InstrumentationProfileFormat.rst since SamplePGO uses different format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstrProfileFormat.rst sounds good to me.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated filename to InstrProfileFormat.rst in a standalone local commit

The actual changes should be visible in the commit right before it


::

+-----------------------+---+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a comment in the code to update this documentation when the format changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment close to the header definition for both raw and indexed profiles.

Uses link https://llvm.org/docs/InstrProfileFormat.html assuming file name will be InstrProfileFormat.rst.

We will call it `CounterDeltaInitVal` below for convenience.
* For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.

Each time the reader advances to the next data record, it updates `CounterDelta`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to the code (at a certain commit)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


.. _`documentation`: https://llvm.org/docs/CoverageMappingFormat.html

Raw Profile Format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlight compatibility guarantees of Raw Profile Format.
Also mention endianness of raw profile data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned version compatibility guarantees for raw and indexed format.

And mention the endianness where Magic field for raw profile header is documented, since the Magic field is used by raw profile reader to decide whether to swap bytes.

Relatedly, created #76312 to fix one issue related with endiannness.

===================

The raw profile is generated by running the instrumented binary. It is a memory
dump of the profile data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/profile counters/profile data/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the comment means to say 's/profile data/profile counters'?

Nevertheless, I revised this to The raw profile data from an executable or a shared library consists of a header and multiple sections, with each section as a memory dump. The profile raw data needs to be reasonably compact and fast to generate.. PTAL.

identical functions.

``FuncHash``
A fingerprint of the function's control flow graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes CFG plus some more stuff (memory ops I think). Can you put in a link to the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly reworded and added a link to computeCFGHash


Bitmap
^^^^^^^
This section is used for source-based MC/DC code coverage. Check out `Bitmap RFC`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand MC/DC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Overview
=========

Instrumentation PGO inserts `llvm.instrprof.*` `code generator intrinsics`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

``FuncHash``
A fingerprint of the function's control flow graph.

``CounterPtr``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the relative distance (offset) in bytes between the function counter and the start of the counter section>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is correct. (My comment was based on old implementation before recent changes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent binary profile correlation effort from @ZequanWu , CounterPtr records the address of counters if I'm reading correctly.

I updated the documentation to point out fields that might have different ways of interpretation. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CounterPtr is still a relative address (__profc_foo - __profd_foo) in default mode. Under binary profile correlation mode, it will just be the absolute address of the counter __profc_foo.


Basically, the profile reader iterates per-function control structure (from the
profile data section) and makes use of the recorded relative distances, as
illustrated below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is clearer to use an equation: CounterOffset(Func) = Data(Func).CounterPtr + Counter_Delta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some equation below.


Function PGO data
^^^^^^^^^^^^^^^^^^
This section stores functions and their PGO profiling data as an on-disk hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profile data for functions with the same name are grouped together and share one hash table entry (the functions may come from different shared libraries for instance). The profile data for them are organized as a sequence of key-value pair where the key is the funcHash (CFG based for IR PGO), and the value is profile counters for the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this.

* Some rewording as suggseted.
* Add link to code at a specific commit at a few places.
* Mention it explicitly when fields might have a different semantics in
  non-IRPGO case.
* Mention version compatibility guarantees explicitly for both formats,
  and add more details on endianness handling for raw profiles.
* Add code comment to ask for doc update if appropriate.
Some fields are used to reference data from other sections in the profile.
The fields are documented as follows:

.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/lib/profile/InstrProfiling.h#L25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/* INSTR_PROF_DATA start. */
/* Definition of member fields of the per-function control structure. */
#ifndef INSTR_PROF_DATA
#define INSTR_PROF_DATA(Type, LLVMType, Name, Initializer)
#else
#define INSTR_PROF_DATA_DEFINED
#endif
INSTR_PROF_DATA(const uint64_t, llvm::Type::getInt64Ty(Ctx), NameRef, \
ConstantInt::get(llvm::Type::getInt64Ty(Ctx), \
IndexedInstrProf::ComputeHash(getPGOFuncNameVarInitializer(Inc->getName()))))
INSTR_PROF_DATA(const uint64_t, llvm::Type::getInt64Ty(Ctx), FuncHash, \
ConstantInt::get(llvm::Type::getInt64Ty(Ctx), \
Inc->getHash()->getZExtValue()))
INSTR_PROF_DATA(const IntPtrT, IntPtrTy, CounterPtr, RelativeCounterPtr)
INSTR_PROF_DATA(const IntPtrT, IntPtrTy, BitmapPtr, RelativeBitmapPtr)
/* This is used to map function pointers for the indirect call targets to
* function name hashes during the conversion from raw to merged profile
* data.
*/
INSTR_PROF_DATA(const IntPtrT, llvm::PointerType::getUnqual(Ctx), FunctionPointer, \
FunctionAddr)
INSTR_PROF_DATA(IntPtrT, llvm::PointerType::getUnqual(Ctx), Values, \
ValuesPtrExpr)
INSTR_PROF_DATA(const uint32_t, llvm::Type::getInt32Ty(Ctx), NumCounters, \
ConstantInt::get(llvm::Type::getInt32Ty(Ctx), NumCounters))
INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \
ConstantArray::get(Int16ArrayTy, Int16ArrayVals)) \
INSTR_PROF_DATA(const uint32_t, llvm::Type::getInt32Ty(Ctx), NumBitmapBytes, \
ConstantInt::get(llvm::Type::getInt32Ty(Ctx), NumBitmapBytes))
#undef INSTR_PROF_DATA
/* INSTR_PROF_DATA end. */
might be a better link as it shows the stuff inside profile data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the link, thanks!

+ --> start(__llvm_prf_cnts) --> +---------------------+ || |
| ... | || |
+---------------------+ -----||----+
| Counter 1 | ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "Counter 1", I think you mean the array of counters for data 1. Maybe rename to "Counters for Data 1" for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

.. _calculation-of-counter-offset:

As mentioned above, the recorded counter offset is relative to the profile metadata.
So how are function counters associated with the profiled function?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change the question to : "how are function counters located in the raw profile data"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


* The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
We will call it `CounterDeltaInitVal` below for convenience.
* For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter --> CounterN, ProfileData --> ProfileDataN.

Also describe that DataN is the N th entry in __llvm_prf_data, and CounterN is the corresponding profile counters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@mingmingl-llvm
Copy link
Contributor Author

thanks for all the reviews! I'll wait a little bit more for review comments, and plan to submit it in early Wednesday if the pull request looks good.

p.s. I'm likely late in the party.. just realized Github UI displays rich diff (like https://github.com/llvm/llvm-project/pull/76105/files?short_path=b1805ae#diff-b1805ae3bd5b5cf0c69249b7df329ef8dd7cbfe322ae004c61f4bd507d2a87e6) with a click for review, which means the workaround is not necessary.

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with some minor nits.


* The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
We will call it `CounterDeltaInitVal` below for convenience.
* For each profile data record `ProileDataN`, `CounterPtr` is recorded as `start(CounterN) - start(ProfileDataN)`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: ProfileDataN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch. Fixed it.

===========================

Indexed profiles are generated from `llvm-profdata`. In the indexed profiles,
function PGO data are organized as on-disk hash table such that compilers could
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/could/can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

function PGO data are organized as on-disk hash table such that compilers could
look up PGO data for functions in an IR module.

Compilers and tools must retain backward compatibility with indexed PGO profiles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spell it out? i.e. "older profiles must be readable by newer tools" or something like that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


Binary Ids
^^^^^^^^^^^^^^^^^^^^^^
The section is used to carry on binary-id information from raw profiles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

1. Use <mangled-name> in IRPGO name format given the recent fix in pull
   request 76994.
2. In UserGuides.html, use 'instrumentation-based profiles' (not IRPGO
   profiles) to keep consistent with filename.
@mingmingl-llvm mingmingl-llvm changed the title [docs][IRPGO]Document two binary formats for IRPGO profiles [docs][IRPGO]Document two binary formats for instrumentation-based profiles, with a focus on IRPGO. Jan 9, 2024
@@ -123,6 +123,8 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \

/* INSTR_PROF_RAW_HEADER start */
/* Definition of member fields of the raw profile header data structure. */
/* Please update https://llvm.org/docs/InstrProfileFormat.html as appropriate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm/docs/InstrProfileFormat.rst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


The raw profile is generated by running the instrumented binary. The raw profile
data from an executable or a shared library [3]_ consists of a header and
multiple sections, with each section as a memory dump. The profile raw data needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw profile data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558

To feed profiles back into compilers for an optimized build (e.g., via
`-fprofile-use` for IR instrumentation), a raw profile must to be converted into
Copy link
Member

@MaskRay MaskRay Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


``CountersDelta``
This field records the in-memory address difference between the `profile metadata`_
and counter section in the instrumented binary, i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

Consider a regex that searches for a single backtick that is not followed by a _. There is a high probability that double backticks should be used instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

and counter section in the instrumented binary, i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.

It's used jointly with the `CounterPtr`_ field to compute the counter offset
relative to `start(__llvm_prf_cnts)`. Check out calculation-of-counter-offset_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

for a visualized explanation.

.. note::
Instrumentations might not load the `__llvm_prf_data` object file section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, and slightly reworded the sentence.


``BitmapDelta``
This field records the in-memory address difference between the `profile metadata`_
and bitmap section in the instrumented binary, i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


.. note::
Frontend-generated profiles are used together with coverage mapping for
`source based code coverage`_. The `coverage mapping format`_ is different from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source-based

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

``Magic``
Magic number encodes profile format (raw, indexed or text). For the raw format,
the magic number also encodes the endianness (big or little) and C pointer
byte size (32 or 64) of the platform on which the profile is generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"C pointer byte size (32 or 42)" can be misleading, 32/64 bytes?

Consider using C pointer size or just pointer size.

... encodes the endianness (big or little) and the pointer size (4 or 8 bytes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

a platform with the opposite endianness and/or the other C pointer byte size.

``Version``
The lower 32 bits specifies the actual version and the most significant 32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (here and below for indexed profile version field)


.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457

Indexed PGO Profile Format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexed Profile Format

Since this is shared with coverage, omitting "PGO" seems clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (here and in a couple of places below)

| | | | +-----------------------+---+
| | | | | Profile Summary | |
| | | | +-----------------------+ P
| | +------>| Function PGO data | A
Copy link
Member

@MaskRay MaskRay Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Data

(omit "PGO")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

summary. For context-sensitive IR-based instrumentation PGO, this section stores
an additional profile summary corresponding to the context-sensitive profiles.

Function PGO data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Data

This is shared with coverage, so we probably want to de-emphasize "PGO".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -58,6 +58,7 @@ intermediate LLVM representation.
NVPTXUsage
Phabricator
Passes
InstrProfileFormat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is alphabetically ordered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

@mingmingl-llvm mingmingl-llvm merged commit 66981f9 into llvm:main Jan 11, 2024
@mingmingl-llvm mingmingl-llvm deleted the pgoformat branch January 11, 2024 00:17
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants