Skip to content

ADR for exposing MCAD observability metrics #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions PCF-ADR-0008-mcad-observ-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Project CodeFlare - MCAD Custom Metrics Emission


| | |
| -------------- | --------------------------------------------------------------------------------- |
| Date | 10/03/2023 |
| Scope | |
| Status | implementable |
| Authors | [Ronen Schaffer](@ronensc), [Rachel Brill](@rachelt44), [Eran Raichstein](@eranra)|
| Supersedes | N/A |
| Superseded by: | N/A |
| Issues | |
| Other docs: | none |

## What

Emit MCAD custom metrics such as total allocatable CPU, GPU and memory.

## Why

MCAD custom metrics information is important for enabling generation of an overall up-to-date observablity view of the running app wrappers and connecting to other stack layers.
This will align MCAD with other existing components in OCP that expose metrics in Prometheus format and allow the collection and building of cross-component context.

## Goals

* Emit MCAD custom metrics

## Non-Goals

The following are not included in this ADR:
* Emit metrics of other components
* Connect MCAD metrics to metrics of other components

## How

Register collected metrics with the runtime controller of the CodeFlare Operator. The metrics will be exposed in standard Prometheus format.


## Alternatives

Given the CodeFlare operator re-design that enables off-the-shelf exposure of metrics, we have not currently considered any alternatives.
Currently, the MCAD dashboard relies on external components to report the information and there is no guaranty that what MCAD is using aligns to that information.


## Stakeholder Impacts

| Group | Key Contacts | Date | Impacted? |
| ---------------------- | --------------------------------------| ---- | --------- |
| CodeFlare Operator | Anish Asthana, Antonin Stefanutti | | yes |
| CodeFlare SDK | Mustafa Eyceoz, Dimitri Saridakis | | no |
| Dashboard | Mohammed Abdi | | yes |
| MCAD | Abhishek Malvankar, Antonin Stefanutti| | yes |


## References



## Reviews

Reviews on the pull request will suffice for the approval process. At least 2 approvals are required prior to this ADR being merged. The ADR must also remain open for at least one week.