Skip to content

[SL-30] Updates to docs site #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

[SL-30] Updates to docs site #20

wants to merge 8 commits into from

Conversation

o11yguru
Copy link
Contributor

Updating docs site

@o11yguru o11yguru force-pushed the v0.8 branch 3 times, most recently from 04237d0 to a098274 Compare June 16, 2025 20:58
@o11yguru o11yguru marked this pull request as ready for review June 16, 2025 22:02
Robin-Law and others added 2 commits June 16, 2025 17:49
Minor tweaks for readability.
Copy link
Contributor

@textractor textractor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed all of the files except the AWS ones.

helm repo update
```

5. Install MDAI dependencies via Helm chart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about this step. It doesn't give the command and is followed by a section on AWS and other stuff. Then we get step 6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you got an answer for this yet, but it is a little confusing.

  • The red box on the screenshot is the command. What is being said is you can set up with or without self observability or AWS however we recommend you do.
  • For the sake of the quick start, it may be good for us to put a note about if you do not set up AWS, you will see an error on the mdai-s3-log-reader, like it does on the 0.8 branch of the mdai-labs, but the the core of mdai will still work even with the error.

Screenshot 2025-06-24 at 9 11 51 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual install from 0.8 on mdai-labs may help as well.


6. Verify that the cluster's pods are running.
```
kubectl get pods -n mdai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I did this step, I got a couple of pods that couldn't be created:

mdai-s3-logs-reader-84b4659569-kvl7r 0/1 CreateContainerConfigError 0 28m
mdai-s3-logs-reader-9dc9c8764-dhzg2 0/1 CreateContainerConfigError 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually directly related to my comment above. Without setting up AWS credentials, you will receive this error, but everything should still work.


1. Apply the configuration to the hub resource.
```
kubectl apply -f ./mdai/hub/0.8/hub_guaranteed_working.yaml -n mdai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a warning that I've not seen before: "Warning: ObserverResource observer-collector does not define a replica count" -- should we say something about ignoring that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ignore this for now. I will try getting you some info on this.


#### Install MDAI

***Note**: Make sure you can access `values.yaml` your working directory. You have have to clone the `mdai-helm-chart` repo.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove double word "have have"


>[!WARNING]
>
>Without this capability, you will not have access to our built-in, self-instrumentation that ensures visibility and accuracy of MDAI operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comma "...built-in self-instrumentation"

@@ -54,10 +53,10 @@ An OpenTelemetry collector is a component that receives and forwards telemetry d

We'll use Fluentd to capture the synthetic log streams you created, and forward them to the collector.

1. From the [MDAI Example Config repo](https://github.com/DecisiveAI/configs/blob/main/synthetics/loggen_fluent_config.yaml), copy the `loggen_fluent_config.yaml` into your working directory.
1. Your working directory should still be pointed at [MDAI Labs repo](https://github.com/DecisiveAI/mdai-labs/). Run the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again remove the "working directory" advice. The fewer words, the lower the cognitive stress.

README.md Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the tweaks.


## About Self-monitoring

The MDAI Smart Telemetry Hub contains complex infrastructure. To maintain and monitor operational excellence, we have included an opt-in capability to create an understanding internal metrics, audit history of change events, and log streams affording traceability across our services and their events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"create an understanding of internal metrics" ?


The `mdai-helm-chart` installed `mdai-operator` and `mdai-gateway` expect a destination to send their logs to, but this chart does not manage deploying the logs destination for those services.

The `mdai-operator` has the ability to manage an opinionated collector, via compatible configurations, called the `mdai-collector` (sometimes referred to as the `hub-monitor`). The `mdai-collector` receives from this fixed list of services and sends the logs to a compatible destination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we mention what "this fixed list of services" is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about users being able to understand what this means for them in a quick start. I mentioned in another comment that this seems to be adding complexity a little early, without enough context.

@@ -33,14 +33,13 @@ But not all logs are created equal. Now that we're generating log data, let's fi

MDAI is monitoring services by a service identifier (log attribute), `mdai_service`, and a tolerance threshold over a rolling time window. When a given service surpasses that threshold, we'd like to drop non-critical data such as log lines with a level below WARN.

To do that, let's add a managed filter to `otel_config.yaml`, the collector's configuration file. Open the file for editing and look for the configuration block that looks like this:
To do that, let's add a managed filter to `otel/0.8/otel_guaranteed_working.yaml`, the collector's configuration file. Open the file for editing and look for the configuration block that looks like this:

```
# filter/service_list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the otel/0.8/otel_guaranteed_working.yaml I'm pulling from mdai_labs right now all of that is already uncommented. So is the next line below that's supposed to be commented out. So that file needs to be updated as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so when I apply this config, nothing seems to change?

> kubectl apply -f otel/0.8/otel_guaranteed_working.yaml --namespace mdai
opentelemetrycollector.opentelemetry.io/gateway unchanged

So I'm not sure the filter is actually doing anything. We used to have a filter that specified filtering out INFO level logs.

>
> 🛑 **Warning**
>
> We do not recommend using this setup for a development or greater environment -- this is truly for the quickstart. If you're in AW EKS, we recommend [IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) for a similar effect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If you're in AW EKS ..." → "AWS"?


##### Create / Use `.env` file to store Access Keys

If you don't already, create a `.env` file. Then, update the `.env` file to have your credentials. It should look similar to the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"don't" → "haven't"


>[!NOTE]
>
>To stop logs from sending to s3, you will need to delete the MdaiCollector Custom Resource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the MdaiCollector Custom Resource" → "the MDAI Collector Custom Resource (CR)"

Looks like it's called "MDAI Collector" below; also below we start using "CR" so would be nice to show that.

@shalane-proctor
Copy link
Contributor

shalane-proctor commented Jun 25, 2025

@textractor Was trying to go through each of your comments on this PR, but maybe it would be better we work together on it some time this week because I'm a bit confused now and seemingly a lot has happened since your first comment. Sent you a DM, but adding here for visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants