diff --git a/ci/vale/dictionary.txt b/ci/vale/dictionary.txt index 1f4c79bf792..faa3c995d2c 100644 --- a/ci/vale/dictionary.txt +++ b/ci/vale/dictionary.txt @@ -1266,6 +1266,7 @@ KPI KPIs krita kroll +KServe ksmbd KStream KStreams @@ -1278,6 +1279,7 @@ kubeconfig Kubecost kubectl kubectx +Kubeflow kubeflow kubelet kubelets @@ -1359,6 +1361,8 @@ Livestatus livigno lke lksemel +LLM +LLMs lmctfy lo0 loadavg @@ -1498,6 +1502,7 @@ Microweber middlebox middleboxes middleware +Milvus mimikatz minecraft mineshafts diff --git a/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/index.md b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/index.md new file mode 100644 index 00000000000..99407a340a3 --- /dev/null +++ b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/index.md @@ -0,0 +1,459 @@ +--- +slug: deploying-rabbitmq-on-a-linode +title: "Deploying RabbitMQ on a Linode" +description: "Learn how to install and configure RabbitMQ on a Linode instance. This guide covers setting up the message broker, enabling management tools, and testing message queues." +authors: ["Akamai"] +contributors: ["Akamai"] +published: 2025-02-11 +keywords: ['rabbitmq','rabbitmq installation','install rabbitmq','rabbitmq setup','rabbitmq ubuntu 24.04','deploy rabbitmq'] +license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)' +external_resources: +- '[RabbitMQ Plugins](https://www.rabbitmq.com/docs/plugins)' +- '[RabbitMQ Management CLI](https://www.rabbitmq.com/docs/management-cli)' +- '[RabbitMQ Deployment Checklist](https://www.rabbitmq.com/docs/production-checklist)' +--- + +RabbitMQ is an open source message broker that facilitates communication between distributed applications. This guide covers steps for manually installing, configuring, and testing RabbitMQ on a Linode instance running Ubuntu 24.04 LTS. + +If you prefer an automated deployment, consider our [RabbitMQ Marketplace app](/docs/marketplace-docs/guides/rabbitmq/). + +## Before You Begin + +1. If you do not already have a virtual machine to use, create a Compute Instance with at least 2 GB of memory running Ubuntu 24.04 LTS. For resources and instructions on deploying an instance using Cloud Manager, see our [Get Started](https://techdocs.akamai.com/cloud-computing/docs/getting-started) and [Create a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/create-a-compute-instance) guides. + + {{< note title="Provisioning Compute Instances with the Linode CLI" type="secondary" isCollapsible="true" >}} + Use these steps if you prefer to use the Linode CLI to provision resources. + + The following command creates a Linode 2 GB compute instance (`g6-standard-1`) running Ubuntu 24.04 LTS (`linode/ubuntu24.04`) in the Miami datacenter (`us-mia`): + + ```command + linode-cli linodes create \ + --image linode/ubuntu24.04 \ + --region us-mia \ + --type g6-standard-1 \ + --root_pass '{{< placeholder "PASSWORD" >}}' \ + --authorized_keys "$(cat ~/.ssh/id_ed25519.pub)" \ + --label rabbitmq-linode + ``` + + Note the following key points: + + - Replace `region` as desired. + - Replace {{< placeholder "PASSWORD" >}} with a secure alternative for your root password. + - This command assumes that an SSH public/private key pair exists, with the public key stored as `id_ed25519.pub` in the user’s `$HOME/.ssh` folder. + - The `--label` argument specifies the name of the new server (`rabbitmq-linode`). + {{< /note >}} + +1. Follow our [Set Up and Secure a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/set-up-and-secure-a-compute-instance) guide to update your system and create a limited user account. You may also wish to set the timezone, configure your hostname, and harden SSH access. + +{{< note >}} +This guide is written for a non-root user. Commands that require elevated privileges are prefixed with `sudo`. If you’re not familiar with the `sudo` command, see the [Users and Groups](/docs/guides/linux-users-and-groups/) guide. +{{< /note >}} + +## Install RabbitMQ as a Service + +RabbitMQ offers an [installation script](https://www.rabbitmq.com/docs/install-debian#apt-quick-start-cloudsmith) for Ubuntu 24.04 LTS. This script uses the latest versions of Erlang supported by RabbitMQ along with the latest version of the server itself. + +1. SSH into your instance as a user with `sudo` privileges: + + ```command + ssh {{< placeholder "USERNAME" >}}@{{< placeholder "IP-ADDRESS" >}} + ``` + +1. Using a text editor such as `nano`, create a file called `install-rabbitmq.sh`: + + ``` + nano install-rabbitmq.sh + ``` + + Paste the code snippet for the Ubuntu 24.04 LTS installation script into the file: + + ```file {title="install-rabbitmq.sh"} + #!/bin/sh + + sudo apt-get install curl gnupg apt-transport-https -y + + ## Team RabbitMQ's main signing key + curl -1sLf "https://keys.openpgp.org/vks/v1/by-fingerprint/0A9AF2115F4687BD29803A206B73A36E6026DFCA" | sudo gpg --dearmor | sudo tee /usr/share/keyrings/com.rabbitmq.team.gpg > /dev/null + ## Community mirror of Cloudsmith: modern Erlang repository + curl -1sLf https://github.com/rabbitmq/signing-keys/releases/download/3.0/cloudsmith.rabbitmq-erlang.E495BB49CC4BBE5B.key | sudo gpg --dearmor | sudo tee /usr/share/keyrings/rabbitmq.E495BB49CC4BBE5B.gpg > /dev/null + ## Community mirror of Cloudsmith: RabbitMQ repository + curl -1sLf https://github.com/rabbitmq/signing-keys/releases/download/3.0/cloudsmith.rabbitmq-server.9F4587F226208342.key | sudo gpg --dearmor | sudo tee /usr/share/keyrings/rabbitmq.9F4587F226208342.gpg > /dev/null + + ## Add apt repositories maintained by Team RabbitMQ + sudo tee /etc/apt/sources.list.d/rabbitmq.list <CTRL+X, followed by Y then Enter to save the file and exit `nano`. + +1. Run the script: + + ```command + source ./install-rabbitmq.sh + ``` + +1. Your instance should now have the latest version of the RabbitMQ server running as a systemd service. Verify this with the following command: + + ```command + systemctl status rabbitmq-server + ``` + + Output containing `active (running)` indicates that the service is enabled and running: + + ```output + ● rabbitmq-server.service - RabbitMQ broker + Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; preset: enabled) + Active: active (running) since Mon 2025-02-10 13:32:01 EST; 17min ago + Main PID: 2120 (beam.smp) + Tasks: 25 (limit: 2276) + Memory: 74.5M (peak: 88.8M) + CPU: 3.317s + CGroup: /system.slice/rabbitmq-server.service + ``` + +1. RabbitMQ supplies a client that allows direct access to the server when connecting from `localhost`. To further verify that the installation was successful and configured as desired, run the following: + + ```command + sudo rabbitmq-diagnostics status + ``` + + This prints a list of diagnostic information about the server such as CPU and memory usage, as well as locations of the logs and configuration files on the system. + + ```output + Status of node rabbit@rabbitmq-ubuntu-2404-1 ... + [] + Runtime + + OS PID: 2120 + OS: Linux + Uptime (seconds): 1217 + Is under maintenance?: false + RabbitMQ version: 4.0.5 + ... + Memory + + Total memory used: 0.0983 gb + Calculation strategy: rss + Memory high watermark setting: 0.6 of available memory, computed to: 1.2382 gb + ... + Totals + + Connection count: 0 + Queue count: 0 + Virtual host count: 1 + + Listeners + + Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication + Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 + ``` + +### Starting and Stopping RabbitMQ + +RabbitMQ requires no additional configuration upon installation. While not required, configuration files can be stored in `/etc/rabbitmq`. See [RabbitMQ's official documentation](https://www.rabbitmq.com/docs/configure) for more information on configuration options. + +The RabbitMQ server can be controlled via systemd-managed services. For example: + +- Use `systemctl` to stop the RabbitMQ server: + + ```command + systemctl stop rabbitmq-server + ``` + +- Use `systemctl` to start the RabbitMQ server: + + ```command + systemctl start rabbitmq-server + ``` + +- Use `journalctl` to view the server logs: + + ```command + journalctl -u rabbitmq-server + ``` + +## Testing RabbitMQ + +1. To test the RabbitMQ deployment, first enable the RabbitMQ management plugin: + + ```command + rabbitmq-plugins enable rabbitmq_management + ``` + + ```output + Enabling plugins on node rabbit@rabbitmq-ubuntu-2404-1: + rabbitmq_management + The following plugins have been configured: + rabbitmq_management + rabbitmq_management_agent + rabbitmq_web_dispatch + Applying plugin configuration to rabbit@rabbitmq-ubuntu-2404-1... + The following plugins have been enabled: + rabbitmq_management + rabbitmq_management_agent + rabbitmq_web_dispatch + + started 3 plugins. + ``` + +1. Next, download the management script, which is available directly from `localhost` after enabling the plugin: + + ```command + wget http://localhost:15672/cli/rabbitmqadmin + ``` + + ```output + Resolving localhost (localhost)... ::1, 127.0.0.1 + Connecting to localhost (localhost)|::1|:15672... failed: Connection refused. + Connecting to localhost (localhost)|127.0.0.1|:15672... connected. + HTTP request sent, awaiting response... 200 OK + Length: 42630 (42K) [application/octet-stream] + Saving to: ‘rabbitmqadmin’ + + rabbitmqadmin 100%[===================>] 41.63K --.-KB/s in 0.002s + + 2025-02-10 14:14:04 (24.9 MB/s) - ‘rabbitmqadmin’ saved [42630/42630] + ``` + +1. Make the script executable, and move it to a location included in the environment `PATH`: + + ```command + chmod +x rabbitmqadmin + sudo mv rabbitmqadmin /usr/local/bin/ + ``` + +### Create An Exchange and Queue + +This guide demonstrates creating a [fanout exchange](https://www.rabbitmq.com/tutorials/amqp-concepts#exchange-fanout), which "routes messages to all of the queues that are bound to it". A fanout closely resembles the pub/sub pattern and is typically used for broadcasting messages. + +See RabbitMQ's official documentation for more on exchanges and queues: [RabbitMQ Tutorials](https://www.rabbitmq.com/tutorials) + +1. Create a `fanout` style exchange on the RabbitMQ server with the following: + + ```command + sudo rabbitmqadmin declare exchange \ + name=test_fanout_exchange \ + type=fanout + ``` + + ```output + exchange declared + ``` + +1. Create a queue to attach to this exchange to hold messages: + + ```command + sudo rabbitmqadmin declare queue \ + name=fanout_queue \ + durable=true + ``` + + ```output + queue declared + ``` + +1. Bind the queue to the exchange: + + ```command + sudo rabbitmqadmin declare binding \ + source=test_fanout_exchange \ + destination=fanout_queue + ``` + + ```output + binding declared + ``` + +### Test Message Publishing and Retrieval + +1. Publish a message to the exchange (and bound queue): + + ```command + sudo rabbitmqadmin publish \ + exchange=test_fanout_exchange \ + routing_key=dummy_key \ + payload="Hello, world!" + ``` + + ```output + Message published + ``` + + {{< note >}} + The routing key is not necessary for a fanout exchange, as each message is routed to each queue regardless of the routing key. However, it is required for the `rabbitmqadmin` tool. + {{< /note >}} + +1. Retrieve the messages from the queue: + + ```command + sudo rabbitmqadmin get queue=fanout_queue + ``` + + ```output + +-------------+----------------------+---------------+---------------+---------------+------------------+------------+-------------+ + | routing_key | exchange | message_count | payload | payload_bytes | payload_encoding | properties | redelivered | + +-------------+----------------------+---------------+---------------+---------------+------------------+------------+-------------+ + | dummy_key | test_fanout_exchange | 0 | Hello, world! | 13 | string | | False | + +-------------+----------------------+---------------+---------------+---------------+------------------+------------+-------------+ + ``` + +## The RabbitMQ Web Interface + +The RabbitMQ management plugin enables a web interface and API accessible at port `15672`. Assuming this port is not blocked by any firewall rules, you can access the web interface in your browser by visiting the following URL, replacing {{< placeholder "IP_ADDRESS" >}} with the IP of your Linode instance: + +```command +http://{{< placeholder "IP_ADDRESS" >}}:15672 +``` + +![Web browser accessing RabbitMQ management interface via port 15672 on a Linode Compute Instance.](rabbitmq-web-interface.png) + +By default, RabbitMQ is initiated with a default [virtual host](https://www.rabbitmq.com/docs/vhosts) and a [default administrative user](https://www.rabbitmq.com/docs/access-control#default-state) with username `guest` (and password `guest`). However, this user can only connect to the management interface from `localhost`. To connect to RabbitMQ remotely, a new user must be created. + +### Create a New RabbitMQ Management User + +1. Use the `rabbitmqctl add_user` command and provide a username and password: + + ```command + sudo rabbitmqctl add_user "{{< placeholder "RABBITMQ_USERNAME" >}}" "{{< placeholder "RABBITMQ_PASSWORD" >}}" + ``` + + ```output + Adding user "{{< placeholder "RABBITMQ_USERNAME" >}}" ... + Done. Don't forget to grant the user permissions to some virtual hosts! See 'rabbitmqctl help set_permissions' to learn more. + ``` + +1. Add the `administrator` tag to the newly created user, giving them management privileges. + + ```command + sudo rabbitmqctl set_user_tags {{< placeholder "RABBITMQ_USERNAME" >}} administrator + ``` + + ```output + Setting tags for user "{{< placeholder "RABBITMQ_USERNAME" >}}" to [administrator\] ... + ``` + +### Set Permissions for the User on the Virtual Host + +1. Verify the name of the existing virtual host: + + ```command + sudo rabbitmqctl -q --formatter=pretty_table list_vhosts name description + ``` + + The default virtual host is named `/`: + + ```output + ┌──────┬──────────────────────┐ + │ name │ description │ + ├──────┼──────────────────────┤ + │ / │ Default virtual host │ + └──────┴──────────────────────┘ + ``` + +1. Grant permissions to the newly created user on this virtual host: + + ```command + sudo rabbitmqctl set_permissions -p "/" "{{< placeholder "RABBITMQ_USERNAME" >}}" ".*" ".*" ".*" + ``` + + ```output + Setting permissions for user "{{< placeholder "RABBITMQ_USERNAME" >}}" in vhost "/" ... + ``` + +### Access the RabbitMQ Management Interface Remotely + +Return to the management console UI in a web browser, and log in with the credentials of the newly created user: + +![RabbitMQ management interface login screen displaying username and password fields.](rabbitmq-login-screen.png) + +After logging in, the **Overview** page displays metrics about the currently running RabbitMQ instance: + +![RabbitMQ management dashboard overview displaying server metrics, queue status, and connection details.](rabbitmq-dashboard-overview.png) + +### Send Test Requests to the RabbitMQ API + +1. Test publishing a message to an exchange using `curl` to send an authenticated request to the RabbitMQ API: + + ```command + curl \ + -u {{< placeholder "RABBITMQ_USERNAME" >}}:{{< placeholder "RABBITMQ_PASSWORD" >}} \ + -H "Content-Type: application/json" \ + -X POST \ + -d '{"properties":{},"routing_key":"dummy_key","payload":"Hello, curl!","payload_encoding":"string"}' \ + http://{{< placeholder "IP_ADDRESS" >}}:15672/api/exchanges/%2f/test_fanout_exchange/publish + ``` + + ```output + {"routed":true} + ``` + + {{< note >}} + The `%2f` in the request URL is the URL-encoded value for the name of the exchange (`/`). + {{< /note >}} + +1. Now send an authenticated request to retrieve the last two messages from the queue: + + ```command + curl \ + -u {{< placeholder "RABBITMQ_USERNAME" >}}:{{< placeholder "RABBITMQ_PASSWORD" >}} \ + -H "Content-type:application/json" \ + -X POST \ + -d '{"count":2,"ackmode":"ack_requeue_true","encoding":"auto"}' \ + http://{{< placeholder "IP_ADDRESS" >}}:15672/api/queues/%2f/fanout_queue/get | json_pp + ``` + + ```output + [ + { + "exchange" : "test_fanout_exchange", + "message_count" : 1, + "payload" : "Hello, world!", + "payload_bytes" : 13, + "payload_encoding" : "string", + "properties" : [], + "redelivered" : true, + "routing_key" : "dummy_key" + }, + { + "exchange" : "test_fanout_exchange", + "message_count" : 0, + "payload" : "Hello, curl!", + "payload_bytes" : 12, + "payload_encoding" : "string", + "properties" : [], + "redelivered" : false, + "routing_key" : "dummy_key" + } + ] + ``` \ No newline at end of file diff --git a/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-dashboard-overview.png b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-dashboard-overview.png new file mode 100644 index 00000000000..1b19983aa29 Binary files /dev/null and b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-dashboard-overview.png differ diff --git a/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-login-screen.png b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-login-screen.png new file mode 100644 index 00000000000..a8092caa848 Binary files /dev/null and b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-login-screen.png differ diff --git a/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-queue-message-retrieval.png b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-queue-message-retrieval.png new file mode 100644 index 00000000000..0be6067fe3a Binary files /dev/null and b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-queue-message-retrieval.png differ diff --git a/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-web-interface.png b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-web-interface.png new file mode 100644 index 00000000000..51730278ddf Binary files /dev/null and b/docs/guides/applications/messaging/deploying-rabbitmq-on-a-linode/rabbitmq-web-interface.png differ diff --git a/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/ai_rag_chatbot_implementation.png b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/ai_rag_chatbot_implementation.png new file mode 100644 index 00000000000..17eb7b5c7f2 Binary files /dev/null and b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/ai_rag_chatbot_implementation.png differ diff --git a/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/index.md b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/index.md new file mode 100644 index 00000000000..180ba339e1b --- /dev/null +++ b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/index.md @@ -0,0 +1,588 @@ +--- +slug: ai-chatbot-and-rag-pipeline-for-inference-on-lke +title: "Deploy an AI Chatbot and RAG Pipeline for Inferencing on LKE" +description: "Utilize the Retrieval-Augmented Generation technique to supplement an LLM with your own custom data." +authors: ["Linode"] +contributors: ["Linode"] +published: 2025-02-11 +keywords: ['kubernetes','lke','ai','inferencing','rag','chatbot','architecture'] +tags: ["kubernetes","lke"] +license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)' +--- + +## Overview + +LLMs (Large Language Models) are increasingly used to power chatbots or other knowledge assistants. While these models are pre-trained on vast swaths of information, they are not trained on your own private data or knowledge base. To overcome this, you need to provide this data to the LLM (a process called context augmentation). This tutorial showcases a particular method of context augmentation called Retrieval-Augmented Generation (RAG), which indexes your data and attaches relevant data as context when users sends the LLM queries. + +Follow this tutorial to deploy a RAG pipeline on Akamai’s LKE service using our latest GPU instances. Once deployed, you will have a web chatbot that can respond to queries using data from your own custom data source. + +## Diagram + +![Diagram of an AI RAG chatbot solution on Akamai Cloud](ai_rag_chatbot_implementation.png) + +## Components + +### Infrastructure + +- **LKE (Linode Kubernetes Engine):** LKE is Akamai’s managed Kubernetes service, enabling you to deploy containerized applications without needing to build out and maintain your own Kubernetes cluster. This tutorial deploys all software components to the same LKE cluster and node pool, though you should consider your own needs if using this solution for a production workload. +- **Linode GPUs (NVIDIA RTX 4000):** Akamai has several GPU virtual machines available, including NVIDIA RTX 4000 (used in this tutorial) and Quadro RTX 6000. NVIDIA’s Ada Lovelace architecture in the RTX 4000 VMs are adept at many AI tasks, including [inferencing](https://www.nvidia.com/en-us/solutions/ai/inference/) and [image generation](https://blogs.nvidia.com/blog/ai-decoded-flux-one/). + +### Software + +- **Kubeflow:** This open-source software platform includes a suite of applications that are used for machine learning tasks. It is designed to be run on Kubernetes. While each application can be installed individually, this tutorial installs all default applications and makes specific use of the following: + - **KServe:** Serves machine learning models. This tutorial installs the Llama 3 LLM to KServe, which then serves it to other applications, such as the chatbot UI. + - **Kubeflow Pipeline:** Used to deploy pipelines, reusable machine learning workflows built using the Kubeflow Pipelines SDK. In this tutorial, a pipeline is used to run LlamaIndex to train the LLM with additional data. +- **Meta’s Llama 3 LLM:** We use Llama 3 as the LLM, along with the LlamaIndex tool to capture data from an external source and send embeddings to the Milvus database. +- **Milvus:** Milvus is an open-source vector database and is used for generative AI workloads. This tutorial uses Milvus to store embeddings generated by LlamaIndex and make them available to queries sent to the Llama 3 LLM. +- **Open WebUI:** This is an self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG solutions. Users interact with this interface to query the LLM. This can be configured to send queries straight to Llama 3 or to first load data from Milvus and send that context along with the query. + +## Prerequisites + +This tutorial requires you to have access to a few different services and local software tools. You should also have a custom dataset available to use for the pipeline. + +- A [Cloud Manager](https://cloud.linode.com/) account is required to use many of Akamai’s cloud computing services, including LKE. +- A [Hugging Face](https://huggingface.co/) account is used for deploying the Llama 3 LLM to KServe. +- You should have both [kubectl](https://kubernetes.io/docs/reference/kubectl/) and [Helm](https://helm.sh/) installed on your local machine. These apps are used for managing your LKE cluster and installing applications to your cluster. +- A **custom dataset** is needed, preferably in Markdown format, though you can use other types of data if you modify the LlamaIndex configuration provided in this tutorial. This dataset should contain all of the information you want used by the Llama 3 LLM. This tutorial uses a Markdown dataset containing all of the Linode Docs. + +{{< note type="warning" title="Production workloads" >}} +These instructions are intended as a proof of concept for testing and demonstration purposes. They are not designed as a complete production reference architecture. +{{< /note >}} + +{{< note type="warning" title="Security notice" >}} +The configuration instructions in this document are expected to not expose any services to the Internet. Instead, they run on the Kubernetes cluster's internal network, and to access the services it’s necessary to forward their ports locally first. This configuration is restricted by design to avoid accidentally exposing those services before they can be properly secured. Additionally, some services will run with no authentication or default credentials configured. +It’s not part of the scope of this document to cover the setup required to secure this configuration for a production deployment. +{{< /note >}} + +# Set up infrastructure + +The first step is to provision the infrastructure needed for this tutorial and configure it with kubectl, so that you can manage it locally and install software through helm. As part of this process, we’ll also need to install the NVIDIA GPU operator at this step so that the NVIDIA cards within the GPU worker nodes can be used on Kubernetes. + +1. **Provision an LKE cluster.** We recommend using at least two **RTX4000 Ada x2 Medium** GPU plans (plan ID: `g2-gpu-rtx4000a2-m`), though you can adjust this as needed. For reference, Kubeflow recommends 32 GB of RAM and 16 CPU cores. This tutorial has been tested using Kubernetes v1.31, though other versions should also work. To learn more about provisioning a cluster, see the [Create a cluster](https://techdocs.akamai.com/cloud-computing/docs/create-a-cluster) guide. + + {{< note noTitle=true >}} + GPU plans are available in a limited number of data centers. Review the [GPU product documentation](https://techdocs.akamai.com/cloud-computing/docs/gpu-compute-instances#availability) to learn more about availability. + {{< /note >}} + +1. **Configure kubectl with the newly deployed cluster.** To do this, you need to download the kubeconfig YAML file for your new cluster and then reference it when running kubectl. For full instructions, see the [Manage a cluster with kubectl](https://techdocs.akamai.com/cloud-computing/docs/manage-a-cluster-with-kubectl) guide. + +1. **Install the NVIDIA GPU operator for Kubernetes using Helm.** This enables the NVIDIA GPUs on the cluster’s worker nodes to run Kubernetes workloads. For additional instructions, see the [official NVIDIA docs](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html). + + ```command + helm repo add nvidia https://helm.ngc.nvidia.com/nvidia + helm repo update + helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.9.1 + ``` + + You can confirm that the operator has been installed on your cluster by running reviewing your pods. You should see a number of pods in the `gpu-operator` namespace. + + ```command + kubectl get pods -A + ``` + +### Deploy Kubeflow + +Next, let’s deploy Kubeflow on the LKE cluster. These instructions deploy all of the components included by default in the Kubeflow platform (the single-command installation method), though the tutorial only makes use of KServe and Kubeflow Pipelines. You can modify this step to deploy each required application separately, if needed. See the [official installation instructions](https://github.com/kubeflow/manifests/tree/v1.9-branch?tab=readme-ov-file#installation) for additional details. + +1. Download the [Kubeflow v1.9.1 manifests file](https://github.com/kubeflow/manifests/archive/refs/tags/v1.9.1.zip) and extract it to its own directory and open this directory in your terminal application. + +1. Before installing Kubeflow, change the default password. + + 1. Generate a random password. This password is needed later in the tutorial so be sure to save it. + + ```command + openssl rand -base64 18 + ``` + + 1. Create a hash of this password, replacing PASSWORD with the password generated in the previous step. This outputs a string starting with `$2y$12$`, which is password hash. + + ```command + htpasswd -bnBC 12 "" | tr -d ':\n' + ``` + + 1. Edit the `common/dex/base/dex-passwords.yaml` file, replacing the value for `DEX_USER_PASSWORD` with the password hash generated in the previous step. + +1. Run the following command to install Kubeflow. + + ```command + while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done + ``` + +1. This may take some time to finish. Once it’s complete, verify that all pods are in the ready state. + + ```command + kubectl get pods -A + ``` + +### Install Llama3 LLM on KServe + +After Kubeflow has been installed, we can now deploy the Llama 3 LLM to KServe. This tutorial uses HuggingFace (a platform that provides pre-trained AI models) to deploy Llama 3 to the LKE cluster. Specifically, these instructions use the [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model. + +1. Create a Hugging Face token with **READ** access to use for this project. See the Hugging Face user documentation on [User access tokens](https://huggingface.co/docs/hub/en/security-tokens) for instructions. + +1. Create the manifest file for the [Kubernetes secret](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-config-file/). You can use the following as a template: + + ```file {title="hf-secret.yaml" lang="yaml"} + apiVersion: v1 + kind: Secret + metadata: + name: hf-secret + type: Opaque + stringData: + HF_TOKEN: + ``` + +1. Then, create the secret on your cluster by applying the manifest file: + + ```command + kubectl apply -f ./hf-secret.yaml + ``` + +1. Create a config file for deploying the Llama 3 model on your cluster. + + ```file {title="model.yaml" lang="yaml"} + apiVersion: serving.kserve.io/v1beta1 + kind: InferenceService + metadata: + name: huggingface-llama3 + spec: + predictor: + model: + modelFormat: + name: huggingface + args: + - --model_name=llama3 + - --model_id=NousResearch/Meta-Llama-3-8B-Instruct + - --max-model-len=4096 + env: + - name: HF_TOKEN + valueFrom: + secretKeyRef: + name: hf-secret + key: HF_TOKEN + optional: false + resources: + limits: + cpu: "6" + memory: 24Gi + nvidia.com/gpu: "1" + requests: + cpu: "6" + memory: 24Gi + nvidia.com/gpu: "1" + ``` + +1. Apply the configuration. + + ```command + kubectl apply -f model.yaml + ``` + +1. Verify that the new Llama 3 pod is ready before continuing. + + ```command + kubectl get pods -A + ``` + +### Install Milvus + +Milvus, the vector database designed for AI inference workloads, will be used as part of the RAG pipeline. Install Milvus before moving forward with the Kubeflow Pipeline configuration. + +1. Create a configuration file, called milvus-custom-values.yaml. Edit this file to add the following text: + + ```file {title="milvus-custom-values.yaml" lang="yaml"} + standalone: + resources: + requests: + nvidia.com/gpu: "1" + limits: + nvidia.com/gpu: "1" + ``` + +1. Add Milvus to Helm. + + ```command + helm repo add milvus https://zilliztech.github.io/milvus-helm/ + helm repo update + ``` + +1. Install Milvus using Helm. + + ```command + helm install my-release milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false -f milvus-custom-values.yaml + ``` + +## Set up Kubeflow Pipeline to ingest data + +Kubeflow Pipeline pulls together the entire workflow for ingesting data from our Markdown data source and outputting embeddings for the vector store in Milvus. The pipeline defined within this section will perform the following steps when it runs: + +1. Download a zip archive from the specified URL. +1. Uses LlamaIndex to read the Markdown files within the archive. +1. Generate embeddings from the content of those files using the [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) model. +1. Store the embeddings within the Milvus database collection. + +Keep this workflow in mind when going through the Kubeflow Pipeline set up steps in this section. If you require a different pipeline workflow, you will need to adjust the python file and Kubeflow Pipeline configuration discussed in this section. + +### Generate the pipeline YAML file + +This tutorial employs a Python script to create the YAML file used within Kubeflow Pipeline. This YAML file describes each step of the pipeline workflow. + +1. Create a virtual environment for Python on your local machine. + + ```command + python3 -m venv . + source bin/activate + ``` + +1. Install the Kubeflow Pipelines package in this virtual environment. + + ```command + pip install kfp + ``` + +1. Use the following python script to generate a YAML file to use for the Kubeflow Pipeline. This script configures the pipeline to download the Markdown data you wish to ingest, read the content using LlamaIndex, generate embeddings of the content using BAAI general embedding model, and store the embeddings in the Milvus database. Replace values as needed before proceeding. + + ```file {title="doc-ingest-pipeline.py" lang="python"} + from kfp import dsl + + @dsl.component( + base_image='nvcr.io/nvidia/ai-workbench/python-cuda117:1.0.3', + packages_to_install=['pymilvus>=2.4.2', 'llama-index', 'llama-index-vector-stores-milvus', 'llama-index-embeddings-huggingface', 'llama-index-llms-openai-like'] + ) + def doc_ingest_component(url: str, collection: str) -> None: + print(">>> doc_ingest_component") + + from urllib.request import urlopen + from io import BytesIO + from zipfile import ZipFile + + http_response = urlopen(url) + zipfile = ZipFile(BytesIO(http_response.read())) + zipfile.extractall(path='./md_docs') + + from llama_index.core import SimpleDirectoryReader + + # load documents + documents = SimpleDirectoryReader("./md_docs/", recursive=True, required_exts=[".md"]).load_data() + + from llama_index.embeddings.huggingface import HuggingFaceEmbedding + from llama_index.core import Settings + + Settings.embed_model = HuggingFaceEmbedding( + model_name="BAAI/bge-large-en-v1.5" + ) + + from llama_index.llms.openai_like import OpenAILike + + llm = OpenAILike( + model="llama3", + api_base="http://huggingface-llama3-predictor-00001.default.svc.cluster.local/openai/v1", + api_key = "EMPTY", + max_tokens = 512) + + Settings.llm = llm + + from llama_index.core import VectorStoreIndex, StorageContext + from llama_index.vector_stores.milvus import MilvusVectorStore + + vector_store = MilvusVectorStore(uri="http://my-release-milvus.default.svc.cluster.local:19530", collection=collection, dim=1024, overwrite=True) + storage_context = StorageContext.from_defaults(vector_store=vector_store) + index = VectorStoreIndex.from_documents( + documents, storage_context=storage_context + ) + + @dsl.pipeline + def doc_ingest_pipeline(url: str, collection: str) -> None: + comp = doc_ingest_component(url=url, collection=collection) + + from kfp import compiler + + compiler.Compiler().compile(doc_ingest_pipeline, 'pipeline.yaml') + ``` + +1. Run the script to generate the YAML file. + + ```command + python3 doc-ingest-pipeline.py + ``` + + This creates a file called pipeline.yaml, which you will upload to Kubeflow in the following section. + +1. Run `deactivate` to exit the Python virtual environment. + +### Run the pipeline workflow + +1. Configure port forwarding on your cluster through kubectl so that you can access the Kubeflow interface from your local computer. + + ```command + kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80 + ``` + +1. Open a web browser and navigate to the Kubeflow interface at http://localhost:8080. A login screen should appear. + + {{< note type="warning" noTitle=true >}} + If the browser instead shows the error `Jwks doesn't have key to match kid or alg from Jwt`, there may be a previous JWT session that is interfering. Opening this URL in your browser's private or incognito mode should resolve this. + {{< /note >}} + +1. Log in with the username `user@example.com` and use the password that you created in a previous step. + +1. Navigate to the Pipelines > Experiments page and click the button to create a new experiment. Enter a name and description for the experiment and click **Next**. + + ![Screenshot of the "New Experiment" page within Kubeflow](kubeflow-new-experiment.jpg) + +1. Next, navigate to Pipelines > Pipelines and click the **Upload Pipeline** link. Select **Upload a file** and use the **Choose file** dialog box to select the pipeline YAML file that was created in a previous step. + + ![Screenshot of the "New Pipeline" page within Kubeflow](kubeflow-new-pipeline.jpg) + +1. Navigate to the Pipelines > Runs page and click **Create Run**. Within the Run details section, select the pipeline and experiment that you just created. Choose *One-off* as the **Run Type** and provide the collection name and URL of the dataset (the zip file with the documents you wish to process) in the **Run parameters** section. For this tutorial, we are using `linode_docs` as the name and `https://github.com/linode/docs/archive/refs/tags/v1.360.0.zip` and the dataset URL. + + ![Screenshot of the "Start a new run" page within Kubeflow](kubeflow-new-run.jpg) + +1. Click **Start** to run the pipeline. This process takes some time. For reference, it took ~10 minutes for the run to complete successfully on the linode.com/docs dataset. + +## Deploy the chatbot + +To finish up this tutorial, we will install the Open-WebUI chatbot and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe. Once this is up and running, you can open up a browser interface to the chatbot and ask it questions. Chatbot UI will use the Milvus database to load context related to the search and send it, along with your query, to the Llama 3 instance within KServe. The LLM will send back a response to the chatbot and your browser will display an answer that is informed by your own custom data. + +### Create the RAG pipeline files + +Despite the naming, these RAG pipeline files are not related to the Kubeflow pipeline created in the previous section. They instead instruct the chatbot on how to interact with all of the components we’ve created so far, including the Milvus data store and the Llama 3 LLM. + +1. Create a new directory on your local machine and navigate to that directory. + +1. Create a pipeline-requirements.txt file with the following contents: + + ```file {title="pipeline-requirements.txt"} + requests + pymilvus + llama-index + llama-index-vector-stores-milvus + llama-index-embeddings-huggingface + llama-index-llms-openai-like + ``` + +1. Create a rag-pipeline.py file with the following contents: + + ```file {title="rag-pipeline.py"} + """ + title: RAG Pipeline + version: 1.0 + description: RAG Pipeline + """ + from typing import List, Optional, Union, Generator, Iterator + + class Pipeline: + + def __init__(self): + self.name = "RAG Pipeline" + self.index = None + pass + + + async def on_startup(self): + from llama_index.embeddings.huggingface import HuggingFaceEmbedding + from llama_index.core import Settings, VectorStoreIndex + from llama_index.llms.openai_like import OpenAILike + from llama_index.vector_stores.milvus import MilvusVectorStore + + print(f"on_startup:{__name__}") + + Settings.embed_model = HuggingFaceEmbedding( + model_name="BAAI/bge-large-en-v1.5" + ) + + llm = OpenAILike( + model="llama3", + api_base="http://huggingface-llama3-predictor-00001.default.svc.cluster.local/openai/v1", + api_key = "EMPTY", + max_tokens = 512) + + Settings.llm = llm + + vector_store = MilvusVectorStore(uri="http://my-release-milvus.default.svc.cluster.local:19530", collection="linode_docs", dim=1024, overwrite=False) + self.index = VectorStoreIndex.from_vector_store(vector_store=vector_store) + + async def on_shutdown(self): + print(f"on_shutdown:{__name__}") + pass + + + def pipe( + self, user_message: str, model_id: str, messages: List[dict], body: dict + ) -> Union[str, Generator, Iterator]: + print(f"pipe:{__name__}") + + query_engine = self.index.as_query_engine(streaming=True, similarity_top_k=5) + response = query_engine.query(user_message) + print(f"rag_response:{response}") + return f"{response}" + ``` + +Both of these files are used in the next section. + +### Deploy the pipeline and chatbot + +After the pipeline files have been created, we can deploy the chatbot and configure it to use that pipeline. + +1. Create the `open-webui` namespace on your Kubernetes cluster and a ConfigMap that contains both of the files created as part of the previous section. Replace `` with the path to the directory where the files are stored. + + ```command + kubectl create namespace open-webui + kubectl create configmap -n open-webui pipelines-files --from-file= + ``` + +1. Use the following YAML configuration file to deploy the pipelines and open-webui applications. + + ```file {title="webui-pipelines.yaml" lang="yaml"} + --- + apiVersion: apps/v1 + kind: Deployment + metadata: + name: pipelines-deployment + namespace: open-webui + spec: + replicas: 1 + selector: + matchLabels: + app: pipelines-webui + template: + metadata: + labels: + app: pipelines-webui + spec: + containers: + - name: pipelines-webui + image: ghcr.io/open-webui/pipelines:main + ports: + - containerPort: 9099 + resources: + requests: + cpu: "500m" + memory: "500Mi" + limits: + cpu: "1000m" + memory: "1Gi" + env: + - name: PIPELINES_REQUIREMENTS_PATH + value: "/opt/pipeline-requirements.txt" + - name: PIPELINES_URLS + value: "file:///opt/rag_pipeline.py" + tty: true + volumeMounts: + - name: config-volume + mountPath: /opt + volumes: + - name: config-volume + configMap: + name: pipelines-files + --- + apiVersion: v1 + kind: Service + metadata: + name: pipelines-service + namespace: open-webui + spec: + type: ClusterIP + selector: + app: pipelines-webui + ports: + - protocol: TCP + port: 9099 + targetPort: 9099 + --- + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + labels: + app: open-webui + name: open-webui-pvc + namespace: open-webui + spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 2Gi + --- + apiVersion: apps/v1 + kind: Deployment + metadata: + name: open-webui-deployment + namespace: open-webui + spec: + replicas: 1 + selector: + matchLabels: + app: open-webui + template: + metadata: + labels: + app: open-webui + spec: + containers: + - name: open-webui + image: ghcr.io/open-webui/open-webui:main + ports: + - containerPort: 8080 + resources: + requests: + cpu: "500m" + memory: "500Mi" + limits: + cpu: "1000m" + memory: "1Gi" + env: + - name: ENABLE_OLLAMA_API + value: "False" + - name: OPENAI_API_BASE_URLS + value: "http://huggingface-llama3-predictor-00001.default.svc.cluster.local/openai/v1;http://pipelines-service.open-webui.svc.cluster.local:9099" + - name: OPENAI_API_KEYS + value: "EMPTY;0p3n-w3bu!" + tty: true + volumeMounts: + - name: webui-volume + mountPath: /app/backend/data + volumes: + - name: webui-volume + persistentVolumeClaim: + claimName: open-webui-pvc + --- + apiVersion: v1 + kind: Service + metadata: + name: open-webui-service + namespace: open-webui + spec: + type: ClusterIP + selector: + app: open-webui + ports: + - protocol: TCP + port: 8080 + targetPort: 8080 + ``` + +1. Apply the configuration. + + ```command + kubectl apply -f webui-pipelines.yaml + ``` + +### Access and test the chatbot application + +Now that the chatbot has been configured, the final step is to access the chatbot and test it. + +1. Configure port forwarding on your cluster through kubectl so that you can access the Open WebUI interface from your local computer through port 9090. + + ```command + kubectl port-forward svc/open-webui-service -n open-webui 9090:8080 + ``` + +1. Open a web browser and navigate to the Open WebUI interface at `http://localhost:9090`. + +1. The first time you access this interface you are prompted to create an admin account. Do this now and then continue once you are successfully logged in using that account. + +1. You are now presented with the chatbot interface. Within the dropdown menu, you should be able to select from several models. Select one and ask it a question. + + - The **llama3** model will just use information that was trained by other data sources (not your own custom data). If you ask this model a question, the data from your own dataset will not be used. + + - The **RAG Pipeline** model that you defined in a previous section does indeed use data from your custom dataset. Ask it a question relevant to your data and the chatbot should respond with an answer that is informed by the custom dataset you configured. \ No newline at end of file diff --git a/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-experiment.jpg b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-experiment.jpg new file mode 100644 index 00000000000..f62c83c9e73 Binary files /dev/null and b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-experiment.jpg differ diff --git a/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-pipeline.jpg b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-pipeline.jpg new file mode 100644 index 00000000000..de9b6aa1a9a Binary files /dev/null and b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-pipeline.jpg differ diff --git a/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-run.jpg b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-run.jpg new file mode 100644 index 00000000000..d5019749de4 Binary files /dev/null and b/docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/kubeflow-new-run.jpg differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/cloudwatch-logs-example.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/cloudwatch-logs-example.png new file mode 100644 index 00000000000..25b5f8e3a4c Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/cloudwatch-logs-example.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/cloudwatch-metrics-latency-graph.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/cloudwatch-metrics-latency-graph.png new file mode 100644 index 00000000000..dfaf9ad05e2 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/cloudwatch-metrics-latency-graph.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png new file mode 100644 index 00000000000..545e5765ae2 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png new file mode 100644 index 00000000000..b09186d46cc Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png new file mode 100644 index 00000000000..8214856d537 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png new file mode 100644 index 00000000000..4c3cbabd1b8 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png new file mode 100644 index 00000000000..1864fe66461 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png new file mode 100644 index 00000000000..d566b83ca6a Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png new file mode 100644 index 00000000000..8f2ee53253b Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-login-page.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-login-page.png new file mode 100644 index 00000000000..3542f16b68a Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-login-page.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png new file mode 100644 index 00000000000..888e50e1193 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png new file mode 100644 index 00000000000..d5c58f45c40 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png new file mode 100644 index 00000000000..f3f0aa31009 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/index.md b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/index.md new file mode 100644 index 00000000000..09ecd57a978 --- /dev/null +++ b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/index.md @@ -0,0 +1,793 @@ +--- +slug: migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai +title: "Migrating From AWS CloudWatch to Prometheus and Grafana on Akamai" +description: "Migrating from AWS CloudWatch to Prometheus and Grafana? Learn how to configure metrics, build custom dashboards, and optimize monitoring with cost-effective, open source tools." +authors: ["Akamai"] +contributors: ["Akamai"] +published: 2025-02-10 +keywords: ['aws','cloudwatch','prometheus','grafana','aws cloudwatch migration','prometheus and grafana setup','migrate to prometheus','grafana dashboards for metrics','cloudwatch alternative','open source monitoring tools','prometheus metrics','grafana visualization','monitoring and observability','prometheus grafana guide','cloudwatch to Prometheus tutorial'] +license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)' +external_resources: +- '[AWS CloudWatch Documentation](https://docs.aws.amazon.com/cloudwatch/)' +- '[Prometheus Documentation](https://prometheus.io/docs/introduction/overview/)' +- '[Grafana Installation Documentation](https://grafana.com/docs/grafana/latest/setup-grafana/installation/)' +- '[Grafana Dashboard Documentation](https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/)' +--- + +AWS CloudWatch is a monitoring and observability service designed to collect and analyze metrics, logs, and events from AWS resources and applications. It provides insights into the performance and health of infrastructure, letting users generate real-time alerts and dashboards for proactive monitoring. + +While CloudWatch can be useful for AWS environments, organizations may seek alternative solutions to reduce costs or increase flexibility across multiple cloud platforms. Prometheus and Grafana offer an open source, platform-agnostic alternative. + +This guide walks through how to migrate standard AWS CloudWatch service logs, metrics, and monitoring to a Prometheus and Grafana software stack on a Linode instance. To illustrate the migration process, an example Flask-based Python application running on a separate instance is configured to send logs and metrics to CloudWatch, and then modified to integrate with Prometheus and Grafana. While this guide uses a Flask application as an example, the principles can be applied to any workload currently monitored via AWS CloudWatch. + +## Introduction to Prometheus and Grafana + +[Prometheus](https://prometheus.io/docs/introduction/overview/) is a [time-series database](https://prometheus.io/docs/concepts/data_model/#data-model) that collects and stores metrics from applications and services. It provides a foundation for monitoring system performance using the PromQL query language to extract and analyze granular data. Prometheus autonomously scrapes (*pulls*) metrics from targets at specified intervals, efficiently storing data through compression while retaining the most critical details. It also supports alerting based on metric thresholds, making it suitable for dynamic, cloud-native environments. + +[Grafana](https://grafana.com/docs/) is a visualization and analytics platform that integrates with Prometheus. It enables users to create real-time, interactive dashboards, visualize metrics, and set up alerts to gain deeper insights into system performance. Grafana can unify data from a wide array of data sources, including Prometheus, to provide a centralized view of system metrics. + +Prometheus and Grafana are considered industry standard, and are commonly used together to monitor service health, detect anomalies, and issue alerts. Being both open source and platfrom-agnostic allows them to be deployed across a diverse range of cloud providers and on-premise infrastructures. Organizations often adopt these tools to reduce operational costs while gaining greater control over how data is collected, stored, and visualized. + +{{< note title="Prometheus and Grafana Marketplace App" >}} +If you prefer an automatic deployment rather than the manual installation steps in this guide, Prometheus and Grafana can be deployed through our [Prometheus and Grafana Marketplace app](https://www.linode.com/marketplace/apps/linode/prometheus-grafana/). +{{< /note >}} + +## Before You Begin + +1. If you do not already have a virtual machine to use, create a Compute Instance for the Prometheus and Grafana stack using the steps in our [Get Started](https://techdocs.akamai.com/cloud-computing/docs/getting-started) and [Create a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/create-a-compute-instance) guides: + + - **Prometheus and Grafana instance requirements**: Linode 8 GB Shared CPU plan, Ubuntu 24.04 LTS distribution + + {{< note type="primary" title="Provisioning Compute Instances with the Linode CLI" isCollapsible="true" >}} + Use these steps if you prefer to use the [Linode CLI](https://techdocs.akamai.com/cloud-computing/docs/getting-started-with-the-linode-cli) to provision resources. + + The following command creates a **Linode 8 GB** compute instance (`g6-standard-4`) running Ubuntu 24.04 LTS (`linode/ubuntu24.04`) in the Miami datacenter (`us-mia`): + + ```command + linode-cli linodes create \ + --image linode/ubuntu24.04 \ + --region us-mia \ + --type g6-standard-4 \ + --root_pass {{< placeholder "PASSWORD" >}} \ + --authorized_keys "$(cat ~/.ssh/id_rsa.pub)" \ + --label monitoring-server + ``` + + Note the following key points: + + - Replace the `region` as desired. + - Replace {{< placeholder "PASSWORD" >}} with a secure alternative for your root password. + - This command assumes that an SSH public/private key pair exists, with the public key stored as `id\_rsa.pub` in the user’s `$HOME/.ssh/` folder. + - The `--label` argument specifies the name of the new server (`monitoring-server`). + {{< /note >}} + + To emulate a real-world workload, the examples in this guide use an additional optional instance to run an example Flask Python application. This application produces sample metrics and is used to illustrate configuration changes when switching from AWS CloudWatch to an alternative monitoring solution. This instance can live on AWS or other infrastructure (such as a Linode) as long as it is configured to send metrics to AWS CloudWatch. + + - **Example Flask app instance requirements**: 1 GB Shared CPU, Ubuntu 24.04 LTS distribution + +1. Follow our [Set Up and Secure a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/set-up-and-secure-a-compute-instance) guide to update each system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access. + +{{< note >}} +This guide is written for a non-root user. Commands that require elevated privileges are prefixed with `sudo`. If you’re not familiar with the `sudo` command, see the [Users and Groups](/docs/guides/linux-users-and-groups/) guide. +{{< /note >}} + +## Install Prometheus as a Service + +1. To install Prometheus, login via SSH to your Linode instance as your limited sudo user: + + ```command + ssh {{< placeholder "SUDO_USER" >}}@{{< placeholder "LINODE_IP" >}} + ``` + +1. Create a dedicated user for Prometheus, disable its login, and create the necessary directories for Prometheus: + + ```command + sudo useradd --no-create-home --shell /bin/false prometheus + sudo mkdir /etc/prometheus + sudo mkdir /var/lib/prometheus + ``` + +1. Download the latest version of Prometheus from its GitHub repository: + + ```command + wget https://github.com/prometheus/prometheus/releases/download/v2.55.1/prometheus-2.55.1.linux-amd64.tar.gz + ``` + + This guide uses version `2.55.1`. Check the project’s [releases page](https://github.com/prometheus/prometheus/releases) for the latest version that aligns with your instance’s operating system. + +1. Extract the compressed file and navigate to the extracted folder: + + ```command + tar xzvf prometheus-2.55.1.linux-amd64.tar.gz + cd prometheus-2.55.1.linux-amd64 + ``` + +1. Move both the `prometheus` and `promtool` binaries to `/usr/local/bin`: + + ```command + sudo cp prometheus /usr/local/bin + sudo cp promtool /usr/local/bin + ``` + + The `prometheus` binary is the main monitoring application, while `promtool` is a utility application that queries and configures a running Prometheus service. + +1. Move the configuration files and directories to the `/etc/prometheus` folder you created previously: + + ```command + sudo cp -r consoles /etc/prometheus + sudo cp -r console_libraries /etc/prometheus + sudo cp prometheus.yml /etc/prometheus/prometheus.yml + ``` + +1. Set the correct ownership permissions for Prometheus files and directories: + + ```command + sudo chown -R prometheus:prometheus /etc/prometheus + sudo chown -R prometheus:prometheus /var/lib/prometheus + sudo chown prometheus:prometheus /usr/local/bin/prometheus + sudo chown prometheus:prometheus /usr/local/bin/promtool + ``` + +### Create a `systemd` Service File + +A `systemd` service configuration file must be created to run Prometheus as a service. + +1. Create the service file using the text editor of your choice. This guide uses `nano`. + + ```command + sudo nano /etc/systemd/system/prometheus.service + ``` + + Add the following content to the file, and save your changes: + + ```file {title="/etc/systemd/system/prometheus.Service"} + [Unit] + Description=Prometheus Service + Wants=network-online.target + After=network-online.target + + [Service] + User=prometheus + Group=prometheus + Type=simple + ExecStart=/usr/local/bin/prometheus \ + --config.file=/etc/prometheus/prometheus.yml \ + --storage.tsdb.path=/var/lib/prometheus \ + --web.console.templates=/etc/prometheus/consoles \ + --web.console.libraries=/etc/prometheus/console_libraries + + [Install] + WantedBy=multi-user.target + ``` + +1. Reload the `systemd` configuration files to apply the new service file: + + ```command + sudo systemctl daemon-reload + ``` + +1. Using `systemctl`, start the `flash-app` service and enable it to automatically start after a system reboot: + + ```command + sudo systemctl start prometheus + sudo systemctl enable prometheus + ``` + +1. Verify that Prometheus is running: + + ```command + systemctl status prometheus + ``` + + The output should display `active (running)`, confirming a successful setup: + + ```output + ● prometheus.service - Prometheus Service + Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 16:11:57 EST; 5s ago + Main PID: 1165 (prometheus) + Tasks: 9 (limit: 9444) + Memory: 16.2M (peak: 16.6M) + CPU: 77ms + CGroup: /system.slice/prometheus.service + ``` + + When done, press the Q key to exit the status output and return to the terminal prompt. + +1. Open a web browser and visit your instance's IP address on port `9090` (Prometheus's default port): + + ```command + http://{{< placeholder "IP_ADDRESS" >}}:9090 + ``` + + The Prometheus UI should appear: + + ![Prometheus UI homepage at port :9090, displaying the query and status options.](prometheus-ui-overview.png) + + {{< note >}} + Prometheus settings are configured in the `/etc/prometheus/prometheus.yml` file. This guide uses the default values. For production systems, consider enabling authentication and other security measures to protect your metrics. + {{< /note >}} + +## Install the Grafana Service + +Grafana provides an `apt` repository, reducing the number of steps needed to install and update it on Ubuntu. + +1. Install the necessary package to add new repositories: + + ```command + sudo apt install software-properties-common -y + ``` + +1. Import and add the public key for the Grafana repository: + + ```command + wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - + sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main" + ``` + +1. Update the package index and install Grafana: + + ```command + sudo apt update + sudo apt install grafana -y + ``` + +1. The installation process already sets up the `systemd` configuration for Grafana. Start and enable the Grafana service: + + ```command + sudo systemctl start grafana-server + sudo systemctl enable grafana-server + ``` + +1. Run the following command to verify that Grafana is `active (running)`: + + ```command + systemctl status grafana-server + ``` + + ```output + ● grafana-server.service - Grafana instance + Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 13:57:10 EST; 8s ago + Docs: http://docs.grafana.org + Main PID: 3434 (grafana) + Tasks: 14 (limit: 9444) + Memory: 71.4M (peak: 80.4M) + CPU: 2.971s + CGroup: /system.slice/grafana-server.service + ``` + +### Connect Grafana to Prometheus + +1. Open a web browser and visit your instance's IP address on port `3000` (Grafana's default port) to access the Grafana web UI: + + ```command + http://{{< placeholder "IP_ADDRESS" >}}:3000 + ``` + +1. Login using the default credentials of `admin` for both the username and password: + + ![Grafana login page showing fields for entering username and password.](grafana-login-page.png) + +1. After logging in, you are prompted to enter a secure replacement for the default password: + + ![Grafana user interface prompting for a new password after the first login.](grafana-new-password-prompt.png) + +1. Add Prometheus as a data source by expanding the **Home** menu, navigating to the **Connections** entry, and clicking **Add new connection**: + + ![Grafana home menu with the option to add a new connection under the Connections section.](grafana-add-new-connection.png) + +1. Search for and select **Prometheus**. + +1. Click **Add new data source**. + + ![Grafana interface with Add New Data Source options, displaying Prometheus configuration fields.](grafana-add-datasource.png) + +1. In the **URL** field, enter `http://localhost:9090`. + +1. Click **Save & Test** to confirm the connection. + + ![Grafana test result confirming successful connection to a Prometheus data source.](grafana-connection-test-success.png) + + If successful, your Grafana installation is now connected to the Prometheus installation running on the same Linode. + +## Configure Example Flask Server + +This guide demonstrates the migration process using an example Flask app running on a separate instance from which metrics and logs can be collected. + +1. Log in to the instance running the example Flask application as a user with `sudo` privileges. + +1. Create a directory for the project named `exmaple-flask-app` and navigate into it: + + ```command + mkdir example-flask-app + cd example-flask-app + ``` + +1. Using a text editor of your choice, create a file called `app.py`: + + ```command + nano app.py + ``` + + Give it the following contents: + + ```file {title="app.py", lang="python"} + import boto3 # Note: pip install boto3 + import json + import logging + import time + + from flask import Flask, request + + logging.basicConfig(filename='flask-app.log', level=logging.INFO) + logger = logging.getLogger(__name__) + + app = Flask(__name__) + + # AWS CloudWatch setup + cloudwatch = boto3.client('cloudwatch') + + @app.before_request + def start_timer(): + request.start_time = time.time() + + @app.after_request + def send_latency_metric(response): + latency = time.time() - request.start_time + + # Send latency metric to CloudWatch + cloudwatch.put_metric_data( + Namespace='FlaskApp', + MetricData=[ + { + 'MetricName': 'EndpointLatency', + 'Dimensions': [ + { + 'Name': 'Endpoint', + 'Value': request.path + }, + { + 'Name': 'Method', + 'Value': request.method + } + ], + 'Unit': 'Seconds', + 'Value': latency + } + ] + ) + + return response + + @app.route('/') + def hello_world(): + logger.info("A request was received at the root URL") + return {'message': 'Hello, World!'}, 200 + + if __name__ == '__main__': + app.run(host='0.0.0.0', port=8080) + ``` + + The example Flask application in this guide collects and sends endpoint latency metrics to CloudWatch using the [`put_metric_data`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudwatch/client/put_metric_data.html) API from [Boto3](https://github.com/boto/boto3). Application logs are written to a local file and ingested into CloudWatch Logs for centralization. + + When done, save your changes, and close the text editor. + +1. Create a separate text file called `requirements.txt`: + + ```command + nano requirements.txt + ``` + + Provide it with the following basic dependencies for the Flask application to function, and save your changes: + + ```file {title="requirements.txt"} + Flask==3.0.3 + itsdangerous==2.2.0 + Jinja2==3.1.4 + MarkupSafe==2.1.5 + Werkzeug==3.0.4 + ``` + +1. A virtual environment is required to run `pip` commands in Ubuntu 24.04 LTS. Use the following command to install `python3.12-venv`: + + ```command + sudo apt install python3.12-venv + ``` + +1. Using the `venv` utility, create a virtual environment named `venv` within the `example-flask-app` directory: + + ```command + python3 -m venv venv + ``` + +1. Activate the `venv` virtual environment: + + ```command + source venv/bin/activate + ``` + +1. Use `pip` to install the example Flask application's dependencies from the `requirements.txt` file: + + ```command + pip install -r requirements.txt + ``` + +1. Also using `pip`, install the `boto3` library, a Python library required for interfacing with AWS resources: + + ```command + pip install boto3 + ``` + +1. Exit the virtual environment: + + ```command + deactivate + ``` + +### Create a `systemd` Service File + +1. Create a `systemd` service file for the example Flask app: + + ```command + sudo nano /etc/systemd/system/flask-app.service + ``` + + Provide the file with the following content, replacing {{< placeholder "USERNAME" >}} with your actual `sudo` user: + + ```file {title="/etc/systemd/system/flask-app.service"} + [Unit] + Description=Flask Application Service + After=network.target + + [Service] + User={{< placeholder "USERNAME" >}} + WorkingDirectory=/home/{{< placeholder "USERNAME" >}}/example-flask-app + ExecStart=/home/{{< placeholder "USERNAME" >}}/example-flask-app/venv/bin/python /home/{{< placeholder "USERNAME" >}}/example-flask-app/app.py + Restart=always + + [Install] + WantedBy=multi-user.target + ``` + + Save your changes when complete. + +1. Reload the `systemd` configuration files to apply the new service file, then start and enable the service: + + ```command + sudo systemctl daemon-reload + sudo systemctl start flask-app + sudo systemctl enable flask-app + ``` + +1. Verify that the `flask-app` service is `active (running)`: + + ```command + systemctl status flask-app + ``` + + ```output + ● flask-app.service - Flask Application Service + Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago + Main PID: 4413 (python) + Tasks: 1 (limit: 9444) + Memory: 20.3M (peak: 20.3M) + CPU: 196ms + CGroup: /system.slice/flask-app.service + ``` + + Once the Flask application is running, CloudWatch can monitor its data. + +1. Generate data by issuing an HTTP request using the following cURL command. Replace {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of the instance where the Flask app is running: + + ```command + curl http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080 + ``` + + You should receive the following response: + + ```output + {"message": "Hello, World!"} + ``` + +## Migrate from AWS CloudWatch to Prometheus and Grafana + +Migrating from AWS CloudWatch to Prometheus and Grafana requires careful planning. It is important to ensure the continuity of your monitoring capabilities while leveraging the added control over data handling and advanced features of Prometheus and Grafana. + +### Assess Current Monitoring Requirements + +Before migrating to Prometheus and Grafana, it's important to understand what metrics and logs are currently being collected by CloudWatch and how they are used. This may vary depending on your application. + +Metrics such as endpoint latency are collected for every HTTP request, along with HTTP method details. Application logs record incoming requests, exceptions, and warnings. For example, when the sample Flask application is configured with AWS CloudWatch, it emits logs like the following: + +![Example of CloudWatch logs with INFO level log entries for a Flask application.](cloudwatch-logs-example.png) + +CloudWatch also visualizes metrics in graphs. For instance, by querying the endpoint latency metrics sent by the Flask application, a graph may look like this: + +![CloudWatch metrics graph displaying endpoint latency data over time.](cloudwatch-metrics-latency-graph.png) + +### Export Existing CloudWatch Logs and Metrics + +AWS includes tools for exporting CloudWatch data for analysis or migration. For example, CloudWatch logs can be exported to an S3 bucket, making them accessible outside AWS and enabling them to be re-ingested into other tools. + +To export CloudWatch Logs to S3, use the following [`create-export-task`](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/logs/create-export-task.html) command from the system where your AWS CLI is configured: + +```command +aws logs create-export-task \ + --log-group-name {{< placeholder "LOG_GROUP" >}} \ + --from {{< placeholder "START_TIME" >}} \ + --to {{< placeholder "END_TIME" >}} \ + --destination {{< placeholder "S3_BUCKET_NAME" >}} \ + --destination-prefix cloudwatch-logs/ +``` + +Replace the following placeholders with your specific values: + +- {{< placeholder "LOG_GROUP" >}}: The name of the log group to export. +- {{< placeholder "START_TIME" >}} and {{< placeholder "END_TIME" >}}: The time range in milliseconds. +- {{< placeholder "S3_BUCKET_NAME" >}}: The name of your S3 bucket. + +### Expose Application Metrics to Prometheus + +Prometheus works differently from CloudWatch. Instead of *pushing* data like CloudWatch, Prometheus *pulls* metrics from the monitored application. After assessing or exporting metrics as needed, modify the application to enable Prometheus metric scraping so that it collects the same metrics previously sent to CloudWatch. This process varies from application to application. + +For the example Flask application in this guide, the [`prometheus_flask_exporter` library](https://github.com/rycus86/prometheus_flask_exporter) is a standard library that can be used for instrumenting Flask applications to expose Prometheus metrics. + +1. Reactivate the `venv` virtual environment: + + ```command + source venv/bin/activate + ``` + +1. Use `pip` to install the `prometheus_client` and `prometheus_flask_exporter` libraries: + + ```command + pip install prometheus_client prometheus_flask_exporter + ``` + +1. Exit the virtual environment: + + ```command + deactivate + ``` + +1. Using a text editor of your choice, open the `app.py` file for the Flask application: + + ```command + nano app.py + ``` + + Replace the file's current AWS-specific contents with the Prometheus-specific code below: + + ```file {title="app.py" lang="python"} + import logging + import random + import time + + from flask import Flask + from prometheus_flask_exporter import PrometheusMetrics + + logging.basicConfig(filename="flask-app.log", level=logging.INFO) + logger = logging.getLogger(__name__) + + app = Flask(__name__) + metrics = PrometheusMetrics(app) + + metrics.info("FlaskApp", "Application info", version="1.0.0") + + + @app.route("/") + def hello_world(): + logger.info("A request was received at the root URL") + return {"message": "Hello, World!"}, 200 + + + @app.route("/long-request") + def long_request(): + n = random.randint(1, 5) + logger.info( + f"A request was received at the long-request URL. Slept for {n} seconds" + ) + time.sleep(n) + return {"message": f"Long running request with {n=}"}, 200 + + + if __name__ == "__main__": + app.run(host="0.0.0.0", port=8080) + ``` + + This uses the `prometheus_flask_exporter` library to: + + - Instrument the Flask app for Prometheus metrics. + - Expose default and application-specific metrics at the `/metrics` endpoint. + - Provide metadata such as version information via `metrics.info`. + +1. Save and close the file, then restart the `flask-app` service: + + ```command + sudo systemctl restart flask-app + ``` + +1. Verify that the `flask-app` service is `active (running)`: + + ```command + systemctl status flask-app + ``` + + ```output + ● flask-app.service - Flask Application Service + Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago + Main PID: 4413 (python) + Tasks: 1 (limit: 9444) + Memory: 20.3M (peak: 20.3M) + CPU: 196ms + CGroup: /system.slice/flask-app.service + ``` + +1. Test to see if the Flask app is accessible by issuing the following cURL command. Replace {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of the instance where the Flask app is running: + + ```command + curl http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080 + ``` + + You should receive the following response: + + ```output + {"message": "Hello, World!"} + ``` + +1. To view the metrics, open a web browser and visit the following URL: + + ```command + http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080/metrics + ``` + + The metrics shown include `http_request_duration_seconds` (request latency) and `http_requests_total` (total number of requests). + +### Configure Prometheus to Ingest Application Metrics + +1. Log back in to the Prometheus & Grafana instance. + +1. Using a text editor, open and modify the Prometheus configuration at `/etc/prometheus/prometheus.yml` to include the Flask application as a scrape target: + + ```command + sudo nano /etc/prometheus/prometheus.yml + ``` + + Append the following content to the `scrape_configs` section of the file, replacing {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of your `monitoring-server` instance: + + ```file {title="/etc/prometheus/prometheus.yml"} + - job_name: 'flask_app' + static_configs: + - targets: ['{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080'] + ``` + + This configuration tells Prometheus to scrape metrics from the Flask application running on port `8080`. + +1. Save the file, and restart Prometheus to apply the changes: + + ```command + sudo systemctl restart prometheus + ``` + +1. To verify that Prometheus is successfully scraping the Flask app, open a web browser and navigate to the Prometheus user interface on port 9090. This is the default port used for Prometheus. Replace {{< placeholder "INSTANCE_IP_ADDRESS" >}} with the IP of your instance: + + ```command + http://{{< placeholder "INSTANCE_IP_ADDRESS" >}}:9090 + ``` + +1. In the Prometheus UI click the **Status** tab and select **Targets**. You should see the Flask application service listed as a target with a status of `UP`, indicating that Prometheus is successfully scraping metrics from the application. + + ![Prometheus UI showing the status and targets of monitored services.](prometheus-ui-targets.png) + +### Create a Grafana Dashboard with Application Metrics + +Grafana serves as the visualization layer, providing an interface for creating dashboards from Prometheus metrics. + +1. Open a web browser and visit the following URL to access the Grafana UI on port 3000 (the default port for Grafana). Replace {{< placeholder "INSTANCE_IP_ADDRESS" >}} with the IP of your instance: + + ```command + http://{{< placeholder "INSTANCE_IP_ADDRESS" >}}:3000 + ``` + +1. Navigate to the **Dashboards** page: + + ![Grafana home menu with the Dashboards section selected.](grafana-home-menu-dashboards.png) + +1. Create a new dashboard in Grafana by clicking **Create dashboard**: + + ![Grafana Dashboards page with an option to create a new dashboard.](grafana-dashboards-overview.png) + +1. Click **Add visualization**: + + ![Grafana interface showing the Add Visualization dialog for creating a new graph.](grafana-add-visualization.png) + +1. In the resulting dialog, select the **prometheus** data source: + + ![Grafana data source selection dialog with Prometheus highlighted.](grafana-prometheus-datasource.png) + +1. To duplicate the CloudWatch metrics for the Flask application, first click on the **Code** tab in the right-hand side of the panel editor: + + ![Grafana panel editor with the Code tab selected for entering a PromQL query.](grafana-panel-editor-query-code.png) + +1. Input the following PromQL query to calculate the average latency for an endpoint: + + ```command + flask_http_request_duration_seconds_sum{method="GET",path="/",status="200"} / + flask_http_request_duration_seconds_count{method="GET",path="/",status="200"} + ``` + +1. After entering the formula, click **Run queries** to execute the PromQL query. The chart should update with data pulled from Prometheus: + + ![Grafana dashboard displaying a latency graph for a Flask application, based on Prometheus data.](grafana-latency-dashboard.png) + + This visualization replicates CloudWatch's endpoint latency graph, detailing the average latency over time for a particular endpoint. Prometheus also provides default labels such as method, path, and status codes, for additional granularity in analysis. + +## Additional Considerations and Concerns + +### Cost Management + +CloudWatch incurs costs based on the number of API requests, log volume, and data retention. As monitoring scales, these costs can increase. Prometheus is an open source tool with no direct charges for usage and offers a potential for cost savings. + +However, infrastructure costs for running Prometheus and Grafana are still a consideration. Running Prometheus and Grafana requires provisioning compute and storage resources, with expenses for maintenance and handling network traffic. Additionally, since Prometheus is primarily designed for short-term data storage, setting up long-term storage solution may also increase costs. + +**Recommendation**: + +- Estimate infrastructure costs for Prometheus and Grafana by assessing current CloudWatch data volume and access usage. +- Utilize object storage or other efficient long-term storage mechanisms to minimize costs. + +### Data Consistency and Accuracy + +CloudWatch aggregates metrics over set intervals, whereas Prometheus collects high-resolution raw metrics. Therefore, migrating from CloudWatch to Prometheus can raise potential concerns about data consistency and accuracy during and after the transition. + +**Recommendation**: + +- Tune Prometheus scrape intervals to capture the necessary level of detail without overwhelming storage or compute capacities. +- Validate that CloudWatch metrics correctly map to Prometheus metrics, with the appropriate time resolutions. + +### CloudWatch Aggregated Data Versus Prometheus Raw Data + +Aggregated data from CloudWatch offers a high-level view of system health and application performance, and can be helpful for monitoring broader trends. Alternatively, the raw data from Prometheus enables detailed analyses and granular troubleshooting. Both approaches have their use cases, and it's important to understand which is most appropriate for you. + +While Prometheus has the ability to collect raw data, consider whether CloudWatch's aggregation is more useful, and how to replicate that with Grafana dashboards or Prometheus queries. + +**Recommendation**: + +- Create Grafana dashboards that aggregate Prometheus data for overall system-level insights. +- Leverage Prometheus's detailed, raw metrics for fine-grained data analysis. + +### Alert System Migration + +CloudWatch’s integrated alerting system is tightly coupled with AWS services and allows for alerts based on metric thresholds, log events, and more. Prometheus offers its own alerting system, [**Alertmanager**](https://prometheus.io/docs/alerting/latest/alertmanager/), which can handle alerts based on Prometheus query results. + +Migrating an alerting setup requires translating existing CloudWatch alarms into Prometheus alert rules. Consider how the thresholds and conditions set in CloudWatch translate to query-based alerts in Prometheus. + +**Recommendation**: + +- Audit all CloudWatch alerts and replicate them using Prometheus Alertmanager. +- Refine alert thresholds based on the type of data collected by Prometheus. +- Integrate Alertmanager with any existing notification systems (e.g. email, Slack, etc.) to maintain consistency in how teams are alerted to critical events. + +### Security and Access Controls + +CloudWatch integrates with AWS Identity and Access Management (IAM) for role-based access control (RBAC). This helps with management of who can view, edit, or delete logs and metrics. Prometheus and Grafana require manual configuration of security and access controls. + +Securing Prometheus and Grafana involves setting up user authentication (e.g. OAuth, LDAP, etc.) and ensuring metrics and dashboards are only accessible to authorized personnel. To maintain security, data in transit should be encrypted using TLS. + +**Recommendation**: + +- Implement secure access controls from the start. +- Configure Grafana with a well-defined RBAC policy and integrate it with an authentication system, such as OAuth or LDAP. +- Enable TLS for Prometheus to secure data in transit, and restrict access to sensitive metrics. + +### Separate Log and Metric Responsibilities + +Since Prometheus is primarily a metrics-based monitoring solution, it does not have built-in capabilities for handling logs in the same way CloudWatch does. Therefore, it's important to decouple log management needs from metric collection when migrating. + +**Recommendation**: + +- Introduce a specialized log aggregation solution alongside Prometheus and Grafana for collecting, aggregating, and querying logs: + - [**Grafana Loki**](https://grafana.com/oss/loki/) is designed to integrate with Grafana. It provides log querying capabilities within Grafana's existing interface, giving a unified view of metrics and logs in a single dashboard. + - [**Fluentd**](https://www.fluentd.org/) is a log aggregator that can forward logs to multiple destinations, including object storage for long-term retention. It works with both Loki and ELK. \ No newline at end of file diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png new file mode 100644 index 00000000000..3752662dfc0 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png new file mode 100644 index 00000000000..0ad9028307d Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-aws-cloudwatch-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-custom-metrics-latency.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-custom-metrics-latency.png new file mode 100644 index 00000000000..20905c414b0 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-custom-metrics-latency.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-custom-metrics-request-count.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-custom-metrics-request-count.png new file mode 100644 index 00000000000..d7f2087e8b7 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-custom-metrics-request-count.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-flask-log-entry.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-flask-log-entry.png new file mode 100644 index 00000000000..d33a8907d11 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/azure-monitor-flask-log-entry.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png new file mode 100644 index 00000000000..545e5765ae2 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png new file mode 100644 index 00000000000..b09186d46cc Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png new file mode 100644 index 00000000000..8214856d537 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png new file mode 100644 index 00000000000..4c3cbabd1b8 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png new file mode 100644 index 00000000000..1864fe66461 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png new file mode 100644 index 00000000000..d566b83ca6a Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png new file mode 100644 index 00000000000..8f2ee53253b Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-login-page.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-login-page.png new file mode 100644 index 00000000000..3542f16b68a Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-login-page.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png new file mode 100644 index 00000000000..888e50e1193 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png new file mode 100644 index 00000000000..d5c58f45c40 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png new file mode 100644 index 00000000000..f3f0aa31009 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/index.md b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/index.md new file mode 100644 index 00000000000..d0edd724ba9 --- /dev/null +++ b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/index.md @@ -0,0 +1,808 @@ +--- +slug: migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai +title: "Migrating From Azure Monitor to Prometheus and Grafana on Akamai" +description: "Migrating from Azure Metrics to Prometheus and Grafana? Learn how to configure metrics, build custom dashboards, and optimize monitoring with cost-effective, open source tools." +authors: ["Akamai"] +contributors: ["Akamai"] +published: 2025-02-10 +keywords: ['azure','azure metrics','prometheus','grafana','azure metrics migration','prometheus and grafana setup','migrate to prometheus','grafana dashboards for metrics','azure metrics alternative','open source monitoring tools','prometheus metrics','grafana visualization','monitoring and observability','prometheus grafana guide','azure metrics to Prometheus tutorial'] +license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)' +external_resources: +- '[Azure Monitor Documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/)' +- '[Prometheus Documentation](https://prometheus.io/docs/introduction/overview/)' +- '[Grafana Installation Documentation](https://grafana.com/docs/grafana/latest/setup-grafana/installation/)' +- '[Grafana Dashboard Documentation](https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/)' +--- + +Azure Monitor is Microsoft's built-in observability platform. It is designed to monitor and analyze the performance and reliability of applications and infrastructure within the Azure ecosystem. It collects metrics, logs, and telemetry data from Azure resources, on-premises environments, and other cloud services. It also offer tools to optimize and maintain system health. + +This guide explains how to migrate Azure Monitor service logs and metrics to Prometheus and Grafana running on a Linode instance. + +This guide walks through how to migrate standard Azure Monitor service logs, metrics, and monitoring to a Prometheus and Grafana software stack on a Linode instance. To illustrate the migration process, an example Flask-based Python application running on a separate instance is configured to send logs and metrics to Azure Monitor, and then modified to integrate with Prometheus and Grafana. While this guide uses a Flask application as an example, the principles can be applied to any workload currently monitored via Azure Monitor. + +## Introduction to Prometheus and Grafana + +[Prometheus](https://prometheus.io/docs/introduction/overview/) is a [time-series database](https://prometheus.io/docs/concepts/data_model/#data-model) that collects and stores metrics from applications and services. It provides a foundation for monitoring system performance using the PromQL query language to extract and analyze granular data. Prometheus autonomously scrapes (*pulls*) metrics from targets at specified intervals, efficiently storing data through compression while retaining the most critical details. It also supports alerting based on metric thresholds, making it suitable for dynamic, cloud-native environments. + +[Grafana](https://grafana.com/docs/) is a visualization and analytics platform that integrates with Prometheus. It enables users to create real-time, interactive dashboards, visualize metrics, and set up alerts to gain deeper insights into system performance. Grafana can unify data from a wide array of data sources, including Prometheus, to provide a centralized view of system metrics. + +Prometheus and Grafana are considered industry standard, and are commonly used together to monitor service health, detect anomalies, and issue alerts. Being both open source and platfrom-agnostic allows them to be deployed across a diverse range of cloud providers and on-premise infrastructures. Organizations often adopt these tools to reduce operational costs while gaining greater control over how data is collected, stored, and visualized. + +{{< note title="Prometheus and Grafana Marketplace App" >}} +If you prefer an automatic deployment rather than the manual installation steps in this guide, Prometheus and Grafana can be deployed through our [Prometheus and Grafana Marketplace app](https://www.linode.com/marketplace/apps/linode/prometheus-grafana/). +{{< /note >}} + +## Before You Begin + +1. If you do not already have a virtual machine to use, create a Compute Instance for the Prometheus and Grafana stack using the steps in our [Get Started](https://techdocs.akamai.com/cloud-computing/docs/getting-started) and [Create a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/create-a-compute-instance) guides: + + - **Prometheus and Grafana instance requirements**: Linode 8 GB Shared CPU plan, Ubuntu 24.04 LTS distribution + + {{< note type="primary" title="Provisioning Compute Instances with the Linode CLI" isCollapsible="true" >}} + Use these steps if you prefer to use the [Linode CLI](https://techdocs.akamai.com/cloud-computing/docs/getting-started-with-the-linode-cli) to provision resources. + + The following command creates a **Linode 8 GB** compute instance (`g6-standard-4`) running Ubuntu 24.04 LTS (`linode/ubuntu24.04`) in the Miami datacenter (`us-mia`): + + ```command + linode-cli linodes create \ + --image linode/ubuntu24.04 \ + --region us-mia \ + --type g6-standard-4 \ + --root_pass {{< placeholder "PASSWORD" >}} \ + --authorized_keys "$(cat ~/.ssh/id_rsa.pub)" \ + --label monitoring-server + ``` + + Note the following key points: + + - Replace the `region` as desired. + - Replace {{< placeholder "PASSWORD" >}} with a secure alternative for your root password. + - This command assumes that an SSH public/private key pair exists, with the public key stored as `id\_rsa.pub` in the user’s `$HOME/.ssh/` folder. + - The `--label` argument specifies the name of the new server (`monitoring-server`). + {{< /note >}} + + To emulate a real-world workload, the examples in this guide use an additional optional instance to run an example Flask Python application. This application produces sample metrics and is used to illustrate configuration changes when switching from Azure Monitor to an alternative monitoring solution. This instance can live on Azure or other infrastructure (such as a Linode) as long as it is configured to send metrics to Azure Monitor. + + - **Example Flask app instance requirements**: 1 GB Shared CPU, Ubuntu 24.04 LTS distribution + +1. Follow our [Set Up and Secure a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/set-up-and-secure-a-compute-instance) guide to update each system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access. + +{{< note >}} +This guide is written for a non-root user. Commands that require elevated privileges are prefixed with `sudo`. If you’re not familiar with the `sudo` command, see the [Users and Groups](/docs/guides/linux-users-and-groups/) guide. +{{< /note >}} + +## Install Prometheus as a Service + +1. To install Prometheus, login via SSH to your Linode instance as your limited sudo user: + + ```command + ssh {{< placeholder "SUDO_USER" >}}@{{< placeholder "LINODE_IP" >}} + ``` + +1. Create a dedicated user for Prometheus, disable its login, and create the necessary directories for Prometheus: + + ```command + sudo useradd --no-create-home --shell /bin/false prometheus + sudo mkdir /etc/prometheus + sudo mkdir /var/lib/prometheus + ``` + +1. Download the latest version of Prometheus from its GitHub repository: + + ```command + wget https://github.com/prometheus/prometheus/releases/download/v2.55.1/prometheus-2.55.1.linux-amd64.tar.gz + ``` + + This guide uses version `2.55.1`. Check the project’s [releases page](https://github.com/prometheus/prometheus/releases) for the latest version that aligns with your instance’s operating system. + +1. Extract the compressed file and navigate to the extracted folder: + + ```command + tar xzvf prometheus-2.55.1.linux-amd64.tar.gz + cd prometheus-2.55.1.linux-amd64 + ``` + +1. Move both the `prometheus` and `promtool` binaries to `/usr/local/bin`: + + ```command + sudo cp prometheus /usr/local/bin + sudo cp promtool /usr/local/bin + ``` + + The `prometheus` binary is the main monitoring application, while `promtool` is a utility application that queries and configures a running Prometheus service. + +1. Move the configuration files and directories to the `/etc/prometheus` folder you created previously: + + ```command + sudo cp -r consoles /etc/prometheus + sudo cp -r console_libraries /etc/prometheus + sudo cp prometheus.yml /etc/prometheus/prometheus.yml + ``` + +1. Set the correct ownership permissions for Prometheus files and directories: + + ```command + sudo chown -R prometheus:prometheus /etc/prometheus + sudo chown -R prometheus:prometheus /var/lib/prometheus + sudo chown prometheus:prometheus /usr/local/bin/prometheus + sudo chown prometheus:prometheus /usr/local/bin/promtool + ``` + +### Create a `systemd` Service File + +A `systemd` service configuration file must be created to run Prometheus as a service. + +1. Create the service file using the text editor of your choice. This guide uses `nano`. + + ```command + sudo nano /etc/systemd/system/prometheus.service + ``` + + Add the following content to the file, and save your changes: + + ```file {title="/etc/systemd/system/prometheus.Service"} + [Unit] + Description=Prometheus Service + Wants=network-online.target + After=network-online.target + + [Service] + User=prometheus + Group=prometheus + Type=simple + ExecStart=/usr/local/bin/prometheus \ + --config.file=/etc/prometheus/prometheus.yml \ + --storage.tsdb.path=/var/lib/prometheus \ + --web.console.templates=/etc/prometheus/consoles \ + --web.console.libraries=/etc/prometheus/console_libraries + + [Install] + WantedBy=multi-user.target + ``` + +1. Reload the `systemd` configuration files to apply the new service file: + + ```command + sudo systemctl daemon-reload + ``` + +1. Using `systemctl`, start the `flash-app` service and enable it to automatically start after a system reboot: + + ```command + sudo systemctl start prometheus + sudo systemctl enable prometheus + ``` + +1. Verify that Prometheus is running: + + ```command + systemctl status prometheus + ``` + + The output should display `active (running)`, confirming a successful setup: + + ```output + ● prometheus.service - Prometheus Service + Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 16:11:57 EST; 5s ago + Main PID: 1165 (prometheus) + Tasks: 9 (limit: 9444) + Memory: 16.2M (peak: 16.6M) + CPU: 77ms + CGroup: /system.slice/prometheus.service + ``` + + When done, press the Q key to exit the status output and return to the terminal prompt. + +1. Open a web browser and visit your instance's IP address on port `9090` (Prometheus's default port): + + ```command + http://{{< placeholder "IP_ADDRESS" >}}:9090 + ``` + + The Prometheus UI should appear: + + ![Prometheus UI homepage at port :9090, displaying the query and status options.](prometheus-ui-overview.png) + + {{< note >}} + Prometheus settings are configured in the `/etc/prometheus/prometheus.yml` file. This guide uses the default values. For production systems, consider enabling authentication and other security measures to protect your metrics. + {{< /note >}} + +## Install the Grafana Service + +Grafana provides an `apt` repository, reducing the number of steps needed to install and update it on Ubuntu. + +1. Install the necessary package to add new repositories: + + ```command + sudo apt install software-properties-common -y + ``` + +1. Import and add the public key for the Grafana repository: + + ```command + wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - + sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main" + ``` + +1. Update the package index and install Grafana: + + ```command + sudo apt update + sudo apt install grafana -y + ``` + +1. The installation process already sets up the `systemd` configuration for Grafana. Start and enable the Grafana service: + + ```command + sudo systemctl start grafana-server + sudo systemctl enable grafana-server + ``` + +1. Run the following command to verify that Grafana is `active (running)`: + + ```command + systemctl status grafana-server + ``` + + ```output + ● grafana-server.service - Grafana instance + Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 13:57:10 EST; 8s ago + Docs: http://docs.grafana.org + Main PID: 3434 (grafana) + Tasks: 14 (limit: 9444) + Memory: 71.4M (peak: 80.4M) + CPU: 2.971s + CGroup: /system.slice/grafana-server.service + ``` + +### Connect Grafana to Prometheus + +1. Open a web browser and visit your instance's IP address on port `3000` (Grafana's default port) to access the Grafana web UI: + + ```command + http://{{< placeholder "IP_ADDRESS" >}}:3000 + ``` + +1. Login using the default credentials of `admin` for both the username and password: + + ![Grafana login page showing fields for entering username and password.](grafana-login-page.png) + +1. After logging in, you are prompted to enter a secure replacement for the default password: + + ![Grafana user interface prompting for a new password after the first login.](grafana-new-password-prompt.png) + +1. Add Prometheus as a data source by expanding the **Home** menu, navigating to the **Connections** entry, and clicking **Add new connection**: + + ![Grafana home menu with the option to add a new connection under the Connections section.](grafana-add-new-connection.png) + +1. Search for and select **Prometheus**. + +1. Click **Add new data source**. + + ![Grafana interface with Add New Data Source options, displaying Prometheus configuration fields.](grafana-add-datasource.png) + +1. In the **URL** field, enter `http://localhost:9090`. + +1. Click **Save & Test** to confirm the connection. + + ![Grafana test result confirming successful connection to a Prometheus data source.](grafana-connection-test-success.png) + + If successful, your Grafana installation is now connected to the Prometheus installation running on the same Linode. + +## Configure Example Flask Server + +This guide demonstrates the migration process using an example Flask app running on a separate instance from which metrics and logs can be collected. + +1. Log in to the instance running the example Flask application as a user with `sudo` privileges. + +1. Create a directory for the project named `exmaple-flask-app` and navigate into it: + + ```command + mkdir example-flask-app + cd example-flask-app + ``` + +1. Using a text editor of your choice, create a file called `app.py`: + + ```command + nano app.py + ``` + + Give it the following contents: + + ```file {title="app.py", lang="python"} + import json + import logging + import time + + from flask import Flask, request + from applicationinsights import TelemetryClient # Note: pip install applicationinsights + + logging.basicConfig(filename='flask-app.log', level=logging.INFO) + logger = logging.getLogger(__name__) + + app = Flask(__name__) + + # Azure Monitoring setup + tc = TelemetryClient('YOUR_INSTRUMENTATION_KEY') + + @app.before_request + def start_timer(): + request.start_time = time.time() + + @app.after_request + def send_latency_metric(response): + latency = time.time() - request.start_time + + # Send latency metric to Azure Monitoring + tc.track_metric("EndpointLatency", latency, properties={ + "Endpoint": request.path, + "Method": request.method + }) + tc.flush() + + return response + + @app.route('/') + def hello_world(): + logger.info("A request was received at the root URL") + return {'message': 'Hello, World!'}, 200 + + if __name__ == '__main__': + app.run(host='0.0.0.0', port=8080) + ``` + + When done, save your changes, and close the text editor. + +1. Create a separate text file called `requirements.txt`: + + ```command + nano requirements.txt + ``` + + Provide it with the following basic dependencies for the Flask application to function, and save your changes: + + ```file {title="requirements.txt"} + Flask==3.0.3 + itsdangerous==2.2.0 + Jinja2==3.1.4 + MarkupSafe==2.1.5 + Werkzeug==3.0.4 + ``` + +1. A virtual environment is required to run `pip` commands in Ubuntu 24.04 LTS. Use the following command to install `python3.12-venv`: + + ```command + sudo apt install python3.12-venv + ``` + +1. Using the `venv` utility, create a virtual environment named `venv` within the `example-flask-app` directory: + + ```command + python3 -m venv venv + ``` + +1. Activate the `venv` virtual environment: + + ```command + source venv/bin/activate + ``` + +1. Use `pip` to install the example Flask application's dependencies from the `requirements.txt` file: + + ```command + pip install -r requirements.txt + ``` + +1. Also using `pip`, install `applicationinsights`, which is required for interfacing with Azure Monitor: + + ```command + pip install applicationinsights + ``` + +1. Exit the virtual environment: + + ```command + deactivate + ``` + +### Create a `systemd` Service File + +1. Create a `systemd` service file for the example Flask app: + + ```command + sudo nano /etc/systemd/system/flask-app.service + ``` + + Provide the file with the following content, replacing {{< placeholder "USERNAME" >}} with your actual `sudo` user: + + ```file {title="/etc/systemd/system/flask-app.service"} + [Unit] + Description=Flask Application Service + After=network.target + + [Service] + User={{< placeholder "USERNAME" >}} + WorkingDirectory=/home/{{< placeholder "USERNAME" >}}/example-flask-app + ExecStart=/home/{{< placeholder "USERNAME" >}}/example-flask-app/venv/bin/python /home/{{< placeholder "USERNAME" >}}/example-flask-app/app.py + Restart=always + + [Install] + WantedBy=multi-user.target + ``` + + Save your changes when complete. + +1. Reload the `systemd` configuration files to apply the new service file, then start and enable the service: + + ```command + sudo systemctl daemon-reload + sudo systemctl start flask-app + sudo systemctl enable flask-app + ``` + +1. Verify that the `flask-app` service is `active (running)`: + + ```command + systemctl status flask-app + ``` + + ```output + ● flask-app.service - Flask Application Service + Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago + Main PID: 4413 (python) + Tasks: 1 (limit: 9444) + Memory: 20.3M (peak: 20.3M) + CPU: 196ms + CGroup: /system.slice/flask-app.service + ``` + + Once the Flask application is running, Azure Monitor can monitor its data. + +1. Generate data by issuing an HTTP request using the following cURL command. Replace {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of the instance where the Flask app is running: + + ```command + curl http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080 + ``` + + You should receive the following response: + + ```output + {"message": "Hello, World!"} + ``` + +## Migrate from Azure Monitor to Prometheus and Grafana + +Migrating from Azure Monitor to Prometheus and Grafana can offer several advantages, including increased control data storage and handling, potential reduction in cost, and enhanced monitoring capabilities across multi-cloud or hybrid environments. However, the transition requires careful planning and a clear understanding of the differences between them: + +| Feature | Azure Monitor | Prometheus | +| :---- | :---- | :---- | +| Integration and Configuration | Out-of-the-box integrations to simplify monitoring Azure resources. | Cloud-agnostic and highly configurable, enabling integration across diverse environments. | +| Data Collection | Passively collects data from Azure resources. | Actively scrapes data at defined intervals from configured targets. | +| Data Storage | Fully manages data storage, including long-term retention. | Defaults to local short-term storage but supports integration with external long-term storage solutions. | + +While Azure Monitor includes native tools for creating dashboards, Grafana enhances visualization capabilities by supporting multiple data sources. This allows users to combine real-time and historical data from Azure Monitor, Prometheus, and other platforms in a single unified view. + +Applications monitored by Azure Monitor may use the following tools: + +- [Azure OpenTelemetry Exporter](https://learn.microsoft.com/en-us/python/api/overview/azure/monitor-opentelemetry-exporter-readme?view=azure-python-preview): For intentional collection of application metrics using the OpenTelemetry Standard. +- [Application Insights](https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview): For automatic collection of metrics and telemetry data from applications. + +### Assess Current Monitoring Requirements + +Begin the migration process by auditing your current Azure Monitor configuration, including: + +- **Metrics**: Catalog the metrics being monitored, their collection intervals, and how they are utilized in day-to-day operations. +- **Logs**: Review logs collected from Azure resources and applications, noting any patterns or specific data points needed for troubleshooting or analysis. +- **Alerts**: Note also what specific alerts are configured. + +This assessment can help determine the equivalent monitoring setup needed for Prometheus and Grafana. + +The screenshots below are examples of Azure Monitor metrics captured for the example Python Flask application running on an Azure Virtual Machine. These metrics, such as server request counts and response latency, would need to be migrated to Prometheus for continued monitoring. + +![Azure Monitor interface showing a custom metric for server request counts in a Python Flask application running on an Azure Virtual Machine.](azure-monitor-custom-metrics-request-count.png) + +![Graph in Azure Monitor displaying response latency metrics for a Python Flask application, highlighting performance trends over time.](azure-monitor-custom-metrics-latency.png) + +Azure Monitor also collects logs from Azure resources. The following example shows a log entry from the Python Flask application: + +![Example log entry captured by Azure Monitor from a Python Flask application, detailing a server event for debugging or analysis.](azure-monitor-flask-log-entry.png) + +### Export Existing Azure Monitor Logs and Metrics + +There are two common approaches for exporting logs and metrics from Azure Monitor to a Prometheus and Grafana monitoring workflow: + +#### Option 1: Azure Monitor Metrics Explorer + +The [Azure Monitor Metrics Exporter](https://github.com/webdevops/azure-metrics-exporter) is an open source tool designed to scrape Azure metrics and expose them in a Prometheus-compatible format. It supports collecting metrics from various Azure resources such as Virtual Machines, App Services, and SQL Databases. + +Typically deployed as a container or agent, the exporter is configured with Azure service principal credentials to access resources. Metrics are exposed at an HTTP endpoint, which Prometheus can scrape at regular intervals. + +While this method enables real-time metric collection, it may require tuning to stay within Azure API limits in high-volume environments. Additionally, some Azure Monitor metrics or custom metrics may not be fully compatible with the exporter, necessitating alternative configurations. + +#### Option 2: Diagnostic Settings in Azure Monitor + +Azure Monitor’s [diagnostic settings](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings) provide another way to export logs and metrics. This approach uses [Azure Event Hub](https://azure.microsoft.com/en-us/products/event-hubs), a managed data streaming service, to transfer data from Azure Monitor. + +**Metrics Workflow:** + +1. Configure diagnostic settings for the Azure resources from which you want to export metrics. + +1. Specify Azure Event Hub as the destination. + +1. Use a streaming service, such as Kafka or a custom consumer, to route data from Event Hub to Prometheus-compatible storage (e.g. a time-series database). Grafana can ingest metrics directly if the format is supported. + +**Logs Workflow:** + +1. Set diagnostic settings to route logs from Azure Monitor to Azure Event Hub. + +1. Stream logs from Event Hub to a log platform such as [Loki](https://github.com/grafana/loki), which is commonly paired with Grafana for visualization. + +1. If needed, employ an ETL (Extract, Transform, Load) pipeline or serverless job to reformat logs for compatibility with Loki or another log storage system. + +### Expose Application Metrics to Prometheus + +Prometheus works differently from Azure Monitor: instead of *pushing* data like Azure Monitor, Prometheus *pulls* metrics from the monitored application. After assessing or exporting metrics as needed, modify the application to enable Prometheus metric scraping so that it collects the same metrics previously sent to Azure Monitor. This process varies from application to application. + +For the example Flask application in this guide, the [`prometheus_flask_exporter` library](https://github.com/rycus86/prometheus_flask_exporter) is a standard library that can be used for instrumenting Flask applications to expose Prometheus metrics. + +1. Reactivate the `venv` virtual environment: + + ```command + source venv/bin/activate + ``` + +1. Use `pip` to install the `prometheus_client` and `prometheus_flask_exporter` libraries: + + ```command + pip install prometheus_client prometheus_flask_exporter + ``` + +1. Exit the virtual environment: + + ```command + deactivate + ``` + +1. Using a text editor of your choice, open the `app.py` file for the Flask application: + + ```command + nano app.py + ``` + + Replace the file's current Azure Monitor-specific contents with the Prometheus-specific code below: + + ```file {title="app.py" lang="python"} + import logging + import random + import time + + from flask import Flask + from prometheus_flask_exporter import PrometheusMetrics + + logging.basicConfig(filename="flask-app.log", level=logging.INFO) + logger = logging.getLogger(__name__) + + app = Flask(__name__) + metrics = PrometheusMetrics(app) + + metrics.info("FlaskApp", "Application info", version="1.0.0") + + + @app.route("/") + def hello_world(): + logger.info("A request was received at the root URL") + return {"message": "Hello, World!"}, 200 + + + @app.route("/long-request") + def long_request(): + n = random.randint(1, 5) + logger.info( + f"A request was received at the long-request URL. Slept for {n} seconds" + ) + time.sleep(n) + return {"message": f"Long running request with {n=}"}, 200 + + + if __name__ == "__main__": + app.run(host="0.0.0.0", port=8080) + ``` + + These lines use the `prometheus_flask_exporter` library to: + + - Instrument the Flask app for Prometheus metrics. + - Expose default and application-specific metrics at the `/metrics` endpoint. + - Provide metadata such as version information via `metrics.info`. + +1. Save and close the file, then restart the `flask-app` service: + + ```command + sudo systemctl restart flask-app + ``` + +1. Verify that the `flask-app` service is `active (running)`: + + ```command + systemctl status flask-app + ``` + + ```output + ● flask-app.service - Flask Application Service + Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago + Main PID: 4413 (python) + Tasks: 1 (limit: 9444) + Memory: 20.3M (peak: 20.3M) + CPU: 196ms + CGroup: /system.slice/flask-app.service + ``` + +1. Test to see if the Flask app is accessible by issuing the following cURL command. Replace {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of the instance where the Flask app is running: + + ```command + curl http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080 + ``` + + You should receive the following response: + + ```output + {"message": "Hello, World!"} + ``` + +1. To view the metrics, open a web browser and visit the following URL: + + ```command + http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080/metrics + ``` + + The metrics shown include `http_request_duration_seconds` (request latency) and `http_requests_total` (total number of requests). + +### Configure Prometheus to Ingest Application Metrics + +1. Log back in to the Prometheus & Grafana instance. + +1. Using a text editor, open and modify the Prometheus configuration at `/etc/prometheus/prometheus.yml` to include the Flask application as a scrape target: + + ```command + sudo nano /etc/prometheus/prometheus.yml + ``` + + Append the following content to the `scrape_configs` section of the file, replacing {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of your `monitoring-server` instance: + + ```file {title="/etc/prometheus/prometheus.yml"} + - job_name: 'flask_app' + static_configs: + - targets: ['{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080'] + ``` + + This configuration tell Prometheus to scrape metrics from the Flask application running on port `8080`. + +1. Save the file, and restart Prometheus to apply the changes: + + ```command + sudo systemctl restart prometheus + ``` + +1. To verify that Prometheus is successfully scraping the Flask app, open a web browser and navigate to the Prometheus user interface on port 9090. This is the default port used for Prometheus. Replace {{< placeholder "INSTANCE_IP_ADDRESS" >}} with the IP of your instance: + + ```command + http://{{< placeholder "INSTANCE_IP_ADDRESS" >}}:9090 + ``` + +1. In the Prometheus UI click the **Status** tab and select **Targets**. You should see the Flask application service listed as a target with a status of `UP`, indicating that Prometheus is successfully scraping metrics from the application. + + ![Prometheus UI showing the status and targets of monitored services.](prometheus-ui-targets.png) + +### Create a Grafana Dashboard with Application Metrics + +Grafana serves as the visualization layer, providing an interface for creating dashboards from Prometheus metrics. + +1. Open a web browser and visit the following URL to access the Grafana UI on port 3000 (the default port for Grafana). Replace {{< placeholder "INSTANCE_IP_ADDRESS" >}} with the IP of your instance: + + ```command + http://{{< placeholder "INSTANCE_IP_ADDRESS" >}}:3000 + ``` + +1. Navigate to the **Dashboards** page: + + ![Grafana home menu with the Dashboards section selected.](grafana-home-menu-dashboards.png) + +1. Create a new dashboard in Grafana by clicking **Create dashboard**: + + ![Grafana Dashboards page with an option to create a new dashboard.](grafana-dashboards-overview.png) + +1. Click **Add visualization**: + + ![Grafana interface showing the Add Visualization dialog for creating a new graph.](grafana-add-visualization.png) + +1. In the resulting dialog, select the **prometheus** data source: + + ![Grafana data source selection dialog with Prometheus highlighted.](grafana-prometheus-datasource.png) + +1. To duplicate the Azure Monitor metrics for the Flask application, first click on the **Code** tab in the right-hand side of the panel editor: + + ![Grafana panel editor with the Code tab selected for entering a PromQL query.](grafana-panel-editor-query-code.png) + +1. Input the following PromQL query to calculate the average latency for an endpoint: + + ```command + flask_http_request_duration_seconds_sum{method="GET",path="/",status="200"} / + flask_http_request_duration_seconds_count{method="GET",path="/",status="200"} + ``` + +1. After entering the formula, click **Run queries** to execute the PromQL query. The chart should update with data pulled from Prometheus: + + ![Grafana dashboard displaying a latency graph for a Flask application, based on Prometheus data.](grafana-latency-dashboard.png) + + This visualization replicates Azure Monitor's latency metrics, detailing the average latency over time for a specific endpoint. Prometheus also provides default labels such as method, path, and status codes, for additional granularity in analysis. + +## Additional Considerations and Concerns + +### Cost Management + +Migrating to Prometheus and Grafana eliminates the recurring licensing costs associated with Azure Monitor. However, infrastructure costs for running Prometheus and Grafana are still a consideration. Running Prometheus and Grafana requires provisioning compute and storage resources, with expenses for maintenance and handling network traffic. Additionally, since Prometheus is primarily designed for short-term data storage, setting up a long-term storage solution may also increase costs. + +**Recommendation**: + +- Assess the current costs of Azure Monitor, including licensing fees, and compare them to the infrastructure costs for hosting Prometheus and Grafana. +- Use Prometheus’s default short-term storage for real-time monitoring, reserving long-term storage for critical metrics only. +- Configure Grafana alerts and dashboards to minimize high-frequency scrapes and unnecessary data retention. +- Regularly review and refine retention policies and scraping intervals to balance cost with visibility needs. + +### Data Consistency and Accuracy + +Azure Monitor automatically standardizes metrics and logs within its centralized system. In contrast, Prometheus and Grafana require custom configurations, which can raise concerns about consistency in data formatting, collection intervals, and overall accuracy during migration. + +**Recommendation**: + +- Document and map existing Azure Monitor metrics, logs, and alerts to align collection intervals and query formats. +- Standardize scrape intervals and retention policies across all Prometheus exporters to ensure consistent and comparable data. +- Early in the migration process, regularly audit and validate data accuracy between Azure Monitor and Prometheus to detect any discrepancies. +- Utilize Grafana dashboards to compare data from both systems during the migration to ensure equivalence and reliability. + +### Azure Monitor Aggregated Data Versus Prometheus Raw Data + +Azure Monitor aggregates data to offer summary metrics that can reduce data volume while providing a high-level view of system health. Alternatively, Prometheus collects raw, fine-grained metrics. While this enables detailed analyses and granular troubleshooting, it can also increase storage requirements and complexity when interpreting data. + +**Recommendation**: + +- Leverage [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) in Prometheus to create aggregated views of commonly queried metrics, thereby reducing storage while retaining essential insights. +- For historical trends or aggregated overviews, use a data pipeline to export and store high-level summaries in Grafana. +- Consider integrating an archival database for long-term storage of aggregated data, reducing reliance on raw metrics for historical monitoring. + +### Alert System Migration + +Azure Monitor supports custom alerts based on queries or thresholds. These can be replicated in Prometheus using PromQL, with [AlertManager](https://prometheus.io/docs/alerting/latest/alertmanager/) handling alert routing and notifications. To ensure a seamless migration, it is critical to ensure that the both conditions and *intent* for each alert are accurately translated. + +**Recommendation**: + +- Audit all Azure Monitor alerts to understand their conditions and thresholds. +- Replicate these alerts in Prometheus using PromQL, ensuring they match the original intent and refining where necessary. +- Integrate Alertmanager with any existing notification systems (e.g. email, Slack, etc.) to maintain consistency in how teams are alerted to critical events. + +### Security and Access Controls + +Azure Monitor integrates with Azure Active Directory to provide built-in, role-based access control (RBAC). Securing Prometheus and Grafana requires configuring user authentication (e.g. OAuth, LDAP, etc.) manually to ensure metrics and dashboards are only accessible to authorized personnel. To maintain security, data in transit should be encrypted using TLS. + +**Recommendation**: + +- Establish a strong security baseline by implementing secure access controls from the start. +- Configure Grafana with a well-defined RBAC policy. +- Integrate Grafana and Prometheus with an authentication system to centralize access management. +- Enable TLS for Prometheus to encrypt data in transit. +- Ensure that any sensitive metrics are restricted from unauthorized users. +- Regularly audit access logs and permissions to identify and mitigate vulnerabilities. + +### Separate Log and Metric Responsibilities + +Azure Monitor provides a unified interface for managing both metrics and logs. Because Prometheus is primarily a metrics-based monitoring solution, it does not have built-in capabilities for handling logs in the way Azure Monitor does. Therefore, it’s important to decouple log management needs from metric collection when migrating. + +**Recommendation**: + +- Introduce a specialized log aggregation solution alongside Prometheus and Grafana for collecting, aggregating, and querying logs: + - [**Grafana Loki**](https://grafana.com/grafana/loki/) is designed to integrate with Grafana. It provides log querying capabilities within Grafana's existing interface, giving a unified view of metrics and logs in a single dashboard. + - [**Fluentd**](https://www.fluentd.org/) is a log aggregator that can forward logs to multiple destinations, including object storage for long-term retention. It works with both Loki and ELK. \ No newline at end of file diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png new file mode 100644 index 00000000000..3752662dfc0 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png new file mode 100644 index 00000000000..0ad9028307d Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-azure-monitor-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-api-request-latency.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-api-request-latency.png new file mode 100644 index 00000000000..18c0f9ad4c3 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-api-request-latency.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-api-requests-over-time.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-api-requests-over-time.png new file mode 100644 index 00000000000..81fc95cbf2a Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-api-requests-over-time.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-cpu-utilization.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-cpu-utilization.png new file mode 100644 index 00000000000..cb64123d884 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-cpu-utilization.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-create-log-sink.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-create-log-sink.png new file mode 100644 index 00000000000..eb6f8b29d2b Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/gcp-create-log-sink.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png new file mode 100644 index 00000000000..545e5765ae2 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-datasource.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png new file mode 100644 index 00000000000..b09186d46cc Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-new-connection.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png new file mode 100644 index 00000000000..8214856d537 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-add-visualization.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png new file mode 100644 index 00000000000..4c3cbabd1b8 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-connection-test-success.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png new file mode 100644 index 00000000000..1864fe66461 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-dashboards-overview.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png new file mode 100644 index 00000000000..d566b83ca6a Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-home-menu-dashboards.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png new file mode 100644 index 00000000000..8f2ee53253b Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-latency-dashboard.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-login-page.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-login-page.png new file mode 100644 index 00000000000..3542f16b68a Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-login-page.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png new file mode 100644 index 00000000000..888e50e1193 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-new-password-prompt.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png new file mode 100644 index 00000000000..d5c58f45c40 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-panel-editor-query-code.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png new file mode 100644 index 00000000000..f3f0aa31009 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/grafana-prometheus-datasource.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/index.md b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/index.md new file mode 100644 index 00000000000..46928a2f3f7 --- /dev/null +++ b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/index.md @@ -0,0 +1,784 @@ +--- +slug: migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai +title: "Migrating From GCP Cloud Monitoring to Prometheus and Grafana on Akamai" +description: "Migrating from GCP Cloud Monitoring to Prometheus and Grafana? Learn how to configure metrics, build custom dashboards, and optimize monitoring with cost-effective, open source tools." +authors: ["Akamai"] +contributors: ["Akamai"] +published: 2025-02-10 +keywords: ['gcp','gcp cloud monitoring','prometheus','grafana','gcp cloud monitoring migration','prometheus and grafana setup','migrate to prometheus','grafana dashboards for metrics','gcp cloud monitoring alternative','open source monitoring tools','prometheus metrics','grafana visualization','monitoring and observability','prometheus grafana guide','gcp cloud monitoring to Prometheus tutorial'] +license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)' +external_resources: +- '[GCP Cloud Monitoring](https://cloud.google.com/monitoring?hl=en)' +- '[GCP Cloud Logging](https://cloud.google.com/logging/docs/overview)' +- '[Prometheus Documentation](https://prometheus.io/docs/introduction/overview/)' +- '[Grafana Installation Documentation](https://grafana.com/docs/grafana/latest/setup-grafana/installation/)' +- '[Grafana Dashboard Documentation](https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/)' +--- + +Cloud Monitoring is an observability solution from Google Cloud Platform (GCP). It allows users to monitor their applications, infrastructure, and services within the GCP ecosystem as well as in external and hybrid environments. Cloud Monitoring provides real-time insights into system health, performance, and availability by collecting metrics, logs, and traces. + +This guide walks through how to migrate standard GCP Cloud Monitoring service logs, metrics, and monitoring to a Prometheus and Grafana software stack on a Linode instance. To illustrate the migration process, an example Flask-based Python application running on a separate instance is configured to send logs and metrics to Cloud Monitoring, and then modified to integrate with Prometheus and Grafana. While this guide uses a Flask application as an example, the principles can be applied to any workload currently monitored via Cloud Monitoring. + +## Introduction to Prometheus and Grafana + +[Prometheus](https://prometheus.io/docs/introduction/overview/) is a [time-series database](https://prometheus.io/docs/concepts/data_model/#data-model) that collects and stores metrics from applications and services. It provides a foundation for monitoring system performance using the PromQL query language to extract and analyze granular data. Prometheus autonomously scrapes (*pulls*) metrics from targets at specified intervals, efficiently storing data through compression while retaining the most critical details. It also supports alerting based on metric thresholds, making it suitable for dynamic, cloud-native environments. + +[Grafana](https://grafana.com/docs/) is a visualization and analytics platform that integrates with Prometheus. It enables users to create real-time, interactive dashboards, visualize metrics, and set up alerts to gain deeper insights into system performance. Grafana can unify data from a wide array of data sources, including Prometheus, to provide a centralized view of system metrics. + +Prometheus and Grafana are considered industry standard, and are commonly used together to monitor service health, detect anomalies, and issue alerts. Being both open source and platfrom-agnostic allows them to be deployed across a diverse range of cloud providers and on-premise infrastructures. Organizations often adopt these tools to reduce operational costs while gaining greater control over how data is collected, stored, and visualized. + +{{< note title="Prometheus and Grafana Marketplace App" >}} +If you prefer an automatic deployment rather than the manual installation steps in this guide, Prometheus and Grafana can be deployed through our [Prometheus and Grafana Marketplace app](https://www.linode.com/marketplace/apps/linode/prometheus-grafana/). +{{< /note >}} + +## Before You Begin + +1. If you do not already have a virtual machine to use, create a Compute Instance for the Prometheus and Grafana stack using the steps in our [Get Started](https://techdocs.akamai.com/cloud-computing/docs/getting-started) and [Create a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/create-a-compute-instance) guides: + + - **Prometheus and Grafana instance requirements**: Linode 8 GB Shared CPU plan, Ubuntu 24.04 LTS distribution + + {{< note type="primary" title="Provisioning Compute Instances with the Linode CLI" isCollapsible="true" >}} + Use these steps if you prefer to use the [Linode CLI](https://techdocs.akamai.com/cloud-computing/docs/getting-started-with-the-linode-cli) to provision resources. + + The following command creates a **Linode 8 GB** compute instance (`g6-standard-4`) running Ubuntu 24.04 LTS (`linode/ubuntu24.04`) in the Miami datacenter (`us-mia`): + + ```command + linode-cli linodes create \ + --image linode/ubuntu24.04 \ + --region us-mia \ + --type g6-standard-4 \ + --root_pass {{< placeholder "PASSWORD" >}} \ + --authorized_keys "$(cat ~/.ssh/id_rsa.pub)" \ + --label monitoring-server + ``` + + Note the following key points: + + - Replace the `region` as desired. + - Replace {{< placeholder "PASSWORD" >}} with a secure alternative for your root password. + - This command assumes that an SSH public/private key pair exists, with the public key stored as `id\_rsa.pub` in the user’s `$HOME/.ssh/` folder. + - The `--label` argument specifies the name of the new server (`monitoring-server`). + {{< /note >}} + + To emulate a real-world workload, the examples in this guide use an additional optional instance to run an example Flask Python application. This application produces sample metrics and is used to illustrate configuration changes when switching from GCP Cloud Monitoring to an alternative monitoring solution. This instance can live on GCP or other infrastructure (such as a Linode) as long as it is configured to send metrics to GCP Cloud Monitoring. + + - **Example Flask app instance requirements**: 1 GB Shared CPU, Ubuntu 24.04 LTS distribution + +1. Follow our [Set Up and Secure a Compute Instance](https://techdocs.akamai.com/cloud-computing/docs/set-up-and-secure-a-compute-instance) guide to update each system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access. + +{{< note >}} +This guide is written for a non-root user. Commands that require elevated privileges are prefixed with `sudo`. If you’re not familiar with the `sudo` command, see the [Users and Groups](/docs/guides/linux-users-and-groups/) guide. +{{< /note >}} + +## Install Prometheus as a Service + +1. To install Prometheus, login via SSH to your Linode instance as your limited sudo user: + + ```command + ssh {{< placeholder "SUDO_USER" >}}@{{< placeholder "LINODE_IP" >}} + ``` + +1. Create a dedicated user for Prometheus, disable its login, and create the necessary directories for Prometheus: + + ```command + sudo useradd --no-create-home --shell /bin/false prometheus + sudo mkdir /etc/prometheus + sudo mkdir /var/lib/prometheus + ``` + +1. Download the latest version of Prometheus from its GitHub repository: + + ```command + wget https://github.com/prometheus/prometheus/releases/download/v2.55.1/prometheus-2.55.1.linux-amd64.tar.gz + ``` + + This guide uses version `2.55.1`. Check the project’s [releases page](https://github.com/prometheus/prometheus/releases) for the latest version that aligns with your instance’s operating system. + +1. Extract the compressed file and navigate to the extracted folder: + + ```command + tar xzvf prometheus-2.55.1.linux-amd64.tar.gz + cd prometheus-2.55.1.linux-amd64 + ``` + +1. Move both the `prometheus` and `promtool` binaries to `/usr/local/bin`: + + ```command + sudo cp prometheus /usr/local/bin + sudo cp promtool /usr/local/bin + ``` + + The `prometheus` binary is the main monitoring application, while `promtool` is a utility application that queries and configures a running Prometheus service. + +1. Move the configuration files and directories to the `/etc/prometheus` folder you created previously: + + ```command + sudo cp -r consoles /etc/prometheus + sudo cp -r console_libraries /etc/prometheus + sudo cp prometheus.yml /etc/prometheus/prometheus.yml + ``` + +1. Set the correct ownership permissions for Prometheus files and directories: + + ```command + sudo chown -R prometheus:prometheus /etc/prometheus + sudo chown -R prometheus:prometheus /var/lib/prometheus + sudo chown prometheus:prometheus /usr/local/bin/prometheus + sudo chown prometheus:prometheus /usr/local/bin/promtool + ``` + +### Create a `systemd` Service File + +A `systemd` service configuration file must be created to run Prometheus as a service. + +1. Create the service file using the text editor of your choice. This guide uses `nano`. + + ```command + sudo nano /etc/systemd/system/prometheus.service + ``` + + Add the following content to the file, and save your changes: + + ```file {title="/etc/systemd/system/prometheus.Service"} + [Unit] + Description=Prometheus Service + Wants=network-online.target + After=network-online.target + + [Service] + User=prometheus + Group=prometheus + Type=simple + ExecStart=/usr/local/bin/prometheus \ + --config.file=/etc/prometheus/prometheus.yml \ + --storage.tsdb.path=/var/lib/prometheus \ + --web.console.templates=/etc/prometheus/consoles \ + --web.console.libraries=/etc/prometheus/console_libraries + + [Install] + WantedBy=multi-user.target + ``` + +1. Reload the `systemd` configuration files to apply the new service file: + + ```command + sudo systemctl daemon-reload + ``` + +1. Using `systemctl`, start the `flash-app` service and enable it to automatically start after a system reboot: + + ```command + sudo systemctl start prometheus + sudo systemctl enable prometheus + ``` + +1. Verify that Prometheus is running: + + ```command + systemctl status prometheus + ``` + + The output should display `active (running)`, confirming a successful setup: + + ```output + ● prometheus.service - Prometheus Service + Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 16:11:57 EST; 5s ago + Main PID: 1165 (prometheus) + Tasks: 9 (limit: 9444) + Memory: 16.2M (peak: 16.6M) + CPU: 77ms + CGroup: /system.slice/prometheus.service + ``` + + When done, press the Q key to exit the status output and return to the terminal prompt. + +1. Open a web browser and visit your instance's IP address on port `9090` (Prometheus's default port): + + ```command + http://{{< placeholder "IP_ADDRESS" >}}:9090 + ``` + + The Prometheus UI should appear: + + ![Prometheus UI homepage at port :9090, displaying the query and status options.](prometheus-ui-overview.png) + + {{< note >}} + Prometheus settings are configured in the `/etc/prometheus/prometheus.yml` file. This guide uses the default values. For production systems, consider enabling authentication and other security measures to protect your metrics. + {{< /note >}} + +## Install the Grafana Service + +Grafana provides an `apt` repository, reducing the number of steps needed to install and update it on Ubuntu. + +1. Install the necessary package to add new repositories: + + ```command + sudo apt install software-properties-common -y + ``` + +1. Import and add the public key for the Grafana repository: + + ```command + wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - + sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main" + ``` + +1. Update the package index and install Grafana: + + ```command + sudo apt update + sudo apt install grafana -y + ``` + +1. The installation process already sets up the `systemd` configuration for Grafana. Start and enable the Grafana service: + + ```command + sudo systemctl start grafana-server + sudo systemctl enable grafana-server + ``` + +1. Run the following command to verify that Grafana is `active (running)`: + + ```command + systemctl status grafana-server + ``` + + ```output + ● grafana-server.service - Grafana instance + Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 13:57:10 EST; 8s ago + Docs: http://docs.grafana.org + Main PID: 3434 (grafana) + Tasks: 14 (limit: 9444) + Memory: 71.4M (peak: 80.4M) + CPU: 2.971s + CGroup: /system.slice/grafana-server.service + ``` + +### Connect Grafana to Prometheus + +1. Open a web browser and visit your instance's IP address on port `3000` (Grafana's default port) to access the Grafana web UI: + + ```command + http://{{< placeholder "IP_ADDRESS" >}}:3000 + ``` + +1. Login using the default credentials of `admin` for both the username and password: + + ![Grafana login page showing fields for entering username and password.](grafana-login-page.png) + +1. After logging in, you are prompted to enter a secure replacement for the default password: + + ![Grafana user interface prompting for a new password after the first login.](grafana-new-password-prompt.png) + +1. Add Prometheus as a data source by expanding the **Home** menu, navigating to the **Connections** entry, and clicking **Add new connection**: + + ![Grafana home menu with the option to add a new connection under the Connections section.](grafana-add-new-connection.png) + +1. Search for and select **Prometheus**. + +1. Click **Add new data source**. + + ![Grafana interface with Add New Data Source options, displaying Prometheus configuration fields.](grafana-add-datasource.png) + +1. In the **URL** field, enter `http://localhost:9090`. + +1. Click **Save & Test** to confirm the connection. + + ![Grafana test result confirming successful connection to a Prometheus data source.](grafana-connection-test-success.png) + + If successful, your Grafana installation is now connected to the Prometheus installation running on the same Linode. + +## Configure Example Flask Server + +This guide demonstrates the migration process using an example Flask app running on a separate instance from which metrics and logs can be collected. + +1. Log in to the instance running the example Flask application as a user with `sudo` privileges. + +1. Create a directory for the project named `exmaple-flask-app` and navigate into it: + + ```command + mkdir example-flask-app + cd example-flask-app + ``` + +1. Using a text editor of your choice, create a file called `app.py`: + + ```command + nano app.py + ``` + + Give it the following contents. Replace {{< placeholder "YOUR_PROJECT_ID" >}} with your actual project ID: + + ```file {title="app.py", lang="python" hl_lines="15"} + import json + import logging + import time + + from flask import Flask, request + from google.cloud import monitoring_v3 # Note: pip install google-cloud-monitoring + + logging.basicConfig(filename='flask-app.log', level=logging.INFO) + logger = logging.getLogger(__name__) + + app = Flask(__name__) + + # Google Cloud Monitoring setup + metric_client = monitoring_v3.MetricServiceClient() + project_id = '{{< placeholder "YOUR_PROJECT_ID" >}}' # replace with your project ID + project_name = f"projects/{project_id}" + + @app.before_request + def start_timer(): + request.start_time = time.time() + + @app.after_request + def send_latency_metric(response): + latency = time.time() - request.start_time + + # Send latency metric to Google Cloud Monitoring + series = monitoring_v3.TimeSeries() + series.metric.type = 'custom.googleapis.com/EndpointLatency' + series.resource.type = 'global' + series.metric.labels['endpoint'] = request.path + series.metric.labels['method'] = request.method + + point = monitoring_v3.Point() + now = time.time() + seconds = int(now) + nanos = int((now - seconds) * 10**9) + point.interval.end_time.seconds = seconds + point.interval.end_time.nanos = nanos + point.value.double_value = latency + + series.points.append(point) + metric_client.create_time_series(name=project_name, time_series=[series]) + + return response + + @app.route('/') + def hello_world(): + logger.info("A request was received at the root URL") + return {'message': 'Hello, World!'}, 200 + + if __name__ == '__main__': + app.run(host='0.0.0.0', port=8080) + ``` + + When done, save your changes, and close the text editor. + +1. Create a separate text file called `requirements.txt`: + + ```command + nano requirements.txt + ``` + + Provide it with the following basic dependencies for the Flask application to function, and save your changes: + + ```file {title="requirements.txt"} + Flask==3.0.3 + itsdangerous==2.2.0 + Jinja2==3.1.4 + MarkupSafe==2.1.5 + Werkzeug==3.0.4 + ``` + +1. A virtual environment is required to run `pip` commands in Ubuntu 24.04 LTS. Use the following command to install `python3.12-venv`: + + ```command + sudo apt install python3.12-venv + ``` + +1. Using the `venv` utility, create a virtual environment named `venv` within the `example-flask-app` directory: + + ```command + python3 -m venv venv + ``` + +1. Activate the `venv` virtual environment: + + ```command + source venv/bin/activate + ``` + +1. Use `pip` to install the example Flask application's dependencies from the `requirements.txt` file: + + ```command + pip install -r requirements.txt + ``` + +1. Also using `pip`, install `google-cloud-monitoring`, which is required for interfacing with GCP resources: + + ```command + pip install google-cloud-monitoring + ``` + +1. Exit the virtual environment: + + ```command + deactivate + ``` + +### Create a `systemd` Service File + +1. Create a `systemd` service file for the example Flask app: + + ```command + sudo nano /etc/systemd/system/flask-app.service + ``` + + Provide the file with the following content, replacing {{< placeholder "USERNAME" >}} with your actual `sudo` user: + + ```file {title="/etc/systemd/system/flask-app.service"} + [Unit] + Description=Flask Application Service + After=network.target + + [Service] + User={{< placeholder "USERNAME" >}} + WorkingDirectory=/home/{{< placeholder "USERNAME" >}}/example-flask-app + ExecStart=/home/{{< placeholder "USERNAME" >}}/example-flask-app/venv/bin/python /home/{{< placeholder "USERNAME" >}}/example-flask-app/app.py + Restart=always + + [Install] + WantedBy=multi-user.target + ``` + + Save your changes when complete. + +1. Reload the `systemd` configuration files to apply the new service file, then start and enable the service: + + ```command + sudo systemctl daemon-reload + sudo systemctl start flask-app + sudo systemctl enable flask-app + ``` + +1. Verify that the `flask-app` service is `active (running)`: + + ```command + systemctl status flask-app + ``` + + ```output + ● flask-app.service - Flask Application Service + Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago + Main PID: 4413 (python) + Tasks: 1 (limit: 9444) + Memory: 20.3M (peak: 20.3M) + CPU: 196ms + CGroup: /system.slice/flask-app.service + ``` + + Once the Flask application is running, GCP Cloud Monitoring can monitor its data. + +1. Generate data by issuing an HTTP request using the following cURL command. Replace {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of the instance where the Flask app is running: + + ```command + curl http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080 + ``` + + You should receive the following response: + + ```output + {"message": "Hello, World!"} + ``` + +## Migrate from GCP Cloud Monitoring to Prometheus and Grafana + +Migrating from GCP Cloud Monitoring to Prometheus and Grafana requires planning to ensure the continuity of your monitoring capabilities. Transitioning from GCP Cloud Monitoring provides greater control over data storage and handling while unlocking the advanced customization and visualization features offered by Prometheus and Grafana. + +### Assess Current Monitoring Requirements + +Begin by cataloging all metrics currently monitored in GCP Cloud Monitoring. So that you can recreate similar monitoring with Prometheus, identify common metrics for web applications, such as latency, request rates, CPU usage, and memory consumption. Remember to document existing alert configurations, as alerting strategies must also be ported to [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/). + +Using the example Python Flask application, GCP Cloud Monitoring collects key metrics such as API requests, latency, and application logs. This may vary depending on your application. Below are examples of metrics visualized in GCP Cloud Monitoring dashboards: + +- **API Requests Over Time**: This dashboard tracks the total number of API requests served by the application: + + ![GCP Cloud Monitoring dashboard showing total API requests over time.](gcp-api-requests-over-time.png) + +- **CPU Utilization**: This metric monitors the CPU usage of the underlying infrastructure without requiring additional configuration. + + ![GCP Cloud Monitoring dashboard displaying CPU utilization over time.](gcp-cpu-utilization.png) + +- **API Request Latency**: This dashboard visualizes the amount of time it takes to serve API requests: + + ![GCP Cloud Monitoring dashboard illustrating API request latency over time.](gcp-api-request-latency.png) + +The metrics shown above are typically tracked in a web application. GCP Cloud Monitoring provides these metrics by default when deployed in a GCP Compute Engine, without the need to modify the application code. Documenting these existing metrics and alerts can help you configure equivalent monitoring using Prometheus and Grafana. + +### Export Existing Cloud Monitoring Logs and Metrics + +[GCP Cloud Logging](https://cloud.google.com/logging?hl=en) integrates with Cloud Monitoring and allows you to [create sinks that export logs to different destinations](https://cloud.google.com/logging/docs/export/configure_export_v2). Sinks can be configured to filter logs for a specific application, exporting only relevant entries. Below is an example sink that facilitates the export of logs from GCP: + +![The GCP Cloud Logging interface showing the configuration of a log export sink.](gcp-create-log-sink.png) + +The [Cloud Monitoring API](https://cloud.google.com/monitoring/api/v3) allows you to programmatically retrieve metric data. Once this data is retrieved, it can be stored locally or sent to another monitoring system. The [Google Cloud Managed Service for Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus) includes an adapter to fetch GCP metrics directly. This avoids the need for manual exporting or scripts, providing real-time observability as if the metrics were local to Prometheus. + +GCP Cloud Monitoring has default data retention policies that may limit the availability of historical data. Ensure the exported data frequency meets system requirements, especially when using the API since data may need to be reformatted to match the destination’s schema. For example, some destinations may require data formatted as JSON, while others may need CSV. + +To avoid unexpected costs, review GCP’s billing policies. GCP may charge for API calls and data exports, especially when querying metrics at high frequency. + +### Expose Application Metrics to Prometheus + +Prometheus works differently from GCP Cloud Monitoring: instead of *pushing* data like GCP Cloud Monitoring, Prometheus *pulls* metrics from the monitored application. After assessing or exporting metrics as needed, modify the application to enable Prometheus metric scraping so that it collects the same metrics previously sent to GCP Cloud Monitoring. This process varies from application to application. + +For the example Flask application in this guide, the [`prometheus_flask_exporter` library](https://github.com/rycus86/prometheus_flask_exporter) is a standard library that can be used for instrumenting Flask applications to expose Prometheus metrics. + +1. Reactivate the `venv` virtual environment: + + ```command + source venv/bin/activate + ``` + +1. Use `pip` to install the `prometheus_client` and `prometheus_flask_exporter` libraries: + + ```command + pip install prometheus_client prometheus_flask_exporter + ``` + +1. Exit the virtual environment: + + ```command + deactivate + ``` + +1. Using a text editor of your choice, open the `app.py` file for the Flask application: + + ```command + nano app.py + ``` + + Replace the file's current GCP Cloud Monitoring-specific contents with the Prometheus-specific code below, making sure to replace {{< placeholder "USERNAME" >}} with your actual username: + + ```file {title="app.py" lang="python"} + import logging + import random + import time + + from flask import Flask + from prometheus_flask_exporter import PrometheusMetrics + + logging.basicConfig(filename="flask-app.log", level=logging.INFO) + logger = logging.getLogger(__name__) + + app = Flask(__name__) + metrics = PrometheusMetrics(app) + + metrics.info("FlaskApp", "Application info", version="1.0.0") + + + @app.route("/") + def hello_world(): + logger.info("A request was received at the root URL") + return {"message": "Hello, World!"}, 200 + + + @app.route("/long-request") + def long_request(): + n = random.randint(1, 5) + logger.info( + f"A request was received at the long-request URL. Slept for {n} seconds" + ) + time.sleep(n) + return {"message": f"Long running request with {n=}"}, 200 + + + if __name__ == "__main__": + app.run(host="0.0.0.0", port=8080) + ``` + + These lines use the `prometheus_flask_exporter` library to: + + - Instrument the Flask app for Prometheus metrics. + - Expose default and application-specific metrics at the `/metrics` endpoint. + - Provide metadata such as version information via `metrics.info`. + +1. Save and close the file, then restart the `flask-app` service: + + ```command + sudo systemctl restart flask-app + ``` + +1. Verify that the `flask-app` service is `active (running)`: + + ```command + systemctl status flask-app + ``` + + ```output + ● flask-app.service - Flask Application Service + Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) + Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago + Main PID: 4413 (python) + Tasks: 1 (limit: 9444) + Memory: 20.3M (peak: 20.3M) + CPU: 196ms + CGroup: /system.slice/flask-app.service + ``` + +1. Test to see if the Flask app is accessible by issuing the following cURL command. Replace {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of the instance where the Flask app is running: + + ```command + curl http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080 + ``` + + You should receive the following response: + + ```output + {"message": "Hello, World!"} + ``` + +1. To view the metrics, open a web browser and visit the following URL: + + ```command + http://{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080/metrics + ``` + + The metrics shown include `http_request_duration_seconds` (request latency) and `http_requests_total` (total number of requests). + +### Configure Prometheus to Ingest Application Metrics + +1. Using a text editor, open and modify the Prometheus configuration at `/etc/prometheus/prometheus.yml` to include the Flask application as a scrape target: + + ```command + sudo nano /etc/prometheus/prometheus.yml + ``` + + Append the following content to the `scrape_configs` section of the file, replacing {{< placeholder "FLASK_APP_IP_ADDRESS" >}} with the IP address of your `monitoring-server` instance: + + ```file {title="/etc/prometheus/prometheus.yml"} + - job_name: 'flask_app' + static_configs: + - targets: ['{{< placeholder "FLASK_APP_IP_ADDRESS" >}}:8080'] + ``` + + This configuration tell Prometheus to scrape metrics from the Flask application running on port `8080`. + +1. Save the file, and restart Prometheus to apply the changes: + + ```command + sudo systemctl restart prometheus + ``` + +1. To verify that Prometheus is successfully scraping the Flask app, open a web browser and navigate to the Prometheus user interface on port 9090. This is the default port used for Prometheus. Replace {{< placeholder "INSTANCE_IP_ADDRESS" >}} with the IP of your instance: + + ```command + http://{{< placeholder "INSTANCE_IP_ADDRESS" >}}:9090 + ``` + +1. In the Prometheus UI click the **Status** tab and select **Targets**. You should see the Flask application service listed as a target with a status of `UP`, indicating that Prometheus is successfully scraping metrics from the application. + + ![Prometheus UI showing the status and targets of monitored services.](prometheus-ui-targets.png) + +### Create a Grafana Dashboard with Application Metrics + +Grafana serves as the visualization layer, providing an interface for creating dashboards from Prometheus metrics. + +1. Open a web browser and visit the following URL to access the Grafana UI on port 3000 (the default port for Grafana). Replace {{< placeholder "INSTANCE_IP_ADDRESS" >}} with the IP of your instance: + + ```command + http://{{< placeholder "INSTANCE_IP_ADDRESS" >}}:3000 + ``` + +1. Navigate to the **Dashboards** page: + + ![Grafana home menu with the Dashboards section selected.](grafana-home-menu-dashboards.png) + +1. Create a new dashboard in Grafana by clicking **Create dashboard**: + + ![Grafana Dashboards page with an option to create a new dashboard.](grafana-dashboards-overview.png) + +1. Click **Add visualization**: + + ![Grafana interface showing the Add Visualization dialog for creating a new graph.](grafana-add-visualization.png) + +1. In the resulting dialog, select the **prometheus** data source: + + ![Grafana data source selection dialog with Prometheus highlighted.](grafana-prometheus-datasource.png) + +1. To duplicate the GCP Cloud Monitoring metrics for the Flask application, first click on the **Code** tab in the right-hand side of the panel editor: + + ![Grafana panel editor with the Code tab selected for entering a PromQL query.](grafana-panel-editor-query-code.png) + +1. Input the following PromQL query to calculate the average latency for an endpoint: + + ```command + flask_http_request_duration_seconds_sum{method="GET",path="/",status="200"} / + flask_http_request_duration_seconds_count{method="GET",path="/",status="200"} + ``` + +1. After entering the formula, click **Run queries** to execute the PromQL query. The chart should update with data pulled from Prometheus: + + ![Grafana dashboard displaying a latency graph for a Flask application, based on Prometheus data.](grafana-latency-dashboard.png) + + This visualization replicates GCP Cloud Monitoring's latency metrics, detailing the average latency over time for a specific endpoint. Prometheus also provides default labels such as method, path, and status codes, for additional granularity in analysis. + +## Additional Considerations and Concerns + +### Cost Management + +GCP Cloud Monitoring incurs [costs](https://cloud.google.com/stackdriver/pricing) for log storage and retention, data ingestion, API calls, and alerting policies. Migrating to Prometheus and Grafana eliminates these charges but introduces infrastructure costs for compute, storage, maintenance, and network traffic. Additionally, since Prometheus is primarily designed for short-term data storage, setting up a long-term storage solution may also increase costs. + +**Recommendation**: + +- Estimate infrastructure costs for Prometheus and Grafana by assessing current GCP Cloud Monitoring data volume and access usage. +- Access the [Google Cloud Billing](https://console.cloud.google.com/billing) report to determine a baseline for costs related to GCP Cloud Monitoring and Cloud Logging. +- Use Prometheus’s default short-term storage for real-time data, and configure a long-term storage solution for essential data to optimize costs. +- Employ Grafana’s alerting and dashboards strategically to reduce high-frequency scrapes and unnecessary data retention. +- Regularly review and refine retention policies and scraping intervals to balance cost against visibility needs. + +### Data Consistency and Accuracy + +GCP Cloud Monitoring automates metric collection with built-in aggregation, whereas Prometheus relies on manual configuration through exporters and application instrumentation. Prometheus stores raw data with high granularity, but does not provide the same level of aggregated historical data as GCP Cloud Monitoring. This may lead to gaps in insights if retention isn’t properly managed. + +**Recommendation**: + +- Set up Prometheus exporters such as the [Node Exporter](https://prometheus.io/docs/guides/node-exporter/) (for host metrics) or [custom exporters](https://prometheus.io/docs/instrumenting/writing_exporters/) (for application metrics). +- Configure scrape intervals to capture data at regular intervals. +- Verify that custom instrumentation is accurate for critical metrics such as latency, requests, and resource usage. +- Use the [remote-write capability](https://prometheus.io/docs/specs/remote_write_spec/) from Prometheus to write data to a remote storage backend like [Thanos](https://thanos.io/) or [Cortex](https://cortexmetrics.io/) for historical data retention. This ensures that older data remains accessible and aggregated at a lower resolution, which is similar to GCP's approach to historical data. + +### GCP Cloud Monitoring Aggregated Data Versus Prometheus Raw Data + +GCP Cloud Monitoring aggregates data automatically and can provide a straightforward approach to historical trend analysis. Prometheus captures high-resolution, raw data, which can require custom queries to derive similar insights. + +**Recommendation**: + +- Leverage Grafana’s dashboards to create aggregated views of Prometheus metrics. +- Apply queries to aggregate data over larger time windows to create an summarized view similar to GCP Cloud Monitoring. +- Use Prometheus [query functions](https://prometheus.io/docs/prometheus/latest/querying/functions/) such as [`rate`](https://prometheus.io/docs/prometheus/latest/querying/functions/#rate), [`avg_over_time`](https://prometheus.io/docs/prometheus/latest/querying/functions/#aggregation_over_time), and [`sum_over_time`](https://prometheus.io/docs/prometheus/latest/querying/functions/#aggregation_over_time) to replicate GCP Cloud Monitoring's aggregated trends. + +### Alert System Migration + +GCP Cloud Monitoring alerts are configured with thresholds and conditions that must be translated into query-based alert rules in Prometheus. + +**Recommendation**: + +- Audit existing GCP Cloud Monitoring alerts and replicate them using Prometheus's Alertmanager. +- Refine alert thresholds based on the type and granularity of data collected by Prometheus. +- Integrate Alertmanager with any existing notification systems (e.g. email, Slack, etc.) to maintain consistency in how teams are alerted to critical events. + +### Security and Access Controls + +GCP Cloud Monitoring integrates with GCP’s Identity and Access Management (IAM) system for Role-Based Access Control (RBAC). This helps manage who can view, edit, or delete logs and metrics. Prometheus and Grafana require manual configuration of security and access controls. + +Securing Prometheus and Grafana involves setting up user authentication (e.g. OAuth, LDAP, etc.) and ensuring metrics and dashboards are only accessible to authorized personnel. Additionally, data in transit should be encrypted using TLS to maintain security. + +**Recommendation**: + +- Configure Grafana with an RBAC policy and integrate it with an authentication system like OAuth or LDAP. +- Enable TLS for Prometheus to secure data in transit. + +### Separate Log and Metric Responsibilities + +Prometheus is primarily designed for metrics collection and does not include built-in capabilities for managing logs. Since GCP Cloud Monitoring natively combines logs and metrics, migration requires decoupling those functions. + +**Recommendation**: + +- Use a specialized log aggregation solution alongside Prometheus and Grafana for collecting, aggregating, and querying logs: + - [**Grafana Loki**](https://grafana.com/grafana/loki/) is designed to integrate with Grafana. It provides log querying capabilities within Grafana's existing interface, giving a unified view of metrics and logs in a single dashboard. + - [**Fluentd**](https://www.fluentd.org/) is a log aggregator that can forward logs to multiple destinations, including object storage for long-term retention, and can work with both Loki and ELK. \ No newline at end of file diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png new file mode 100644 index 00000000000..3752662dfc0 Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/prometheus-ui-overview.png differ diff --git a/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png new file mode 100644 index 00000000000..0ad9028307d Binary files /dev/null and b/docs/guides/uptime/monitoring/migrating-from-gcp-cloud-monitoring-to-prometheus-and-grafana-on-akamai/prometheus-ui-targets.png differ diff --git a/package-lock.json b/package-lock.json index 786e4fadba1..ac59311000b 100644 --- a/package-lock.json +++ b/package-lock.json @@ -2869,9 +2869,9 @@ "dev": true }, "node_modules/cross-spawn": { - "version": "7.0.3", - "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.3.tgz", - "integrity": "sha512-iRDPJKUPVEND7dHPO8rkbOnPpyDygcDFtWjpeWNCgy8WP2rXcxXL8TskReQl6OrB2G7+UJrags1q15Fudc7G6w==", + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", + "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==", "dev": true, "dependencies": { "path-key": "^3.1.0", @@ -4539,9 +4539,9 @@ } }, "node_modules/micromatch": { - "version": "4.0.7", - "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.7.tgz", - "integrity": "sha512-LPP/3KorzCwBxfeUuZmaR6bG2kdeHSbe0P2tY3FLRU4vYrjYz5hI4QZwV0njUx3jeuKe67YukQ1LSPZBKDqO/Q==", + "version": "4.0.8", + "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.8.tgz", + "integrity": "sha512-PXwfBhYu0hBCPw8Dn0E+WDYb7af3dSLVWKi3HGv84IdF4TyFoC0ysxFd0Goxw7nSv4T/PzEJQxsYsEiFCKo2BA==", "dev": true, "dependencies": { "braces": "^3.0.3", @@ -8401,9 +8401,9 @@ "dev": true }, "cross-spawn": { - "version": "7.0.3", - "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.3.tgz", - "integrity": "sha512-iRDPJKUPVEND7dHPO8rkbOnPpyDygcDFtWjpeWNCgy8WP2rXcxXL8TskReQl6OrB2G7+UJrags1q15Fudc7G6w==", + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", + "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==", "dev": true, "requires": { "path-key": "^3.1.0", @@ -9597,9 +9597,9 @@ "dev": true }, "micromatch": { - "version": "4.0.7", - "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.7.tgz", - "integrity": "sha512-LPP/3KorzCwBxfeUuZmaR6bG2kdeHSbe0P2tY3FLRU4vYrjYz5hI4QZwV0njUx3jeuKe67YukQ1LSPZBKDqO/Q==", + "version": "4.0.8", + "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.8.tgz", + "integrity": "sha512-PXwfBhYu0hBCPw8Dn0E+WDYb7af3dSLVWKi3HGv84IdF4TyFoC0ysxFd0Goxw7nSv4T/PzEJQxsYsEiFCKo2BA==", "dev": true, "requires": { "braces": "^3.0.3",