From 10afecd0c97b414d1fe740201bb0b0d416e9c877 Mon Sep 17 00:00:00 2001 From: helen frank Date: Tue, 17 Jun 2025 17:24:39 +0800 Subject: [PATCH] docs: add log_collection_for_troubleshooting Signed-off-by: helen frank --- src/app/_meta.global.ts | 5 +++ .../log_collection_for_troubleshooting.mdx | 34 +++++++++++++++++++ 2 files changed, 39 insertions(+) create mode 100644 src/content/guide/troubleshooting/log_collection_for_troubleshooting.mdx diff --git a/src/app/_meta.global.ts b/src/app/_meta.global.ts index 2178281..b9fd432 100644 --- a/src/app/_meta.global.ts +++ b/src/app/_meta.global.ts @@ -47,6 +47,11 @@ export default { node_pod_evictor: {} } }, + troubleshooting: { + items: { + log_collection_for_troubleshooting: {} + } + }, tips: { items: { ecr_auto_create: {}, diff --git a/src/content/guide/troubleshooting/log_collection_for_troubleshooting.mdx b/src/content/guide/troubleshooting/log_collection_for_troubleshooting.mdx new file mode 100644 index 0000000..50806ee --- /dev/null +++ b/src/content/guide/troubleshooting/log_collection_for_troubleshooting.mdx @@ -0,0 +1,34 @@ +--- +title: Log Collection for Troubleshooting +--- + +# Log Collection for Troubleshooting +This document provides a procedure to collect CloudPilot AI component logs from a Kubernetes cluster. The logs contain detailed operational information for troubleshooting CloudPilot AI-specific issues, such as why nodes are provisioned or de-provisioned. These logs do not include any sensitive system information. + +## Export Collected Log Steps + +1. **Connect to the target Kubernetes cluster** with administrative privileges. +2. **Export collected logs for sharing**: + ```bash + kubectl -n cloudpilot port-forward svc/cloudpilot-workload-log-collector-svc 8080:8080 & PID=$!; timeout=10; found=0; while [ $timeout -gt 0 ]; do if nc -z -w 1 localhost 8080 2>/dev/null; then found=1; break; fi; sleep 1; ((timeout--)); done; [ $found -eq 1 ] && curl -s http://localhost:8080/logs > "logs_$(date +%Y%m%d%H%M%S).tgz" || echo "Timeout"; kill $PID 2>/dev/null; wait $PID 2>/dev/null + ``` + **Command breakdown**: + - Establishes a port-forward tunnel to the `cloudpilot-workload-log-collector-svc` service. + - Waits up to 10 seconds for the service endpoint to become reachable. + - Uses `curl` to fetch logs via the exposed HTTP endpoint. + - Saves logs as a timestamped `.tgz` file (e.g., `logs_20231015143045.tgz`). + - Gracefully terminates the background port-forward process. + + After running this command, you will get one tgz file containing the detailed logs (e.g., logs_20231015143045.tgz). You can now share it with the CloudPilot AI Support team for any further questions. + +If you encounter issues while exporting logs, please check the following: +- Your Kubernetes context is set to the correct cluster. +- The `cloudpilot-workload-log-collector-svc` service exists (`kubectl -n cloudpilot get svc`). +- Your permissions allow port-forwarding and pod access in the `cloudpilot` namespace. + +## Security Considerations + +The logs contain operational metadata (e.g., pod statuses, API call traces) but exclude: +- SSensitive workload data, such as pod environment variables +- Secrets or credentials +- Network traffic payloads