Skip to content

docs: add log_collection_for_troubleshooting #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/app/_meta.global.ts
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,11 @@ export default {
node_pod_evictor: {}
}
},
troubleshooting: {
items: {
log_collection_for_troubleshooting: {}
}
},
tips: {
items: {
ecr_auto_create: {},
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: Log Collection for Troubleshooting
---

# Log Collection for Troubleshooting
This document provides a procedure to collect CloudPilot AI component logs from a Kubernetes cluster. The logs contain detailed operational information for troubleshooting CloudPilot AI-specific issues, such as why nodes are provisioned or de-provisioned. These logs do not include any sensitive system information.

## Export Collected Log Steps

1. **Connect to the target Kubernetes cluster** with administrative privileges.
2. **Export collected logs for sharing**:
```bash
kubectl -n cloudpilot port-forward svc/cloudpilot-workload-log-collector-svc 8080:8080 & PID=$!; timeout=10; found=0; while [ $timeout -gt 0 ]; do if nc -z -w 1 localhost 8080 2>/dev/null; then found=1; break; fi; sleep 1; ((timeout--)); done; [ $found -eq 1 ] && curl -s http://localhost:8080/logs > "logs_$(date +%Y%m%d%H%M%S).tgz" || echo "Timeout"; kill $PID 2>/dev/null; wait $PID 2>/dev/null
```
**Command breakdown**:
- Establishes a port-forward tunnel to the `cloudpilot-workload-log-collector-svc` service.
- Waits up to 10 seconds for the service endpoint to become reachable.
- Uses `curl` to fetch logs via the exposed HTTP endpoint.
- Saves logs as a timestamped `.tgz` file (e.g., `logs_20231015143045.tgz`).
- Gracefully terminates the background port-forward process.

After running this command, you will get one tgz file containing the detailed logs (e.g., logs_20231015143045.tgz). You can now share it with the CloudPilot AI Support team for any further questions.

If you encounter issues while exporting logs, please check the following:
- Your Kubernetes context is set to the correct cluster.
- The `cloudpilot-workload-log-collector-svc` service exists (`kubectl -n cloudpilot get svc`).
- Your permissions allow port-forwarding and pod access in the `cloudpilot` namespace.

## Security Considerations

The logs contain operational metadata (e.g., pod statuses, API call traces) but exclude:
- SSensitive workload data, such as pod environment variables
- Secrets or credentials
- Network traffic payloads