This Helm chart deploys the TensorFusion Remote/Local vGPU Benchmark application, which includes a deployment for running the benchmark tests and a cronjob for automated testing.
- Kubernetes 1.19+
- Helm 3.2.0+
- PV provisioner support in the underlying infrastructure
- A GPU node with NVIDIA drivers installed
To install the chart with the release name my-release
:
helm install my-release ./helm/torchbench
The command deploys the benchmark application on the Kubernetes cluster with default configuration.
The following table lists the configurable parameters of the chart and their default values.
Parameter | Description | Default |
---|---|---|
replicaCount |
Number of replicas | 1 |
image.repository |
Image repository | crpi-wpzfqfci37r0ad3n.cn-hangzhou.personal.cr.aliyuncs.com/tensorfusionrobin/tensorfusionrobin |
image.tag |
Image tag | latest |
image.pullPolicy |
Image pull policy | Always |
serviceAccount.create |
Create service account | true |
serviceAccount.name |
Service account name | cronjob-sa |
podAnnotations |
Pod annotations | See values.yaml |
podLabels |
Pod labels | See values.yaml |
resources |
Pod resource requests and limits | See values.yaml |
nodeSelector |
Node selector | kubernetes.io/hostname: gpu-2 |
cronjob.schedule |
Cronjob schedule | 0 0 * * * |
cronjob.concurrencyPolicy |
Cronjob concurrency policy | Allow |
cronjob.successfulJobsHistoryLimit |
Number of successful jobs to keep | 3 |
cronjob.failedJobsHistoryLimit |
Number of failed jobs to keep | 1 |
The benchmark will run automatically according to the cronjob schedule. You can also manually trigger a benchmark run by:
- Finding the cronjob:
kubectl get cronjob
- Creating a job from the cronjob:
kubectl create job --from=cronjob/my-release-torchbench-test-runner manual-run
To view the benchmark results:
kubectl logs -l app=my-release-torchbench
To customize the configuration, create a custom values file:
helm install my-release ./helm/torchbench -f custom-values.yaml
To uninstall/delete the deployment:
helm uninstall my-release
If you encounter any issues:
- Check the pod status:
kubectl get pods -l app=my-release-torchbench
- Check the pod logs:
kubectl logs -l app=my-release-torchbench
- Check the cronjob status:
kubectl get cronjob
kubectl get jobs
- Check the service account:
kubectl get serviceaccount