diff --git a/cmd/xpumanager_sidecar/README.md b/cmd/xpumanager_sidecar/README.md index fbc4b37c0..b5ecf7aef 100644 --- a/cmd/xpumanager_sidecar/README.md +++ b/cmd/xpumanager_sidecar/README.md @@ -23,7 +23,7 @@ Intel GPUs can be interconnected via an XeLink. In some workloads it is benefici | -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) | | -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links | | -allow-subdeviceless-links | bool | false | Include xelinks also for devices that do not have subdevices | -| -use-https | bool | false | Use HTTPS protocol when connecting to XPU Manager | +| -cert | string | "" | Use HTTPS and verify server's endpoint | The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options. @@ -50,7 +50,7 @@ See [the development guide](../../DEVEL.md) for details if you want to deploy a Install XPU Manager daemonset with the XeLink sidecar ```bash -$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=' +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http?ref=' ``` Please see XPU Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes). @@ -60,7 +60,7 @@ Please see XPU Manager Kubernetes files for additional info on [installation](ht Use patch to add sidecar into the XPU Manager daemonset. ```bash -$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=' +$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml?ref=' ``` NOTE: The sidecar patch will remove other resources from the XPU Manager container. If your XPU Manager daemonset is using, for example, the smarter device manager resources, those will be removed. @@ -76,7 +76,25 @@ master,0.0-1.0_0.1-1.1 ### Use HTTPS with XPU Manager -XPU Manager can be configured to use HTTPS on the metrics interface. For the gunicorn sidecar, cert and key files have to be added to the command: +There is an alternative deployment that uses HTTPS instead of HTTP. The reference deployment requires `cert-manager` to provide a certificate for TLS. To deploy: + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/cert-manager?ref=' +``` + +The deployment requests a certificate and key from `cert-manager`. They are then provided to the gunicorn container as secrets and are used in the HTTPS interface. The sidecar container uses the same certificate to verify the server. + +> *NOTE*: The HTTPS deployment uses self-signed certificates. For production use, the certificates should be properly set up. + +
+Enabling HTTPS manually + +If one doesn't want to use `cert-manager`, the same can be achieved manually by creating certificates with openssl and then adding it to the deployment. The steps are roughly: +1) Create a certificate with [openssl](https://www.linode.com/docs/guides/create-a-self-signed-tls-certificate/) +1) Create a secret from the [certificate & key](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/). +1) Change the deployment: + +* Add certificate and key to gunicorn container: ``` - command: - gunicorn @@ -87,8 +105,7 @@ XPU Manager can be configured to use HTTPS on the metrics interface. For the gun - xpum_rest_main:main() ``` -The gunicorn container will also need the tls.crt and tls.key files within the container. For example: - +* Add secret mounting to the Pod: ``` containers: - name: python-exporter @@ -101,44 +118,19 @@ The gunicorn container will also need the tls.crt and tls.key files within the c secret: defaultMode: 420 secretName: xpum-server-cert -``` - -In this case, the secret providing the certificate and key is called `xpum-server-cert`. - -The certificate and key can be [added manually to a secret](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/). Another way to achieve a secret is to leverage [cert-manager](https://cert-manager.io/). - -
-Example for the Cert-manager objects - -Cert-manager will create a self-signed certificate and the private key, and store them into a secret called `xpum-server-cert`. + ``` +* Add use-https and cert to sidecar ``` -apiVersion: cert-manager.io/v1 -kind: Issuer -metadata: - name: selfsigned-issuer -spec: - selfSigned: {} ---- -apiVersion: cert-manager.io/v1 -kind: Certificate -metadata: - name: serving-cert -spec: - dnsNames: - - xpum.svc - - xpum.svc.cluster.local - issuerRef: - kind: Issuer - name: selfsigned-issuer - secretName: xpum-server-cert + name: xelink-sidecar + volumeMounts: + - mountPath: /certs + name: certs + readOnly: true + args: +... + - --cert=/certs/tls.crt +... ```
- -For the XPU Manager sidecar, `use-https` has to be added to the arguments. Then the sidecar will leverage HTTPS with the connection to the metrics interface. -``` - args: - - -v=2 - - -use-https -``` diff --git a/cmd/xpumanager_sidecar/main.go b/cmd/xpumanager_sidecar/main.go index 98a82e806..4366d0f1c 100644 --- a/cmd/xpumanager_sidecar/main.go +++ b/cmd/xpumanager_sidecar/main.go @@ -19,6 +19,7 @@ import ( "bytes" "context" "crypto/tls" + "crypto/x509" "flag" "fmt" "io" @@ -61,12 +62,12 @@ type xpuManagerSidecar struct { dstFilePath string labelNamespace string url string + certFile string interval uint64 startDelay uint64 xpumPort uint64 laneCount uint64 allowSubdevicelessLinks bool - useHTTPS bool } func (e *invalidEntryErr) Error() string { @@ -78,12 +79,30 @@ func (xms *xpuManagerSidecar) getMetricsDataFromXPUM() []byte { Timeout: 5 * time.Second, } - if xms.useHTTPS { - customTransport := http.DefaultTransport.(*http.Transport).Clone() - //#nosec - customTransport.TLSClientConfig = &tls.Config{InsecureSkipVerify: true} + if len(xms.certFile) > 0 { + cert, err := os.ReadFile(xms.certFile) + if err != nil { + klog.Warning("Failed to read cert: ", err) + + return nil + } - client.Transport = customTransport + certPool := x509.NewCertPool() + if !certPool.AppendCertsFromPEM(cert) { + klog.Warning("Adding server cert to pool failed") + + return nil + } + + tr := &http.Transport{ + TLSClientConfig: &tls.Config{ + MinVersion: tls.VersionTLS12, + RootCAs: certPool, + ServerName: "127.0.0.1", + }, + } + + client.Transport = tr } ctx := context.Background() @@ -380,7 +399,7 @@ func main() { flag.Uint64Var(&xms.laneCount, "lane-count", 4, "minimum lane count for xelink") flag.StringVar(&xms.labelNamespace, "label-namespace", "gpu.intel.com", "namespace for the labels") flag.BoolVar(&xms.allowSubdevicelessLinks, "allow-subdeviceless-links", false, "allow xelinks that are not tied to subdevices (=1 tile GPUs)") - flag.BoolVar(&xms.useHTTPS, "use-https", false, "Use HTTPS protocol to connect to xpumanager") + flag.StringVar(&xms.certFile, "cert", "", "Use HTTPS and verify server's endpoint") klog.InitFlags(nil) flag.Parse() @@ -390,7 +409,8 @@ func main() { } protocol := "http" - if xms.useHTTPS { + + if len(xms.certFile) > 0 { protocol = "https" } diff --git a/deployments/xpumanager_sidecar/base/kustomization.yaml b/deployments/xpumanager_sidecar/base/kustomization.yaml new file mode 100644 index 000000000..7a45ac242 --- /dev/null +++ b/deployments/xpumanager_sidecar/base/kustomization.yaml @@ -0,0 +1,2 @@ +resources: +- https://github.com/intel/xpumanager/deployment/kubernetes/daemonset/base/?ref=V1.2.38 diff --git a/deployments/xpumanager_sidecar/kustomization.yaml b/deployments/xpumanager_sidecar/kustomization.yaml deleted file mode 100644 index a72b9631c..000000000 --- a/deployments/xpumanager_sidecar/kustomization.yaml +++ /dev/null @@ -1,7 +0,0 @@ -resources: -- https://github.com/intel/xpumanager/deployment/kubernetes/daemonset/base/?ref=V1.2.29 -namespace: monitoring -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization -patches: -- path: kustom/kustom_xpumanager.yaml diff --git a/deployments/xpumanager_sidecar/overlays/cert-manager/certs.yaml b/deployments/xpumanager_sidecar/overlays/cert-manager/certs.yaml new file mode 100644 index 000000000..79650f9cb --- /dev/null +++ b/deployments/xpumanager_sidecar/overlays/cert-manager/certs.yaml @@ -0,0 +1,20 @@ +apiVersion: cert-manager.io/v1 +kind: Issuer +metadata: + name: selfsigned-issuer +spec: + selfSigned: {} +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: serving-cert +spec: + ipAddresses: + - "127.0.0.1" + privateKey: + rotationPolicy: Always + issuerRef: + kind: Issuer + name: selfsigned-issuer + secretName: xpum-server-cert diff --git a/deployments/xpumanager_sidecar/overlays/cert-manager/kustomization.yaml b/deployments/xpumanager_sidecar/overlays/cert-manager/kustomization.yaml new file mode 100644 index 000000000..d482aadc5 --- /dev/null +++ b/deployments/xpumanager_sidecar/overlays/cert-manager/kustomization.yaml @@ -0,0 +1,8 @@ +resources: +- ../../base +- certs.yaml +namespace: monitoring +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +patches: +- path: xpumanager.yaml diff --git a/deployments/xpumanager_sidecar/overlays/cert-manager/xpumanager.yaml b/deployments/xpumanager_sidecar/overlays/cert-manager/xpumanager.yaml new file mode 100644 index 000000000..54e83771e --- /dev/null +++ b/deployments/xpumanager_sidecar/overlays/cert-manager/xpumanager.yaml @@ -0,0 +1,60 @@ +apiVersion: apps/v1 +kind: DaemonSet +metadata: + labels: + app: intel-xpumanager + name: intel-xpumanager +spec: + template: + spec: + volumes: + - name: features-d + hostPath: + path: "/etc/kubernetes/node-feature-discovery/features.d/" + - name: xpum-cert + secret: + secretName: xpum-server-cert + containers: + - name: python-exporter + volumeMounts: + - name: xpum-cert + mountPath: "/cert" + command: + - gunicorn + - --bind + - 0.0.0.0:29999 + - --worker-connections + - "64" + - --worker-class + - gthread + - --workers + - "1" + - --threads + - "4" + - --keyfile=/cert/tls.key + - --certfile=/cert/tls.crt + - xpum_rest_main:main() + startupProbe: + httpGet: + scheme: HTTPS + livenessProbe: + httpGet: + scheme: HTTPS + - name: xelink-sidecar + image: intel/intel-xpumanager-sidecar:devel + imagePullPolicy: IfNotPresent + args: + - -v=2 + - --cert=/cert/tls.crt + volumeMounts: + - name: features-d + mountPath: "/etc/kubernetes/node-feature-discovery/features.d/" + - name: xpum-cert + mountPath: "/cert" + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsUser: 0 diff --git a/deployments/xpumanager_sidecar/overlays/http/kustomization.yaml b/deployments/xpumanager_sidecar/overlays/http/kustomization.yaml new file mode 100644 index 000000000..0d5ac1bb2 --- /dev/null +++ b/deployments/xpumanager_sidecar/overlays/http/kustomization.yaml @@ -0,0 +1,7 @@ +resources: +- ../../base +namespace: monitoring +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +patches: +- path: xpumanager.yaml diff --git a/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml b/deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml similarity index 94% rename from deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml rename to deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml index 3ce726271..ce326f147 100644 --- a/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml +++ b/deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml @@ -14,7 +14,7 @@ spec: containers: - name: xelink-sidecar image: intel/intel-xpumanager-sidecar:devel - imagePullPolicy: Always + imagePullPolicy: IfNotPresent args: - -v=2 volumeMounts: