Skip to content

[k8s_cloud] Add sshjump host support. #2201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

aviweit
Copy link
Contributor

@aviweit aviweit commented Jul 9, 2023

Add an initial support for ssh jump host.

By this, only single port (nodeport) is opened on a target k8s cluster where ssh connections established from skypilot user targeted to ray cluster - are tunneled through ssh jump host pod.

@aviweit aviweit force-pushed the k8s_cloud-sshjumphost2 branch 3 times, most recently from 9063150 to e66cda1 Compare July 9, 2023 19:05
@aviweit aviweit changed the title Add sshjump host support. [k8s_cloud] Add sshjump host support. Jul 9, 2023
@@ -958,7 +957,6 @@ def write_cluster_config(

# Kubernetes only:
'skypilot_k8s_image': k8s_image,
'ssh_key_secret_name': ssh_key_secret_name,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssh_key_secret_name doesn't seem to be used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for catching - this is removed in the current latest k8s_cloud branch.

@@ -17,7 +17,7 @@ provider:
module: sky.skylet.providers.kubernetes.KubernetesNodeProvider

# Use False if running from outside of k8s cluster
use_internal_ips: false
use_internal_ips: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes ssh itself (not its proxy command) to target internal pod ipaddress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to use ssh_proxy_command_config handled here. Our ssh_proxy seems to be dynamic; port is dynamic NodePort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume port assigned to nodeport doesn't change (since the service is created only once)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The port is the same as long as the sshjump service is running.

@aviweit aviweit force-pushed the k8s_cloud-sshjumphost2 branch from e66cda1 to 832818c Compare July 10, 2023 07:57
@aviweit aviweit force-pushed the k8s_cloud-sshjumphost2 branch from 832818c to 73e35d6 Compare July 10, 2023 09:29
@aviweit
Copy link
Contributor Author

aviweit commented Jul 10, 2023

I confirm that launching the following task against kind, works well.

SSH interactions with ray head during provision phase, is done through sshjump host as well as user can access ray head with ssh where head is being reached via sshjump pod.

hello.yaml:

num_nodes: 1

resources:
  memory: 2
  cpus: 1
  # Optional; if left out, pick from the available clouds.
  cloud: kubernetes
workdir: .

setup: |
  echo "Running setup."

run: |
  echo "Hello, User, how are you?"
  conda env list

create a local kind k8s cluster:

sky local up

ensure cluster is up:

kubectl get nodes | grep skypilot-control-plane

launch the task

sky launch hello.yaml

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aviweit! This is great. I skimmed through the code and left some comments. Haven't tried running yet. Will take another pass soon.

Can you also merge the latest changes from k8s_cloud branch? It has a few fixes, and some changes in how ssh auth works (especially after #2119), so would be good to verify this still works.

Comment on lines 1 to 24
FROM continuumio/miniconda3:22.11.1

RUN apt update -y && \
apt install sudo vim openssh-server -y && \
rm -rf /var/lib/apt/lists/*

# Setup SSH and generate hostkeys
RUN mkdir -p /var/run/sshd && \
sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd && \
cd /etc/ssh/ && \
ssh-keygen -A

RUN useradd -m -s /bin/bash sky && \
echo "sky ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \
echo 'Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"' > /etc/sudoers.d/sky

# Switch to sky user
USER sky

# Add /home/sky/.local/bin/ to PATH
RUN echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc

WORKDIR /home/sky
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious - what do you think about reusing the image generated from Dockerfile_k8s (us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot:latest) instead of creating a new one?

Pros: Reduces image maintenance burden for us, may be faster to pull since cluster would likely already have the main skypilot:latest image for running tasks
Cons: Dockerfile_k8s is quite bloated because it installs SkyPilot and its dependencies.

Might be worth looking at image sizes for both before taking a call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @romilbhardwaj ,

I built both images using build_image.sh and noticed the below size differences. I agree with your Pros and Cons.

$docker images
REPOSITORY                     TAG       IMAGE ID       CREATED        SIZE
...
sshjump                        latest    9c4a7ca3d121   4 hours ago    479MB
skypilot                       latest    448c7c3b7553   4 hours ago    1.32GB
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aviweit. Since this image will be pulled only once, lets remove this and reuse Dockerfile_k8s (us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot:latest).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romilbhardwaj , done.

@@ -10,6 +10,8 @@
from typing import Any, Dict, Tuple
import uuid

from urllib.parse import urlparse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since urllib is standard library, perhaps should be added at L11 before import uuid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @romilbhardwaj , I removed import urlparse, as it is now being used by clouds.Kubernetes.get_external_ip() at setup_kubernetes_authentication.

sshjump_name = clouds.Kubernetes.SKY_SSH_JUMP_NAME
sshjump_image = clouds.Kubernetes.SSH_JUMP_IMAGE

def _to_sshjump_pod_spec(name, image, secret):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define this pod spec as a jinja template (e.g., templates/k8s_ssh_jump.j2) that is filled in with name, image and secret through a function call? Same for the service spec below. Might be able to put them in the same file too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I pushed the changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume port assigned to nodeport doesn't change (since the service is created only once)?

@@ -958,7 +957,6 @@ def write_cluster_config(

# Kubernetes only:
'skypilot_k8s_image': k8s_image,
'ssh_key_secret_name': ssh_key_secret_name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for catching - this is removed in the current latest k8s_cloud branch.

aviweit added 3 commits July 13, 2023 12:45
- it seems not to be used anymore
- template file to store sshjump pod and service specs
- update setup_kubernetes_authentication to use it
@aviweit
Copy link
Contributor Author

aviweit commented Jul 13, 2023

Hi @romilbhardwaj , I have merged the PR with k8s_cloud branch and tested again (with the additional commits) - with the task yaml that is mentioned at #2201 (comment).

It worked well with skypilot installed on Ubuntu 20.04 host - against an external kubernetes cluster (deployed via kubeadm), and a one deployed locally (kind).

@aviweit aviweit marked this pull request as ready for review July 13, 2023 15:10
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aviweit! We probably also need to handle the lifecycle of the jump pod - left some comments. Also, let's use us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot:latest image for the ssh jump pod.

Comment on lines 1 to 24
FROM continuumio/miniconda3:22.11.1

RUN apt update -y && \
apt install sudo vim openssh-server -y && \
rm -rf /var/lib/apt/lists/*

# Setup SSH and generate hostkeys
RUN mkdir -p /var/run/sshd && \
sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd && \
cd /etc/ssh/ && \
ssh-keygen -A

RUN useradd -m -s /bin/bash sky && \
echo "sky ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \
echo 'Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"' > /etc/sudoers.d/sky

# Switch to sky user
USER sky

# Add /home/sky/.local/bin/ to PATH
RUN echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc

WORKDIR /home/sky
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aviweit. Since this image will be pulled only once, lets remove this and reuse Dockerfile_k8s (us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot:latest).

content = json.loads(_content)

try:
kubernetes.core_api().create_namespaced_pod('default', content['pod_spec'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think a bit about the lifecycle of this jump pod.

  • Creation - Creation of this jump ssh pod happens when the user creates their first SkyPilot cluster in the Kubernetes cluster.
  • Run - The jump SSH pod is kept alive as long as there is a SkyPilot cluster running (i.e., there exists a Kubernetes pod with label parent: skypilot in the same namespace)
  • Termination - If there hasn't been a pod with label parent: skypilot in the same namespace since the past n minutes (say, 10 minutes), this jump pod should terminate itself.

We can do this by running a script like this in the jump ssh pod as the command for the pod:

import os
import datetime
import pytz
import time
from kubernetes import client, config

# Load kube config
config.load_incluster_config()

v1 = client.CoreV1Api()

# Get the current namespace from the pod service account
with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f:
    current_namespace = f.read()

# Set the time delta for checking last active pods
time_delta = datetime.timedelta(minutes=10)

# Set delay for each retry
retry_delay = 60  # In seconds

while True:
    # Get the current time
    now = datetime.datetime.now(pytz.UTC)

    found = False
    # List the pods in the current namespace
    ret = v1.list_namespaced_pod(current_namespace)
    for i in ret.items:
        if i.metadata.labels and 'parent' in i.metadata.labels and i.metadata.labels['parent'] == 'skypilot':
            # Calculate the elapsed time since the pod was last active
            elapsed_time = now - i.metadata.creation_timestamp
            # If the pod was active in the last 10 minutes, set found to True
            if elapsed_time < time_delta:
                found = True
                break

    # If no active pods were found with the specified label, exit the script
    if not found:
        print("No active pods found with label 'parent: skypilot' in the past 10 minutes. Exiting...")
        exit(1)

    # If pods were found, sleep for the specified delay and then retry
    print(f"Active pods found with label 'parent: skypilot'. Retrying in {retry_delay} seconds...")
    time.sleep(retry_delay)

What do you think?

@@ -0,0 +1,74 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this file as a YAML instead of a json? Trying to have something that can also be easily used with kubectl apply -f for debugging/testing (of course, after replacing template variables).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romilbhardwaj , yes. I changed. Thanks.

@aviweit
Copy link
Contributor Author

aviweit commented Jul 19, 2023

Hi @romilbhardwaj

Thanks a lot for the comments. I updated the code to reuse Dockerfile_k8s and updated the template format to yaml. I re-tested the recent changes in our environment and it seems to work well.

Your suggestion on sshjump lifecycle seems to be fine for me. There seems to be additional work such as defining RBAC for the pod to list other pods. I tested it and got that failure.

I would like to ask whether it would be ok to merge this PR and to open another PR to handle sshjump lifecycle? Other PRs can be probably opened to handle additional enhancements as needed.
What do you think?

Thanks again.

@romilbhardwaj
Copy link
Collaborator

@aviweit - this is great! Tried it out - works nicely.

There seems to be additional work such as defining RBAC for the pod to list other pods. I tested it and got that failure.

Is there some way to upload kubeconfig credentials so that jump pod can use those to authenticate with the control plane and list pods?

Asking because I think lifecycle management of the jump host is important to get right before we ship this. We do not want to leave an orphan ssh jump pod running on users' clusters after they are done using SkyPilot. If we can figure out a quick solution to the above problem, it'll be great to have in this PR.

If you think it will be an involved process, then it'll be good to have a draft solution verifying this kind of lifecycle management is possible. Once we can verify it can be done, then we can merge this PR and create another one to address it.

What do you think? Thanks again for adding this!

@romilbhardwaj
Copy link
Collaborator

@aviweit - I've made some updates to k8s_cloud branch. We use the autoscaler service account which has full permissions over the namespace. Is it possible to use the same autoscaler service account with this jump pod and use it to manage the lifecycle of the pod?

@aviweit
Copy link
Contributor Author

aviweit commented Jul 23, 2023

Hi @romilbhardwaj , I merged k8s_cloud changes into this PR and added lifecycle management of the jump host. It seems to work well in my environment. Can you please review?

Regarding reusing autoscaler service account, I think it would be good to define a dedicated one that will contain only the permissions needed by the sshjump host lifecycle logic.

I would like to ask whether this PR can be merged and if needed, additional PRs will be opened? This will save us time on keeping this PR up to date in respect to upstream changes.

@romilbhardwaj , I think that you will need to re-build and push Dockerfile_k8s because of adding sshjump-lcm.py.

Thanks again.

@aviweit
Copy link
Contributor Author

aviweit commented Jul 26, 2023

Hi @romilbhardwaj , I have updated the code to create sshjump related resources in the correct namespace (and a small update to if statement in lcm script).

Thanks a lot.

@aviweit
Copy link
Contributor Author

aviweit commented Jul 28, 2023

@romilbhardwaj , I added sshjump lcm to monitor only ray pods it is responsible for.
The changes are currently under sub-branch k8s_cloud-sshjumphost2-multiuser: https://github.com/aviweit/skypilot/tree/k8s_cloud-sshjumphost2-multiuser
commit: aviweit@21879c5.

If it is ok, it can be merged to this PR. Thanks.

@romilbhardwaj romilbhardwaj deleted the branch skypilot-org:k8s_cloud August 2, 2023 10:57
@Michaelvll
Copy link
Collaborator

We should probably re-open this PR by changing the target branch. The original k8s_cloud branch has been deleted as it was merged to the master.

@romilbhardwaj
Copy link
Collaborator

Hey @aviweit ! This PR got closed when we merged #2096 into master.

For our next sprint, we'll be switching to k8s_cloud_beta1 branch. Can you re-open this PR on that branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants