-
Notifications
You must be signed in to change notification settings - Fork 633
[k8s] Add Kubernetes support for ports #2713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this exciting feature! 🚀 Left several comments on the ports
related part 👀
sky/backends/cloud_vm_ray_backend.py
Outdated
# so that our backend do not need to know the specific details | ||
# for different clouds. | ||
if isinstance(cloud, (clouds.AWS, clouds.GCP, clouds.Azure)): | ||
if provision_lib.supports(repr(cloud), 'cleanup_ports'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! 🚀
sky/cli.py
Outdated
@@ -1750,17 +1761,23 @@ def status(all: bool, refresh: bool, ip: bool, show_spot_jobs: bool, | |||
show_all=False, | |||
limit_num_jobs_to_show=not all, | |||
is_called_by_user=False)) | |||
if ip: | |||
if ip or ports: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To discuss: Probably we could have an enum containing ip
and ports
, and use sth like --target ip
/--target ports
in the CLI?
sky/templates/kubernetes-ray.yml.j2
Outdated
@@ -25,6 +28,18 @@ provider: | |||
|
|||
ssh_jump_image: {{k8s_ssh_jump_image}} | |||
|
|||
ingress_name: {{cluster_name_on_cloud}}-skypilot-ingress | |||
|
|||
{%- if ports is not none %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current ports implementation won't put ports in the provider config. Instead, we use handle.launched_resources.ports
to keep track of all opened ports. Reference:
skypilot/sky/backends/cloud_vm_ray_backend.py
Lines 3067 to 3077 in bf33602
prev_ports = None | |
if prev_handle is not None: | |
prev_ports = prev_handle.launched_resources.ports | |
current_ports = handle.launched_resources.ports | |
open_new_ports = bool( | |
resources_utils.port_ranges_to_set(current_ports) - | |
resources_utils.port_ranges_to_set(prev_ports)) | |
if open_new_ports: | |
with rich_utils.safe_status( | |
'[bold cyan]Launching - Opening new ports'): | |
self._open_ports(handle) |
Can we change to the same implementation logic to support
sky launch --ports <new-ports>
to open new ports on an existing cluster? It seems like we can construct all other fields using port
and cluster_name
so keeping track of ports should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 186193c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hemildesai! This is super cool. Tested some functionality and seems to work nicely for a simple http server. Jupyter seems to have some problems, see comments below. Will take a closer look at the code soon.
def delete_namespaced_service(namespace: str, service_name: str) -> None: | ||
"""Deletes a service resource.""" | ||
core_api = kubernetes.core_api() | ||
core_api.delete_namespaced_service(service_name, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into a bug here - my previous provisioning had failed due to a library mismatch and ended up not creating the service in k8s.
When I tried to terminate the skypilot cluster (stuck in a bad INIT
state), I got:
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'f463ef5b-3fd6-488d-8c32-1ef4bcce7240', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'c57dbea3-732c-471b-840e-3fd589a06584', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7187fedc-79b8-4e30-933f-705406be228f', 'Date': 'Thu, 19 Oct 2023 17:28:55 GMT', 'Content-Length': '240'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"test-2ea4-skypilot-service--8888\" not found","reason":"NotFound","details":{"name":"test-2ea4-skypilot-service--8888","kind":"services"},"code":404}
We should probably handle any deletion failures due to port not existing by wrapping them into a new sky.exceptions.PortDoesNotExist class here, re-raising it and handling it upstream (perhaps by simply pass
) when provision_lib.cleanup_ports
is called in cloud_vm_ray_backend. cc @cblmemo - does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! The current implementation just prints some warning messages in the provision lib API. I think it is a good idea to handle them in a unified place 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 8a70791
sky/cli.py
Outdated
@@ -1672,6 +1677,12 @@ def _get_spot_jobs( | |||
'clusters, the returned IP address is the internal IP ' | |||
'of the head pod, and may not be accessible from outside ' | |||
'the cluster.')) | |||
@click.option('--ports', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm for some reason getting two endpoints listed:
$ sky status --ports test2
8888: http://35.222.205.70/skypilot/test2-2ea4_8888, https://35.222.205.70/skypilot/test2-2ea4_8888
Repro:
# task.yaml
resources:
ports: 8888
run: python -m http.server 8888
sky launch -c test2 task.yaml --cloud kubernetes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, this currently returns HTTP and HTTPS url but will default to HTTP for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3d411a1
sky/provision/__init__.py
Outdated
def query_ports( | ||
provider_name: str, | ||
cluster_name_on_cloud: str, | ||
provider_config: Optional[Dict[str, Any]] = None, | ||
) -> Dict[str, Tuple[str, str]]: | ||
"""Query details about ports on a cluster. | ||
|
||
Returns a dict with port as the key and the value as the http and https url to access the port. | ||
""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We should also probably add something for gcp and aws too, simply returning <ip>:<port>
so sky status --endpoints
can be used consistently across clouds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3d411a1
sky/cli.py
Outdated
@@ -1672,6 +1677,12 @@ def _get_spot_jobs( | |||
'clusters, the returned IP address is the internal IP ' | |||
'of the head pod, and may not be accessible from outside ' | |||
'the cluster.')) | |||
@click.option('--ports', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought - we should support this API for easy programmatic use too. For example, many users query their skypilot endpoints with:
$ curl http://$(sky status mycluster --ip):8888/myapi
Since the current output is of the format 8888: http://35.222.205.70/skypilot/test2-2ea4_8888
, it would require some parsing on the user's part.
Can we:
- Rename
--ports
to--endpoints
- If a port number is specified to
--endpoints
, then we simply return the URL. E.g.,:
$sky status test2 --endpoint 8888
http://35.222.205.70/skypilot/test2-2ea4_8888
- If a port number is not specified, then we list all ports open on the cluster:
$ sky status --endpoints test2
8888: http://35.222.205.70/skypilot/test2-2ea4_8888
8889: https://35.222.205.70/skypilot/test2-2ea4_8889
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the latest CLI command be sky status --endpoints test2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - that should work. Would be nice to change --ports
to --endpoint(s)
.
from sky.adaptors import kubernetes | ||
|
||
|
||
def create_namespaced_ingress(namespace: str, ingress_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jupyter does not seems to play nicely with ingress:
#jupyter.yaml
resources:
ports: 8888
setup: pip install jupyter
run: jupyter lab --port 8888 --no-browser --ip=0.0.0.0
On accessing https://35.222.205.70/skypilot/test-2ea4_8888/
in Chrome, I get 404:

Note that it tried to redirect me to https://35.222.205.70/lab, which is incorrect.
On accessing the exact suburl (/lab?token=....
), I get an empty white page but loading some js/css is likely failing with 404:
Can we configure the ingress to correctly redirect to the sub-path we have under the ingress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt fix! Left some other comments ;)
sky/provision/kubernetes/network.py
Outdated
assert provider_config is not None, cluster_name_on_cloud | ||
for port in ports: | ||
service_name = f"{cluster_name_on_cloud}-skypilot-service--{port}" | ||
ingress_name = f"{cluster_name_on_cloud}-skypilot-ingress--{port}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is there any way in k8s that we can register multiple ports in a single service? That way, we could get rid of the ports
argument; But if there is no such API I'm happy with the current implementation 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible. But, thinking about it, this becomes tricky for maintenance and modification. If we want to add ports or remove some, we will have to edit the service in place.
The benefit of multiple ports under a single service is that it shares the IP, but since the services are internal and the ingress routes are port specific, I don't think a shared IP provides much benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this wasn't so bad and is the default way for opening ports using the LoadBalancer
service type.
To resolve the issue with Jupyter and similar services, we decided to support LoadBalancer service type (which creates an external LB per cluster) so that each cluster can have a dedicated IP. The user can configure whether to use LoadBalancer or Ingress using skypilot config. Related changes in 55f2ece and 430e6a7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome @hemildesai! This works very nicely - I had a quick go with both nginx ingress and loadbalancer. There are a few edge cases to handle, mentioned in comments below. We should also update the --ports
flag to --endpoint(s)
.
Will take a closer look at the code soon.
sky/cli.py
Outdated
head_ip = handle.external_ips()[0] | ||
if ports: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should perhaps add some error handling when clusters have different port modes. E.g., if user switches their global config.yaml
from loadbalancer
to ingress
or vice-versa.
For instance, I launched one cluster with ports: loadbalancer
, then changed my config to ports: ingress
. On running sky status --ports jupyter
, I get this error. It would be nice to either fix this or cleanly ask the user to not mix port modes.
(base) ➜ ~ sky status --ports jupyter
D 11-02 16:31:37 skypilot_config.py:157] Using config path: /Users/romilb/.sky/config.yaml
D 11-02 16:31:37 skypilot_config.py:160] Config loaded:
D 11-02 16:31:37 skypilot_config.py:160] {'kubernetes': {'ports': 'loadbalancer'}}
D 11-02 16:31:37 skypilot_config.py:166] Config syntax check passed.
D 11-02 16:31:38 skypilot_config.py:93] User config: kubernetes.ports -> loadbalancer
D 11-02 16:31:38 skypilot_config.py:157] Using config path: /Users/romilb/.sky/config.yaml
D 11-02 16:31:38 skypilot_config.py:160] Config loaded:
D 11-02 16:31:38 skypilot_config.py:160] {'kubernetes': {'ports': 'loadbalancer'}}
D 11-02 16:31:38 skypilot_config.py:166] Config syntax check passed.
Traceback (most recent call last):
File "/Users/romilb/tools/anaconda3/bin/sky", line 8, in <module>
sys.exit(cli())
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/utils/common_utils.py", line 311, in _record
return f(*args, **kwargs)
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/cli.py", line 1163, in invoke
return super().invoke(ctx)
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/utils/common_utils.py", line 332, in _record
return f(*args, **kwargs)
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/cli.py", line 1821, in status
port_details = provision_lib.query_ports(
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/__init__.py", line 41, in _wrapper
return impl(*args, **kwargs)
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/kubernetes/network.py", line 162, in query_ports
return _query_ports_for_loadbalancer(
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/kubernetes/network.py", line 184, in _query_ports_for_loadbalancer
external_ip = network_utils.get_loadbalancer_ip(
File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/kubernetes/network_utils.py", line 169, in get_loadbalancer_ip
service = core_api.read_namespaced_service(service_name, namespace)
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 25141, in read_namespaced_service
return self.read_namespaced_service_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 25228, in read_namespaced_service_with_http_info
return self.api_client.call_api(
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/rest.py", line 244, in GET
return self.request("GET", url,
File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/rest.py", line 238, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ba277f1b-fb86-41c8-888b-32896a6f1e9c', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'cf43db27-7222-4741-adea-8b7f8f0e681e', 'X-Kubernetes-Pf-Prioritylevel-Uid': '17aff9c6-0fa7-4909-a811-4644194e9001', 'Date': 'Thu, 02 Nov 2023 23:31:38 GMT', 'Content-Length': '244'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"jupyter-2ea4-skypilot-loadbalancer\" not found","reason":"NotFound","details":{"name":"jupyter-2ea4-skypilot-loadbalancer","kind":"services"},"code":404}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can persist the mode for each cluster in the provider config during creation time. This will avoid this issue and allow users to have separate modes for separate clusters. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in 4ae6602
class KubernetesPortMode(enum.Enum): | ||
"""Enum for the different types of modes supported for opening | ||
ports on Kubernetes. | ||
""" | ||
INGRESS = 'ingress' | ||
LOADBALANCER = 'loadbalancer' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic! I love the flexibility this provides. Would be nice to document this in docs/source/reference/config.rst
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, will update.
sky/cli.py
Outdated
@@ -1672,6 +1677,12 @@ def _get_spot_jobs( | |||
'clusters, the returned IP address is the internal IP ' | |||
'of the head pod, and may not be accessible from outside ' | |||
'the cluster.')) | |||
@click.option('--ports', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - that should work. Would be nice to change --ports
to --endpoint(s)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome work!! 🚀 Left one comment to discuss 🫡
sky/provision/kubernetes/network.py
Outdated
selector_key='skypilot-cluster', | ||
selector_value=cluster_name_on_cloud, | ||
) | ||
network_utils.create_namespaced_service( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current ports implementation is incremental, i.e. if I do sky launch -c mycluster --ports 8080
and then sky launch -c mycluster --ports 8081
, the second sky launch
will update the ports to '8080-8081'
. In
skypilot/sky/backends/cloud_vm_ray_backend.py
Lines 3009 to 3017 in 53ba415
def _open_ports(self, handle: CloudVmRayResourceHandle) -> None: | |
cloud = handle.launched_resources.cloud | |
logger.debug( | |
f'Opening ports {handle.launched_resources.ports} for {cloud}') | |
config = common_utils.read_yaml(handle.cluster_yaml) | |
provider_config = config['provider'] | |
provision_lib.open_ports(repr(cloud), handle.cluster_name_on_cloud, | |
handle.launched_resources.ports, | |
provider_config) |
We passed the updated list to the
open_ports
API ('8080-8081'
in the example above). Therefore, the open_ports
API should update the ports to what ports
describe. I'm not very familiar with k8s API, but the API is named create_xxx
rather than create_or_update_xxx
; so just want to make sure, does this API support update semantic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, I will update the ingress implementation to be create_or_update
. LoadBalancer is already create_or_update
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 1d95434
Co-authored-by: Tian Xia <[email protected]>
LoadBalancer service
430e6a7
to
da8fad6
Compare
sky/clouds/kubernetes.py
Outdated
deploy_vars = { | ||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we retain the deploy_vars
so the return statement remains clean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 4ae6602
@@ -123,12 +131,28 @@ def open_ports( | |||
def cleanup_ports( | |||
provider_name: str, | |||
cluster_name_on_cloud: str, | |||
ports: List[str], | |||
provider_config: Optional[Dict[str, Any]] = None, | |||
) -> None: | |||
"""Delete any opened ports.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the docstr and make this an optional arg? Then the semantics can be:
- If ports are specified, clean up only the specific ports
- If not specified, clean up all ports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. However, for loadbalancer based services, a single service contains all ports. In that case, we'll have to get all_ports - cleanup_ports
and recreate the service using that set.
Also, we'll have to support this for other clouds as well. Maybe we can leave it out of scope for this PR and do it in a followup PR. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - we should update the docstr and mark it as a TODO to implement the desired behavior
sky/provision/__init__.py
Outdated
) -> Dict[str, Tuple[str, str]]: | ||
"""Query details about ports on a cluster. | ||
|
||
Returns a dict with port as the key and the value as the http and https url to access the port. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ports may not necessarily be exposing both HTTP and HTTPS urls. Should we change the return type to Dict[str, List[Endpoint]]
, which is a dict of {port: [endpoint]}?
Endpoint is new class like so:
class Endpoint:
pass
class SocketEndpoint(Endpoint):
def __init__(self, host, port):
self.host = host
self.port = port
def __str__(self):
return f'{self.host}:{self.port}'
class HTTPEndpoint(Endpoint):
def __init__(self, url):
self.url = url
def __str__(self):
return self.url
class HTTPSEndpoint(Endpoint):
def __init__(self, url):
self.url = url
def __str__(self):
return self.url
Example return values:
# Example for k8s
{8080:
[
HTTPEndpoint("http://www.example.com/mycluster/8080"),
HTTPSEndpoint("https://www.example.com/mycluster/8080")
]
}
# Example for raw socket
{8080:
[
SocketEndpoint(ip='1.1.1.1', port=8080),
]
}
cc @cblmemo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 4ae6602
if (cloud.PROVISIONER_VERSION >= clouds.ProvisionerVersion. | ||
RAY_PROVISIONER_SKYPILOT_TERMINATOR or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was recently added, not sure if it's necessary along with if provision_lib.supports(repr(cloud), 'cleanup_ports')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cblmemo - can you take a quick look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added in #2681, https://github.com/skypilot-org/skypilot/pull/2681/files#diff-f14940b9d929546868fb41cb0b27a1db08b321b7bbc0cb3ab3b864589aa82aa8
I still think the provision_lib.supports(cloud, 'cleanup_ports')
is needed since clouds.ProvisionerVersion
is not related to what cloud supports open ports. The new condition here is essentially still isinstance(cloud, (GCP, AWS, Azure))
. The current implementation LGTM; cc @Michaelvll for a double-check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we change the check to the following:
try:
cloud.check_features_are_supported(handle.launched_resources, {clouds.CloudImplementationFeatures.OPEN_PORTS})
provision_lib.cleanup_ports(repr(cloud),
cluster_name_on_cloud,
handle.launched_resources.ports,
config['provider'])
except exceptions.NotSupportedError:
pass
except ...
The provision_lib.supports
seems duplicated with the CloudImplementationFeatures
in Cloud
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Let's use cloud.check_features_are_supported
. This should be fine since all ports related functions is implemented in new provisioned API.
This reverts commit 7302e45.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hemildesai! Pushed some additional fixes for error msg handling. Ran smoke tests, LGTM!
EDIT - testing some final local kubernetes integration before merging
Made some changes to error handling and disallowed opening ports |
Fixed other comments and removed |
Background for this can be found here
This PR adds Kubernetes support for ports. It uses Ingress to accomplish this. For each port, it creates a new service and a new ingress rule. The port can then be accessed by visiting the url obtained by running
sky status {cluster_name} --ports
. In the initial PR, we assume that the ingress controller is already created by the Kubernetes cluster admin, but in the future we will add support for creating one automatically if none exists. We can add documentation on how to manually install the ingress controller for now. Currently, the ingress controller of choice is nginx.The new cli command looks like
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh