Skip to content

Launch Pythia Platform #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kmpaul opened this issue Feb 10, 2022 · 10 comments
Closed

Launch Pythia Platform #213

kmpaul opened this issue Feb 10, 2022 · 10 comments
Labels
infrastructure Infrastructure related issue needs info If more info has been requested from the author, apply this label. needs triage This can be kept if the triager is unsure which next steps to take

Comments

@kmpaul
Copy link
Collaborator

kmpaul commented Feb 10, 2022

We should move ahead with launching our own BinderHub in the cloud. @brian-rose and @ktyle have experience with their own JupyterHub/BinderHub instances, and their experience will be invaluable. There are some outstanding questions that need answers before we can proceed, however:

  1. What cloud platform should we use?
  2. If we are providing a public resource, we probably need authentication to prevent things like bitcoin mining. Authentication can be done with GitHub. But what other considerations need to be addressed if this is what we want to pursue?
    • Should users have their own permanent storage space? How much? Or should it be ephemeral?
    • GitHub is an option, but what authentication do we need?
    • How do we allow people to create an account / sign up to use the Pythia hub? (Auth is one thing, but allowing specific individuals access may be necessary if we are providing parallel resources.)

What else?

@kmpaul kmpaul added the infrastructure Infrastructure related issue label Feb 10, 2022
@brian-rose
Copy link
Member

I'm way out of my league on these questions, but I like the idea of using GitHub for authentication just because it reinforces the message that GitHub is our preferred collaboration platform. It's a way to get more of our users onto the platform.

Pangeo used ORCID as the authentication, on the assumption that most users would be active researchers who already have an ORCID. It's another option to consider. But at least some of our users may be students who haven't yet published and so haven't yet signed up for an ORCID.

@clyne
Copy link
Contributor

clyne commented Feb 10, 2022

Is there a compelling use case for users having their own persistent space in the context of Pythia as a training resource? I'm out of my league as well, but to me this seems like it could be a pretty big lift with on-going maintenance required. It would be good to know that it would be useful for our target audience.

@brian-rose
Copy link
Member

Is there a compelling use case for users having their own persistent space in the context of Pythia as a training resource? I'm out of my league as well, but to me this seems like it could be a pretty big lift with on-going maintenance required. It would be good to know that it would be useful for our target audience.

I can imagine this being useful in the context of multi-day workshops or training sessions, where users may be working systematically through examples in Foundations along with some workshop-specific content.

Here I'm drawing on my own experience serving Jupyter notebooks in the classroom. I distribute an entire semester of notes and assignments through a Jupyter book. I lead students through lecture notes in class, but they customize things as they go with their own explanations and their own solutions to in-line code exercises. Being able to persist that personal copy of the notes from one class to the next is an essential part of the workflow.

@clyne
Copy link
Contributor

clyne commented Feb 10, 2022

@brian-rose that use case makes complete sense to me. Thanks. I still wonder about the effort to implement and maintain persistent cloud storage in this framework, but this area is definitely not my jam :-) Someone else will have to do the cost-benefit analysis.

@brian-rose
Copy link
Member

Yes, it certainly doesn't make sense for us to promise open-ended persistent storage for all comers.

I wonder if we should be thinking more along the lines of developing a "proof of concept" for a Pythia platform, with the idea that actual resources (compute, storage) making use of the platform could be furnished to groups (e.g. people wanting to host a workshop focussed on specific datasets) on a fee-for-service basis. That's basically what 2i2c exists to do, if I understand correctly.

Basically the Platform shouldn't be one specific BinderHub instance, but instead a portable product that can be deployed where it's needed.

@clyne
Copy link
Contributor

clyne commented Feb 11, 2022

This sounds more in line with what we promised and what I think we might reasonably deliver. However, we were pretty vague about exactly what we would deliver, and probably have more latitude here than with the portal. That's my 2 cents worth, but people far more knowledgable about this stuff should probably weigh in :-)

@kmpaul
Copy link
Collaborator Author

kmpaul commented Feb 15, 2022

Basically the Platform shouldn't be one specific BinderHub instance, but instead a portable product that can be deployed where it's needed.

@brian-rose: That's basically what 2i2c and Pangeo has been trying to provide, though it's not like a "push-button-deploy" product. It's more like an instruction set for how to deploy this kind of platform on various cloud platforms.

In light of discussions happening in the Pangeo organization and here, I think that maybe we can leverage Pangeo's hubs for our use. Why not just partner with them?

@brian-rose
Copy link
Member

I am 100% +1 for partnering and avoiding duplication of effort.

@jukent jukent added needs triage This can be kept if the triager is unsure which next steps to take needs info If more info has been requested from the author, apply this label. labels Oct 5, 2022
@jukent
Copy link
Contributor

jukent commented Oct 5, 2022

Are we waiting on Pangeo for anything on this front? What can we do to make this partnership move forward?

@brian-rose
Copy link
Member

Closing as we've moved on from this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Infrastructure related issue needs info If more info has been requested from the author, apply this label. needs triage This can be kept if the triager is unsure which next steps to take
Projects
Status: Done
Development

No branches or pull requests

4 participants