Skip to content

Added option for st2packs-PersistentVolumes to allow for seamless pack updates #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

moonrail
Copy link
Contributor

@moonrail moonrail commented Nov 20, 2020

Related to the #118 implementation discussion

Hello altogether

we have the requirement to frequently change st2packs via various pipelines and therefore found the default solution via initContainers/Sidecar not fitting, as actionrunner, sensor & api-containers were restarted on update.

Restarts are undesirable as Executions/Workflows will be "Abandoned", so we've searched for a k8s-native solution.

This Pull Request adds the option to use two shared PersistentVolumes (one for Packs, one for VirtualEnvironments) instead of initContainers per component.

Container job-st2-register-content is the only Container allowed to RW onto these PVs and will register content as usual after each upgrade/install via initContainers.

We've successfully updated packs and virtual environments while running Executions/Workflows and had no negative impact with this solution.

As not everybody can use PersistentVolumes and this should be held compatible to existing installations, the default pack-handling via initContainers is untouched.

Thanks to @angrydeveloper in #118 for the idea to deduplicate initContainer & Volume.

Tested with 3.3dev on kubernetes 1.17, 1.18 & 1.19.

Please let me know, if there is something to improve. :)

@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Nov 20, 2020
@moonrail
Copy link
Contributor Author

Hmm, as packs are on a PersistentVolume, no packs (or their contents) will be removed, even if packs are not present in st2packs-container anymore.
This should also be an issue in @angrydeveloper's NFS-solution in #118, unless I'm missing something.

So e.g.:

  • first the user finds a pack "some_pack" in Version 1.0.0 and wants to use it
    let's assume, the pack contains an action "do_something"
  • the user changes Dockerfile-Args to include this pack in this version
  • it will be deployed and is usable

Case 1:

  • the user wants to remove the pack "some_pack", so the user changes Dockerfile-Args
  • new image will be build and deployed
  • the commands cp -aR ... in initContainers will copy data from new container onto the existing PersistentVolume and its packs
  • the pack "some_pack" is still present and is still usable in StackStorm

Case 2:

  • pack "some_pack" released a new Version 2.0.0, where the action "do_something" was removed
  • the user wants to upgrade the pack "some_pack" to 2.0.0, so the user changes Dockerfile-Args
  • new image will be build and deployed
  • the commands cp -aR ... in initContainers will copy data from new container onto the existing PersistentVolume and its packs
  • the pack "some_pack" is now present in Version 2.0.0, but in a dirty state, as old action "do_something" was not cleaned up

I've had a look on current st2actionrunner-image and found rsync to be installed.
We could change cp -aR ... to rsync -a --delete ..., so files would be cleaned up properly, without removing any important files currently in use for running executions (we assume, the user knows about any removed action and has no executions using it at this point).

Would this be acceptable as a solution?

@arm4b
Copy link
Member

arm4b commented Nov 25, 2020

Sure, rsync sounds good to me.

@moonrail
Copy link
Contributor Author

@armab
I cannot see what failed in circleci, could you have a look?

I have a question regarding your desired size of pull requests:
We have other changes, that are loosely based on the outcome of this PR.
Not logically, just syntactically, as we've deduplicated some code here to _helpers.tpl and our changes do change exactly .
Would it be better, if we'd integrate these additions in here, oder do a seperate PR with the current master-Code, that would not be compatible to this PR?

In detail:
We found st2packs-Containers to be pretty large (over 120MB for each VirtualEnvironment).
To be able to build, upload and download st2packs-Containers frequently and in reasonable time, we'd imagine splitting up packs into multiple st2packs-images would be benefitial for larger deployments.
This way one could easily build another image for a pack and deploy it quickly, instead of dealing with >2GB and more images, that also can lead to Helm timing out.

We'd like to add the option, defaulting to current behaviour, of providing a list of st2packs-images to pull.
So simply "st2.packs.images" containing a list of items, that have attributes like "st2.packs.image".
If you'd like, we could of course name it "st2.packs.extraImages" (or something like this) and append these into a list with "st2.packs.image".

@arm4b
Copy link
Member

arm4b commented Nov 25, 2020

@moonrail Might be temporary issue as I restarted the e2e build and it's ✔️ now.

Good that you asked!
Initially I need to dedicate more time to try it myself to understand the experience, pros/cons. Then I'll be looking for feedback from the community to try it and know their point of view too.

Based on that data, tradeoffs, overall experience, feedback and discussions we might collectively choose with community the path forward.

For instance, we definitely don't want to maintain many conflicting options for distributing st2 pack content: immutable containerized packs as is, containerized packs with persistent volumes, NFS-like approach.
We saw a lot of confusion in Community about building the packs based on the Docker images and so #118 might be a path forward? Or maybe we'll replace existing docker-packs approach with the solution you're providing as it has some serious benefits.

Having said that, this all may take time to evaluate. Hope that gives an idea.
Additionally, having your opinions, tradeoffs highlighted between the 3 options and operational experience would be helpful too!

name: st2-venv-pvol
spec:
accessModes:
- ReadWriteMany
Copy link
Member

@arm4b arm4b Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so that looks like the main point behind the implementation.
ReadWriteMany (https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) means that some kind of network-like filesystem is still required for this to work. Does it mean it's a new possible point of failure that may affect overall availability formula?

I guess it's a hybrid between the current immutable containerized pack content approach vs mutable/dynamic way demonstrated in #118 when packs are installed in-place.

Copy link
Member

@arm4b arm4b Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering it's ReadWriteMany volume shared between all the Pods that need it, do you think live pack install may work in such environment as well?

I mean getting "hybrid" best from the 2 approaches when user can have all the options and satisfy all the needs:

  1. Installing the packs in-place with st2 pack install and from the UI (easiest & best experience for the majority of users - can solve a lot of community pain points as a default solution)
  2. Optionally relying on pre-built docker images for packs (best for deployment, repeatability, versioning, availability?)
  3. Can do 1 or 2 or both.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly - this approach requires a storageClass supporting ReadWriteMany - this should be added to README.md as well.

Hmmm, you're right - why going the container-direction on packs, that hinders from using API & UI-Pack-Handling?
Well, its the k8s way, the containerized way.
You enforce IaaC, that users strictly define their packs, versions, e.g. via code.
If you allow changes on the fly, via API/UI, there would be a "split-brain" between your code and actual installed & configured packs.

Our PR here is only aimed to improve resilience and availability by not modifying and re-creating System-Important-Containers for Packs. This leads to interrupts in the Service.

Concerning your described Approaches:
1.
Would lead to inconsistencies, but would allow to use API & UI to install/update/remove Packs.

Slower approach for sure, but more resilient as long as users are hindered from using API & UI to modify/add Packs-Contents.
Scales better for (internal) Self-Service-Deployments with several Stakeholders/Departments (Our case).

Would be somewhat OK, as long as one only handles Custom Packs (those not on Exchange) via Containers, everything else via API & UI.
I do not see how StackStorm could properly distinct between "containerized" Packs and "modifyiable" Packs for API & UI.
So this would lead to inconsistencies.

Our & My opinion:
I think there is no right or wrong answer to it.

For me personally I was confused when I saw how StackStorm handles Packs (Enterprise-UI and API), in combination with Git-Packs.
The user is able to modify Actions (Workflows) and Rules via API & UI for Git-Packs, but those will be discarded the moment, the user updates the Pack from Git.
I do see problems arising from this, so I've already planned, how I could limit Users via RBAC to not handle any resources that reside in Packs.

I would stick to building containers and would like to be able to configure an option in StackStorm itself for hindering/locking API & UI-Pack-(Un-)Installations, or Modifications to Actions, Rules altogether.
This way Packs would always be consistent with external Versions and Users could be shown a proper error message. Right now the error messages are not optimal due to simply setting Filesystem to ReadOnly :)
So either Approach 1 or 2, but not both and certainly not FreeForAll via Option 3, from my point of view.

@moonrail
Copy link
Contributor Author

@armab
As mentioned, we think having the option to use multiple independent st2packs-containers is beneficial.
Therefore I've added this change to this PR as well.

This is currently a breaking change, as st2.packs.image is renamed to st2.packs.images and is now a list of images.

What do you think of this?

@moonrail
Copy link
Contributor Author

Reverted from rsync to cp command, as st2packs-runtime is based on alpine and is not shipped with it.
rsync would have to be installed here first:
https://github.com/StackStorm/st2packs-dockerfiles/blob/master/st2packs-runtime/Dockerfile

@arm4b
Copy link
Member

arm4b commented Nov 30, 2020

@moonrail Yeah, right. Having the packs: [] breaking change as an isolated PR would definitely work better!

The situation is already pretty difficult here in stackstorm-ha as we thought reducing overall complexity by switching to NFS-only approach (also used by the bigger adopters) and deprecating containerized packs as community is confused by that :) But I see the point in the other way and every pack having its own CI/CD pipeline resulting in a docker image makes perfect sense to me.

@moonrail
Copy link
Contributor Author

moonrail commented Dec 1, 2020

@armab
So, current plan is to have three options?

  1. direct use of (external) NFS via Volume (see Allow mounting of NFS volumes instead of using st2packs for pack management #118)
  • How is this filled initially? Via st2actionrunner´s base packs & cp -aR?
  • Any additional packs would be handled via API/Client/UI?
  • In Allow mounting of NFS volumes instead of using st2packs for pack management #118 it seems to currently rely on Packs-Container, as Volumes are readOnly, so here you think of removing readOnly and Packs-Container-Support?
  • Components will not be affected by Pack-Handling
  1. use of ReadWriteMany PersistentVolume (e.g. via NFS) and multiple Packs-Containers
  • Packs will be in Containers, that fill a PersistentVolume (that will be removed, when cluster is removed)
  • API/Client/UI-Packs-Installation is prohibited
  • Components will not be affected by Pack-Handling
  1. as 2, but without PersistentVolume
  • Components will be restarted on changing Packs-Containers

I do not know how to modify this PR to fit your plans, please give me some hints. :)

@arm4b
Copy link
Member

arm4b commented Dec 14, 2020

@moonrail It's best to keep this PR as is and so I could rely on #118 and #160 conflicting stories in future research and try to find some middle ground between them.

FYI your other changes were released in v0.40.0 chart version: https://github.com/StackStorm/stackstorm-ha/releases/tag/v0.40.0. Thanks for contributing those enhancements!

If you're interested to include packs: [] support via multiple Docker Images, - please do that in a dedicated PR.
I don't see any blockers there and I could merge that feature with no problem as a next step.

@arm4b arm4b added the status:on hold Feature, design or idea with undecided status to include in the project functionality label Dec 14, 2020
@moonrail
Copy link
Contributor Author

@armab
I've reverted this PR to introduce only PV-Handling for packs.
I cannot see why linting is failing, though - on my end helm lints just fine.

@ghost
Copy link

ghost commented Dec 23, 2020

Edit: I had a 5 second look, I will have a better look tonight and see if I can figure it out.

@moonrail I think its erroring at chart.yaml I would say where the api version has been set to v2.. I think this has been mixed with PR #163 where I updated the charts to Helm v3

Circle CI log shows
==> Linting .
[ERROR] templates/: error unpacking deprecation-0.2.2.tgz in stackstorm-ha: apiVersion 'v2' is not valid. The value must be "v1"

Error: 1 chart(s) linted, 1 chart(s) failed

Exited with code exit status 1
CircleCI received exit code 1

@moonrail
Copy link
Contributor Author

@Christopher-Russell
Aah, makes sense.
So we can just wait until your PR is merged, rebase ontop of it and try again. :)

@chris3081
Copy link

@moonrail I think a rebase against the current master branch might have this one sorted.

@manisha-tanwar
Copy link
Contributor

Hi @moonrail Is it possible for you to use StatefulSet with persistentVolumeTemplate instead?
Using what you've suggested deletes pvc when chart is deleted.
I tried using

annotations:
helm.sh/resource-policy: keep

& it works fine until we reinstall the chart which fails with:

release stackstorm failed: persistentvolumeclaims "st2-packs-pvol-stackstorm" already exists

Or is there any way we can resuse PV with dynamic provisioning.

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier today, I submitted #199. I'd seen #118, but I missed this one when I was researching the earlier conversations about how to change the packs volumes.

It looks like this PR and #199 are somewhat similar. But, in #199, I opted for treating the volume definition as an opaque value, leaving the admin to manage whatever storage solution they are using in their cluster (persistentVolumeClaims, NFS, rook-ceph, or any of the other many storage solutions available now).

Maybe there's a way we could merge the two approaches so that this chart can still create persistentVolumeClaims like this one does, if an admin requests that?

Also, it looks like some of these changes in this PR have already been merged, like the packs-initContainers and packs-volumes templates. So, the next step might be to rebase this on master to trim it down to the core persistentVolumeClaims bits?

@cognifloyd
Copy link
Member

@moonrail could you review #199? Is there anything missing from #199 that you included in this PR, or that you think would be better a different way?
If #199 is satisfactory for you, then I think we could close this PR. What do you think?

@moonrail
Copy link
Contributor Author

@cognifloyd
Unable to test or evaluate right now or in the next weeks, but I've looked through your PR and the documentation in it and cannot find anything missing.
Props for the good docs incl. examples.
Looks good from my point of view, so I'll close this PR here, hoping yours will go forward. :)

@moonrail moonrail closed this Jul 22, 2021
@cognifloyd
Copy link
Member

Thanks for taking a look! I hope it goes forward too :)

@chris3081
Copy link

I will hopefully have time to test next week against AKS, I'll report back once done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature size/L PR that changes 100-499 lines. Requires some effort to review. status:feedback needed status:on hold Feature, design or idea with undecided status to include in the project functionality status:under discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants