Added option for st2packs-PersistentVolumes to allow for seamless pack updates #160

moonrail · 2020-11-20T12:54:34Z

Related to the #118 implementation discussion

Hello altogether

we have the requirement to frequently change st2packs via various pipelines and therefore found the default solution via initContainers/Sidecar not fitting, as actionrunner, sensor & api-containers were restarted on update.

Restarts are undesirable as Executions/Workflows will be "Abandoned", so we've searched for a k8s-native solution.

This Pull Request adds the option to use two shared PersistentVolumes (one for Packs, one for VirtualEnvironments) instead of initContainers per component.

Container job-st2-register-content is the only Container allowed to RW onto these PVs and will register content as usual after each upgrade/install via initContainers.

We've successfully updated packs and virtual environments while running Executions/Workflows and had no negative impact with this solution.

As not everybody can use PersistentVolumes and this should be held compatible to existing installations, the default pack-handling via initContainers is untouched.

Thanks to @angrydeveloper in #118 for the idea to deduplicate initContainer & Volume.

Tested with 3.3dev on kubernetes 1.17, 1.18 & 1.19.

Please let me know, if there is something to improve. :)

moonrail · 2020-11-25T12:19:26Z

Hmm, as packs are on a PersistentVolume, no packs (or their contents) will be removed, even if packs are not present in st2packs-container anymore.
This should also be an issue in @angrydeveloper's NFS-solution in #118, unless I'm missing something.

So e.g.:

first the user finds a pack "some_pack" in Version 1.0.0 and wants to use it
let's assume, the pack contains an action "do_something"
the user changes Dockerfile-Args to include this pack in this version
it will be deployed and is usable

Case 1:

the user wants to remove the pack "some_pack", so the user changes Dockerfile-Args
new image will be build and deployed
the commands cp -aR ... in initContainers will copy data from new container onto the existing PersistentVolume and its packs
the pack "some_pack" is still present and is still usable in StackStorm

Case 2:

pack "some_pack" released a new Version 2.0.0, where the action "do_something" was removed
the user wants to upgrade the pack "some_pack" to 2.0.0, so the user changes Dockerfile-Args
new image will be build and deployed
the commands cp -aR ... in initContainers will copy data from new container onto the existing PersistentVolume and its packs
the pack "some_pack" is now present in Version 2.0.0, but in a dirty state, as old action "do_something" was not cleaned up

I've had a look on current st2actionrunner-image and found rsync to be installed.
We could change cp -aR ... to rsync -a --delete ..., so files would be cleaned up properly, without removing any important files currently in use for running executions (we assume, the user knows about any removed action and has no executions using it at this point).

Would this be acceptable as a solution?

arm4b · 2020-11-25T12:54:21Z

Sure, rsync sounds good to me.

moonrail · 2020-11-25T17:06:49Z

@armab
I cannot see what failed in circleci, could you have a look?

I have a question regarding your desired size of pull requests:
We have other changes, that are loosely based on the outcome of this PR.
Not logically, just syntactically, as we've deduplicated some code here to _helpers.tpl and our changes do change exactly .
Would it be better, if we'd integrate these additions in here, oder do a seperate PR with the current master-Code, that would not be compatible to this PR?

In detail:
We found st2packs-Containers to be pretty large (over 120MB for each VirtualEnvironment).
To be able to build, upload and download st2packs-Containers frequently and in reasonable time, we'd imagine splitting up packs into multiple st2packs-images would be benefitial for larger deployments.
This way one could easily build another image for a pack and deploy it quickly, instead of dealing with >2GB and more images, that also can lead to Helm timing out.

We'd like to add the option, defaulting to current behaviour, of providing a list of st2packs-images to pull.
So simply "st2.packs.images" containing a list of items, that have attributes like "st2.packs.image".
If you'd like, we could of course name it "st2.packs.extraImages" (or something like this) and append these into a list with "st2.packs.image".

arm4b · 2020-11-25T17:47:20Z

@moonrail Might be temporary issue as I restarted the e2e build and it's ✔️ now.

Good that you asked!
Initially I need to dedicate more time to try it myself to understand the experience, pros/cons. Then I'll be looking for feedback from the community to try it and know their point of view too.

Based on that data, tradeoffs, overall experience, feedback and discussions we might collectively choose with community the path forward.

For instance, we definitely don't want to maintain many conflicting options for distributing st2 pack content: immutable containerized packs as is, containerized packs with persistent volumes, NFS-like approach.
We saw a lot of confusion in Community about building the packs based on the Docker images and so #118 might be a path forward? Or maybe we'll replace existing docker-packs approach with the solution you're providing as it has some serious benefits.

Having said that, this all may take time to evaluate. Hope that gives an idea.
Additionally, having your opinions, tradeoffs highlighted between the 3 options and operational experience would be helpful too!

arm4b · 2020-11-25T18:01:53Z

templates/persistentvolumeclaims.yaml

+  name: st2-venv-pvol
+spec:
+  accessModes:
+    - ReadWriteMany


Ah, so that looks like the main point behind the implementation.
ReadWriteMany (https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) means that some kind of network-like filesystem is still required for this to work. Does it mean it's a new possible point of failure that may affect overall availability formula?

I guess it's a hybrid between the current immutable containerized pack content approach vs mutable/dynamic way demonstrated in #118 when packs are installed in-place.

Considering it's ReadWriteMany volume shared between all the Pods that need it, do you think live pack install may work in such environment as well?

I mean getting "hybrid" best from the 2 approaches when user can have all the options and satisfy all the needs:

Installing the packs in-place with st2 pack install and from the UI (easiest & best experience for the majority of users - can solve a lot of community pain points as a default solution)

Optionally relying on pre-built docker images for packs (best for deployment, repeatability, versioning, availability?)

Can do 1 or 2 or both.

WDYT?

Exactly - this approach requires a storageClass supporting ReadWriteMany - this should be added to README.md as well.

Hmmm, you're right - why going the container-direction on packs, that hinders from using API & UI-Pack-Handling?
Well, its the k8s way, the containerized way.
You enforce IaaC, that users strictly define their packs, versions, e.g. via code.
If you allow changes on the fly, via API/UI, there would be a "split-brain" between your code and actual installed & configured packs.

Our PR here is only aimed to improve resilience and availability by not modifying and re-creating System-Important-Containers for Packs. This leads to interrupts in the Service.

Concerning your described Approaches:
1.
Would lead to inconsistencies, but would allow to use API & UI to install/update/remove Packs.

Slower approach for sure, but more resilient as long as users are hindered from using API & UI to modify/add Packs-Contents.
Scales better for (internal) Self-Service-Deployments with several Stakeholders/Departments (Our case).

Would be somewhat OK, as long as one only handles Custom Packs (those not on Exchange) via Containers, everything else via API & UI.
I do not see how StackStorm could properly distinct between "containerized" Packs and "modifyiable" Packs for API & UI.
So this would lead to inconsistencies.

Our & My opinion:
I think there is no right or wrong answer to it.

For me personally I was confused when I saw how StackStorm handles Packs (Enterprise-UI and API), in combination with Git-Packs.
The user is able to modify Actions (Workflows) and Rules via API & UI for Git-Packs, but those will be discarded the moment, the user updates the Pack from Git.
I do see problems arising from this, so I've already planned, how I could limit Users via RBAC to not handle any resources that reside in Packs.

I would stick to building containers and would like to be able to configure an option in StackStorm itself for hindering/locking API & UI-Pack-(Un-)Installations, or Modifications to Actions, Rules altogether.
This way Packs would always be consistent with external Versions and Users could be shown a proper error message. Right now the error messages are not optimal due to simply setting Filesystem to ReadOnly :)
So either Approach 1 or 2, but not both and certainly not FreeForAll via Option 3, from my point of view.

moonrail · 2020-11-26T22:48:37Z

@armab
As mentioned, we think having the option to use multiple independent st2packs-containers is beneficial.
Therefore I've added this change to this PR as well.

This is currently a breaking change, as st2.packs.image is renamed to st2.packs.images and is now a list of images.

What do you think of this?

moonrail · 2020-11-27T17:18:35Z

Reverted from rsync to cp command, as st2packs-runtime is based on alpine and is not shipped with it.
rsync would have to be installed here first:
https://github.com/StackStorm/st2packs-dockerfiles/blob/master/st2packs-runtime/Dockerfile

arm4b · 2020-11-30T21:53:12Z

@moonrail Yeah, right. Having the packs: [] breaking change as an isolated PR would definitely work better!

The situation is already pretty difficult here in stackstorm-ha as we thought reducing overall complexity by switching to NFS-only approach (also used by the bigger adopters) and deprecating containerized packs as community is confused by that :) But I see the point in the other way and every pack having its own CI/CD pipeline resulting in a docker image makes perfect sense to me.

moonrail · 2020-12-01T11:39:22Z

@armab
So, current plan is to have three options?

direct use of (external) NFS via Volume (see Allow mounting of NFS volumes instead of using st2packs for pack management #118)

How is this filled initially? Via st2actionrunner´s base packs & cp -aR?
Any additional packs would be handled via API/Client/UI?
In Allow mounting of NFS volumes instead of using st2packs for pack management #118 it seems to currently rely on Packs-Container, as Volumes are readOnly, so here you think of removing readOnly and Packs-Container-Support?
Components will not be affected by Pack-Handling

use of ReadWriteMany PersistentVolume (e.g. via NFS) and multiple Packs-Containers

Packs will be in Containers, that fill a PersistentVolume (that will be removed, when cluster is removed)
API/Client/UI-Packs-Installation is prohibited
Components will not be affected by Pack-Handling

as 2, but without PersistentVolume

Components will be restarted on changing Packs-Containers

I do not know how to modify this PR to fit your plans, please give me some hints. :)

arm4b · 2020-12-14T15:37:16Z

@moonrail It's best to keep this PR as is and so I could rely on #118 and #160 conflicting stories in future research and try to find some middle ground between them.

FYI your other changes were released in v0.40.0 chart version: https://github.com/StackStorm/stackstorm-ha/releases/tag/v0.40.0. Thanks for contributing those enhancements!

If you're interested to include packs: [] support via multiple Docker Images, - please do that in a dedicated PR.
I don't see any blockers there and I could merge that feature with no problem as a next step.

…tom pack-updates without job-impact

moonrail · 2020-12-21T11:00:18Z

@armab
I've reverted this PR to introduce only PV-Handling for packs.
I cannot see why linting is failing, though - on my end helm lints just fine.

ghost · 2020-12-23T00:38:55Z

Edit: I had a 5 second look, I will have a better look tonight and see if I can figure it out.

@moonrail I think its erroring at chart.yaml I would say where the api version has been set to v2.. I think this has been mixed with PR #163 where I updated the charts to Helm v3

Circle CI log shows
==> Linting .
[ERROR] templates/: error unpacking deprecation-0.2.2.tgz in stackstorm-ha: apiVersion 'v2' is not valid. The value must be "v1"

Error: 1 chart(s) linted, 1 chart(s) failed

Exited with code exit status 1
CircleCI received exit code 1

moonrail · 2020-12-23T11:03:22Z

@Christopher-Russell
Aah, makes sense.
So we can just wait until your PR is merged, rebase ontop of it and try again. :)

chris3081 · 2021-01-19T02:03:48Z

@moonrail I think a rebase against the current master branch might have this one sorted.

manisha-tanwar · 2021-02-16T17:24:08Z

Hi @moonrail Is it possible for you to use StatefulSet with persistentVolumeTemplate instead?
Using what you've suggested deletes pvc when chart is deleted.
I tried using

annotations:
helm.sh/resource-policy: keep

& it works fine until we reinstall the chart which fails with:

release stackstorm failed: persistentvolumeclaims "st2-packs-pvol-stackstorm" already exists

Or is there any way we can resuse PV with dynamic provisioning.

cognifloyd

Earlier today, I submitted #199. I'd seen #118, but I missed this one when I was researching the earlier conversations about how to change the packs volumes.

It looks like this PR and #199 are somewhat similar. But, in #199, I opted for treating the volume definition as an opaque value, leaving the admin to manage whatever storage solution they are using in their cluster (persistentVolumeClaims, NFS, rook-ceph, or any of the other many storage solutions available now).

Maybe there's a way we could merge the two approaches so that this chart can still create persistentVolumeClaims like this one does, if an admin requests that?

Also, it looks like some of these changes in this PR have already been merged, like the packs-initContainers and packs-volumes templates. So, the next step might be to rebase this on master to trim it down to the core persistentVolumeClaims bits?

cognifloyd · 2021-07-22T16:41:53Z

@moonrail could you review #199? Is there anything missing from #199 that you included in this PR, or that you think would be better a different way?
If #199 is satisfactory for you, then I think we could close this PR. What do you think?

moonrail · 2021-07-22T17:26:16Z

@cognifloyd
Unable to test or evaluate right now or in the next weeks, but I've looked through your PR and the documentation in it and cannot find anything missing.
Props for the good docs incl. examples.
Looks good from my point of view, so I'll close this PR here, hoping yours will go forward. :)

cognifloyd · 2021-07-22T17:29:17Z

Thanks for taking a look! I hope it goes forward too :)

chris3081 · 2021-07-22T23:14:42Z

I will hopefully have time to test next week against AKS, I'll report back once done

pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Nov 20, 2020

arm4b reviewed Nov 25, 2020

View reviewed changes

arm4b added feature status:feedback needed status:under discussion labels Nov 25, 2020

arm4b added the status:on hold Feature, design or idea with undecided status to include in the project functionality label Dec 14, 2020

Added option for st2packs-PersistentVolumes to allow for seamless cus…

03e1b78

…tom pack-updates without job-impact

moonrail mentioned this pull request Dec 23, 2020

Allow multiple packs images #166

Merged

cognifloyd reviewed Jun 18, 2021

View reviewed changes

cognifloyd mentioned this pull request Jun 18, 2021

Add custom packs volumes #199

Merged

moonrail closed this Jul 22, 2021

cognifloyd mentioned this pull request Sep 7, 2021

Problem with nfs deploy on job job-st2-register-content #238

Closed

cognifloyd mentioned this pull request Sep 9, 2021

Make packs/virtualenv file copy more robust (use rsync w/ fallback to cp) #245

Open

2 tasks

Uh oh!

Added option for st2packs-PersistentVolumes to allow for seamless pack updates #160

Added option for st2packs-PersistentVolumes to allow for seamless pack updates #160

Uh oh!

Conversation

moonrail commented Nov 20, 2020 • edited by arm4b Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moonrail commented Nov 25, 2020

Uh oh!

arm4b commented Nov 25, 2020

Uh oh!

moonrail commented Nov 25, 2020

Uh oh!

arm4b commented Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arm4b Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arm4b Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moonrail Nov 25, 2020

Choose a reason for hiding this comment

Uh oh!

moonrail commented Nov 26, 2020

Uh oh!

moonrail commented Nov 27, 2020

Uh oh!

arm4b commented Nov 30, 2020

Uh oh!

moonrail commented Dec 1, 2020

Uh oh!

arm4b commented Dec 14, 2020

Uh oh!

moonrail commented Dec 21, 2020

Uh oh!

ghost commented Dec 23, 2020 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moonrail commented Dec 23, 2020

Uh oh!

chris3081 commented Jan 19, 2021

Uh oh!

manisha-tanwar commented Feb 16, 2021

Uh oh!

cognifloyd left a comment

Choose a reason for hiding this comment

Uh oh!

cognifloyd commented Jul 22, 2021

Uh oh!

moonrail commented Jul 22, 2021

Uh oh!

cognifloyd commented Jul 22, 2021

Uh oh!

chris3081 commented Jul 22, 2021

Uh oh!

Uh oh!

moonrail commented Nov 20, 2020 •

edited by arm4b

Loading

arm4b commented Nov 25, 2020 •

edited

Loading

arm4b Nov 25, 2020 •

edited

Loading

arm4b Nov 25, 2020 •

edited

Loading

ghost commented Dec 23, 2020 •

edited by ghost

Loading