Skip to content

🌱 Monorepo pt2: fully consolidate image pull/cache implementations #1731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

joelanford
Copy link
Member

@joelanford joelanford commented Feb 7, 2025

Description

This PR is a follow-up from #1690. It completely removes the source packages that existed in both catalogd and operator-controller, eliminates the Unpacker abstraction that is unused and leftover from rukpak's support for multiple remote fetch protocols, adds new common abstractions for an image puller and an image cache, adds a new containers/image puller implementation, and adds a new disk-based cache implementation.

This is still a work-in-progress because the unit tests need to be consolidated and de-duplicated. Because the tests were written with the unpacker interface in mind, they also couple the pull and cache concepts together, so more work can be done to tease those tests apart and make them more appropriate for the new interfaces and implementations.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 7, 2025
Copy link

netlify bot commented Feb 7, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit aea6012
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/67ae3cf8844a8d0008df260d
😎 Deploy Preview https://deploy-preview-1731--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

if err != nil {
unpackErr := fmt.Errorf("source catalog content: %w", err)
updateStatusProgressing(&catalog.Status, catalog.GetGeneration(), unpackErr)
return ctrl.Result{}, unpackErr
}

switch unpackResult.State {
case source.StateUnpacked:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source.StateUnpacked was also a rukpak leftover that we left behind. In rukpak, we had async image pulling based on running a pod. Now, we have synchronous image pulling in-process, so we know that if we've gotten this far we have an image unpacked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we are moving from ``r.Unpacker.UnpacktoImagePuller.Pull`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 75.38071% with 97 lines in your changes missing coverage. Please review.

Project coverage is 68.29%. Comparing base (43dfc65) to head (aea6012).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/shared/util/image/pull.go 66.29% 42 Missing and 18 partials ⚠️
catalogd/cmd/catalogd/main.go 0.00% 11 Missing ⚠️
internal/shared/util/image/cache.go 89.81% 9 Missing and 2 partials ⚠️
internal/shared/util/image/mocks.go 46.15% 7 Missing ⚠️
...logd/controllers/core/clustercatalog_controller.go 89.47% 3 Missing and 1 partial ⚠️
internal/shared/util/image/filters.go 90.62% 2 Missing and 1 partial ⚠️
internal/shared/util/error/terminal.go 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1731      +/-   ##
==========================================
+ Coverage   67.45%   68.29%   +0.84%     
==========================================
  Files          61       63       +2     
  Lines        5245     5116     -129     
==========================================
- Hits         3538     3494      -44     
+ Misses       1446     1392      -54     
+ Partials      261      230      -31     
Flag Coverage Δ
e2e 51.71% <47.53%> (-0.44%) ⬇️
unit 55.92% <71.31%> (+0.91%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -927,7 +920,7 @@ func TestPollingReconcilerUnpack(t *testing.T) {
},
Status: successfulUnpackStatus(),
},
storedCatalogData: successfulStoredCatalogData(metav1.Now()),
storedCatalogData: successfulStoredCatalogData(time.Now()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metav1.Now() to time.Now() changes are there because our image handling libraries should not be coupled to kubernetes-isms. I moved conversion to metav1.Time into the reconciler code, iirc.

cl, reconciler := newClientAndReconciler(t)
reconciler.Unpacker = &MockUnpacker{
result: &source.Result{
State: "unexpected",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the rukpak source.Result struct is gone and there is no concept of unpack state anymore, this entire test is no longer relevant.

Comment on lines 26 to 54
func CatalogCache(basePath string) Cache {
return &diskCache{
basePath: basePath,
filterFunc: func(ctx context.Context, srcRef reference.Named, image ocispecv1.Image) (archive.Filter, error) {
_, specIsCanonical := srcRef.(reference.Canonical)
dirToUnpack, ok := image.Config.Labels[ConfigDirLabel]
if !ok {
// If the spec is a tagged ref, retries could end up resolving a new digest, where the label
// might show up. If the spec is canonical, no amount of retries will make the label appear.
// Therefore, we treat the error as terminal if the reference from the spec is canonical.
return nil, errorutil.WrapTerminal(fmt.Errorf("catalog image is missing the required label %q", ConfigDirLabel), specIsCanonical)
}

return allFilters(
onlyPath(dirToUnpack),
forceOwnershipRWX(),
), nil
},
}
}

func BundleCache(basePath string) Cache {
return &diskCache{
basePath: basePath,
filterFunc: func(_ context.Context, _ reference.Named, _ ocispecv1.Image) (archive.Filter, error) {
return forceOwnershipRWX(), nil
},
}
}
Copy link
Member Author

@joelanford joelanford Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These filter funcs are literally the only difference between catalog and bundle image unpacking. 😄

return filepath.Join(a.idPath(id), digest.String())
}

func (a *diskCache) Store(ctx context.Context, id string, srcRef reference.Named, canonicalRef reference.Canonical, imgCfg ocispecv1.Image, layers iter.Seq[LayerData]) (fs.FS, time.Time, error) {
Copy link
Member Author

@joelanford joelanford Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First use of a go1.23 iterator!

I implemented this interface/function this way in order to eliminate the cache's knowledge of the puller's internals. The puller provides the layers iterator to the cache. And the cache simply needs to iterate the layers.

Copy link
Member Author

@joelanford joelanford Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not actually new. It existed before as layers.go, but I moved a bunch of it into the disk cache file. Only change in this PR was to unexport the filters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file still needs to be consolidated with pull_catalog_test.go. We can likely re-write all the tests now that the abstractions are better separated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially a move/update of the containers_image_test.go from operator-controller's original rukpak/source package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially a move/update of the containers_image_test.go from catalogd's original rukpak/source package. Thankfully, it looks like it didn't change enough for GH to decide it was a different file.

@joelanford joelanford force-pushed the consolidate-layer-handling-pt2 branch from 1909622 to 19d10a0 Compare February 7, 2025 20:34
@joelanford joelanford changed the title Monorepo pt2: fully consolidate image pull/cache implementations 🌱 Monorepo pt2: fully consolidate image pull/cache implementations Feb 7, 2025
@joelanford joelanford marked this pull request as ready for review February 7, 2025 23:01
@joelanford joelanford requested a review from a team as a code owner February 7, 2025 23:01
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 7, 2025
@joelanford joelanford force-pushed the consolidate-layer-handling-pt2 branch from fb9cc60 to 1d61874 Compare February 8, 2025 01:19
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2025
@joelanford joelanford force-pushed the consolidate-layer-handling-pt2 branch 2 times, most recently from e1021fb to 5da55b0 Compare February 11, 2025 20:31
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2025
@joelanford joelanford force-pushed the consolidate-layer-handling-pt2 branch 2 times, most recently from bc48637 to d9682c0 Compare February 12, 2025 16:15
"github.com/operator-framework/operator-controller/internal/catalogd/storage"
imageutil "github.com/operator-framework/operator-controller/internal/shared/util/image"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 What we use from source is now in image

@@ -272,7 +286,7 @@ func (r *ClusterCatalogReconciler) getCurrentState(catalog *catalogdv1.ClusterCa
// Set expected status based on what we see in the stored catalog
clearUnknownConditions(expectedStatus)
if hasStoredCatalog && r.Storage.ContentExists(catalog.Name) {
updateStatusServing(expectedStatus, storedCatalog.unpackResult, r.Storage.BaseURL(catalog.Name), storedCatalog.observedGeneration)
updateStatusServing(expectedStatus, storedCatalog.ref, storedCatalog.lastUnpack, r.Storage.BaseURL(catalog.Name), storedCatalog.observedGeneration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we covered all places where have before unpackResult https://github.com/search?q=repo%3Aoperator-framework%2Foperator-controller+unpackResult&type=code 👍

camilamacedo86
camilamacedo86 previously approved these changes Feb 12, 2025
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 12, 2025
@LalatenduMohanty
Copy link
Member

/hold adding a hold for couple of hours to finish reviewing the PR. The hold is there to make sure it does not get merged in between.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2025
Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks fine.

return ctrl.Result{}, err
}
if catalog.Spec.Source.Image == nil {
err := reconcile.TerminalError(fmt.Errorf("error parsing catalog, catalog %s has a nil image source", catalog.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := reconcile.TerminalError(fmt.Errorf("error parsing catalog, catalog %s has a nil image source", catalog.Name))
err := reconcile.TerminalError(fmt.Errorf("error parsing ClusterCatalog %s, image source is nil", catalog.Name))

"catalog" is repeated, and you probably want to indicate the actual resource type.

Comment on lines -253 to -259
bundleSource := &rukpaksource.BundleSource{
Name: ext.GetName(),
Type: rukpaksource.SourceTypeImage,
Image: &rukpaksource.ImageSource{
Ref: resolvedBundle.Image,
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for getting rid of this.

Comment on lines +56 to +61
if err != nil {
return false, err
}
if !keep {
return false, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
return false, err
}
if !keep {
return false, nil
}
if err != nil || !keep {
return false, err
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can personally go either way on this. The condensed version, IMO, is ever so slightly harder to reason about because I would assume that most go developers will have their eye drawn primarily to err != nil and return false, err and initialize their memory model with "this code path returns an error", only then to see !keep and have to reason their way through to understand that err may or may not be nil.

I think I'll leave as is for clarity unless there is broad consensus to condense.

// copy.Image can concurrently pull all the layers.
//
//////////////////////////////////////////////////////
layoutDir, err := os.MkdirTemp("", "oci-layout-*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, rather than using the bundle or catalog name, you're using something random, which presumably will behave better if you do things multiple times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was random before as well (the * just lets the caller be explicit about where the random name piece should go). By default the randomness of the name is appended, so me adding this * here is essentailly a no-op.

I notice that we used to include what is now the ownerID in the name. I'll add that back in.

}); err != nil {
return nil, nil, time.Time{}, fmt.Errorf("error copying image: %w", err)
}
l.Info("pulled image")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this was missing the image ref, but it's done way up on lines 51/52.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a double-edged sword probably. We can:

  1. Incrementally add k/v pairs as the context fills in, meaning later Info and Error calls don't have to keep repeating that context over and over (and potentially differently)
  2. Leave the incremental k/v pairs out, and repeat them every time (potentially forgetting important k/v pairs in certain places).

(1) has the problem that we may forget about the logger already being setup with k/v pairs, which might lead us to duplicating the same information with a slightly different key. (e.g. ref and reference both showing up in a log line with the same value)

I'm almost convincing myself that the tests should include tests of the structured fields of the logger, but maybe we can add that later if we discover that we aren't being consistent with our logging.

Comment on lines +160 to +170
fsys, modTime, err = p.applyImage(ctx, ownerID, dockerRef, canonicalRef, layoutImgRef, cache, srcCtx)
if err != nil {
return nil, nil, time.Time{}, fmt.Errorf("error applying image: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like another stage, where a large comment might be useful; the old code indicated:

        //////////////////////////////////////////////////////
        //
        // Mount the image we just pulled
        //
        //////////////////////////////////////////////////////

@@ -273,14 +274,16 @@ func main() {
os.Exit(1)
}

unpackCacheBasePath := filepath.Join(cacheDir, source.UnpackCacheDir)
unpackCacheBasePath := filepath.Join(cacheDir, "unpack")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd. we are moving from using a variable to a hard coded string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place that constant was used. Nothing in the source package used it, nothing in the new imageutil package uses it.

err := reconcile.TerminalError(fmt.Errorf("unknown source type %q", catalog.Spec.Source.Type))
updateStatusProgressing(&catalog.Status, catalog.GetGeneration(), err)
return ctrl.Result{}, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we never checked the catalog.Spec.Source.Type or did we move this from other place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That used to be part of the source.ContainersImage unpacker code:

if catalog.Spec.Source.Type != catalogdv1.SourceTypeImage {
panic(fmt.Sprintf("programmer error: source type %q is unable to handle specified catalog source type %q", catalogdv1.SourceTypeImage, catalog.Spec.Source.Type))
}
if catalog.Spec.Source.Image == nil {
return nil, reconcile.TerminalError(fmt.Errorf("error parsing catalog, catalog %s has a nil image source", catalog.Name))
}

unpacker := &source.ContainersImageRegistry{
BaseCachePath: filepath.Join(cachePath, "unpack"),
SourceContextFunc: func(logger logr.Logger) (*types.SystemContext, error) {
imageCache := imageutil.BundleCache(filepath.Join(cachePath, "unpack"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using unpackCacheBasePath := filepath.Join(cacheDir, "unpack") also in catalogd main.go . we should use a constant or variable for unpack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These unpack paths are decided/defined/used purely in the context of each main.go. It's just a coincidence/convention that they are the same. But they don't have to be.

@LalatenduMohanty
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2025
@joelanford joelanford force-pushed the consolidate-layer-handling-pt2 branch from d9682c0 to aea6012 Compare February 13, 2025 18:41
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2025
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2025
@LalatenduMohanty
Copy link
Member

/test e2e-kind

Copy link

openshift-ci bot commented Feb 13, 2025

@LalatenduMohanty: No presubmit jobs available for operator-framework/operator-controller@main

In response to this:

/test e2e-kind

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@LalatenduMohanty LalatenduMohanty added this pull request to the merge queue Feb 13, 2025
Merged via the queue into operator-framework:main with commit ee8d821 Feb 13, 2025
23 checks passed
@joelanford joelanford deleted the consolidate-layer-handling-pt2 branch February 14, 2025 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants