Skip to content

🌱 (catalogd) add unit tests for indexing algo for query endpoint #1702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions catalogd/internal/storage/index.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,20 +50,6 @@ func (s *section) UnmarshalJSON(b []byte) error {
return nil
}

func (i index) Size() int64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I realized it's not being used anywhere, so at least in the context of existing code, it's dead code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I was using that in my prototype to estimate how big the index would be in memory. No need for it anymore.

size := 0
for k, v := range i.BySchema {
size += len(k) + len(v)*16
}
for k, v := range i.ByPackage {
size += len(k) + len(v)*16
}
for k, v := range i.ByName {
size += len(k) + len(v)*16
}
return int64(size)
}

func (i index) Get(r io.ReaderAt, schema, packageName, name string) io.Reader {
sectionSet := i.getSectionSet(schema, packageName, name)

Expand Down
285 changes: 285 additions & 0 deletions catalogd/internal/storage/index_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
package storage

import (
"bytes"
"encoding/json"
"io"
"testing"

"github.com/stretchr/testify/require"

"github.com/operator-framework/operator-registry/alpha/declcfg"
)

func TestIndexCreation(t *testing.T) {
// Create test Meta objects
metas := []*declcfg.Meta{
{
Schema: "olm.package",
Package: "test",
Name: "test-package",
Blob: []byte(`{"test": "data1"}`),
},
{
Schema: "olm.bundle",
Package: "test",
Name: "test-bundle",
Blob: []byte(`{"test": "data2"}`),
},
}

// Create channel and feed Metas
metasChan := make(chan *declcfg.Meta, len(metas))
for _, meta := range metas {
metasChan <- meta
}
close(metasChan)

// Create index
idx := newIndex(metasChan)

// Verify schema index
require.Len(t, idx.BySchema, 2, "Expected 2 schema entries, got %d", len(idx.BySchema))
require.Len(t, idx.BySchema["olm.package"], 1, "Expected 1 olm.package entry, got %d", len(idx.BySchema["olm.package"]))
require.Len(t, idx.BySchema["olm.bundle"], 1, "Expected 1 olm.bundle entry, got %d", len(idx.BySchema["olm.bundle"]))

// Verify package index
require.Len(t, idx.ByPackage["test"], 2, "Expected 2 package entries, got %d", len(idx.ByPackage))

// Verify name index
require.Len(t, idx.ByName["test-package"], 1, "Expected 1 entry for name 'test-package', got %d", len(idx.ByName["test-package"]))
require.Len(t, idx.ByName["test-bundle"], 1, "Expected 1 entry for name 'test-bundle', got %d", len(idx.ByName["test-bundle"]))
}

func TestIndexGet(t *testing.T) {
// Test data structure that represents a catalog
metas := []*declcfg.Meta{
{
// Package definition
Schema: "olm.package",
Name: "test-package",
Blob: createBlob(t, map[string]interface{}{
"schema": "olm.package",
"name": "test-package",
"defaultChannel": "stable-v6.x",
}),
},
{
// First channel (stable-5.x)
Schema: "olm.channel",
Package: "test-package",
Name: "stable-5.x",
Blob: createBlob(t, map[string]interface{}{
"schema": "olm.channel",
"name": "stable-5.x",
"package": "test-package",
"entries": []map[string]interface{}{
{"name": "test-bunble.v5.0.3"},
{"name": "test-bundle.v5.0.4", "replaces": "test-bundle.v5.0.3"},
},
}),
},
{
// Second channel (stable-v6.x)
Schema: "olm.channel",
Package: "test-package",
Name: "stable-v6.x",
Blob: createBlob(t, map[string]interface{}{
"schema": "olm.channel",
"name": "stable-v6.x",
"package": "test-package",
"entries": []map[string]interface{}{
{"name": "test-bundle.v6.0.0", "skipRange": "<6.0.0"},
},
}),
},
{
// Bundle v5.0.3
Schema: "olm.bundle",
Package: "test-package",
Name: "test-bundle.v5.0.3",
Blob: createBlob(t, map[string]interface{}{
"schema": "olm.bundle",
"name": "test-bundle.v5.0.3",
"package": "test-package",
"image": "test-image@sha256:a5d4f",
"properties": []map[string]interface{}{
{
"type": "olm.package",
"value": map[string]interface{}{
"packageName": "test-package",
"version": "5.0.3",
},
},
},
}),
},
{
// Bundle v5.0.4
Schema: "olm.bundle",
Package: "test-package",
Name: "test-bundle.v5.0.4",
Blob: createBlob(t, map[string]interface{}{
"schema": "olm.bundle",
"name": "test-bundle.v5.0.4",
"package": "test-package",
"image": "test-image@sha256:f4233",
"properties": []map[string]interface{}{
{
"type": "olm.package",
"value": map[string]interface{}{
"packageName": "test-package",
"version": "5.0.4",
},
},
},
}),
},
{
// Bundle v6.0.0
Schema: "olm.bundle",
Package: "test-package",
Name: "test-bundle.v6.0.0",
Blob: createBlob(t, map[string]interface{}{
"schema": "olm.bundle",
"name": "test-bundle.v6.0.0",
"package": "test-package",
"image": "test-image@sha256:d3016b",
"properties": []map[string]interface{}{
{
"type": "olm.package",
"value": map[string]interface{}{
"packageName": "test-package",
"version": "6.0.0",
},
},
},
}),
},
}

// Create and populate the index
metasChan := make(chan *declcfg.Meta, len(metas))
for _, meta := range metas {
metasChan <- meta
}
close(metasChan)

idx := newIndex(metasChan)

// Create a reader from the metas
var combinedBlob bytes.Buffer
for _, meta := range metas {
combinedBlob.Write(meta.Blob)
}
fullData := bytes.NewReader(combinedBlob.Bytes())

tests := []struct {
name string
schema string
packageName string
blobName string
wantCount int
validate func(t *testing.T, entry map[string]interface{})
}{
{
name: "filter by schema - olm.package",
schema: "olm.package",
wantCount: 1,
validate: func(t *testing.T, entry map[string]interface{}) {
if entry["schema"] != "olm.package" {
t.Errorf("Expected olm.package schema blob got %v", entry["schema"])
}
},
},
{
name: "filter by schema - olm.channel",
schema: "olm.channel",
wantCount: 2,
validate: func(t *testing.T, entry map[string]interface{}) {
if entry["schema"] != "olm.channel" {
t.Errorf("Expected olm.channel schema blob got %v", entry["schema"])
}
},
},
{
name: "filter by schema - olm.bundle",
schema: "olm.bundle",
wantCount: 3,
validate: func(t *testing.T, entry map[string]interface{}) {
if entry["schema"] != "olm.bundle" {
t.Errorf("Expected olm.bundle schema blob got %v", entry["schema"])
}
},
},
{
name: "filter by package",
packageName: "test-package",
wantCount: 5,
validate: func(t *testing.T, entry map[string]interface{}) {
if entry["package"] != "test-package" {
t.Errorf("Expected blobs with package name test-package, got blob with package name %v", entry["package"])
}
},
},
{
name: "filter by specific bundle name",
blobName: "test-bundle.v5.0.3",
wantCount: 1,
validate: func(t *testing.T, entry map[string]interface{}) {
if entry["schema"] != "olm.bundle" && entry["name"] != "test-bundle.v5.0.3" {
t.Errorf("Expected blob with schema=olm.bundle and name=test-bundle.v5.0.3, got %v", entry)
}
},
},
{
name: "filter by schema and package",
schema: "olm.bundle",
packageName: "test-package",
wantCount: 3,
validate: func(t *testing.T, entry map[string]interface{}) {
if entry["schema"] != "olm.bundle" && entry["package"] != "test-package" {
t.Errorf("Expected blob with schema=olm.bundle and package=test-package, got %v", entry)
}
},
},
{
name: "no matches",
schema: "non.existent",
packageName: "not-found",
wantCount: 0,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
reader := idx.Get(fullData, tt.schema, tt.packageName, tt.blobName)
Copy link
Contributor

@grokspawn grokspawn Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blobName and packageName are not consistently provided in the tests. Are they needed?
Looks like blobName is actually bundleName. Rename?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSection() code handles these cases:

func (i *index) getSectionSet(schema, packageName, name string) sets.Set[section] {
	.
        .
	// Filter by package name if specified
	if packageName != "" {
		.
                .
	}

	// Filter by name if specified
	if name != "" {
		.
                .
	}
	return sectionSet
}

content, err := io.ReadAll(reader)
require.NoError(t, err, "Failed to read content: %v", err)

var count int
decoder := json.NewDecoder(bytes.NewReader(content))
for decoder.More() {
var entry map[string]interface{}
err := decoder.Decode(&entry)
require.NoError(t, err, "Failed to decode result: %v", err)
count++

if tt.validate != nil {
tt.validate(t, entry)
}
}

require.Equal(t, tt.wantCount, count, "Got %d entries, want %d", count, tt.wantCount)
})
}
}

// createBlob is a helper function that creates a JSON blob with a trailing newline
func createBlob(t *testing.T, data map[string]interface{}) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also use the declcfg.WriteJSON method in operator-registry which utilizes the underlying JSON encoder which turns each blob into a JSONLines-compliant format by appending a newline suffix here.
This is how catalogd ends up coincidentally JSONLines-compliant.

Copy link
Contributor Author

@anik120 anik120 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I studied the WriteJSON function and looks like it requires a declCfg.DeclConfig as an argument. Overkill imho if we create a declCfg.DeclConfig from test data just to use WriteJSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how catalogd ends up coincidentally JSONLines-compliant.

On a side note this sounds fragile because it looks like they only add the /n to make the output look nicer and could easily pull this out in a future release in which case WriteJSON will stop being JSONLines compliant.

Not sure how many places this function is actually being used, but I guess at this point this is a "we'll cross the bridge if we ever get there" case :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I initially included the note just for the context because this is totally an accident and could just as easily go away. So we used to have tests to ensure we would recognize any change from that accident.
We need to restore those tests, since presumably any behavior change will come from a dependency bump, so we want the CI to fail early/loud if the behavior changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't related to this PR, so I created #1709 for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

blob, err := json.Marshal(data)
if err != nil {
t.Fatalf("Failed to create blob: %v", err)
}
return append(blob, '\n')
}
Loading