Skip to content

[usage] List workspaces for each workspace instance in usage period #10495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 67 additions & 3 deletions components/usage/pkg/controller/reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ package controller

import (
"context"
"errors"
"fmt"
"github.com/gitpod-io/gitpod/common-go/log"
"github.com/gitpod-io/gitpod/usage/pkg/db"
Expand Down Expand Up @@ -38,6 +39,8 @@ type UsageReconcileStatus struct {

WorkspaceInstances int
InvalidWorkspaceInstances int

Workspaces int
}

func (u *UsageReconciler) Reconcile() error {
Expand Down Expand Up @@ -71,11 +74,59 @@ func (u *UsageReconciler) ReconcileTimeRange(ctx context.Context, from, to time.
if len(invalidInstances) > 0 {
log.WithField("invalid_workspace_instances", invalidInstances).Errorf("Detected %d invalid instances. These will be skipped in the current run.", len(invalidInstances))
}

log.WithField("workspace_instances", instances).Debug("Successfully loaded workspace instances.")

workspaces, err := u.loadWorkspaces(ctx, instances)
if err != nil {
return nil, fmt.Errorf("failed to load workspaces for workspace instances in time range: %w", err)
}
status.Workspaces = len(workspaces)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, at this point we have:

  • a collection of all "fully hydrated" instances
  • a collection of all "fully hydrated" workspaces, each with an attached sub-collection of all its corresponding "fully hydrated" instances (although these will just be references to a single instance in memory, right, not a full copy of the object?)

I'm still slightly uneasy with loading so many large objects into memory, and iterating over them multiple times. So, I'm looking forward to seeing exactly how we'll use these collections, and see if we can later reduce the amount of used memory and the number of iterations. 👀


return status, nil
}

type workspaceWithInstances struct {
Workspace db.Workspace
Instances []db.WorkspaceInstance
}
Comment on lines +88 to +91
Copy link
Contributor

@jankeromnes jankeromnes Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: I wonder if we actually want to load into memory the "fully-hydrated" shapes of both Workspace and WorkspaceInstance.

In the past, we've had issues with memory consumption, because we were keeping full shapes in memory, while only a few fields were actually relevant. This is especially true for workspace instances, because there are so many of them.

A usage record will probably look something like this:

export type WorkspaceUsageRecord = {
    instanceId: string;
    workspaceId: string;
    userId: string;
    projectId?: string;
    teamId?: string;
    fromTime: string;
    toTime: string;
};

So, in my mind, the most efficient flow to get these usage records is:

  1. Query only instanceId, workspaceId, creationTime, stoppedTime from d_b_workspace_instance (for the current period) -- all the other fields are unnecessary
  2. Trim the time bounds between beginningOfMonth and min(getCurrentTime(), endOfMonth) (also set the toTime to the maximum bound when there is no stoppedTime yet)
  3. For each unique workspaceId query only the owner userId and possible projectId from d_b_workspace
  4. For each userId, query only the possible teamId (randomly pick one team for now)

If we query and then load the full Workspace and WorkspaceInstance shapes into memory, and start passing them around a lot, I worry that we'll have a bad time (because many workspace instances are created every month, and that number is rapidly increasing over time).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this makes sense. But I don't think it makes sense to do this now.

For me, this is an optimisation once we have something working. For now, I'd prefer to waste memory to get to a working version and then measure what is in fact the cost of these queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, thanks. (I don't really see why it's better to ship a less optimized version first when we know how to easily write a more optimized one, but I also agree about shipping this sooner than later, and maybe my suggestion could already be considered premature optimization). So, fine to defer this to later. 👍

Copy link
Member Author

@easyCZ easyCZ Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely because I think the current handling will go away anyway once we do add attribution to the WSI. If we optimized here, it would then be wasted work because it would also be replaced by the attribution work from WSI.

I don't really see why it's better to ship a less optimized version first when we know how to easily write a more optimized one

We don't have any hard data to know it is in fact an optimisation. Suboptimal, but with data is more valuable than optimal but without baseline metrics.


func (u *UsageReconciler) loadWorkspaces(ctx context.Context, instances []db.WorkspaceInstance) ([]workspaceWithInstances, error) {
var workspaceIDs []string
for _, instance := range instances {
workspaceIDs = append(workspaceIDs, instance.WorkspaceID)
}

workspaces, err := db.ListWorkspacesByID(ctx, u.conn, toSet(workspaceIDs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the IDs unique! I would have preferred to make the collection a unique "set" before all of the appends, in order to iterate just once over all instances (not once, then again in toSet to create the map, then another time over the unique IDs to convert them back to a list) -- but okay for quick iterations without worrying too much about complexity this early. 🛹

if err != nil {
return nil, fmt.Errorf("failed to find workspaces for provided workspace instances: %w", err)
}

// Map workspaces to corresponding instances
workspacesWithInstancesByID := map[string]workspaceWithInstances{}
for _, workspace := range workspaces {
workspacesWithInstancesByID[workspace.ID] = workspaceWithInstances{
Workspace: workspace,
}
}

// We need to also add the instances to corresponding records, a single workspace can have multiple instances
for _, instance := range instances {
item, ok := workspacesWithInstancesByID[instance.WorkspaceID]
if !ok {
return nil, errors.New("encountered instance without a corresponding workspace record")
}
item.Instances = append(item.Instances, instance)
}

// Flatten results into a list
var workspacesWithInstances []workspaceWithInstances
for _, w := range workspacesWithInstancesByID {
workspacesWithInstances = append(workspacesWithInstances, w)
}

return workspacesWithInstances, nil
}

func (u *UsageReconciler) loadWorkspaceInstances(ctx context.Context, from, to time.Time) ([]db.WorkspaceInstance, []invalidWorkspaceInstance, error) {
log.Infof("Gathering usage data from %s to %s", from, to)
instances, err := db.ListWorkspaceInstancesInRange(ctx, u.conn, from, to)
Expand Down Expand Up @@ -132,8 +183,8 @@ func trimStartStopTime(instances []db.WorkspaceInstance, maximumStart, minimumSt
var updated []db.WorkspaceInstance

for _, instance := range instances {
if instance.StartedTime.Time().Before(maximumStart) {
instance.StartedTime = db.NewVarcharTime(maximumStart)
if instance.CreationTime.Time().Before(maximumStart) {
instance.CreationTime = db.NewVarcharTime(maximumStart)
}

if instance.StoppedTime.Time().After(minimumStop) {
Expand All @@ -144,3 +195,16 @@ func trimStartStopTime(instances []db.WorkspaceInstance, maximumStart, minimumSt
}
return updated
}

func toSet(items []string) []string {
m := map[string]struct{}{}
for _, i := range items {
m[i] = struct{}{}
}

var result []string
for s := range m {
result = append(result, s)
}
return result
}
21 changes: 17 additions & 4 deletions components/usage/pkg/controller/reconciler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ package controller
import (
"context"
"github.com/gitpod-io/gitpod/usage/pkg/db"
"github.com/gitpod-io/gitpod/usage/pkg/db/dbtest"
"github.com/google/uuid"
"github.com/stretchr/testify/require"
"testing"
Expand All @@ -15,22 +16,30 @@ import (

func TestUsageReconciler_Reconcile(t *testing.T) {
conn := db.ConnectForTests(t)
workspaceID := "gitpodio-gitpod-gyjr82jkfnd"
instanceStatus := []byte(`{"phase": "stopped", "conditions": {"deployed": false, "pullingImages": false, "serviceExists": false}}`)
startOfMay := time.Date(2022, 05, 1, 0, 00, 00, 00, time.UTC)
startOfJune := time.Date(2022, 06, 1, 0, 00, 00, 00, time.UTC)
workspace := dbtest.NewWorkspace(t, "gitpodio-gitpod-gyjr82jkfnd")
instances := []db.WorkspaceInstance{
// Ran throughout the reconcile period
{
ID: uuid.New(),
WorkspaceID: workspaceID,
WorkspaceID: workspace.ID,
CreationTime: db.NewVarcharTime(time.Date(2022, 05, 1, 00, 00, 00, 00, time.UTC)),
StoppedTime: db.NewVarcharTime(time.Date(2022, 06, 1, 1, 0, 0, 0, time.UTC)),
Status: instanceStatus,
},
// Still running
{
ID: uuid.New(),
WorkspaceID: workspace.ID,
CreationTime: db.NewVarcharTime(time.Date(2022, 05, 30, 00, 00, 00, 00, time.UTC)),
Status: instanceStatus,
},
// No creation time, invalid record
{
ID: uuid.New(),
WorkspaceID: workspaceID,
WorkspaceID: workspace.ID,
StoppedTime: db.NewVarcharTime(time.Date(2022, 06, 1, 1, 0, 0, 0, time.UTC)),
Status: instanceStatus,
},
Expand All @@ -39,14 +48,18 @@ func TestUsageReconciler_Reconcile(t *testing.T) {
tx := conn.Create(instances)
require.NoError(t, tx.Error)

tx = conn.Create(&workspace)
require.NoError(t, tx.Error)

reconciler := NewUsageReconciler(conn)

status, err := reconciler.ReconcileTimeRange(context.Background(), startOfMay, startOfJune)
require.NoError(t, err)
require.Equal(t, &UsageReconcileStatus{
StartTime: startOfMay,
EndTime: startOfJune,
WorkspaceInstances: 1,
WorkspaceInstances: 2,
InvalidWorkspaceInstances: 1,
Workspaces: 1,
}, status)
}
24 changes: 24 additions & 0 deletions components/usage/pkg/db/dbtest/workspace.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Copyright (c) 2022 Gitpod GmbH. All rights reserved.
// Licensed under the GNU Affero General Public License (AGPL).
// See License-AGPL.txt in the project root for license information.

package dbtest

import (
"github.com/gitpod-io/gitpod/usage/pkg/db"
"github.com/google/uuid"
"testing"
)

func NewWorkspace(t *testing.T, id string) db.Workspace {
t.Helper()

return db.Workspace{
ID: id,
OwnerID: uuid.New(),
Type: "prebuild",
ContextURL: "https://github.com/gitpod-io/gitpod",
Context: []byte(`{"title":"[usage] List workspaces for each workspace instance in usage period","repository":{"cloneUrl":"https://github.com/gitpod-io/gitpod.git","host":"github.com","name":"gitpod","owner":"gitpod-io","private":false},"ref":"mp/usage-list-workspaces","refType":"branch","revision":"586f22ecaeeb3b4796fd92f9ae1ca3512ca1e330","nr":10495,"base":{"repository":{"cloneUrl":"https://github.com/gitpod-io/gitpod.git","host":"github.com","name":"gitpod","owner":"gitpod-io","private":false},"ref":"mp/usage-validate-instances","refType":"branch"},"normalizedContextURL":"https://github.com/gitpod-io/gitpod/pull/10495","checkoutLocation":"gitpod"}`),
Config: []byte(`{"image":"eu.gcr.io/gitpod-core-dev/dev/dev-environment:me-me-image.1","workspaceLocation":"gitpod/gitpod-ws.code-workspace","checkoutLocation":"gitpod","ports":[{"port":1337,"onOpen":"open-preview"},{"port":3000,"onOpen":"ignore"},{"port":3001,"onOpen":"ignore"},{"port":3306,"onOpen":"ignore"},{"port":4000,"onOpen":"ignore"},{"port":5900,"onOpen":"ignore"},{"port":6080,"onOpen":"ignore"},{"port":7777,"onOpen":"ignore"},{"port":9229,"onOpen":"ignore"},{"port":9999,"onOpen":"ignore"},{"port":13001,"onOpen":"ignore"},{"port":13444}],"tasks":[{"name":"Install Preview Environment kube-context","command":"(cd dev/preview/previewctl && go install .)\npreviewctl install-context\nexit\n"},{"name":"Add Harvester kubeconfig","command":"./dev/preview/util/download-and-merge-harvester-kubeconfig.sh\nexit 0\n"},{"name":"Java","command":"if [ -z \"$RUN_GRADLE_TASK\" ]; then\n read -r -p \"Press enter to continue Java gradle task\"\nfi\nleeway exec --package components/supervisor-api/java:lib --package components/gitpod-protocol/java:lib -- ./gradlew --build-cache build\nleeway exec --package components/ide/jetbrains/backend-plugin:plugin --package components/ide/jetbrains/gateway-plugin:publish --parallel -- ./gradlew --build-cache buildPlugin\n"},{"name":"TypeScript","before":"scripts/branch-namespace.sh","init":"yarn --network-timeout 100000 && yarn build"},{"name":"Go","before":"pre-commit install --install-hooks","init":"leeway exec --filter-type go -v -- go mod verify","openMode":"split-right"}],"vscode":{"extensions":["bradlc.vscode-tailwindcss","EditorConfig.EditorConfig","golang.go","hashicorp.terraform","ms-azuretools.vscode-docker","ms-kubernetes-tools.vscode-kubernetes-tools","stkb.rewrap","zxh404.vscode-proto3","matthewpi.caddyfile-support","heptio.jsonnet","timonwong.shellcheck","vscjava.vscode-java-pack","fwcd.kotlin","dbaeumer.vscode-eslint","esbenp.prettier-vscode"]},"jetbrains":{"goland":{"prebuilds":{"version":"stable"}}},"_origin":"repo","_featureFlags":[]}`),
}
}
20 changes: 19 additions & 1 deletion components/usage/pkg/db/workspace.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,19 @@
package db

import (
"context"
"database/sql"
"fmt"
"github.com/google/uuid"
"gorm.io/datatypes"
"gorm.io/gorm"
"time"
)

// Workspace represents the underlying DB object
type Workspace struct {
ID string `gorm:"primary_key;column:id;type:char;size:36;" json:"id"`
OwnerID string `gorm:"column:ownerId;type:char;size:36;" json:"ownerId"`
OwnerID uuid.UUID `gorm:"column:ownerId;type:char;size:36;" json:"ownerId"`
ProjectID sql.NullString `gorm:"column:projectId;type:char;size:36;" json:"projectId"`
Description string `gorm:"column:description;type:varchar;size:255;" json:"description"`
Type string `gorm:"column:type;type:char;size:16;default:regular;" json:"type"`
Expand Down Expand Up @@ -46,3 +50,17 @@ type Workspace struct {
func (d *Workspace) TableName() string {
return "d_b_workspace"
}

func ListWorkspacesByID(ctx context.Context, conn *gorm.DB, ids []string) ([]Workspace, error) {
if len(ids) == 0 {
return nil, nil
}

var workspaces []Workspace
tx := conn.WithContext(ctx).Where(ids).Find(&workspaces)
if tx.Error != nil {
return nil, fmt.Errorf("failed to list workspaces by id: %w", tx.Error)
}

return workspaces, nil
}
52 changes: 52 additions & 0 deletions components/usage/pkg/db/workspace_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
package db_test

import (
"context"
"fmt"
"github.com/gitpod-io/gitpod/usage/pkg/db"
"github.com/gitpod-io/gitpod/usage/pkg/db/dbtest"
"github.com/stretchr/testify/require"
"gorm.io/gorm"
"strings"
Expand Down Expand Up @@ -83,3 +85,53 @@ func stringToVarchar(t *testing.T, s string) db.VarcharTime {
require.NoError(t, err)
return converted
}

func TestListWorkspacesByID(t *testing.T) {
conn := db.ConnectForTests(t)

workspaces := []db.Workspace{
dbtest.NewWorkspace(t, "gitpodio-gitpod-aaaaaaaaaaa"),
dbtest.NewWorkspace(t, "gitpodio-gitpod-bbbbbbbbbbb"),
}
tx := conn.Create(workspaces)
require.NoError(t, tx.Error)

for _, scenario := range []struct {
Name string
QueryIDs []string
Expected int
}{
{
Name: "no query ids returns empty results",
QueryIDs: nil,
Expected: 0,
},
{
Name: "not found id returns emtpy results",
QueryIDs: []string{"gitpodio-gitpod-xxxxxxxxxxx"},
Expected: 0,
},
{
Name: "one matching returns results",
QueryIDs: []string{workspaces[0].ID},
Expected: 1,
},
{
Name: "one matching and one non existent returns one found result",
QueryIDs: []string{workspaces[0].ID, "gitpodio-gitpod-xxxxxxxxxxx"},
Expected: 1,
},
{
Name: "multiple matching ids return results for each",
QueryIDs: []string{workspaces[0].ID, workspaces[1].ID},
Expected: 2,
},
} {
t.Run(scenario.Name, func(t *testing.T) {
results, err := db.ListWorkspacesByID(context.Background(), conn, scenario.QueryIDs)
require.NoError(t, err)
require.Len(t, results, scenario.Expected)
})

}
}