-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[usage] List workspaces for each workspace instance in usage period #10495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ package controller | |
|
||
import ( | ||
"context" | ||
"errors" | ||
"fmt" | ||
"github.com/gitpod-io/gitpod/common-go/log" | ||
"github.com/gitpod-io/gitpod/usage/pkg/db" | ||
|
@@ -38,6 +39,8 @@ type UsageReconcileStatus struct { | |
|
||
WorkspaceInstances int | ||
InvalidWorkspaceInstances int | ||
|
||
Workspaces int | ||
} | ||
|
||
func (u *UsageReconciler) Reconcile() error { | ||
|
@@ -71,11 +74,59 @@ func (u *UsageReconciler) ReconcileTimeRange(ctx context.Context, from, to time. | |
if len(invalidInstances) > 0 { | ||
log.WithField("invalid_workspace_instances", invalidInstances).Errorf("Detected %d invalid instances. These will be skipped in the current run.", len(invalidInstances)) | ||
} | ||
|
||
log.WithField("workspace_instances", instances).Debug("Successfully loaded workspace instances.") | ||
|
||
workspaces, err := u.loadWorkspaces(ctx, instances) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to load workspaces for workspace instances in time range: %w", err) | ||
} | ||
status.Workspaces = len(workspaces) | ||
|
||
return status, nil | ||
} | ||
|
||
type workspaceWithInstances struct { | ||
Workspace db.Workspace | ||
Instances []db.WorkspaceInstance | ||
} | ||
Comment on lines
+88
to
+91
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thought: I wonder if we actually want to load into memory the "fully-hydrated" shapes of both In the past, we've had issues with memory consumption, because we were keeping full shapes in memory, while only a few fields were actually relevant. This is especially true for workspace instances, because there are so many of them. A usage record will probably look something like this: export type WorkspaceUsageRecord = {
instanceId: string;
workspaceId: string;
userId: string;
projectId?: string;
teamId?: string;
fromTime: string;
toTime: string;
}; So, in my mind, the most efficient flow to get these usage records is:
If we query and then load the full There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, this makes sense. But I don't think it makes sense to do this now. For me, this is an optimisation once we have something working. For now, I'd prefer to waste memory to get to a working version and then measure what is in fact the cost of these queries. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point, thanks. (I don't really see why it's better to ship a less optimized version first when we know how to easily write a more optimized one, but I also agree about shipping this sooner than later, and maybe my suggestion could already be considered premature optimization). So, fine to defer this to later. 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Largely because I think the current handling will go away anyway once we do add attribution to the WSI. If we optimized here, it would then be wasted work because it would also be replaced by the attribution work from WSI.
We don't have any hard data to know it is in fact an optimisation. Suboptimal, but with data is more valuable than optimal but without baseline metrics. |
||
|
||
func (u *UsageReconciler) loadWorkspaces(ctx context.Context, instances []db.WorkspaceInstance) ([]workspaceWithInstances, error) { | ||
var workspaceIDs []string | ||
easyCZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for _, instance := range instances { | ||
workspaceIDs = append(workspaceIDs, instance.WorkspaceID) | ||
} | ||
|
||
workspaces, err := db.ListWorkspacesByID(ctx, u.conn, toSet(workspaceIDs)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for making the IDs unique! I would have preferred to make the collection a unique "set" before all of the appends, in order to iterate just once over all instances (not once, then again in |
||
if err != nil { | ||
return nil, fmt.Errorf("failed to find workspaces for provided workspace instances: %w", err) | ||
} | ||
|
||
// Map workspaces to corresponding instances | ||
workspacesWithInstancesByID := map[string]workspaceWithInstances{} | ||
for _, workspace := range workspaces { | ||
workspacesWithInstancesByID[workspace.ID] = workspaceWithInstances{ | ||
Workspace: workspace, | ||
} | ||
} | ||
|
||
// We need to also add the instances to corresponding records, a single workspace can have multiple instances | ||
for _, instance := range instances { | ||
item, ok := workspacesWithInstancesByID[instance.WorkspaceID] | ||
if !ok { | ||
return nil, errors.New("encountered instance without a corresponding workspace record") | ||
} | ||
item.Instances = append(item.Instances, instance) | ||
} | ||
|
||
// Flatten results into a list | ||
var workspacesWithInstances []workspaceWithInstances | ||
for _, w := range workspacesWithInstancesByID { | ||
workspacesWithInstances = append(workspacesWithInstances, w) | ||
} | ||
|
||
return workspacesWithInstances, nil | ||
} | ||
|
||
func (u *UsageReconciler) loadWorkspaceInstances(ctx context.Context, from, to time.Time) ([]db.WorkspaceInstance, []invalidWorkspaceInstance, error) { | ||
log.Infof("Gathering usage data from %s to %s", from, to) | ||
instances, err := db.ListWorkspaceInstancesInRange(ctx, u.conn, from, to) | ||
jankeromnes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
@@ -132,8 +183,8 @@ func trimStartStopTime(instances []db.WorkspaceInstance, maximumStart, minimumSt | |
var updated []db.WorkspaceInstance | ||
|
||
for _, instance := range instances { | ||
if instance.StartedTime.Time().Before(maximumStart) { | ||
instance.StartedTime = db.NewVarcharTime(maximumStart) | ||
if instance.CreationTime.Time().Before(maximumStart) { | ||
instance.CreationTime = db.NewVarcharTime(maximumStart) | ||
} | ||
|
||
if instance.StoppedTime.Time().After(minimumStop) { | ||
|
@@ -144,3 +195,16 @@ func trimStartStopTime(instances []db.WorkspaceInstance, maximumStart, minimumSt | |
} | ||
return updated | ||
} | ||
|
||
func toSet(items []string) []string { | ||
m := map[string]struct{}{} | ||
for _, i := range items { | ||
m[i] = struct{}{} | ||
} | ||
|
||
var result []string | ||
for s := range m { | ||
result = append(result, s) | ||
} | ||
return result | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
// Copyright (c) 2022 Gitpod GmbH. All rights reserved. | ||
// Licensed under the GNU Affero General Public License (AGPL). | ||
// See License-AGPL.txt in the project root for license information. | ||
|
||
package dbtest | ||
|
||
import ( | ||
"github.com/gitpod-io/gitpod/usage/pkg/db" | ||
"github.com/google/uuid" | ||
"testing" | ||
) | ||
|
||
func NewWorkspace(t *testing.T, id string) db.Workspace { | ||
t.Helper() | ||
|
||
return db.Workspace{ | ||
ID: id, | ||
OwnerID: uuid.New(), | ||
Type: "prebuild", | ||
ContextURL: "https://github.com/gitpod-io/gitpod", | ||
Context: []byte(`{"title":"[usage] List workspaces for each workspace instance in usage period","repository":{"cloneUrl":"https://github.com/gitpod-io/gitpod.git","host":"github.com","name":"gitpod","owner":"gitpod-io","private":false},"ref":"mp/usage-list-workspaces","refType":"branch","revision":"586f22ecaeeb3b4796fd92f9ae1ca3512ca1e330","nr":10495,"base":{"repository":{"cloneUrl":"https://github.com/gitpod-io/gitpod.git","host":"github.com","name":"gitpod","owner":"gitpod-io","private":false},"ref":"mp/usage-validate-instances","refType":"branch"},"normalizedContextURL":"https://github.com/gitpod-io/gitpod/pull/10495","checkoutLocation":"gitpod"}`), | ||
Config: []byte(`{"image":"eu.gcr.io/gitpod-core-dev/dev/dev-environment:me-me-image.1","workspaceLocation":"gitpod/gitpod-ws.code-workspace","checkoutLocation":"gitpod","ports":[{"port":1337,"onOpen":"open-preview"},{"port":3000,"onOpen":"ignore"},{"port":3001,"onOpen":"ignore"},{"port":3306,"onOpen":"ignore"},{"port":4000,"onOpen":"ignore"},{"port":5900,"onOpen":"ignore"},{"port":6080,"onOpen":"ignore"},{"port":7777,"onOpen":"ignore"},{"port":9229,"onOpen":"ignore"},{"port":9999,"onOpen":"ignore"},{"port":13001,"onOpen":"ignore"},{"port":13444}],"tasks":[{"name":"Install Preview Environment kube-context","command":"(cd dev/preview/previewctl && go install .)\npreviewctl install-context\nexit\n"},{"name":"Add Harvester kubeconfig","command":"./dev/preview/util/download-and-merge-harvester-kubeconfig.sh\nexit 0\n"},{"name":"Java","command":"if [ -z \"$RUN_GRADLE_TASK\" ]; then\n read -r -p \"Press enter to continue Java gradle task\"\nfi\nleeway exec --package components/supervisor-api/java:lib --package components/gitpod-protocol/java:lib -- ./gradlew --build-cache build\nleeway exec --package components/ide/jetbrains/backend-plugin:plugin --package components/ide/jetbrains/gateway-plugin:publish --parallel -- ./gradlew --build-cache buildPlugin\n"},{"name":"TypeScript","before":"scripts/branch-namespace.sh","init":"yarn --network-timeout 100000 && yarn build"},{"name":"Go","before":"pre-commit install --install-hooks","init":"leeway exec --filter-type go -v -- go mod verify","openMode":"split-right"}],"vscode":{"extensions":["bradlc.vscode-tailwindcss","EditorConfig.EditorConfig","golang.go","hashicorp.terraform","ms-azuretools.vscode-docker","ms-kubernetes-tools.vscode-kubernetes-tools","stkb.rewrap","zxh404.vscode-proto3","matthewpi.caddyfile-support","heptio.jsonnet","timonwong.shellcheck","vscjava.vscode-java-pack","fwcd.kotlin","dbaeumer.vscode-eslint","esbenp.prettier-vscode"]},"jetbrains":{"goland":{"prebuilds":{"version":"stable"}}},"_origin":"repo","_featureFlags":[]}`), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, at this point we have:
I'm still slightly uneasy with loading so many large objects into memory, and iterating over them multiple times. So, I'm looking forward to seeing exactly how we'll use these collections, and see if we can later reduce the amount of used memory and the number of iterations. 👀