-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Stop running prebuilds for inactive projects (10+ weeks) #9219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cd00c1b
to
3de43d3
Compare
3de43d3
to
192747b
Compare
On a side note, we now have both I considered adding the timestamps to |
3cd8f44
to
815b525
Compare
fyi: I've opened a follow up issue with a feature request to surface this automatic pause action better on the dashboard as this could be confusing to users coming back to inactive projects in the dashboard, see #9232. Feedback is welcome! |
1a3b8f3
to
36a1a6f
Compare
/werft run with-clean-slate-deployment 👍 started the job as gitpod-build-jx-inactive-projects.7 |
36a1a6f
to
93a960c
Compare
93a960c
to
904d37a
Compare
Fixes #8911 Fixes prebuild rate limit
904d37a
to
436ceea
Compare
@PrimaryColumn(TypeORM.UUID_COLUMN_TYPE) | ||
projectId: string; | ||
|
||
@Column("varchar") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's historically the reason for using varchar to store timestamps? Is it possible to use a DATETIME?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question -- I don't have a strong opinion here, and was just following established practice 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I ask is that when prototyping an ORM for Golang, Varchar timestamps make it much more complicated - this is further coupled with the somewhat non-standard usage of ISO8601 in javascript to represent a timestamp, which isn't quite RFC3339 compatible.
Definitely not trying to block this PR on this, mostly looking to understand historical reasoning for the varchar choice. Feel free to ignore this comment to land the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I ask is that when prototyping an ORM for Golang, Varchar timestamps make it much more complicated - this is further coupled with the somewhat non-standard usage of ISO8601 in javascript to represent a timestamp, which isn't quite RFC3339 compatible.
I think your concerns are valid, and there is definitely room for improvement in our DB types. 👍
Definitely not trying to block this PR on this, mostly looking to understand historical reasoning for the varchar choice. Feel free to ignore this comment to land the change.
Thanks! I would love to, but first I need a review. 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's awesome, thanks a bunch 🙏
I'll try to practice our new review process by assigning this Pull Request to you then (but please feel free to un-assign yourself again if that didn't make sense!)
One alternative approach we could consider is to load the
WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good, and worked for me based on the test specified. Leaving a couple of questions for my own context. Approving with a hold.
/hold
Posting our slack conversation about splitting PRs here as well for visibility.
For the PR, with regards to tiny PRs. I'd split it into the following:
- Add a method to call to updateProjectUsage but don't write to the DB, only increment a metric
- Create the table, it's unused
Get data that this works - Wire them together - updateProjectUsage actually writes to the DB
projectId: string; | ||
|
||
@Column("varchar") | ||
lastWebhookReceived: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field strictly needed for our first version? We do set it, but we're not using it in read paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it's not strictly needed here indeed. However, since I modified very related code, I thought this small drive-by change could be a good first step toward fixing #7010
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it up to you.
For surfacing which webhook we've received, I'd expose it as a first class-table with webhooks (and webhook payloads), such that we can show what we've received in the webhook as well (would definitely be useful for our debugging).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'll leave this as is, and we can decide later whether we want a full-fledged table of webhook events, or whether showing a simple timestamp of the last event is good enough. 🛹
export class ProjectUsage1649667202321 implements MigrationInterface { | ||
public async up(queryRunner: QueryRunner): Promise<void> { | ||
await queryRunner.query( | ||
"CREATE TABLE IF NOT EXISTS `d_b_project_usage` (`projectId` char(36) NOT NULL, `lastWebhookReceived` varchar(255) NOT NULL DEFAULT '', `lastWorkspaceStart` varchar(255) NOT NULL DEFAULT '', `deleted` tinyint(4) NOT NULL DEFAULT '0', `_lastModified` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6), PRIMARY KEY (`projectId`), KEY `ind_dbsync` (`_lastModified`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's we're setting a limit on projectId
of 36 chars, should this also be reflected in the precision: 36
field on the Column of the ORM model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with the intricacies of our ORM model, so I wouldn't know the pros and cons of doing this.
Briefly looking through our code, it seems that we typically use precision
for timestamps, but not for IDs. If you think this should be changed, could you please open an issue for this? 🙏
@@ -116,6 +116,15 @@ export class GitHubEnterpriseApp { | |||
): Promise<StartPrebuildResult | undefined> { | |||
const span = TraceContext.startSpan("GitHubEnterpriseApp.handlePushHook", ctx); | |||
try { | |||
const cloneURL = payload.repository.clone_url; | |||
const projectAndOwner = await this.findProjectAndOwner(cloneURL, user); | |||
if (projectAndOwner.project) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the point of this call, how do we know that the project specified in the payload in fact belongs to the correct user? Is it possible to poison our data across projects here?
Mostly lacking context, so trying to understand better how the payload from the webhook is authhorized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the point of this call, how do we know that the project specified in the payload in fact belongs to the correct user?
That's a good question. I don't know about webhook events from the GitHub App, but for GitLab, we look at the provided (user + project) token in order to make sure that the project specified in the payload actually belongs to the user who installed this webhook:
gitpod/components/server/ee/src/prebuilds/gitlab-app.ts
Lines 61 to 97 in e8fa6d2
protected async findUser(ctx: TraceContext, context: GitLabPushHook, req: express.Request): Promise<User> { | |
const span = TraceContext.startSpan("GitLapApp.findUser", ctx); | |
try { | |
const secretToken = req.header("X-Gitlab-Token"); | |
span.setTag("secret-token", secretToken); | |
if (!secretToken) { | |
throw new Error("No secretToken provided."); | |
} | |
const [userid, tokenValue] = secretToken.split("|"); | |
const user = await this.userDB.findUserById(userid); | |
if (!user) { | |
throw new Error("No user found for " + secretToken + " found."); | |
} else if (!!user.blocked) { | |
throw new Error(`Blocked user ${user.id} tried to start prebuild.`); | |
} | |
const identity = user.identities.find((i) => i.authProviderId === TokenService.GITPOD_AUTH_PROVIDER_ID); | |
if (!identity) { | |
throw new Error(`User ${user.id} has no identity for '${TokenService.GITPOD_AUTH_PROVIDER_ID}'.`); | |
} | |
const tokens = await this.userDB.findTokensForIdentity(identity); | |
const token = tokens.find((t) => t.token.value === tokenValue); | |
if (!token) { | |
throw new Error(`User ${user.id} has no token with given value.`); | |
} | |
if ( | |
token.token.scopes.indexOf(GitlabService.PREBUILD_TOKEN_SCOPE) === -1 || | |
token.token.scopes.indexOf(context.repository.git_http_url) === -1 | |
) { | |
throw new Error( | |
`The provided token is not valid for the repository ${context.repository.git_http_url}.`, | |
); | |
} | |
return user; | |
} finally { | |
span.finish(); | |
} | |
} |
We do something similar for GitHub Enterprise webhooks, where the secret token is not sent as part of the webhook payload, but is used to sign the webhook payload (which we can verify against the user's tokens):
gitpod/components/server/ee/src/prebuilds/github-enterprise-app.ts
Lines 89 to 106 in e8fa6d2
// Verify the webhook signature | |
const signature = req.header("X-Hub-Signature-256"); | |
const body = (req as any).rawBody; | |
const tokenEntries = (await this.userDB.findTokensForIdentity(gitpodIdentity)).filter((tokenEntry) => { | |
return tokenEntry.token.scopes.includes(GitHubService.PREBUILD_TOKEN_SCOPE); | |
}); | |
const signingToken = tokenEntries.find((tokenEntry) => { | |
const sig = | |
"sha256=" + | |
createHmac("sha256", user.id + "|" + tokenEntry.token.value) | |
.update(body) | |
.digest("hex"); | |
return timingSafeEqual(Buffer.from(sig), Buffer.from(signature ?? "")); | |
}); | |
if (!signingToken) { | |
throw new Error(`User ${user.id} has no token matching the payload signature.`); | |
} | |
return user; |
Is it possible to poison our data across projects here?
I don't think it is possible to send forged webhook events to Gitpod.
However, your question does remind me of an issue where Pull Requests can come from forks, but trigger a Prebuild for the main repository, which can be problematic in some cases: https://github.com/gitpod-io/security/issues/26
Many thanks for your very helpful review and suggestions @easyCZ! 🙏
I like the idea of doing these things more async, because it could make our system simpler and much less subtle/magical. (For example, I initially updated However, it feels like we're lacking some good framework code for such asynchronous jobs that regularly process large amounts of data in the background. I agree that we have to start somewhere, but I'm a bit hesitant to volunteer this Pull Request to be the first "async background processing job", given that it can already provide value as is. 🛹 I've also replied to your questions in-line. Many thanks for diving deeper into all these systems! I find your questions and instincts for improvements quite relevant, and am keen to help improve things or provide context whever I can. 🙂 Also, removing the hold, as I think this is good to go as is. Happy to improve anything in follow-ups. 🚀 /unhold |
Description
Stop running prebuilds for projects that did not start a workspace in the last 10+ weeks.
Related Issue(s)
Fixes #8911
Drive-by fixes prebuild rate limits again
How to test
lastWorkspaceStart
andlastWebhookReceived
are properly set ind_b_project_usage
lastWorkspaceStart
to a date more than 10 weeks ago ind_b_project_usage
Release Notes
Documentation