Skip to content

add migration 0490_job_st_fin_allrows_backpop #4367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

risicle
Copy link
Member

@risicle risicle commented Jan 31, 2025

A follow-up to #4323

This updates all finished jobs more than 24h since scheduled_for to the new finished all notifications created status. why 24h? 24h is how far back the check-for-missing-rows-in-completed-jobs celery task will look and so anything older than that won't get converted to finished all notifications created automatically.

There shouldn't be any jobs that make it to 24h without all rows or being moved to a failed state by then anyway...

@risicle
Copy link
Member Author

risicle commented Jan 31, 2025

Is the squawk failure a matter of it being confused by the %(...)s parameter syntax that sqlalchemy is using?

@risicle risicle marked this pull request as ready for review January 31, 2025 15:28
@quis
Copy link
Member

quis commented Feb 12, 2025

There’s already a 487 now so you’ll need to update the version number in this PR

@risicle risicle force-pushed the ris-job-status-finished-notifications-created-back-population branch from 3ffb771 to 417eefb Compare February 20, 2025 16:41
@risicle risicle changed the title add migration 0487_job_st_fin_allrows_backpop add migration 0490_job_st_fin_allrows_backpop Feb 20, 2025
jobs.update()
.where(
jobs.c.job_status == "finished",
jobs.c.scheduled_for < datetime.datetime.utcnow() - datetime.timedelta(days=1),
Copy link
Member

@quis quis Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most jobs don’t have a scheduled_for date (only ones where the user choose a later time to send them). The processing_started column is a more reliable indicator of when they were sent.

Suggested change
jobs.c.scheduled_for < datetime.datetime.utcnow() - datetime.timedelta(days=1),
jobs.c.processing_started < datetime.datetime.utcnow() - datetime.timedelta(days=1),

Some example jobs from the Notify service in production here
job_status scheduled_for processing_started processing_finished
finished all notifications created 2025-02-13 15:06:00.585552 2025-02-13 15:06:00.890576
finished all notifications created 2025-02-13 14:34:31.734979 2025-02-13 14:34:32.065425
finished all notifications created 2025-02-13 14:04:45.689966 2025-02-13 14:04:46.028345
finished all notifications created 2025-02-12 13:15:40.374586 2025-02-12 13:15:40.552505
finished all notifications created 2025-02-12 13:13:25.01338 2025-02-12 13:13:25.207884
finished all notifications created 2025-02-12 13:10:19.987049 2025-02-12 13:10:20.18051
finished all notifications created 2025-02-12 13:08:14.185305 2025-02-12 13:08:14.375147
finished all notifications created 2025-02-12 13:05:08.103867 2025-02-12 13:05:08.251264
finished all notifications created 2025-02-12 12:59:40.527564 2025-02-12 12:59:40.693566
finished all notifications created 2025-02-12 12:53:26.41888 2025-02-12 12:53:26.601478
finished all notifications created 2025-02-12 12:48:01.024616 2025-02-12 12:48:01.197442
finished all notifications created 2025-02-04 10:44:56.358388 2025-02-04 10:44:56.570331
finished all notifications created 2025-02-04 10:42:47.799703 2025-02-04 10:42:48.065459
finished all notifications created 2025-01-21 13:39:11.814203 2025-01-21 13:39:17.399959
finished all notifications created 2025-01-17 09:40:43.544483 2025-01-17 09:41:00.544936
finished 2024-12-19 16:56:52.850092 2024-12-19 16:57:07.990041
finished 2024-12-11 16:27:15.68192 2024-12-11 16:27:16.155975
finished 2024-12-11 16:10:47.650919 2024-12-11 16:10:48.851847
finished 2024-12-04 16:22:31.478881 2024-12-04 16:25:53.121038
finished 2024-11-28 13:55:19.573986 2024-11-28 13:55:25.801151
finished 2024-11-21 16:47:27.065497 2024-11-21 16:47:31.796688
finished 2024-11-19 15:36:51.598946 2024-11-19 15:36:57.850349
finished 2024-11-14 11:04:40.956831 2024-11-14 11:04:47.690653
finished 2024-11-05 15:00:52.806905 2024-11-05 15:04:49.979648
finished 2024-10-18 11:58:45.601174 2024-10-18 11:58:45.905006
finished 2024-10-17 15:49:58.747644 2024-10-17 15:50:01.29511
finished 2024-10-15 08:00:00 2024-10-15 08:00:04.948226 2024-10-15 08:00:37.527912
finished 2024-10-15 08:00:00 2024-10-15 08:00:02.910461 2024-10-15 08:01:38.208428
finished 2024-10-15 08:00:00 2024-10-15 08:00:02.486629 2024-10-15 08:00:09.805449
finished 2024-10-15 08:00:00 2024-10-15 08:00:02.384441 2024-10-15 08:02:17.828845
finished 2024-08-06 08:00:00 2024-08-06 08:00:02.735349 2024-08-06 08:00:04.670112
finished 2024-08-01 14:31:09.607592 2024-08-01 14:31:20.69365
finished 2024-07-17 08:00:00 2024-07-17 08:00:01.380056 2024-07-17 08:00:05.30163
finished 2024-07-09 10:31:34.372538 2024-07-09 10:31:43.973822
finished 2024-06-25 11:09:05.333171 2024-06-25 11:11:45.1229
select job_status, scheduled_for, processing_started, processing_finished from jobs where service_id = 'd6aa2c68-a2d9-4437-ab19-3ae8eb202553
 ' and job_status != 'cancelled' order by 3 desc limit 35

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmmmmmmmm the problem is there isn't an index on processing_started, so there isn't a way of picking them out without a table scan which might take ages 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select count(*) from jobs;
count
2255339

Regardless of how you select them, is it even safe to update that many in one go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to pg_stats this column's null_frac (the proportion of values that are null) is 0.004, so there are really very few and I could probably add a second statement catching those that are null and make the timestamp condition based on another column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(perhaps this is all overthinking anyway because this clause has an extremely low selectivity meaning a table scan isn't much worse)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rows where processing_started is null will probably have a job_status of cancelled so will be caught by your job_status == "finished" clause anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit more analysis because I’m not sure you’re getting the right thing from pg_stats, or you’re looking at the wrong column

Column null not null
scheduled_for 91% 9%
processing_started 0.5% 99.5%

select job_status, count(*) from jobs
where processing_started is null
group by 1 order by 2 desc;
job_status count
cancelled 9149
scheduled 267
sent to dvla 103
pending 27
sending limits exceeded 9

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I was talking about processing_started having a null_frac of 0.004 which is ~ 0.5%. Not totally sure what my point was thereon though..

this updates all "finished" jobs more than 24h since scheduled_for
to the new "finished all notifications created" status. why 24h?
24h is how far back the check-for-missing-rows-in-completed-jobs
celery task will look and so anything older than that won't get
converted to "finished all notifications created" automatically.

there *shouldn't* be any jobs that make it to 24h without all rows
or being moved to a failed state by then anyway...
@risicle risicle force-pushed the ris-job-status-finished-notifications-created-back-population branch from 417eefb to 07aee6e Compare February 21, 2025 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants