Skip to content

Optimize user file size calculation query #5069

@bjester

Description

@bjester

This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.

Current behavior

The following query performs poorly on Studio's hotfixes DB. Two queries averaged 17.16 minutes per call.

SELECT
  SUM("__col1")
FROM (
  SELECT
    DISTINCT "contentcuration_file"."checksum" AS Col1,
    "contentcuration_file"."file_size" AS "__col1"
  FROM
    "contentcuration_file"
  INNER JOIN
    "contentcuration_contentnode"
  ON
    ("contentcuration_file"."contentnode_id" = "contentcuration_contentnode"."id")
  WHERE
    ("contentcuration_file"."uploaded_by_id" = $1
      AND "contentcuration_contentnode"."tree_id" = $2
      AND NOT ("contentcuration_file"."checksum" IN (
        WITH
          RECURSIVE "cte" AS (
          SELECT
            DISTINCT "contentcuration_contentnode"."tree_id" AS "tree_id"
          FROM
            "contentcuration_channel"
          INNER JOIN
            "contentcuration_channel_editors"
          ON
            ("contentcuration_channel"."id" = "contentcuration_channel_editors"."channel_id")
          LEFT OUTER JOIN
            "contentcuration_contentnode"
          ON
            ("contentcuration_channel"."main_tree_id" = "contentcuration_contentnode"."id")
          WHERE
            ("contentcuration_channel_editors"."user_id" = $3
              AND NOT ("contentcuration_channel"."deleted")))
        SELECT
          DISTINCT U0."checksum"
        FROM
          "contentcuration_file" U0
        INNER JOIN
          "contentcuration_contentnode" U2
        ON
          (U0."contentnode_id" = U2."id")
        INNER JOIN
          "cte" U3
        ON
          U2."tree_id" = U3."tree_id"
        WHERE
          U0."uploaded_by_id" = $4)))) subquery

Desired behavior

The query appears to be calculating the size of a user's uploads. This may be the primary size calculation query, which has been a problematic query for a while.

It is suggested to attempt to reproduce this query on hotfixes, identify where it comes from, and see where GCP Cloud SQL will produce a query plan, to assist in how to best optimize the query.

Value add

This allows us to continue the process of upgrading our postgres server to a more recent version.

References

Query ID: 7243823605537814967

Unfortunately, the query does not have a query plan available. Query plans are based of frequent samples, and there are not enough for GCP to provide one.

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions