Skip to content

on-pr docker build stuck with user is not authorized to BatchGetImage #148771

Closed
@malfet

Description

@malfet

🐛 Describe the bug

Build https://github.com/pytorch/pytorch/actions/runs/13724713611/job/38388317099?pr=148740 stuck at Calculate docker image step trying to check if such image already exists or not

+ [[ 1741362495 -lt 1741364292 ]]
+ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc11:c097a94c03da3be2f692f9ff22e3963e933633cf
denied: User: arn:aws:sts::391835788720:assumed-role/ghci-lf-github-action-runners-runner-role/i-0e98877505f067739 is not authorized to perform: ecr:BatchGetImage on resource: arn:aws:ecr:us-east-1:308535385114:repository/pytorch/pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc11 because no resource-based policy allows the ecr:BatchGetImage action
+ '[' false == true ']'
+ sleep 300
++ date +%s

This logic was added by pytorch/test-infra#6013 but looks like it does not work right now due to some sort of security considerations. (Though all runners should have read access to ECR, shouldn't they?)

Versions

CI

cc @seemethere @pytorch/pytorch-dev-infra

Metadata

Metadata

Assignees

Labels

module: ciRelated to continuous integrationmodule: dockermodule: regressionIt used to work, and now it doesn'tsecuritytriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions