Skip to content

For Cross Account S3 buckets - DVC Add gives put object access denied exception even after providing ACL of bucket-owner-full-control. #4887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Nasreen23 opened this issue Nov 12, 2020 · 11 comments
Labels
awaiting response we are waiting for your reply, please respond! :)

Comments

@Nasreen23
Copy link

Nasreen23 commented Nov 12, 2020

Bug Report

Please provide information about your setup

Working with DVC remote s3.
The input file and cache location are on AWS S3 bucket.

Output of dvc version:
1.9.1

$ dvc version

Additional Information (if any):
Performing DVC add with an input file located on cross account s3 bucket and cache path is also being referred to the same bucket.
Using acl config of bucket-owner-full-control for the remote s3 and not allowed to use the other grant options per our requirement.

Below are the steps that can be followed.

dvc init --no-scm
dvc remote add -d test-remote s3://test-bucket/dvc-test/cache
dvc config cache.s3 test-remote
dvc remote modify myremote acl bucket-owner-full-control
dvc add -v s3://test-bucket/all/dvc-test/input/test_1232.csv --external --file test_1232.csv.dvc

If applicable, please also provide a `--verbose` output of the command, eg: `dvc add --verbose`.

dvc add -v s3://test-bucket/input/test_1232.csv --external --file test_1232.csv.dvc
2020-11-06 08:46:41,149 DEBUG: Check for update is enabled.
2020-11-06 08:46:41,440 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'
2020-11-06 08:46:42,145 DEBUG: Spawned '['daemon', '-q', 'updater']'
2020-11-06 08:46:42,149 DEBUG: fetched: [(3,)]
2020-11-06 08:46:48,673 DEBUG: {'s3://test-bucket/input/test_1232.csv': 'modified'}
2020-11-06 08:46:49,902 DEBUG: Computed stage: 'test_1232.csv.dvc' md5: 'None'
2020-11-06 08:46:50,207 DEBUG: Saving 's3://test-bucket/input/test_1232.csv' to 's3://test-bucket/dvc-test/cache_new/02/312185d395208731aca8ea4d632498'.
2020-11-06 08:46:54,713 DEBUG: cache 's3://test-bucket/dvc-test/cache_new/02/312185d395208731aca8ea4d632498' expected 'HashInfo(name='etag', value='02312185d395208731aca8ea4d632498', dir_info=None)' actual 'None'
2020-11-06 08:46:55,840 DEBUG: Removing s3://test-bucket/input/test_1232.csv
Adding...
2020-11-06 08:46:56,766 DEBUG: fetched: [(0,)]
2020-11-06 08:46:56,772 ERROR: unexpected error - An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
[dvc_add_logs.txt](https://github.com/iterative/dvc/files/5530596/dvc_add_logs.txt)

Please find below the solution for the same as discussed in the dev talk forum.
Solution: The line number 238 in s3.py file has been modified as below. The ACL parameter has been passed to the obj.put call that's being done.

This is required in the scenario where we are allowed to use only bucket owner acl and cannot use any other grant permissions due to the restrictions imposed on our AWS account policies.

Below is the method where the change has been done.
def makedirs(self, path_info):
# We need to support creating empty directories, which means
# creating an object with an empty body and a trailing slash /.
#
# We are not creating directory objects for every parent prefix,
# as it is not required.
if not path_info.path:
return

    dir_path = path_info / ""
    with self._get_obj(dir_path) as obj:
        obj.put(Body="",ACL= self.extra_args["ACL"])  ## change done by Nasreen
@pared
Copy link
Contributor

pared commented Nov 12, 2020

@Nasreen23 are you able to execute some pure s3 command?
Are you able to execute, for example, aws s3 ls s3://test-bucket?

@pared pared added the awaiting response we are waiting for your reply, please respond! :) label Nov 12, 2020
@efiop
Copy link
Contributor

efiop commented Nov 12, 2020

@pared The thing here is that @Nasreen23 is using dvc add s3://, and s3:// url in it is no way connected to the remote that he has defined above.

dvc add -v s3://test-bucket/all/dvc-test/input/test_1232.csv --external --file test_1232.csv.dvc

@Nasreen23 Could you elaborate on your scenario, please? dvc add --external is a very advanced feature that people often misuse and we don't usually recommend it.

@shcheklein
Copy link
Member

More context on this (I think), @Nasreen23 correct me if I put the wrong links:

https://discord.com/channels/485586884165107732/485596304961962003/774103813161353246
https://discord.com/channels/485586884165107732/485596304961962003/775708232156446730

@Nasreen23 I've started looking into this, but I don't have enough experience with S3 ACL to be honest. So it would take a while on my end.

@Nasreen23
Copy link
Author

Nasreen23 commented Nov 30, 2020

@shcheklein : Yes Ivan, that's right. As discussed in the dvc dev-talk forum, have created this issue and provided the solution.
Can we know by when can this solution be implemented at your end and can be released?

@efiop
Copy link
Contributor

efiop commented Nov 30, 2020

@Nasreen23 Could you elaborate on why you are using --external in the first place? Also, does the same issue arise when you dvc push to that bucket? --external is an advanced feature that people tend to misuse pretty often when they are actually looking for #4520 feature.

@Nasreen23
Copy link
Author

Nasreen23 commented Nov 30, 2020

@efiop - This is our use case for which I am trying to use dvc add --external.
Use case is to track and version the input data set that is available on s3 bucket. Providing capability to checkout the older data set versions at a given point in time.

To achieve this- we are using the dvc add --external feature which will cache the input data set in a cache remote location(i.e. on s3). We aren't saving the local copies of input data set rather using Git to maintain the .dvc files.

How can we do the dvc push without doing a dvc add which will create my cache file in the cache s3 location ?

@efiop
Copy link
Contributor

efiop commented Dec 1, 2020

@Nasreen23 Thanks, I'm just making sure you are aware of potential consequenses of using --external. For example, if anyone in your team runs dvc checkout, it will change the file on s3 for everyone. Is that desired in your case?

@sajjap
Copy link

sajjap commented Dec 1, 2020

@Nasreen23 Thanks, I'm just making sure you are aware of potential consequenses of using --external. For example, if anyone in your team runs dvc checkout, it will change the file on s3 for everyone. Is that desired in your case?

Hello efiop, yes that is the desired outcome for us.

@Nasreen23
Copy link
Author

@efiop - Having explained the use case for which we are using dvc add --external feature, did you happen to look at the suggested solution?

@efiop
Copy link
Contributor

efiop commented Dec 13, 2020

@Nasreen23 Sorry for the delay. Unfortunately I wasn't able to look deeper into your case, but the reason I was asking you about whether or not regular dvc workflow works with dvc push is that I wanted to make sure that you are able to access the backet in general. If you are able to do that, then the problem is that you are using s3:// url, and dvc doesn't know that it should associate the acl settings you've set. So you need to do something like:

dvc remote add test s3://test-bucket/
dvc remote modify test acl bucket-owner-full-control

dvc remote add test-cache remote://test/dvc-test/cache
dvc config cache.s3 test-cache

dvc add remote://test/all/dvc-test/input/test_1232.csv --external

instead of what you did before. Overall, please be aware that this is not a core dvc scenario and it might be buggy and we might change the behavior in the future versions. So you are venturing into this more or less at your own risk. We can only recommend regular workflow without --external.

@efiop
Copy link
Contributor

efiop commented Apr 14, 2021

Closing as stale.

@efiop efiop closed this as completed Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

5 participants