HADOOP-13278. S3AFileSystem mkdirs does not need to validate parent path components #100
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
According to S3 semantics, there is no conflict if a bucket contains a key named
a/b
and also a directory nameda/b/c
. "Directories" in S3 are, after all, nothing but prefixes.However, the
mkdirs
call inS3AFileSystem
does go out of its way to traverse every parent path component for the directory it's trying to create, making sure there's no file with that name. This is suboptimal for three main reasons:mkdirs
, even on a prefix that you have access to, the traversal up the path will cause you to eventually hit the root bucket, which will fail with a 403 - even though the directory creation call would have succeeded.I've opened a ticket on the Hadoop JIRA. This pull request is a simple patch that just removes this portion of the check. I have tested it with my team's instance of Spark + Luigi, and can confirm it works, and resolves the aforementioned permissions issue for a bucket on which we only had prefix access.
This is my first ticket/pull request against Hadoop, so let me know if I'm not following some convention properly :)