-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-18399. S3A Prefetch - SingleFilePerBlockCache to use LocalDirAllocator #5054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tested against
and
|
|
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
416ce37
to
e94d0ad
Compare
@steveloughran, requesting your review |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the allocator something passed in to the SingleFilePerBlockCache; create it in the owning class. so no need to pass down as a param, have a singleton, synchronized init etc.
|
||
private final PrefetchingStatistics prefetchingStatistics; | ||
|
||
private static LocalDirAllocator localDirAllocator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be static, as that would be shared across all fs instances, even those of different users.
better: constructor takes an allocator, which is created in s3a caching block manager.
* @throws IOException if there is an error writing the given block. | ||
*/ | ||
void put(int blockNumber, ByteBuffer buffer) throws IOException; | ||
void put(int blockNumber, ByteBuffer buffer, Configuration conf, String bufferDirConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the allocator should be per instance for the cache; and so no need to pass in binding info here
1dc0b6e
to
e5ca459
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Addressed the above comments. |
@steveloughran could you please take another look? |
Re-tested the full test suit with and without prefetch against us-west-2, test results look good. |
FYI @mehakmeet @mukund-thakur if you have some bandwidth to review this PR |
Sorry for the late reply. We are really busy with other important stuff like data correctness issue in Azure connector and many others internally. So I can't prioritize this as prefetching is not an active feature. |
We'd love to take a look but this week is pretty busy for our team. I should have more time next week, will see if I can do earlier but I want to be realistic :') |
No worries, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments, otherwise looks good
|
||
this.ops = new BlockOperations(); | ||
this.ops.setDebug(false); | ||
this.conf = conf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add requireNonNull() checks here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After applying this change, a minor tests that intercepts (asserts) some arg based Exceptions, failed.
Fixed in the latest revision.
createObjectAttributes(path, fileStatus), | ||
createInputStreamCallbacks(auditSpan), | ||
inputStreamStats)); | ||
inputStreamStats, getConf())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put on a new line for consistency
|
||
@Override | ||
protected Path getCacheFilePath() throws IOException { | ||
protected Path getCacheFilePath(Configuration conf, LocalDirAllocator localDirAllocator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think this should be indented more
|
||
private static final Logger LOG = | ||
LoggerFactory.getLogger(ITestS3APrefetchingInputStream.class); | ||
private static final Logger LOG = LoggerFactory.getLogger(ITestS3APrefetchingInputStream.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed; just revert to make patch smaller. unless you really, really want this
} | ||
|
||
@Test | ||
public void testCacheFileExistence() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any cleanup of these files after? i know a maven clean will do it, but these may be be big files and create problems on docker container runs. best to do an rm of the dir, or at least the files, afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to let the test clean it up rather than rely on mvn cleanup. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I had to remove this change in the latest revision.
The problem could arise with the file cleanup: we would not know which exact cache file was created by this test, meaning what if another test (with -Dprefetch option) running in parallel also creates a new cache file and we remove it as part of this test?
Edit: This is now taken care of in the latest revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: this is now taken care of with the latest commit. The test has been included with sequential-integration-tests
.
0922a44
to
399ad71
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
ee06d3f
to
c9331a2
Compare
🎊 +1 overall
This message was automatically generated. |
@mehakmeet I have addressed all comments. Could you please take a look? |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
mvninstall and shadedclient errors above are related to this latest commit 72b0122 |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented. main changes
- somehow get that allocator from s3afs all the way down
- tests to properly isolate temp dirs used including cleanup; all test suites which use
prepareTestConfiguration()
have this already, so: use its defined buffer dir, and call the method in new unit tests
|
||
private final PrefetchingStatistics prefetchingStatistics; | ||
|
||
// File attributes attached to any intermediate temporary file created during index creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use javadocs
.../hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java
Show resolved
Hide resolved
.../hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java
Show resolved
Hide resolved
LOG.debug("Creating in caching input stream for {}", context.getPath()); | ||
final String bufferDir = | ||
conf.get(BUFFER_DIR) != null ? BUFFER_DIR : HADOOP_TMP_DIR; | ||
final LocalDirAllocator localDirAllocator = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are pretty expensive. s3afs creates one in directoryAllocator on demand...that code could be pulled out from createTmpFileForWrite() and also passed down here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i missed this one in the first place, should have passed down S3AFS dir allocator only.
Done.
|
||
testFile = new Path(testFileUri); | ||
prefetchBlockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE); | ||
fs = new S3AFileSystem(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superclass setup creates an fs already; no need for another one.
@Override | ||
public synchronized void teardown() throws Exception { | ||
super.teardown(); | ||
File tmpFileDir = new File("target/build/test"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, here is why the tests are stamping on each other.
if you look at S3ATestUtils.prepareTestConfiguration()
you can see how it patches test configs with custom temp dirs. extract that from the fs used and delete it only
private static final class BlockManagerForTesting | ||
extends S3ACachingBlockManager { | ||
|
||
private static final Configuration CONF = new Configuration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use prepareTestConfiguration() to isolate the temp dir. this always sets BUFFER_DIR so no need for the checks on L142
S3AInputStreamStatistics stats = | ||
readContext.getS3AStatisticsContext().newInputStreamStatistics(); | ||
|
||
Configuration conf = new Configuration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as above on prepareTestConfiguration()
|
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Re-run against us-west-2:
|
assertEquals(0, cache.size()); | ||
assertFalse(cache.containsBlock(0)); | ||
cache.put(0, buffer1); | ||
cache.put(0, buffer1, CONF, new LocalDirAllocator(HADOOP_TMP_DIR)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Test* classes, using BUFFER_DIR
is not helpful as they don't use S3ATestUtils#prepareTestConfiguration
.
Hence, using HADOOP_TMP_DIR
for Test* classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+1
…llocator (#5054) Contributed by Viraj Jasani
…llocator (#5054) Contributed by Viraj Jasani
…llocator (apache#5054) Contributed by Viraj Jasani
No description provided.