Skip to content

Conversation

steveloughran
Copy link
Contributor

backport branch for S3A prefetching; initial big commit

=========

Contains

HADOOP-18028. High performance S3A input stream (#4109)
Contributed by Bhalchandra Pandit.

HADOOP-18180. Replace use of twitter util-core with java futures (#4115)
Contributed by PJ Fanning.

HADOOP-18177. Document prefetching architecture. (#4205)
Contributed by Ahmar Suhail

HADOOP-18175. fix test failures with prefetching s3a input stream (#4212)
Contributed by Monthon Klongklaew

HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386)

* adds in new test for prefetching input stream
* creates streamStats before opening stream
* updates numBlocks calculation method
* fixes ITestS3AOpenCost.testOpenFileLongerLength
* drains stream async
* fixes failing unit test

Contributed by Ahmar Suhail

HADOOP-18254. Disable S3A prefetching by default. (#4469)
Contributed by Ahmar Suhail

HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458)

This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with
new statistic names in StreamStatistics.

This stream is not (yet) IOStatisticsContext aware.

Contributed by Ahmar Suhail

HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums. HADOOP-18318. Update class names to be clear they belong to S3A prefetching
Contributed by Steve Loughran

Change-Id: I3eca19564dc0c0cb83184f4a42605dbafd908937

Description of PR

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

This is the the preview release of the HADOOP-18028 S3A performance input stream.
It is still stabilizing, but ready to test.

Contains

HADOOP-18028. High performance S3A input stream (apache#4109)
	Contributed by Bhalchandra Pandit.

HADOOP-18180. Replace use of twitter util-core with java futures (apache#4115)
	Contributed by PJ Fanning.

HADOOP-18177. Document prefetching architecture. (apache#4205)
	Contributed by Ahmar Suhail

HADOOP-18175. fix test failures with prefetching s3a input stream (apache#4212)
 Contributed by Monthon Klongklaew

HADOOP-18231.  S3A prefetching: fix failing tests & drain stream async.  (apache#4386)

	* adds in new test for prefetching input stream
	* creates streamStats before opening stream
	* updates numBlocks calculation method
	* fixes ITestS3AOpenCost.testOpenFileLongerLength
	* drains stream async
	* fixes failing unit test

	Contributed by Ahmar Suhail

HADOOP-18254. Disable S3A prefetching by default. (apache#4469)
	Contributed by Ahmar Suhail

HADOOP-18190. Collect IOStatistics during S3A prefetching (apache#4458)

	This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with
	new statistic names in StreamStatistics.

	This stream is not (yet) IOStatisticsContext aware.

	Contributed by Ahmar Suhail

HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk
HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums.
HADOOP-18318. Update class names to be clear they belong to S3A prefetching
	Contributed by Steve Loughran

Change-Id: I3eca19564dc0c0cb83184f4a42605dbafd908937
@ahmarsuhail
Copy link
Contributor

looks good so far, not sure if this helpful, but patches that came after this big commit are (listed in order they were committed to trunk):

  • ITestS3ACannedACLs failure; not in a span: JIRA, PR
  • fs.s3a.prefetch.block.size to be read through longBytesOption: JIRA, PR
  • s3a prefetching to use SemaphoredDelegatingExecutor for submitting work: JIRA, PR
  • hadoop-aws maven build to add a prefetch profile to run all tests with prefetching: JIRA, PR
  • s3a prefetching Executor should be closed: JIRA, PR & PR
  • Implement readFully(long position, byte[] buffer, int offset, int length) - JIRA, PR
  • S3PrefetchingInputStream to support status probes when closed - JIRA, PR
  • assertion failure in ITestS3APrefetchingInputStream - JIRA, PR
  • Remove lower limit on s3a prefetching/caching block size - JIRA, PR
  • S3A prefetching: Error logging during reads - JIRA, PR

Patch available, but not merged yet:
SingleFilePerBlockCache to use LocalDirAllocator for file allocation: JIRA, PR

@steveloughran
Copy link
Contributor Author

thanks for the list; will xref in the jira and apply the patches.

@steveloughran
Copy link
Contributor Author

pushed the pr up as a feature branch

feature-HADOOP-18028-s3a-prefetch-branch-3.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants