From b089cf68fb46f7fdbb5d1d88b1b7d87d3988ebfc Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Wed, 20 Apr 2022 12:05:51 +0100 Subject: [PATCH 1/8] adds in prefetching architecture --- .../markdown/tools/hadoop-aws/prefetching.md | 137 ++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md new file mode 100644 index 0000000000000..6a976749a482a --- /dev/null +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -0,0 +1,137 @@ +# S3A Prefetching + + +This document explains the `S3PrefetchingInputStream` and the various components it uses. + +This input stream implements prefetching and caching to improve read performance of the input stream. A high level overview of this feature can also be found on [this](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0) blogpost. + +With prefetching, we divide the file into blocks of a fixed size (default is 8MB), associate buffers to these blocks, and then read data into these buffers asynchronously. We also potentially cache these blocks. + +### Basic Concepts + +* **File** : A binary blob of data stored on some storage device. +* **Block :** A file is divided into a number of blocks. The default size of a block is 8MB, but can be configured. The size of the first n-1 blocks is same, and the size of the last block may be same or smaller. +* **Block based reading** : The granularity of read is one block. That is, we read an entire block and return or none at all. Multiple blocks may be read in parallel. + +### Configuring the stream + +|Property |Meaning |Default | +|--- |--- |--- | +|fs.s3a.prefetch.enabled |Enable the prefetch input stream |TRUE | +|fs.s3a.prefetch.block.size |Size of a block |8MB | +|fs.s3a.prefetch.block.count |Number of blocks to prefetch |8 | + +### Key Components: + +`S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will return an instance of this class as the input stream. Depending on the file size, it will either use the `S3InMemoryInputStream` or the `S3CachingInputStream` as the underlying input stream. + +`S3InMemoryInputStream` - Underlying input stream used when the file size < configured block size. Will read the entire file into memory. + +`S3CachingInputStream` - Underlying input stream used when file size > configured block size. Uses asynchronous prefetching of blocks and caching to improve performance. + +`BlockData` - Holds information about the blocks in a file, such as: + +* Number of blocks in the file +* Block size +* State of each block (initially all blocks have state *NOT_READY*). Other states are: Queued, Ready, Cached. + +`BufferData` - Holds the buffer and additional information about it such as: + +* The block number this buffer is for +* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). Initial state of a buffer is blank. + +`CachingBlockManager` - Implements reading data into the buffer, prefetching and caching. + +`BufferPool` - Manages a fixed sized pool of buffers. It’s used by `CachingBlockManager` to acquire buffers. + +`S3File` - Implements operations to interact with S3 such as opening and closing the input stream to the S3 file. + +`S3Reader` - Implements reading from the stream opened by `S3File`. Reads from this input stream in blocks of 64KB. + +`FilePosition` - Provides functionality related to tracking the position in the file. Also gives access to the current buffer in use. + +`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. Each cache block is stored on the local disk as a separate file. + +### Operation + +### S3InMemoryInputStream + +If we have a file with size 5MB, and block size = 8MB. Since file size is less than the block size, the `S3InMemoryInputStream` will be used. + +If the caller makes the following read calls: + + +``` +in.read(buffer, 0, 3MB); +in.read(buffer, 0, 2MB); +``` + +When the first read is issued, there is no buffer in use yet. We get the data in this file by calling the `ensureCurrentBuffer()` method, which ensures that a buffer with data is available to be read from. + +The `ensureCurrentBuffer()` then: + +* Reads data into a buffer by calling `S3Reader.read(ByteBuffer buffer, long offset, int size)` +* `S3Reader` uses `S3File` to open an input stream to the S3 file by making a `getObject()` request with range as `(0, filesize)`. +* The S3Reader reads the entire file into the provided buffer, and once reading is complete closes the S3 stream and frees all underlying resources. +* Now the entire file is in a buffer, set this data in `FilePosition` so it can be accessed by the input stream. + +The read operation now just gets the required bytes from the buffer in `FilePosition`. + +When the second read is issued, there is already a valid buffer which can be used. Don’t do anything else, just read the required bytes from this buffer. + +### S3CachingInputStream + + + +[Image: image.png] + +Now, if we have a file with size 40MB, and block size = 8MB. The `S3CachingInputStream` will be used. + +### Sequential Reads + +If the caller makes the following calls: + +``` +in.read(buffer, 0, 5MB) +in.read(buffer, 0, 8MB) +``` + +For the first read call, there is no valid buffer yet. `ensureCurrentBuffer()` is called, and for the first read(), prefetch count is set as 1. + +The current block (block 0) is read synchronously, while the blocks to be prefetched (block 1) is read asynchronously. + +The `CachingBlockManager` is responsible for getting buffers from the buffer pool and reading data into them. This process of acquiring the buffer pool works as follows: + + +* The buffer pool keeps a map of allocated buffers and a pool of available buffers. The size of this pool is = prefetch block count + 1. For the default value of 8, we will have a buffer pool of size 9. +* If the pool is not yet at capacity, create a new buffer and add it to the pool. +* If it’s at capacity, check if any buffers with state = done can be released. Releasing a buffer means removing it from allocated and returning it back to the pool of available buffers. +* If we have no buffers with state = done currently then nothing will be released, so retry the above step at a fixed interval a few times till a buffer becomes available. +* If after multiple retries we still don’t have an available buffer, release a buffer in the ready state. The buffer for the block furthest from the current block is released. + +Once a buffer has been acquired by `CachingBlockManager`, if the buffer is in a *READY* state, we can just return it. This means that data was already read into this buffer asynchronously by a prefetch. If it’s state is *BLANK,* then data is read into it using `S3Reader.read(ByteBuffer buffer, long offset, int size).` + +For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and we have only read 5MB of block 0 so far, 3MB of the required data will be read from the current block 0. Once we’re done with this block, we’ll request the next block (block 1), which will already have been prefetched and so we can just start reading from it. Also, while reading from block 1 we will also issue prefetch requests for the next blocks. The number of blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`, with the default being 8. + + +### Random Reads + +If the caller makes the following calls: + + +``` +in.read(buffer, 0, 5MB) +in.seek(10MB) +in.read(buffer, 0, 4MB) +in.seek(2MB) +in.read(buffer, 0, 4MB) +``` + + +The `CachingInputStream` also caches prefetched blocks. This happens when a `seek()` is issued for outside the current block and the current block still has not been fully read. + +For the above read sequence, when the `seek(10MB)` call is issued, block 0 has not been read completely so we should cache it as we will probably want to read from it again. + +When `seek(2MB)` is called, the position is back inside block 0. The next read can now be satisfied from the locally cached block, which is typically orders of magnitude faster than a network based read. + + From e5e9ea3075349df41fb52b5ae48b9f9cc3ed2d78 Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Wed, 20 Apr 2022 13:50:06 +0100 Subject: [PATCH 2/8] fixes formatting errors --- .../markdown/tools/hadoop-aws/prefetching.md | 22 +++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md index 6a976749a482a..b21d150e3363b 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -1,3 +1,17 @@ + + # S3A Prefetching @@ -15,11 +29,11 @@ With prefetching, we divide the file into blocks of a fixed size (default is 8MB ### Configuring the stream -|Property |Meaning |Default | +|Property |Meaning |Default | |--- |--- |--- | -|fs.s3a.prefetch.enabled |Enable the prefetch input stream |TRUE | -|fs.s3a.prefetch.block.size |Size of a block |8MB | -|fs.s3a.prefetch.block.count |Number of blocks to prefetch |8 | +|fs.s3a.prefetch.enabled |Enable the prefetch input stream |TRUE | +|fs.s3a.prefetch.block.size |Size of a block |8MB | +|fs.s3a.prefetch.block.count |Number of blocks to prefetch |8 | ### Key Components: From 11d33c3d7ece25246642773efa3583c97eef10f7 Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Mon, 25 Apr 2022 17:31:55 +0100 Subject: [PATCH 3/8] updates doc as per review comments --- .../markdown/tools/hadoop-aws/prefetching.md | 152 +++++++++++------- 1 file changed, 94 insertions(+), 58 deletions(-) diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md index b21d150e3363b..ea0ad2e91678c 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -14,94 +14,114 @@ # S3A Prefetching - This document explains the `S3PrefetchingInputStream` and the various components it uses. -This input stream implements prefetching and caching to improve read performance of the input stream. A high level overview of this feature can also be found on [this](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0) blogpost. +This input stream implements prefetching and caching to improve read performance of the input +stream. A high level overview of this feature was published in +[Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0) +blogpost. -With prefetching, we divide the file into blocks of a fixed size (default is 8MB), associate buffers to these blocks, and then read data into these buffers asynchronously. We also potentially cache these blocks. +With prefetching, the input stream divides the remote file into blocks of a fixed size, associates +buffers to these blocks and then reads data into these buffers asynchronously. It also potentially +caches these blocks. ### Basic Concepts -* **File** : A binary blob of data stored on some storage device. -* **Block :** A file is divided into a number of blocks. The default size of a block is 8MB, but can be configured. The size of the first n-1 blocks is same, and the size of the last block may be same or smaller. -* **Block based reading** : The granularity of read is one block. That is, we read an entire block and return or none at all. Multiple blocks may be read in parallel. +* **Remote File**: A binary blob of data stored on some storage device. +* **Block**: A file is divided into a number of blocks. The size of the first n-1 blocks is same, + and the size of the last block may be same or smaller. +* **Block based reading**: The granularity of read is one block. That is, either an entire block is + read and returned or none at all. Multiple blocks may be read in parallel. ### Configuring the stream |Property |Meaning |Default | |--- |--- |--- | -|fs.s3a.prefetch.enabled |Enable the prefetch input stream |TRUE | -|fs.s3a.prefetch.block.size |Size of a block |8MB | -|fs.s3a.prefetch.block.count |Number of blocks to prefetch |8 | +|fs.s3a.prefetch.enabled |Enable the prefetch input stream |`true` | +|fs.s3a.prefetch.block.size |Size of a block |`8M` | +|fs.s3a.prefetch.block.count |Number of blocks to prefetch |`8` | -### Key Components: +### Key Components -`S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will return an instance of this class as the input stream. Depending on the file size, it will either use the `S3InMemoryInputStream` or the `S3CachingInputStream` as the underlying input stream. +`S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will return an instance of +this class as the input stream. Depending on the remote file size, it will either use +the `S3InMemoryInputStream` or the `S3CachingInputStream` as the underlying input stream. -`S3InMemoryInputStream` - Underlying input stream used when the file size < configured block size. Will read the entire file into memory. +`S3InMemoryInputStream` - Underlying input stream used when the remote file size < configured block +size. Will read the entire remote file into memory. -`S3CachingInputStream` - Underlying input stream used when file size > configured block size. Uses asynchronous prefetching of blocks and caching to improve performance. +`S3CachingInputStream` - Underlying input stream used when remote file size > configured block size. +Uses asynchronous prefetching of blocks and caching to improve performance. -`BlockData` - Holds information about the blocks in a file, such as: +`BlockData` - Holds information about the blocks in a remote file, such as: -* Number of blocks in the file +* Number of blocks in the remote file * Block size -* State of each block (initially all blocks have state *NOT_READY*). Other states are: Queued, Ready, Cached. +* State of each block (initially all blocks have state *NOT_READY*). Other states are: Queued, + Ready, Cached. `BufferData` - Holds the buffer and additional information about it such as: * The block number this buffer is for -* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). Initial state of a buffer is blank. +* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). Initial state of a buffer + is blank. `CachingBlockManager` - Implements reading data into the buffer, prefetching and caching. -`BufferPool` - Manages a fixed sized pool of buffers. It’s used by `CachingBlockManager` to acquire buffers. +`BufferPool` - Manages a fixed sized pool of buffers. It’s used by `CachingBlockManager` to acquire +buffers. -`S3File` - Implements operations to interact with S3 such as opening and closing the input stream to the S3 file. +`S3File` - Implements operations to interact with S3 such as opening and closing the input stream to +the remote file in S3. -`S3Reader` - Implements reading from the stream opened by `S3File`. Reads from this input stream in blocks of 64KB. +`S3Reader` - Implements reading from the stream opened by `S3File`. Reads from this input stream in +blocks of 64KB. -`FilePosition` - Provides functionality related to tracking the position in the file. Also gives access to the current buffer in use. +`FilePosition` - Provides functionality related to tracking the position in the file. Also gives +access to the current buffer in use. -`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. Each cache block is stored on the local disk as a separate file. +`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. Each cache +block is stored on the local disk as a separate file. ### Operation -### S3InMemoryInputStream +#### S3InMemoryInputStream -If we have a file with size 5MB, and block size = 8MB. Since file size is less than the block size, the `S3InMemoryInputStream` will be used. +For a remote file with size 5MB, and block size = 8MB, since file size is less than the block size, +the `S3InMemoryInputStream` will be used. If the caller makes the following read calls: - ``` in.read(buffer, 0, 3MB); in.read(buffer, 0, 2MB); ``` -When the first read is issued, there is no buffer in use yet. We get the data in this file by calling the `ensureCurrentBuffer()` method, which ensures that a buffer with data is available to be read from. +When the first read is issued, there is no buffer in use yet. The `S3InMemoryInputStream` gets the +data in this remote file by calling the `ensureCurrentBuffer()` method, which ensures that a buffer +with data is available to be read from. The `ensureCurrentBuffer()` then: -* Reads data into a buffer by calling `S3Reader.read(ByteBuffer buffer, long offset, int size)` -* `S3Reader` uses `S3File` to open an input stream to the S3 file by making a `getObject()` request with range as `(0, filesize)`. -* The S3Reader reads the entire file into the provided buffer, and once reading is complete closes the S3 stream and frees all underlying resources. -* Now the entire file is in a buffer, set this data in `FilePosition` so it can be accessed by the input stream. +* Reads data into a buffer by calling `S3Reader.read(ByteBuffer buffer, long offset, int size)`. +* `S3Reader` uses `S3File` to open an input stream to the remote file in S3 by making + a `getObject()` request with range as `(0, filesize)`. +* The `S3Reader` reads the entire remote file into the provided buffer, and once reading is complete + closes the S3 stream and frees all underlying resources. +* Now the entire remote file is in a buffer, set this data in `FilePosition` so it can be accessed + by the input stream. The read operation now just gets the required bytes from the buffer in `FilePosition`. -When the second read is issued, there is already a valid buffer which can be used. Don’t do anything else, just read the required bytes from this buffer. - -### S3CachingInputStream +When the second read is issued, there is already a valid buffer which can be used. Don’t do anything +else, just read the required bytes from this buffer. +#### S3CachingInputStream +If there is a remote file with size 40MB and block size = 8MB, the `S3CachingInputStream` will be +used. -[Image: image.png] - -Now, if we have a file with size 40MB, and block size = 8MB. The `S3CachingInputStream` will be used. - -### Sequential Reads +##### Sequential Reads If the caller makes the following calls: @@ -110,29 +130,42 @@ in.read(buffer, 0, 5MB) in.read(buffer, 0, 8MB) ``` -For the first read call, there is no valid buffer yet. `ensureCurrentBuffer()` is called, and for the first read(), prefetch count is set as 1. - -The current block (block 0) is read synchronously, while the blocks to be prefetched (block 1) is read asynchronously. +For the first read call, there is no valid buffer yet. `ensureCurrentBuffer()` is called, and for +the first `read()`, prefetch count is set as 1. -The `CachingBlockManager` is responsible for getting buffers from the buffer pool and reading data into them. This process of acquiring the buffer pool works as follows: +The current block (block 0) is read synchronously, while the blocks to be prefetched (block 1) is +read asynchronously. +The `CachingBlockManager` is responsible for getting buffers from the buffer pool and reading data +into them. This process of acquiring the buffer pool works as follows: -* The buffer pool keeps a map of allocated buffers and a pool of available buffers. The size of this pool is = prefetch block count + 1. For the default value of 8, we will have a buffer pool of size 9. +* The buffer pool keeps a map of allocated buffers and a pool of available buffers. The size of this + pool is = prefetch block count + 1. If the prefetch block count is 8, the buffer pool has a size + of 9. * If the pool is not yet at capacity, create a new buffer and add it to the pool. -* If it’s at capacity, check if any buffers with state = done can be released. Releasing a buffer means removing it from allocated and returning it back to the pool of available buffers. -* If we have no buffers with state = done currently then nothing will be released, so retry the above step at a fixed interval a few times till a buffer becomes available. -* If after multiple retries we still don’t have an available buffer, release a buffer in the ready state. The buffer for the block furthest from the current block is released. - -Once a buffer has been acquired by `CachingBlockManager`, if the buffer is in a *READY* state, we can just return it. This means that data was already read into this buffer asynchronously by a prefetch. If it’s state is *BLANK,* then data is read into it using `S3Reader.read(ByteBuffer buffer, long offset, int size).` - -For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and we have only read 5MB of block 0 so far, 3MB of the required data will be read from the current block 0. Once we’re done with this block, we’ll request the next block (block 1), which will already have been prefetched and so we can just start reading from it. Also, while reading from block 1 we will also issue prefetch requests for the next blocks. The number of blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`, with the default being 8. - - -### Random Reads +* If it’s at capacity, check if any buffers with state = done can be released. Releasing a buffer + means removing it from allocated and returning it back to the pool of available buffers. +* If there are no buffers with state = done currently then nothing will be released, so retry the + above step at a fixed interval a few times till a buffer becomes available. +* If after multiple retries there are still no available buffers, release a buffer in the ready + state. The buffer for the block furthest from the current block is released. + +Once a buffer has been acquired by `CachingBlockManager`, if the buffer is in a *READY* state, it is +returned. This means that data was already read into this buffer asynchronously by a prefetch. If +it’s state is *BLANK,* then data is read into it +using `S3Reader.read(ByteBuffer buffer, long offset, int size).` + +For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and only `5MB` +of block 0 has been read so far, 3MB of the required data will be read from the current block 0. +Once all data has been read from this block, `S3CachingInputStream` requests the next block ( +block 1), which will already have been prefetched and so it can just start reading from it. Also, +while reading from block 1 it will also issue prefetch requests for the next blocks. The number of +blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`. + +##### Random Reads If the caller makes the following calls: - ``` in.read(buffer, 0, 5MB) in.seek(10MB) @@ -141,11 +174,14 @@ in.seek(2MB) in.read(buffer, 0, 4MB) ``` +The `CachingInputStream` also caches prefetched blocks. This happens when a `seek()` is issued for +outside the current block and the current block still has not been fully read. -The `CachingInputStream` also caches prefetched blocks. This happens when a `seek()` is issued for outside the current block and the current block still has not been fully read. - -For the above read sequence, when the `seek(10MB)` call is issued, block 0 has not been read completely so we should cache it as we will probably want to read from it again. +For the above read sequence, when the `seek(10MB)` call is issued, block 0 has not been read +completely so cache it as the caller will probably want to read from it again. -When `seek(2MB)` is called, the position is back inside block 0. The next read can now be satisfied from the locally cached block, which is typically orders of magnitude faster than a network based read. +When `seek(2MB)` is called, the position is back inside block 0. The next read can now be satisfied +from the locally cached block, which is typically orders of magnitude faster than a network based +read. From 24380d95f1da4aab12af6183f4302e513e87f6dc Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Mon, 25 Apr 2022 17:46:05 +0100 Subject: [PATCH 4/8] fixes typo --- .../src/site/markdown/tools/hadoop-aws/prefetching.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md index ea0ad2e91678c..eb52ac57a9564 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -19,7 +19,7 @@ This document explains the `S3PrefetchingInputStream` and the various components This input stream implements prefetching and caching to improve read performance of the input stream. A high level overview of this feature was published in [Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0) -blogpost. +. With prefetching, the input stream divides the remote file into blocks of a fixed size, associates buffers to these blocks and then reads data into these buffers asynchronously. It also potentially From ba1d26a0f77b6d3795cdeb923c1403dfa31e7ded Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Tue, 26 Apr 2022 10:01:55 +0100 Subject: [PATCH 5/8] adds in block file description --- .../src/site/markdown/tools/hadoop-aws/prefetching.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md index eb52ac57a9564..432454ab06d6d 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -28,6 +28,7 @@ caches these blocks. ### Basic Concepts * **Remote File**: A binary blob of data stored on some storage device. +* **Block File**: Local file containing a block of the remote file. * **Block**: A file is divided into a number of blocks. The size of the first n-1 blocks is same, and the size of the last block may be same or smaller. * **Block based reading**: The granularity of read is one block. That is, either an entire block is @@ -81,7 +82,7 @@ blocks of 64KB. access to the current buffer in use. `SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. Each cache -block is stored on the local disk as a separate file. +block is stored on the local disk as a separate block file. ### Operation @@ -181,7 +182,5 @@ For the above read sequence, when the `seek(10MB)` call is issued, block 0 has n completely so cache it as the caller will probably want to read from it again. When `seek(2MB)` is called, the position is back inside block 0. The next read can now be satisfied -from the locally cached block, which is typically orders of magnitude faster than a network based -read. - - +from the locally cached block file, which is typically orders of magnitude faster than a network +based read. \ No newline at end of file From 125152ef64d24083abb7308978302285c1720fb6 Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Tue, 26 Apr 2022 10:41:01 +0100 Subject: [PATCH 6/8] updates formatting --- .../markdown/tools/hadoop-aws/prefetching.md | 110 +++++++++--------- 1 file changed, 58 insertions(+), 52 deletions(-) diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md index 432454ab06d6d..be081c36da879 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -17,39 +17,42 @@ This document explains the `S3PrefetchingInputStream` and the various components it uses. This input stream implements prefetching and caching to improve read performance of the input -stream. A high level overview of this feature was published in -[Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0) -. +stream. +A high level overview of this feature was published in +[Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0). With prefetching, the input stream divides the remote file into blocks of a fixed size, associates -buffers to these blocks and then reads data into these buffers asynchronously. It also potentially -caches these blocks. +buffers to these blocks and then reads data into these buffers asynchronously. +It also potentially caches these blocks. ### Basic Concepts * **Remote File**: A binary blob of data stored on some storage device. * **Block File**: Local file containing a block of the remote file. -* **Block**: A file is divided into a number of blocks. The size of the first n-1 blocks is same, - and the size of the last block may be same or smaller. -* **Block based reading**: The granularity of read is one block. That is, either an entire block is - read and returned or none at all. Multiple blocks may be read in parallel. +* **Block**: A file is divided into a number of blocks. +The size of the first n-1 blocks is same, and the size of the last block may be same or smaller. +* **Block based reading**: The granularity of read is one block. +That is, either an entire block is read and returned or none at all. +Multiple blocks may be read in parallel. ### Configuring the stream |Property |Meaning |Default | |--- |--- |--- | -|fs.s3a.prefetch.enabled |Enable the prefetch input stream |`true` | -|fs.s3a.prefetch.block.size |Size of a block |`8M` | -|fs.s3a.prefetch.block.count |Number of blocks to prefetch |`8` | +|`fs.s3a.prefetch.enabled` |Enable the prefetch input stream |`true` | +|`fs.s3a.prefetch.block.size` |Size of a block |`8M` | +|`fs.s3a.prefetch.block.count` |Number of blocks to prefetch |`8` | ### Key Components `S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will return an instance of -this class as the input stream. Depending on the remote file size, it will either use +this class as the input stream. +Depending on the remote file size, it will either use the `S3InMemoryInputStream` or the `S3CachingInputStream` as the underlying input stream. `S3InMemoryInputStream` - Underlying input stream used when the remote file size < configured block -size. Will read the entire remote file into memory. +size. +Will read the entire remote file into memory. `S3CachingInputStream` - Underlying input stream used when remote file size > configured block size. Uses asynchronous prefetching of blocks and caching to improve performance. @@ -58,31 +61,31 @@ Uses asynchronous prefetching of blocks and caching to improve performance. * Number of blocks in the remote file * Block size -* State of each block (initially all blocks have state *NOT_READY*). Other states are: Queued, - Ready, Cached. +* State of each block (initially all blocks have state *NOT_READY*). +Other states are: Queued, Ready, Cached. `BufferData` - Holds the buffer and additional information about it such as: * The block number this buffer is for -* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). Initial state of a buffer - is blank. +* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). +Initial state of a buffer is blank. `CachingBlockManager` - Implements reading data into the buffer, prefetching and caching. -`BufferPool` - Manages a fixed sized pool of buffers. It’s used by `CachingBlockManager` to acquire -buffers. +`BufferPool` - Manages a fixed sized pool of buffers. +It’s used by `CachingBlockManager` to acquire buffers. `S3File` - Implements operations to interact with S3 such as opening and closing the input stream to the remote file in S3. -`S3Reader` - Implements reading from the stream opened by `S3File`. Reads from this input stream in -blocks of 64KB. +`S3Reader` - Implements reading from the stream opened by `S3File`. +Reads from this input stream in blocks of 64KB. -`FilePosition` - Provides functionality related to tracking the position in the file. Also gives -access to the current buffer in use. +`FilePosition` - Provides functionality related to tracking the position in the file. +Also gives access to the current buffer in use. -`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. Each cache -block is stored on the local disk as a separate block file. +`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. +Each cache block is stored on the local disk as a separate block file. ### Operation @@ -98,9 +101,9 @@ in.read(buffer, 0, 3MB); in.read(buffer, 0, 2MB); ``` -When the first read is issued, there is no buffer in use yet. The `S3InMemoryInputStream` gets the -data in this remote file by calling the `ensureCurrentBuffer()` method, which ensures that a buffer -with data is available to be read from. +When the first read is issued, there is no buffer in use yet. +The `S3InMemoryInputStream` gets the data in this remote file by calling the `ensureCurrentBuffer()` +method, which ensures that a buffer with data is available to be read from. The `ensureCurrentBuffer()` then: @@ -114,8 +117,8 @@ The `ensureCurrentBuffer()` then: The read operation now just gets the required bytes from the buffer in `FilePosition`. -When the second read is issued, there is already a valid buffer which can be used. Don’t do anything -else, just read the required bytes from this buffer. +When the second read is issued, there is already a valid buffer which can be used. +Don’t do anything else, just read the required bytes from this buffer. #### S3CachingInputStream @@ -131,8 +134,8 @@ in.read(buffer, 0, 5MB) in.read(buffer, 0, 8MB) ``` -For the first read call, there is no valid buffer yet. `ensureCurrentBuffer()` is called, and for -the first `read()`, prefetch count is set as 1. +For the first read call, there is no valid buffer yet. +`ensureCurrentBuffer()` is called, and for the first `read()`, prefetch count is set as 1. The current block (block 0) is read synchronously, while the blocks to be prefetched (block 1) is read asynchronously. @@ -140,28 +143,30 @@ read asynchronously. The `CachingBlockManager` is responsible for getting buffers from the buffer pool and reading data into them. This process of acquiring the buffer pool works as follows: -* The buffer pool keeps a map of allocated buffers and a pool of available buffers. The size of this - pool is = prefetch block count + 1. If the prefetch block count is 8, the buffer pool has a size - of 9. +* The buffer pool keeps a map of allocated buffers and a pool of available buffers. +The size of this pool is = prefetch block count + 1. +If the prefetch block count is 8, the buffer pool has a size of 9. * If the pool is not yet at capacity, create a new buffer and add it to the pool. -* If it’s at capacity, check if any buffers with state = done can be released. Releasing a buffer - means removing it from allocated and returning it back to the pool of available buffers. +* If it’s at capacity, check if any buffers with state = done can be released. +Releasing a buffer means removing it from allocated and returning it back to the pool of available +buffers. * If there are no buffers with state = done currently then nothing will be released, so retry the above step at a fixed interval a few times till a buffer becomes available. -* If after multiple retries there are still no available buffers, release a buffer in the ready - state. The buffer for the block furthest from the current block is released. +* If after multiple retries there are still no available buffers, release a buffer in the ready state. +The buffer for the block furthest from the current block is released. Once a buffer has been acquired by `CachingBlockManager`, if the buffer is in a *READY* state, it is -returned. This means that data was already read into this buffer asynchronously by a prefetch. If -it’s state is *BLANK,* then data is read into it -using `S3Reader.read(ByteBuffer buffer, long offset, int size).` +returned. +This means that data was already read into this buffer asynchronously by a prefetch. +If it’s state is *BLANK,* then data is read into it using +`S3Reader.read(ByteBuffer buffer, long offset, int size).` -For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and only `5MB` +For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and only 5MB of block 0 has been read so far, 3MB of the required data will be read from the current block 0. Once all data has been read from this block, `S3CachingInputStream` requests the next block ( -block 1), which will already have been prefetched and so it can just start reading from it. Also, -while reading from block 1 it will also issue prefetch requests for the next blocks. The number of -blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`. +block 1), which will already have been prefetched and so it can just start reading from it. +Also, while reading from block 1 it will also issue prefetch requests for the next blocks. +The number of blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`. ##### Random Reads @@ -175,12 +180,13 @@ in.seek(2MB) in.read(buffer, 0, 4MB) ``` -The `CachingInputStream` also caches prefetched blocks. This happens when a `seek()` is issued for -outside the current block and the current block still has not been fully read. +The `CachingInputStream` also caches prefetched blocks. +This happens when a `seek()` is issued for outside the current block and the current block still has +not been fully read. For the above read sequence, when the `seek(10MB)` call is issued, block 0 has not been read completely so cache it as the caller will probably want to read from it again. -When `seek(2MB)` is called, the position is back inside block 0. The next read can now be satisfied -from the locally cached block file, which is typically orders of magnitude faster than a network -based read. \ No newline at end of file +When `seek(2MB)` is called, the position is back inside block 0. +The next read can now be satisfied from the locally cached block file, which is typically orders of +magnitude faster than a network based read. \ No newline at end of file From 658396acb384ba50103e240e890f31cf1355388a Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Tue, 26 Apr 2022 13:13:40 +0100 Subject: [PATCH 7/8] fixes yetus errors --- .../markdown/tools/hadoop-aws/prefetching.md | 62 +++++++++---------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md index be081c36da879..d1d735b6922f1 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -22,23 +22,23 @@ A high level overview of this feature was published in [Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0). With prefetching, the input stream divides the remote file into blocks of a fixed size, associates -buffers to these blocks and then reads data into these buffers asynchronously. +buffers to these blocks and then reads data into these buffers asynchronously. It also potentially caches these blocks. ### Basic Concepts * **Remote File**: A binary blob of data stored on some storage device. * **Block File**: Local file containing a block of the remote file. -* **Block**: A file is divided into a number of blocks. +* **Block**: A file is divided into a number of blocks. The size of the first n-1 blocks is same, and the size of the last block may be same or smaller. -* **Block based reading**: The granularity of read is one block. -That is, either an entire block is read and returned or none at all. +* **Block based reading**: The granularity of read is one block. +That is, either an entire block is read and returned or none at all. Multiple blocks may be read in parallel. ### Configuring the stream |Property |Meaning |Default | -|--- |--- |--- | +|---|---|---| |`fs.s3a.prefetch.enabled` |Enable the prefetch input stream |`true` | |`fs.s3a.prefetch.block.size` |Size of a block |`8M` | |`fs.s3a.prefetch.block.count` |Number of blocks to prefetch |`8` | @@ -46,12 +46,12 @@ Multiple blocks may be read in parallel. ### Key Components `S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will return an instance of -this class as the input stream. +this class as the input stream. Depending on the remote file size, it will either use the `S3InMemoryInputStream` or the `S3CachingInputStream` as the underlying input stream. `S3InMemoryInputStream` - Underlying input stream used when the remote file size < configured block -size. +size. Will read the entire remote file into memory. `S3CachingInputStream` - Underlying input stream used when remote file size > configured block size. @@ -61,30 +61,30 @@ Uses asynchronous prefetching of blocks and caching to improve performance. * Number of blocks in the remote file * Block size -* State of each block (initially all blocks have state *NOT_READY*). +* State of each block (initially all blocks have state *NOT_READY*). Other states are: Queued, Ready, Cached. `BufferData` - Holds the buffer and additional information about it such as: * The block number this buffer is for -* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). +* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). Initial state of a buffer is blank. `CachingBlockManager` - Implements reading data into the buffer, prefetching and caching. -`BufferPool` - Manages a fixed sized pool of buffers. +`BufferPool` - Manages a fixed sized pool of buffers. It’s used by `CachingBlockManager` to acquire buffers. `S3File` - Implements operations to interact with S3 such as opening and closing the input stream to the remote file in S3. -`S3Reader` - Implements reading from the stream opened by `S3File`. +`S3Reader` - Implements reading from the stream opened by `S3File`. Reads from this input stream in blocks of 64KB. -`FilePosition` - Provides functionality related to tracking the position in the file. +`FilePosition` - Provides functionality related to tracking the position in the file. Also gives access to the current buffer in use. -`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. +`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. Each cache block is stored on the local disk as a separate block file. ### Operation @@ -101,8 +101,8 @@ in.read(buffer, 0, 3MB); in.read(buffer, 0, 2MB); ``` -When the first read is issued, there is no buffer in use yet. -The `S3InMemoryInputStream` gets the data in this remote file by calling the `ensureCurrentBuffer()` +When the first read is issued, there is no buffer in use yet. +The `S3InMemoryInputStream` gets the data in this remote file by calling the `ensureCurrentBuffer()` method, which ensures that a buffer with data is available to be read from. The `ensureCurrentBuffer()` then: @@ -117,7 +117,7 @@ The `ensureCurrentBuffer()` then: The read operation now just gets the required bytes from the buffer in `FilePosition`. -When the second read is issued, there is already a valid buffer which can be used. +When the second read is issued, there is already a valid buffer which can be used. Don’t do anything else, just read the required bytes from this buffer. #### S3CachingInputStream @@ -134,7 +134,7 @@ in.read(buffer, 0, 5MB) in.read(buffer, 0, 8MB) ``` -For the first read call, there is no valid buffer yet. +For the first read call, there is no valid buffer yet. `ensureCurrentBuffer()` is called, and for the first `read()`, prefetch count is set as 1. The current block (block 0) is read synchronously, while the blocks to be prefetched (block 1) is @@ -143,29 +143,29 @@ read asynchronously. The `CachingBlockManager` is responsible for getting buffers from the buffer pool and reading data into them. This process of acquiring the buffer pool works as follows: -* The buffer pool keeps a map of allocated buffers and a pool of available buffers. -The size of this pool is = prefetch block count + 1. +* The buffer pool keeps a map of allocated buffers and a pool of available buffers. +The size of this pool is = prefetch block count + 1. If the prefetch block count is 8, the buffer pool has a size of 9. * If the pool is not yet at capacity, create a new buffer and add it to the pool. -* If it’s at capacity, check if any buffers with state = done can be released. -Releasing a buffer means removing it from allocated and returning it back to the pool of available +* If it's at capacity, check if any buffers with state = done can be released. +Releasing a buffer means removing it from allocated and returning it back to the pool of available buffers. * If there are no buffers with state = done currently then nothing will be released, so retry the above step at a fixed interval a few times till a buffer becomes available. -* If after multiple retries there are still no available buffers, release a buffer in the ready state. +* If after multiple retries there are still no available buffers, release a buffer in the ready state. The buffer for the block furthest from the current block is released. Once a buffer has been acquired by `CachingBlockManager`, if the buffer is in a *READY* state, it is -returned. -This means that data was already read into this buffer asynchronously by a prefetch. -If it’s state is *BLANK,* then data is read into it using +returned. +This means that data was already read into this buffer asynchronously by a prefetch. +If it's state is *BLANK* then data is read into it using `S3Reader.read(ByteBuffer buffer, long offset, int size).` For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and only 5MB of block 0 has been read so far, 3MB of the required data will be read from the current block 0. Once all data has been read from this block, `S3CachingInputStream` requests the next block ( -block 1), which will already have been prefetched and so it can just start reading from it. -Also, while reading from block 1 it will also issue prefetch requests for the next blocks. +block 1), which will already have been prefetched and so it can just start reading from it. +Also, while reading from block 1 it will also issue prefetch requests for the next blocks. The number of blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`. ##### Random Reads @@ -180,13 +180,13 @@ in.seek(2MB) in.read(buffer, 0, 4MB) ``` -The `CachingInputStream` also caches prefetched blocks. -This happens when a `seek()` is issued for outside the current block and the current block still has +The `CachingInputStream` also caches prefetched blocks. +This happens when a `seek()` is issued for outside the current block and the current block still has not been fully read. For the above read sequence, when the `seek(10MB)` call is issued, block 0 has not been read completely so cache it as the caller will probably want to read from it again. -When `seek(2MB)` is called, the position is back inside block 0. -The next read can now be satisfied from the locally cached block file, which is typically orders of +When `seek(2MB)` is called, the position is back inside block 0. +The next read can now be satisfied from the locally cached block file, which is typically orders of magnitude faster than a network based read. \ No newline at end of file From 9558361263d879b0bef5f452528dca5f18abdc15 Mon Sep 17 00:00:00 2001 From: Ahmar Suhail Date: Tue, 26 Apr 2022 13:47:06 +0100 Subject: [PATCH 8/8] update quotation marks --- .../src/site/markdown/tools/hadoop-aws/prefetching.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md index d1d735b6922f1..db1a52b88e711 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md @@ -73,7 +73,7 @@ Initial state of a buffer is blank. `CachingBlockManager` - Implements reading data into the buffer, prefetching and caching. `BufferPool` - Manages a fixed sized pool of buffers. -It’s used by `CachingBlockManager` to acquire buffers. +It's used by `CachingBlockManager` to acquire buffers. `S3File` - Implements operations to interact with S3 such as opening and closing the input stream to the remote file in S3. @@ -118,7 +118,7 @@ The `ensureCurrentBuffer()` then: The read operation now just gets the required bytes from the buffer in `FilePosition`. When the second read is issued, there is already a valid buffer which can be used. -Don’t do anything else, just read the required bytes from this buffer. +Don't do anything else, just read the required bytes from this buffer. #### S3CachingInputStream