Skip to content

Checksum validation of archives (#2281) #3546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Oct 3, 2022
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
1fefabe
Add CRC32C checksum methods from sigurdm's branch
nehalvpatel Sep 1, 2022
8c3bffc
Add CRC32C checksum validation for packages served from GCS
nehalvpatel Sep 1, 2022
7755d07
Replace stream_transform dependency with compact implementation
nehalvpatel Sep 2, 2022
7d84f89
Update lib/src/source/hosted.dart
nehalvpatel Sep 6, 2022
999d804
Update lib/src/source/hosted.dart
nehalvpatel Sep 6, 2022
3c6eed1
Update lib/src/source/hosted.dart
nehalvpatel Sep 6, 2022
25fa20e
Recompose functionality without extension
nehalvpatel Sep 12, 2022
415481b
Add cache-control request header to golden test log
nehalvpatel Sep 12, 2022
dfabdac
Add basic checksum validation tests and ability to pass arbitrary res…
nehalvpatel Sep 16, 2022
f49da42
Add checksum header and log line to golden test log
nehalvpatel Sep 16, 2022
e5240d4
Add tests for invalid checksum headers
nehalvpatel Sep 16, 2022
6b451bb
Add file name to checksum IO log
nehalvpatel Sep 16, 2022
b38352f
Document header in archive request
nehalvpatel Sep 19, 2022
aba4b79
Improve error handling, logging, and tests
nehalvpatel Sep 19, 2022
499e4ef
Retry downloads after checksum validation errors
nehalvpatel Sep 20, 2022
d505b29
Simplify checksum logic and improve error handling
nehalvpatel Sep 20, 2022
0d1b87f
Use max retries environment variable
nehalvpatel Sep 21, 2022
f1d40d5
Improve tests
nehalvpatel Sep 22, 2022
0265897
Test CRC32C checksum related retries
nehalvpatel Sep 22, 2022
2656135
Swap request header line to "accept-encoding" in golden log
nehalvpatel Sep 22, 2022
cb5d6ef
Fix CRC32C fine log RegExp in golden log test
nehalvpatel Sep 22, 2022
6a12e42
Remove explicit Accept-Encoding request header because it is included…
nehalvpatel Sep 22, 2022
79f5a9b
Add test for redundant gzip compression
nehalvpatel Sep 27, 2022
9b0e5ec
Update Content-Type response headers to match Hosted Pub Repository spec
nehalvpatel Sep 27, 2022
ce3a0ec
Update lib/src/source/hosted.dart
nehalvpatel Sep 27, 2022
8e44bbb
Update documentation for autoCompress
nehalvpatel Sep 29, 2022
0ed432d
Update error message for checksum mismatch
nehalvpatel Sep 29, 2022
63e5f29
Add visibleForTesting annotation to checksumHeaderName
nehalvpatel Sep 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions lib/src/crc32c.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
// Copyright (c) 2022, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

/// Computes a crc32c checksum.
class Crc32c {
int _current = mask;
static const mask = 0xFFFFFFFF;

// Algorithm based on https://en.wikipedia.org/wiki/Cyclic_redundancy_check
void update(List<int> data) {
for (var i = 0; i < data.length; i++) {
final lookupIndex = (_current ^ data[i]) & 0xff;
_current = (_current >> 8) ^ _crcTable[lookupIndex];
}
}

int finalize() {
// Finalize the CRC-32 value by inverting all the bits
return _current ^ mask & mask;
}

/// Consumes the entirety of "stream" and returns the CRC32C checksum of its
/// data once the stream is finished.
static Future<int> computeByConsumingStream(Stream<List<int>> stream) async {
final checksumComputer = Crc32c();

await for (final chunk in stream) {
checksumComputer.update(chunk);
}

return checksumComputer.finalize();
}
}

// Generated by ./pycrc.py --algorithm=table-driven --model=crc-32c --generate=c
// See: https://pycrc.org/
const _crcTable = [
0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4, //
0xc79a971f, 0x35f1141c, 0x26a1e7e8, 0xd4ca64eb,
0x8ad958cf, 0x78b2dbcc, 0x6be22838, 0x9989ab3b,
0x4d43cfd0, 0xbf284cd3, 0xac78bf27, 0x5e133c24,
0x105ec76f, 0xe235446c, 0xf165b798, 0x030e349b,
0xd7c45070, 0x25afd373, 0x36ff2087, 0xc494a384,
0x9a879fa0, 0x68ec1ca3, 0x7bbcef57, 0x89d76c54,
0x5d1d08bf, 0xaf768bbc, 0xbc267848, 0x4e4dfb4b,
0x20bd8ede, 0xd2d60ddd, 0xc186fe29, 0x33ed7d2a,
0xe72719c1, 0x154c9ac2, 0x061c6936, 0xf477ea35,
0xaa64d611, 0x580f5512, 0x4b5fa6e6, 0xb93425e5,
0x6dfe410e, 0x9f95c20d, 0x8cc531f9, 0x7eaeb2fa,
0x30e349b1, 0xc288cab2, 0xd1d83946, 0x23b3ba45,
0xf779deae, 0x05125dad, 0x1642ae59, 0xe4292d5a,
0xba3a117e, 0x4851927d, 0x5b016189, 0xa96ae28a,
0x7da08661, 0x8fcb0562, 0x9c9bf696, 0x6ef07595,
0x417b1dbc, 0xb3109ebf, 0xa0406d4b, 0x522bee48,
0x86e18aa3, 0x748a09a0, 0x67dafa54, 0x95b17957,
0xcba24573, 0x39c9c670, 0x2a993584, 0xd8f2b687,
0x0c38d26c, 0xfe53516f, 0xed03a29b, 0x1f682198,
0x5125dad3, 0xa34e59d0, 0xb01eaa24, 0x42752927,
0x96bf4dcc, 0x64d4cecf, 0x77843d3b, 0x85efbe38,
0xdbfc821c, 0x2997011f, 0x3ac7f2eb, 0xc8ac71e8,
0x1c661503, 0xee0d9600, 0xfd5d65f4, 0x0f36e6f7,
0x61c69362, 0x93ad1061, 0x80fde395, 0x72966096,
0xa65c047d, 0x5437877e, 0x4767748a, 0xb50cf789,
0xeb1fcbad, 0x197448ae, 0x0a24bb5a, 0xf84f3859,
0x2c855cb2, 0xdeeedfb1, 0xcdbe2c45, 0x3fd5af46,
0x7198540d, 0x83f3d70e, 0x90a324fa, 0x62c8a7f9,
0xb602c312, 0x44694011, 0x5739b3e5, 0xa55230e6,
0xfb410cc2, 0x092a8fc1, 0x1a7a7c35, 0xe811ff36,
0x3cdb9bdd, 0xceb018de, 0xdde0eb2a, 0x2f8b6829,
0x82f63b78, 0x709db87b, 0x63cd4b8f, 0x91a6c88c,
0x456cac67, 0xb7072f64, 0xa457dc90, 0x563c5f93,
0x082f63b7, 0xfa44e0b4, 0xe9141340, 0x1b7f9043,
0xcfb5f4a8, 0x3dde77ab, 0x2e8e845f, 0xdce5075c,
0x92a8fc17, 0x60c37f14, 0x73938ce0, 0x81f80fe3,
0x55326b08, 0xa759e80b, 0xb4091bff, 0x466298fc,
0x1871a4d8, 0xea1a27db, 0xf94ad42f, 0x0b21572c,
0xdfeb33c7, 0x2d80b0c4, 0x3ed04330, 0xccbbc033,
0xa24bb5a6, 0x502036a5, 0x4370c551, 0xb11b4652,
0x65d122b9, 0x97baa1ba, 0x84ea524e, 0x7681d14d,
0x2892ed69, 0xdaf96e6a, 0xc9a99d9e, 0x3bc21e9d,
0xef087a76, 0x1d63f975, 0x0e330a81, 0xfc588982,
0xb21572c9, 0x407ef1ca, 0x532e023e, 0xa145813d,
0x758fe5d6, 0x87e466d5, 0x94b49521, 0x66df1622,
0x38cc2a06, 0xcaa7a905, 0xd9f75af1, 0x2b9cd9f2,
0xff56bd19, 0x0d3d3e1a, 0x1e6dcdee, 0xec064eed,
0xc38d26c4, 0x31e6a5c7, 0x22b65633, 0xd0ddd530,
0x0417b1db, 0xf67c32d8, 0xe52cc12c, 0x1747422f,
0x49547e0b, 0xbb3ffd08, 0xa86f0efc, 0x5a048dff,
0x8ecee914, 0x7ca56a17, 0x6ff599e3, 0x9d9e1ae0,
0xd3d3e1ab, 0x21b862a8, 0x32e8915c, 0xc083125f,
0x144976b4, 0xe622f5b7, 0xf5720643, 0x07198540,
0x590ab964, 0xab613a67, 0xb831c993, 0x4a5a4a90,
0x9e902e7b, 0x6cfbad78, 0x7fab5e8c, 0x8dc0dd8f,
0xe330a81a, 0x115b2b19, 0x020bd8ed, 0xf0605bee,
0x24aa3f05, 0xd6c1bc06, 0xc5914ff2, 0x37faccf1,
0x69e9f0d5, 0x9b8273d6, 0x88d28022, 0x7ab90321,
0xae7367ca, 0x5c18e4c9, 0x4f48173d, 0xbd23943e,
0xf36e6f75, 0x0105ec76, 0x12551f82, 0xe03e9c81,
0x34f4f86a, 0xc69f7b69, 0xd5cf889d, 0x27a40b9e,
0x79b737ba, 0x8bdcb4b9, 0x988c474d, 0x6ae7c44e,
0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351
];
9 changes: 9 additions & 0 deletions lib/src/exceptions.dart
Original file line number Diff line number Diff line change
@@ -104,6 +104,15 @@ class PackageNotFoundException extends WrappedException {
String toString() => 'Package not available ($message).';
}

/// A class for exceptions where a package's checksum could not be validated.
class PackageIntegrityException extends WrappedException {
PackageIntegrityException(
String message, {
Object? innerError,
StackTrace? innerTrace,
}) : super(message, innerError, innerTrace);
}

/// Returns whether [error] is a user-facing error object.
///
/// This includes both [ApplicationException] and any dart:io errors.
2 changes: 1 addition & 1 deletion lib/src/io.dart
Original file line number Diff line number Diff line change
@@ -172,7 +172,7 @@ List<int> readBinaryFile(String file) {
}

/// Reads the contents of the binary file [file] as a [Stream].
Stream<List<int>> readBinaryFileAsSream(String file) {
Stream<List<int>> readBinaryFileAsStream(String file) {
log.io('Reading binary file $file.');
var contents = File(file).openRead();
return contents;
143 changes: 127 additions & 16 deletions lib/src/source/hosted.dart
Original file line number Diff line number Diff line change
@@ -5,15 +5,19 @@
import 'dart:async';
import 'dart:convert';
import 'dart:io' as io;
import 'dart:math' as math;
import 'dart:typed_data';

import 'package:collection/collection.dart'
show maxBy, IterableNullableExtension;
import 'package:http/http.dart' as http;
import 'package:meta/meta.dart';
import 'package:path/path.dart' as p;
import 'package:pub_semver/pub_semver.dart';
import 'package:stack_trace/stack_trace.dart';

import '../authentication/client.dart';
import '../crc32c.dart';
import '../exceptions.dart';
import '../http.dart';
import '../io.dart';
@@ -851,27 +855,53 @@ class HostedSource extends CachedSource {
'Package $packageName has no version $version');
}

var url = versionInfo.archiveUrl;
log.io('Get package from $url.');
final archiveUrl = versionInfo.archiveUrl;
log.io('Get package from $archiveUrl.');
log.message('Downloading ${log.bold(id.name)} ${id.version}...');

// Download and extract the archive to a temp directory.
await withTempDir((tempDirForArchive) async {
var archivePath =
p.join(tempDirForArchive, '$packageName-$version.tar.gz');
var response = await withAuthenticatedClient(
cache,
Uri.parse(description.url),
(client) => client.send(http.Request('GET', url)));

// We download the archive to disk instead of streaming it directly into
// the tar unpacking. This simplifies stream handling.
// Package:tar cancels the stream when it reaches end-of-archive, and
// cancelling a http stream makes it not reusable.
// There are ways around this, and we might revisit this later.
await createFileFromStream(response.stream, archivePath);
var fileName = '$packageName-$version.tar.gz';
var archivePath = p.join(tempDirForArchive, fileName);

// The client from `withAuthenticatedClient` will retry HTTP requests.
// This wrapper is one layer up and will retry checksum validation errors.
await retry(
// Attempt to download archive and validate its checksum.
() async {
final request = http.Request('GET', archiveUrl);
final response = await withAuthenticatedClient(cache,
Uri.parse(description.url), (client) => client.send(request));
final expectedChecksum = _parseCrc32c(response.headers, fileName);

Stream<List<int>> stream = response.stream;
if (expectedChecksum != null) {
stream = _validateStream(
response.stream, expectedChecksum, id, archiveUrl);
}

// We download the archive to disk instead of streaming it directly
// into the tar unpacking. This simplifies stream handling.
// Package:tar cancels the stream when it reaches end-of-archive, and
// cancelling a http stream makes it not reusable.
// There are ways around this, and we might revisit this later.
await createFileFromStream(stream, archivePath);
},
// Retry if the checksum response header was malformed or the actual
// checksum did not match the expected checksum.
retryIf: (e) => e is PackageIntegrityException,
onRetry: (e, retryCount) => log
.io('Retry #${retryCount + 1} because of checksum error with GET '
'$archiveUrl...'),
maxAttempts: math.max(
1, // Having less than 1 attempt doesn't make sense.
int.tryParse(io.Platform.environment['PUB_MAX_HTTP_RETRIES'] ?? '') ??
7,
),
);

var tempDir = cache.createTempDir();
await extractTarGz(readBinaryFileAsSream(archivePath), tempDir);
await extractTarGz(readBinaryFileAsStream(archivePath), tempDir);

// Now that the get has succeeded, move it to the real location in the
// cache.
@@ -1100,3 +1130,84 @@ class _RefAndCache {
@override
bool operator ==(Object other) => other is _RefAndCache && other.ref == ref;
}

@visibleForTesting
const checksumHeaderName = 'x-goog-hash';

/// Adds a checksum validation "tap" to the response stream and returns a
/// wrapped `Stream` object, which should be used to consume the incoming data.
///
/// As chunks are received, a CRC32C checksum is updated.
/// Once the download is completed, the final checksum is compared with
/// the one present in the checksum response header.
///
/// Throws [PackageIntegrityException] if there is a checksum mismatch.
Stream<List<int>> _validateStream(Stream<List<int>> stream,
int expectedChecksum, PackageId id, Uri archiveUrl) async* {
final crc32c = Crc32c();

await for (final chunk in stream) {
crc32c.update(chunk);
yield chunk;
}

final actualChecksum = crc32c.finalize();

log.fine(
'Computed checksum $actualChecksum for ${id.name} ${id.version} with '
'expected CRC32C of $expectedChecksum.');

if (actualChecksum != expectedChecksum) {
throw PackageIntegrityException(
'Package archive for ${id.name} ${id.version} downloaded from '
'"$archiveUrl" has "x-goog-hash: crc32c=$expectedChecksum", which '
'doesn\'t match the checksum of the archive downloaded.');
}
}

/// Parses response [headers] and returns the archive's CRC32C checksum.
///
/// In most cases, GCS provides both MD5 and CRC32C checksums in its response
/// headers. It uses the header name "x-goog-hash" for these values. It has
/// been documented and observed that GCS will send multiple response headers
/// with the same "x-goog-hash" token as the key.
/// https://cloud.google.com/storage/docs/xml-api/reference-headers#xgooghash
///
/// Additionally, when the Dart http client encounters multiple response
/// headers with the same key, it concatenates their values with a comma
/// before inserting a single item with that key and concatenated value into
/// its response "headers" Map.
/// See https://github.com/dart-lang/http/issues/24
/// https://github.com/dart-lang/http/blob/06649afbb5847dbb0293816ba8348766b116e419/pkgs/http/lib/src/base_response.dart#L29
///
/// Throws [PackageIntegrityException] if the CRC32C checksum cannot be parsed.
int? _parseCrc32c(Map<String, String> headers, String fileName) {
final checksumHeader = headers[checksumHeaderName];
if (checksumHeader == null) return null;

final parts = checksumHeader.split(',');
for (final part in parts) {
if (part.startsWith('crc32c=')) {
final undecoded = part.substring('crc32c='.length);

try {
final bytes = base64Decode(undecoded);

// CRC32C must be 32 bits, or 4 bytes.
if (bytes.length != 4) {
throw FormatException('CRC32C checksum has invalid length', bytes);
}

return ByteData.view(bytes.buffer).getUint32(0);
} on FormatException catch (e, s) {
throw PackageIntegrityException(
'Package archive "$fileName" has a malformed CRC32C checksum in '
'its response headers',
innerError: e,
innerTrace: s);
}
}
}

return null;
}
60 changes: 60 additions & 0 deletions lib/src/utils.dart
Original file line number Diff line number Diff line change
@@ -638,3 +638,63 @@ Map<K2, V2> mapMap<K1, V1, K2, V2>(
key(entry.key, entry.value): value(entry.key, entry.value),
};
}

/// Call [fn] retrying so long as [retryIf] return `true` for the exception
/// thrown, up-to [maxAttempts] times.
///
/// Defaults to 8 attempts, sleeping as following after 1st, 2nd, 3rd, ...,
/// 7th attempt:
/// 1. 400 ms +/- 25%
/// 2. 800 ms +/- 25%
/// 3. 1600 ms +/- 25%
/// 4. 3200 ms +/- 25%
/// 5. 6400 ms +/- 25%
/// 6. 12800 ms +/- 25%
/// 7. 25600 ms +/- 25%
///
/// ```dart
/// final response = await retry(
/// // Make a GET request
/// () => http.get('https://google.com').timeout(Duration(seconds: 5)),
/// // Retry on SocketException or TimeoutException
/// retryIf: (e) => e is SocketException || e is TimeoutException,
/// );
/// print(response.body);
/// ```
///
/// If no [retryIf] function is given this will retry any for any [Exception]
/// thrown. To retry on an [Error], the error must be caught and _rethrown_
/// as an [Exception].
///
/// See https://github.com/google/dart-neats/blob/master/retry/lib/retry.dart
Future<T> retry<T>(
FutureOr<T> Function() fn, {
Duration delayFactor = const Duration(milliseconds: 200),
double randomizationFactor = 0.25,
Duration maxDelay = const Duration(seconds: 30),
int maxAttempts = 8,
FutureOr<bool> Function(Exception)? retryIf,
FutureOr<void> Function(Exception, int retryCount)? onRetry,
}) async {
var attempt = 0;
// ignore: literal_only_boolean_expressions
while (true) {
attempt++; // first invocation is the first attempt
try {
return await fn();
} on Exception catch (e) {
if (attempt >= maxAttempts || (retryIf != null && !(await retryIf(e)))) {
rethrow;
}
if (onRetry != null) {
await onRetry(e, attempt);
}
}

// Sleep for a delay
final rf = randomizationFactor * (random.nextDouble() * 2 - 1) + 1;
final exp = math.min(attempt, 31); // prevent overflows.
final delay = delayFactor * math.pow(2.0, exp) * rf;
await Future.delayed(delay < maxDelay ? delay : maxDelay);
}
}
12 changes: 12 additions & 0 deletions test/embedding/embedding_test.dart
Original file line number Diff line number Diff line change
@@ -357,6 +357,18 @@ String _filter(String input) {
RegExp(r'Writing \d+ characters', multiLine: true),
r'Writing $N characters',
)
.replaceAll(
RegExp(r'x-goog-hash(.*)$', multiLine: true),
r'x-goog-hash: $CHECKSUM_HEADER',
)
.replaceAll(
RegExp(
r'Computed checksum \d+ for foo 1.0.0 with expected CRC32C of '
r'\d+\.',
multiLine: true),
r'Computed checksum $CRC32C for foo 1.0.0 with expected CRC32C of '
r'$CRC32C.',
)

/// TODO(sigurdm): This hack suppresses differences in stack-traces
/// between dart 2.17 and 2.18. Remove when 2.18 is stable.
Loading