-
Notifications
You must be signed in to change notification settings - Fork 68
[native_assets_builder] Caching strategy for hook invocations #1593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The way this is handled dynamically is that certain targets will output a depfile format during execution, and that depfile contains the set of resolved dependencies. If the depfile is missing, you assume that the target needs to be rerun, if present then the depfile contains the set of all input/output files which lets you discover things dynamically. This is used for the dart dependencies, since we don't want to assume every single dart file as an input. |
Thanks so much for your input @jonahwilliams! Some notes from the discussion:
|
@jtmcdole when I did a hot reload on a flutter project with a moderate amount of assets (flutter gallery) in 2020 , it took more than a second to hot reload due to re-hashing. That is too slow. |
Notes from discussion with @jtmcdole! 🙏
|
I'm not disagreeing that was too much. What I'm looking for is an actual measuring stick so we could validate and make informed decisions. How many files? What sizes? @dcharkes - I just wrote a silly hash_it program in dart using md5,sha1,sha256, and that xxh4 package. I ran it over 1000 files of varying size from 0 to 32KB in size (totallying 18MB) for i in {1..1000}; do head -c ${(l:6::0:)$((${RANDOM}))} /dev/urandom > "$i.dat"; done I chose to only await all the files data after their bytes has been read, i.e. They took 90ms to 140ms If you await the file read; they are all slower, but the same results: They took 125 to 175ms. What I have no tried: using ffi to perform xxh3 to compare speeds. Instead I just did some cli magic:
md5: 34ms ffi would probably be better than pure dart. |
Going with a bigger fileset (random*10 ~= 164MB): native: dart: The big wins are "use the native libraries" |
Thanks for the exploration @jtmcdole! Using native libraries in dartdev/flutter_tools will require a bundling strategy for native libraries. Currently, native assets cannot yet be used by flutter_tools and dartdev itself. The easiest alternative would be to bake in xxh3 into Maybe it's better to temporarily go with the fasted hashing in Dart (md5) and swap to xxh3 in a later stage when native assets can be used inside the Dart/Flutter commandline tooling. |
Although native can be faster, especially if vectorized, I'm surprised by a 10x difference. Maybe our compiler is doing bad job here. /cc @mraleph |
@jtmcdole do you have your benchmark somewhere? |
FWIW with a simple fix to the package code XXH3 goes from 8ns/byte on my machine to ~0.6ns/byte in AOT mode. In JIT mode this reveals an ineffecient compilation of the |
Ill post it here (it's the dumbest of benches).
|
- int dataVal = readLE64(input, inputOffset + (8 * i));
+ int dataVal = readLE64(input, inputOffset + (i * 8)); Me in college: "don't try to outsmart the compiler". How does swapping these two make a difference? |
@jtmcdole the bulk of the improvement is due to |
import 'dart:io';
import 'dart:typed_data';
import 'package:crypto/crypto.dart';
import 'package:xxh3/xxh3.dart';
import 'package:args/args.dart';
const String version = '0.0.1';
ArgParser buildParser() {
return ArgParser()
..addFlag(
'help',
abbr: 'h',
negatable: false,
help: 'Print this usage information.',
)
..addOption(
'hash',
defaultsTo: 'xxh',
allowed: ['xxh', 'md5', 'sha1', 'sha256'],
)
;
}
void printUsage(ArgParser argParser) {
print('Usage: dart hash_it.dart <flags> [arguments]');
print(argParser.usage);
}
void main(List<String> arguments) async {
final ArgParser argParser = buildParser();
late final String hashOption;
try {
final ArgResults results = argParser.parse(arguments);
// Process the parsed arguments.
if (results.wasParsed('help')) {
printUsage(argParser);
return;
}
hashOption = results['hash'];
} on FormatException catch (e) {
// Print usage information if an invalid argument was provided.
print(e.message);
print('');
printUsage(argParser);
return;
}
final watch = Stopwatch()..start();
Function(List<int> bytes) hasher;
switch (hashOption) {
case 'sha1':
hasher = (bytes) {
var digest = sha1.convert(bytes);
return digest.bytes[1];
};
case 'sha256':
hasher = (bytes) {
var digest = sha256.convert(bytes);
return digest.bytes[1];
};
case 'md5':
hasher = (bytes) {
var digest = md5.convert(bytes);
return digest.bytes[1];
};
case 'xxh':
default:
hasher = (bytes) => xxh3(bytes as Uint8List);
}
var dir = Directory('data');
if (true) {
var futs = <Future<Uint8List>>[];
await for (var fileEnt in dir.list()) {
if (fileEnt is File) {
futs.add(fileEnt.readAsBytes());
}
}
final results = await Future.wait(futs);
final hashs = List.filled(results.length, 0);
int counter = 0;
for (var result in results) {
final hash = hasher(result);
hashs[counter++] = hash;
}
} else {
final hashes = [];
await for (var fileEnt in dir.list()) {
if (fileEnt is File) {
final bytes = await fileEnt.readAsBytes();
final hash = hasher(bytes);
hashes.add(hash);
}
}
}
// print(hashs);
print('done: ${watch.elapsed}');
} |
@mraleph - do we want to put up a friendly PR to the xxh3 library? |
dart + patches: sha1: 900ms Well that's a win. |
|
Version 1.1.0 just got published with some other optimizations as well: https://pub.dev/packages/xxh3 |
We need to decide on a robust strategy for caching hook invocations.
Requirements
Some typical use cases and corner cases:
Non goal:
Design options
1. Use dependencies last-modified timestamp and the last-build-started timestamp (current implementation).
This is the strategy used by
make
andninja
.Downsides:
lastModified
on a timestamp that is before the moment the last build was run (use case 1).2. Use dependencies last-modified, and save the last seen timestamp for each dependency
This strategy is used by
flutter_tools
withFileStoreStrategy.timestamp
.Downsides:
lastModified
on exactly the timestamp that the previous version of the file had.Flutter uses this only for when dependencies are created by our own tools. For example a
package_config.json
is the input for generatingdart_plugin_registrant.dart
.3. Use file-content hashing, and save the last seen hash for each dependency
This strategy is used by
flutter_tools
withFileStoreStrategy.hash
.Downsides:
dart compile kernel --package=package_config.json --output=foo.dill --depfile=dependencies.d foo.dart
and modifying one of the files listed as dynamic dependencies independencies.d
whiledart compile kernel
is running goes unnoticed.Most
Target
s in flutter tools have static dependencies (List<Source> get inputs
). I am not entirely sure how dynamic dependencies are handled (List<String> get depfiles
).4. Use file-content hashing, save the last seen hash for each dependency, and error on last-modified during build
This strategy is used by Bazel/Blaze. (Surfaces as file modified during build.)
Prevents the downside of 3 by checking the last-modified timestamps of dependencies to check if they didn't change since the build started.
Downside:
5. Use file-content hashing, save the last seen hash for each dependency, and don't cache on last-modified during build (proposed solution)
Solution 4, but don't error, simply don't cache.
Downside:
Other solutions?
We could consider having to list all dependencies upfront, but that seems too restrictive.
We could consider having to output all the file content hashes from the hooks, but that seems to unfriendly for users.
Maybe we say that reverting a file to an older last modified during a build is not supported.
@jonahwilliams You wrote some of the caching in flutter, is my understanding correct? Any suggestions?
@mkustermann Any other build systems we should investigate?
cc @blaugold @HosseinYousefi
The text was updated successfully, but these errors were encountered: