-
Notifications
You must be signed in to change notification settings - Fork 1.7k
building long strings through concatenation is quadratic in complexity #50304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In Dart, string concatenation is a bit slow. I'd suggest using StringBuffer to build larger strings:
This takes less than a second on my machine. |
Yes.. correct. My friend also had another fix with some similar but different code. Also.. I came about this problem while trying to parse a JSON file of 200k records. This is another problem. I think something should change.. 10 minutes versus a second .. yes.. that is a bug to me. |
I'm just a Dart user, not on the team or anything, but it's pretty common to use a StringBuffer in other languages when building large strings. I guess I personally don't have an issue using it in these cases, which I've only ran into once. It is interesting why JS is so much faster using the + operator though. |
a string and a buffer should be the same, but i guess it is not. My friend said that 200k objects are being made with this method. It also gets cumulatively slower and slower as more records (concatenations) are made. so each copy has more "string" |
Yep, your friend's right:
There is a lint that warns you not to use String foo() {
final buffer = '';
for (int i = 0; i < 10; i++) {
buffer += 'a'; // LINT
}
return buffer;
} vs String foo() {
final buffer = StringBuffer();
for (int i = 0; i < 10; i++) {
buffer.write('a');
}
return buffer.toString();
}
Currently, Hopefully that helps |
JavaScript is actually a relative outlier that decided to optimise string concatenation because people don't understand that building long strings through repetitive concatenation of immutable strings has quadratic time complexity. Most languages don't do that. Most JavaScript engines use complicated zoo of different string representations to optimise things like concatenation and slicing. This improves performance of the code like the one you have written but comes with a lot of complexity and comes with its own performance pitfalls. Dart decided that it is not worth going this route. The solution, as explained, is to use @lrhn do we want to update |
The reason JavaScript is "faster" at string I worked on V8 years ago, and back then its internal string model was something like:
This means you can do string concatenation in constant time. Yey. It also means that you cannot do most simple string operations efficiently on a concatenated string. For example doing Because of that, a lot of string operations start by flattening their receiver, if it's a concat-string. Basically, a concat-string is a "lazy concatenation", which gets forced when you try to use the string. In that way, it works like an implicit When the string has been flattened, the concat string object becomes (header, full string, empty string), so that existing references to the object keeps working, but there is only a small constant overhead when using the string. (If the string gets moved by garbage collection, it just becomes the flattened string without the trivial concat wrapper). You can write badly performing code in JavaScript too, like: // Fast code:
var t0 = Date.now();
var s = "";
for (var i = 0; i < 50000; i++) {
s = s + "xyzzy";
var c = "xyzzy".charCodeAt("xyzzy".length >> 1); // Simple constant time operation on a string.
}
var t1 = Date.now();
// Slow code:
var s2 = "";
for (var i = 0; i < 50000; i++) {
s2 = s2 + "xyzzy";
var c = s2.charCodeAt(s2.length >> 2); // Still simple, right?
}
var t2 = Date.now();
console.log(t1-t0, t2-t1); // For me: 18, 5342 By forcing the implicit flattening at every step, you get the same quadratic behavior that Dart's Dart (native) chose to not do what JavaScript does here, because it wanted a more predictable performance (and a smaller overhead) on string operations, which then doesn't have to first check if the string is a concatenation or not. Instead Dart followed Java's strategy and introduced a When compiled to JavaScript, there is no way around the JavaScript behavior, but you are still recommended to use a |
Well, when we consider 10 minutes versus 1 second, then there is something to consider besides "pure" coding ethics. So you can close it.. I guess the experts will decide about lint string buffer. The other (original) issue I had was reading and parsing a very large json using the fromJson method of the model. It too was very slow and seem to stop filling json arrays at near to 87,000 . 50k was fine. This whole exercise of creating a 200k csv was to make my own data was free from corruption problems due to font conversion (myanmar to roman, etc). I guess I'll create a json programatically and then see if size matters. However, maybe there is a reverse issue of buffers? I'm using the factory in the model by usual dart documentation way and it seems that factories do not really create objects and that is the point. |
Yes! Thank you for the awesome explanations. I like that Dart has predictable performance, unlike JavaScript where there are a bunch of arcane performance tricks that change between runtimes and versions. It would be cool if The docs actually do hint at when to use StringBuffer, but it's never said explicitly. Maybe this would be a good step forward. StringBuffer does say it's "A class for concatenating strings efficiently." However, it doesn't really say when those advantages show. Also, String does not show examples of when StringBuffer would be a better option. I could see putting the example from the use_string_buffers lint into one or both of these places. I think that doc change with the lint would reduce the confusion a lot. |
According to dart-lang/sdk#50304 (comment) and https://gist.github.com/mraleph/3397008, V8 (the JavaScript engine of Node.js) optimizes repeated string concatenation to be `O(n)` instead of `O(n^2)` (with some caveats on indexing into the string between concatenations). Going forward, this will be taken advantage of whenever it makes things more convenient.
I have a dictionary with 200,000 records.
Normally I process these records in js and I can read and write an 18MB file within a few seconds. (Note:.. older file about 168,000).
However, when I tried doing this in Dart for our re-written product, I ran into problems with .fromJson parsing. There seemed to be a limit of less than 90,000.
Further investigation had me try processing a csv manually.
Further investigation had me write a csv manually in case my data was bad.
Further investigation had me write a dart cli ...
The code is shown below and takes roughly 1 second per 1000 merely to do string processing in memory.
The flutter code of the same thing takes about 2.5 seconds per 1000.
So in flutter (for my real life dictionary import feature), we are talking about 600 seconds or 10 minutes. I have not ran any of these scripts to completion. They appear as "hung" even though they are just taking a long time.
Meanwhile.. I can read an sql file and pass db.Update(sql); in a few seconds.
For now, I will need to store my data online for downloading dictionary updates as Raw SQL transaction inserts.
I never thought I would report speed as a bug. But when we are talking a few seconds in js versus 10 minutes, this appears as "hung" and counts as a bug. I love dart flutter, but had I known about this limitation, I might have opted for something else. on the other hand the framework and using db's does not require us to feel this pain until now.
Hardware: Lenovo IdeaPad 3 14ALC6
Ubuntu 22.04.1
[✓] Flutter (Channel stable, 3.3.5, on Ubuntu 22.04.1 LTS 5.15.0-52-generic, locale en_US.UTF-8)
The text was updated successfully, but these errors were encountered: