- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
The current approach uses polymorphism to make RopeString objects. This is pretty inefficient for the typical small string use-case. To efficiently construct a C-style string in the current framework, one makes the current output stream a memio object and then prints to it. The general pattern I've used is to write a print_whatever
function and then wrap it in a whatever
function that returns a string using print_to_string
. Should we stick with this pattern? It has the advantage of allowing the printing version to be very efficient, but it's kind of awkward to write. Should we figure out a different pattern? Something like C#'s StringBuilder
pattern?
Perhaps it suffices to make strcat
check the size and encodings of its arguments and use print_to_string
approach to concatenate them into a copied string where appropriate — namely when the arguments are of compatible encodings (e.g. any mixture of ASCIIString
and UTF8String
), and if concatenated they would be below some size threshold. For larger strings, we should continue to use the RopeString
approach. Also, string slices should copy their contents as well unless the resulting string is above the "large string" threshold, in which case, they can continue to use the current SubString
with the known issue that this pins the superstring in memory.
Activity
JeffBezanson commentedon May 3, 2011
Changing types based on string lengths makes it too hard to infer the types of these rather common operations. Instead, we should have the option to wrap a string as BigString(s) if s might be large, and BigString can use the memory-saving versions of these operations.
There's not much difference between print_to_string and StringBuilder. with_output_stream can be skipped in some cases by using write() with an explicit destination argument. It's also nice to be able to write output directly to an I/O endpoint without building a temporary string first.
Simple string building cases can also be handled by pushing characters into an array.
StefanKarpinski commentedon May 3, 2011
Makes sense. I can make the BigString change easily.
Is this an argument for continuing to implement core string building functionality by writing the printing version first and then defining the string creating version by applying print_to_string to the printing version?
JeffBezanson commentedon May 3, 2011
Somewhat, but multiple approaches can be used. For example, if you're just
combining strings and characters you can use write() instead of going
through print_to_string. We might want to provide some nicer names for
memio, takebuf_string, and write, and make it look more like StringBuilder.
Or for something like strcat I would determine the size of the result,
allocate it once, and use memcpy.
The trouble is that if I do something like
write(io, strcat(a,b,c))
what you ideally want is to write each string without forming the temporary.
Even if strcat is written using an i/o buffer you don't get that
automatically here. I might have to say
strcat_to(io, a, b, c)
but that's not a very nice interface. If a, b, or c is a BigString though,
the strcat is done lazily and you get the desired behavior of writing all
the pieces with no copying. This seems to convince me that there's no
advantage to writing all the string functions in terms of printing. So do
whatever's simplest/fastest/convenient, and let BigString handle other
concerns. How's that sound?
print_escaped is a bit different since we know that a main use of it is
doing output. So strcat etc. doesn't necessarily need to imitate it.
On Tue, May 3, 2011 at 12:38 PM, StefanKarpinski <
reply@reply.github.com>wrote:
ViralBShah commentedon Jul 9, 2011
This seems like a 2.0 thing.
StefanKarpinski commentedon Jul 9, 2011
We're actually pretty good on this at this point. All
strcat
and stringref
(substring) operations onASCIIString
andUTF8String
objects usememcpy
now, so they're fast and they don't create exotic string objects (RopeString, SubString, etc.). Repeating a string does create a
RepString` object, but I think that's probably acceptable. I could make a copying implementation of that rather easily.If someone wants to use a
StringBuilder
pattern, they can write the printing version and then useprint_to_string
on it. I feel like that's a reasonable approach if one is worried aboutstrcat
efficiency, with the added bonus of providing a version of the same functionality that can print without having to build a string at all.I think this issue is not fully addressed, but well enough for v1.0 for now. Will reassign to v2.0.
JeffBezanson commentedon Jul 9, 2011
Can I replace
memcpy(a)
withcopy(a)
?StefanKarpinski commentedon Jul 9, 2011
Is
copy(a::Array{Uint8,1})
as efficient asmemcpy
is?JeffBezanson commentedon Jul 9, 2011
It should be now that we changed
copy_to
to usememcpy
for arrays where possible.StefanKarpinski commentedon Jul 9, 2011
We can get rid of
memcpy
entirely then. I'll do it.ViralBShah commentedon Jul 9, 2011
We also need to experiment with some sizes at which memcpy is faster. It is actually slower for small arrays. Copy_to should have these smarts.
On 10-Jul-2011, at 12:43 AM, JeffBezansonreply@reply.github.com wrote:
289 remaining items