-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: Go 2: builtin add func resize(v Type, size) #41551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for raising this proposal. I don’t think there is much value in resizing slices as they grow geometrical when appended to. If this is not good enough making a new slice and copying the contents is two lines. Maps grow by doubling (I think) and provide a size hint. Perhaps this is good enough, if not, some benchmarks showing that map rehashing is dominating insert performance would be valuable for this proposal. I cannot see how a slice could be resized safely without additional locking. There are probably some unanswered questions when a channel is resized from unbuffered to buffered. |
If performance is a concern that can solved without a language change: With generics a generic map copy function can be defined not requiring more typing than a builtin and no additional language change. Can you pleas spell out in more detail what resize does and where it is useful and how often that comes up? |
For language change proposals, please fill out the template at https://go.googlesource.com/proposal/+/refs/heads/master/go2-language-changes.md . When you are done, please reply to the issue with @gopherbot please remove label WaitingForInfo. Thanks! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I agree that slices have the least value in this proposal
I can redo the benchmarks but I did see
well no slice operation is thread safe.
A. I did not know this is possible :).
the map scenerio comes a lot in functions like func (c myCollection) AddMultiple(objs []Objects) {
for _, obj := range objs {
c[obj.key()] = obj.value()
}
} or even: func (c myCollection) Merge(other myCollection) {
for key, value := range other {
c[key] = value
}
}
obviously just like with about pointers/references it should work just like a natural resizing caused by appending/inserting (so yeah it's kinda weird in slices) EDIT: @martisch Should I write a comment here/edit the OP with answers to all the questions in the template? |
The map examples given just show adding elements. There would need to be an additional resize call that is trying to more then double the capacity. How would that look like in the code above and what capacity would it target? If its not more than doubling the internal resize mechanism of maps would do the same as the resize on the fly already. A resize will need to also copy all elements to a new map allocation in the current design of maps to grow the map. If performance is lost on alot of loops adding elements to a map an option would also be for the compiler to automatically initiate a growth to more than 2 times the map size if the two be added elements in that loop are more than double the size of the map. This wouldnt need an explicit hint by the programmer. More details can be added in an extra comment or in the top comment. |
This seems difficult to implement for channels. And I find the rationale unconvincing. Does any code ever really adjust the amount of workers on the fly while in the middle of processing? If not in the middle of processing, it's easy to simply create a new channel. As noted, it's not that useful for slices. Even more maps, changing the map size means rehashing all the elements, so although it may look simpler than a loop assigning to elements to a new map, it won't actually run any faster. So I'm not seeing a significant benefit here. |
1: (hint to increase by X) func (c myCollection) Merge(other myCollection) {
resize(c, len(other))
for key, value := range other {
c[key] = value
}
} 2: (hint the max cap we want) func (c myCollection) Merge(other myCollection) {
resize(c, len(c)+len(other))
for key, value := range other {
c[key] = value
}
} it could also act similar to |
The use case is for example a server that can accept arbitrary connections, and a few routines at the side processing data from those connections via a channel, and every X new connections i'd like to increase the channel capacity so the connections won't block too much on the channel, but I don't want to start with a 2GB channel even if I only have 1 connection.
If a map has 4 capacity and I'm going to push 5K elements it will reallocate + copy 11 times instead if I resize before it will check if the cap is enough for that and if not will reallocate+copy beforehand, thus avoiding doing it over and over |
A few questions as the semantics of resize havent been spelled out completly yet: Assuming: "2: (hint the max cap we want)" What happens when the hinted at capacity is smaller than the existing capacity: Does resize downsize the map? Does the resize happen for the whole map immediatly or is it done by having each write and delete do a bit of the resize like current map growth? What happens if the hint provided is larger than a map is allowed to be (exceeding memory limits)? Does resize require an exact capacity or is it only a hint e.g. for map it is allowed to have next power of 2 capacity like the current make on a map allocates? |
This question is why I prefer passing the additional capacity, not the max, but if not I see multiple possible semantics:
whole map immediately, I don't see any benefits to resizing "slowly", this is counter-productive to the proposal.
the same behavior that will happen with the
it can just pass the next power of 2 but as you said the user will gain nothing from doing that, but of course the compiler doesn't have to increase, it might use heuristics to increase even more / a little less to allow better performance (ie allocate a whole page if the specified capacity is close enough to a full page) AFAIU this is the exact same semantics as |
So when discussing further should it be assumed the proposal means the hint is to be used for additional capacity? Can you spell out how the language specification would explain the semantics of the resize on maps? Can you elaborate what that then means for the current hash map implementation. e.g. is the hint making sure there is at least that much open slots somewhere in the map buckets including overflow buckets? Note that depending on hash collisions that does not mean adding that much elements doesnt cause a map growth to happen. Is a resize on a nil map allowed? |
I'm suggesting using My point is that that is exactly what the suggested |
As @networkimprov said earlier, this can be done by interposing another goroutine that manages the buffer space. I would be less concerned about this if there were an easy way to safely change the buffer size of a channel. But since I don't see an easy way, I think we need a very compelling benefit to pay the cost of writing and maintaining an implementation. |
Ian, what's complicated about changing a channel's buffer size? |
No, it is currently not possible to check the capacity of I know that capacity for maps doesn't always make perfect sense, but it still make some, and that depends on the map implementation (ie capacity makes a lot of sense in a SwissTable implementation), |
@networkimprov The issue with changing channel buffer size is what I mentioned above: changing between a buffered and an unbuffered channel requires waking every goroutine currently sleeping on the channel and letting those goroutines requeue themselves with changes appropriate for the difference between buffered and unbuffered channels. @elichai If the only issue is supporting |
Why not only allow channel buffer resize for channels with non-zero buffer? If you have a channel that may need to resize, I don't know why you'd ever start with no buffer. |
It's confusing to have capabilities with mysterious restrictions. We would need to have a really big benefit to be worth paying that cost. |
How would this be reported at runtime? Would it be permitted to resize a channel capacity downwards? |
It's expensive to require a 4K goroutine for each channel that normally needs 64 bytes of buffer and sees occasional traffic spikes requiring larger buffers. It's a 50% increase in goroutines beyond the producer and consumer using the channel. Maybe the overhead of GC swamps this in practice? |
Seems like the easiest solution would be to forgo resizing in favor of dynamically sizing the buffer up to the requested capacity. The make() function already has a 3 parameter variant for doing exactly that if channels were to obey it. If I could allocate a channel that could grow to 10 million entries, but start with 4000, then I wouldn't need to resize, secure in the knowledge that any channel that exceeds the current capacity will grow up to my specified maximum. Even better if it will also shrink back down over time. What I don't want, in my app which has very many channels feeding a large number of goroutine workers, is for every channel to start out with a very large buffer when the vast majority don't need it. There is no way to safely resize a channel, since wrapping channel access in a mutex, so that the old channel might be replaced with a new channel while holding the lock, means that any goroutine which blocks on channel access would do so while holding the mutex, thereby causing a deadlock. If resize happens internally to the channel, it can also ensure that blocking on the channel never happens while holding any kind of exclusive access. |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
The
make
builtin allow you to specify a capacity for maps,slices and channels, but once you specify you cannot currently change that.With slices you could make a new slice, copy over and assign to the current slice, but with maps/channels it is harder.
Maps will auto-scale with inserts but sometimes you know you're going to add X elements to the map so you can tell it to reallocate/resize beforehand.
Channels are even worse, you currently are stuck with the buffer you choose when you created the channel, sometimes a single channel is used to communicate with multiple go routines, and you want to resize the buffer if you launch more routines s.t. no routine will block on the channel.
I think a way to resize/realloc a type can be very useful, both for performance and usability,
an alternative for a new builtin can be to add this support to the
make
builtinThe text was updated successfully, but these errors were encountered: