Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Part of implementing StringView
#5374
@XiangpengHao implemented gc
which compacts all the strings in a StringView/BinaryView into contiguous storage in #5513
However, that functionality does not deduplicate/intern the strings -- it just copies them over
Describe the solution you'd like
We should make it easy to deduplicate the strings in a StringView.
I do think we should change gc
to do deduplication without an explict as (as deduplication is expensive)
Describe alternatives you've considered
- Do nothing (users can implement their own version of this code without any addtional apis)
- Add a new function (e.g.
GenericBinaryView::dedupe
) that deduplicated such arrays (likely not moving any strings, but just updating views) - Add an argument to
GenericBinaryView::gc
that controlled the behavior (as in could also specify doing gc)
Additional context
@alexwilcoxson-rel asked in #5904 (comment)
Can/will this incorporate deduping/interning/implicitly using the gc function that landed recently?