Closed
Description
Is your feature request related to a problem or challenge?
We are working to add complete StringView
support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.
Today, most DataFusion string functions support DataType::Utf8
and DataType::LargeUtf8
and when called with a StringView
argument DataFusion will cast the argument back to DataType::Utf8
which is expensive.
To realize the full speed of StringView
, we need to ensure that all string functions support the DataType::Utf8View
directly.
Describe the solution you'd like
Port all string functions
- Implement native support StringView for character length #11676
- Initial support for regex_replace on
StringViewArray
#11556 - Support
starts_with
forUtf8View
#11786 - Update the
ASCII
scalar function to supportUtf8View
#11834 - Update the
BTRIM
scalar function to supportUtf8View
#11835 - Update the
CONCAT
scalar function to supportUtf8View
#11836 - Update
concat_ws
scalar function to supportUtf8View
#11837 - Update
CONTAINS
scalar function to supportUtf8View
#11838 - Update
ENDS_WITH
scalar function to supportUtf8View
#11852 - Update
INITCAP
scalar function to supportUtf8View
#11853 - Update
levenshtein
scalar function to supportUtf8View
#11854 - Update
LOWER
scalar function to supportUtf8View
#11855 - Update
LTRIM
scalar function to supportUtf8View
#11856 - Update
LPAD
scalar function to supportUtf8View
#11857 - Update
OCTET_LENGTH
scalar function to supportUtf8View
#11858 - Update
SPLIT_PART
scalar function to support Utf8View #11950 - Update
STRPOS
scalar function to support Utf8View #11951 - Update
SUBSTR
scalar function to support Utf8View #11952 - Update
TRANSLATE
scalar function to support Utf8View #11953 - Update
FIND_IN_SET
scalar function to support Utf8View #11954 - Implement native support StringView for
REPEAT
#11962 - Support Utf8View for string function
bit_length
#13195
Describe alternatives you've considered
No response
Additional context
See coordination plan with @tshauck and myself here: #11787 (comment)