python string wrapper? #218

ExpandingMan · 2022-09-07T16:43:45Z

Converting strings from Python is of course really expensive because it involves a lot of copying and not even of contiguous blocks of data. Once you start getting above a few MB of data converting strings starts to look like a really bad option. The Py objects can do a lot, but they are not AbstractString so they don't really look like strings on the Julia end until you convert them.

Any interest in creating some kind of PyStr wrapper that provides an AbstractString interface for Py's?

The text was updated successfully, but these errors were encountered:

cjdoris · 2022-09-07T18:20:38Z

I've just done a quick benchmark with x = "a"^100_000_000 and y = Py(x):

x * "b" takes 17ms
Py(x) takes 33ms
pyconvert(String, y) takes 41ms

So conversion to/from Python strings appears to be within a small factor of optimal.

Is it that you don't want to pay this conversion cost at all, or have limited memory and therefore want a lazy wrapper?

I'm not against adding PyString, but AFAIU Python strings have no defined storage representation so there is no good way to access its codepoints or do fast random access to its characters. I think the best you can do is explicitly encode the string as UTF8 first, which is exactly what conversion to String does.

We could have a PyString wrapper type which simply uses ordinary Python indexing to access substrings and characters, but this will be sloooow.

ExpandingMan · 2022-09-07T18:40:28Z

I think the problem is that it doesn't scale well. I don't have a MWE but I have observed converting large sets of strings to be much more expensive. I think the reasons for this are the burden on the garbage collector and that the memory holding the strings is not in general contiguous as in your example with one large string.

I'm aware that PyString would still have enormous disadvantages, but it might be "good enough" in some cases. For example, I recently had to get $\sim 10^7$ strings from Python but there were only $\sim 100$ distinct strings, and my task was much faster by encoding them with a hash or whatever rather than trying to copy them. In my case not having a PyString didn't matter much, but sometimes it may... on the other hand maybe in cases where you would really need a PyString it's just not worth it and you should just convert 🤷

I'm not sure it's the right approach, it was just a thought.

cjdoris · 2022-09-07T19:33:41Z

The main use for a wrapper is to provide a zero-copy interface to a mutable object. A secondary use is to access only a small portion of a large container. I can't think of more uses than these two. If you don't have either of these uses (i.e. you are only reading, and will read most of the container) then usually you're better off eagerly converting the container instead.

Strings are immutable, which leaves only the second use, i.e. reading a small portion of a large string. But then you may as well just take the relevant substrings on the python side before converting. Maybe that's not always possible (e.g. this is happening inside a function which is acting generically on strings).

PallHaraldsson · 2022-10-10T13:51:10Z

https://discuss.python.org/t/pep-686-make-utf-8-mode-default/14435/43
"Python 3.15 instead of 3.13" will default to UTF-8 mode, some wanted it for Python 3.12. It seems it was postponed (I mean as default, the mode is already a non-default option in all currently supported Python versions).

I don't know when strings will internally be UTF-8 (as opposed to just for I/O), but I think they want to change that, in similar time-frame.

cjdoris · 2022-10-10T19:03:43Z

That's interesting, if strings ever use UTF-8 internally then we could add a PyString wrapper. Until then I don't think a wrapper would gain much so gonna close this issue for now. Feel free to reopen whenever.

cjdoris closed this as completed Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

python string wrapper? #218

python string wrapper? #218

ExpandingMan commented Sep 7, 2022

cjdoris commented Sep 7, 2022

Uh oh!

ExpandingMan commented Sep 7, 2022

Uh oh!

cjdoris commented Sep 7, 2022

Uh oh!

PallHaraldsson commented Oct 10, 2022

Uh oh!

cjdoris commented Oct 10, 2022

Uh oh!

python string wrapper? #218

python string wrapper? #218

Comments

ExpandingMan commented Sep 7, 2022

cjdoris commented Sep 7, 2022

Uh oh!

ExpandingMan commented Sep 7, 2022

Uh oh!

cjdoris commented Sep 7, 2022

Uh oh!

PallHaraldsson commented Oct 10, 2022

Uh oh!

cjdoris commented Oct 10, 2022

Uh oh!