Skip to content

gh-107801: Improve io.*.seek docs and docstrings #107899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 38 additions & 25 deletions Doc/library/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,19 @@ High-level Module Interface
:func:`os.stat`) if possible.


.. data:: SEEK_SET
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants are defined in os and io modules. Instead of duplicating the description, I think that it is better to document them in one place and refer to them from other module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are exposed API in both modules, so in my opinion they should be documented in both io and os docs. I think .. seealso:: for cross linking is fine.

SEEK_CUR
SEEK_END

Parameters to the :meth:`~IOBase.seek` method,
used to specify the relative position
Their values are 0, 1, and 2, respectively.

.. seealso::

:data:`os.SEEK_SET`, :data:`os.SEEK_CUR`, and :data:`os.SEEK_END`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the os constants were added in 2.5 for the os.lseek function (and io's in 2.7 & 3.1 (#48822)). Given the values are the same there's little difference, but should we provide which module's constants you're 'meant' to use in which cases?

Edit: Note that fnctl.lockf, mmap.seek and sqlite3.Blob.seek use the os constants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As os constants were added earlier, I think we can just refer to them from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no opinion on which constants to use, but my gut reaction would be to use the constants from io for io stuff.



.. function:: open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

This is an alias for the builtin :func:`open` function.
Expand Down Expand Up @@ -405,18 +418,17 @@ I/O Base Classes

.. method:: seek(offset, whence=SEEK_SET, /)

Change the stream position to the given byte *offset*. *offset* is
interpreted relative to the position indicated by *whence*. The default
value for *whence* is :data:`SEEK_SET`. Values for *whence* are:
Change the stream position to the given byte *offset*,
and return the new absolute position as an integer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the new position is secondary and was already documented at the end of the method description. Is it so important to move it to the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a function or method returns a value, surely it makes sense to start the text with that information, together with a succinct description of what the function/method does.

*offset* is interpreted relative to the position indicated by *whence*.
Valid values for *whence* are:

* :data:`SEEK_SET` or ``0`` -- start of the stream (the default);
* :data:`SEEK_SET` -- seek from the start of the stream (the default);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the literal values. They are standard and widely used. Removing the literal values will make the reader confused when they see file.seek(0, 2) in someone's other code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text above contains: "offset is interpreted relative to the position indicated by whence." The text in this paragraph describes that position. So if you consider it as a continuation of the former phrase, "seek from" may be not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the literal values. They are standard and widely used. Removing the literal values will make the reader confused when they see file.seek(0, 2) in someone's other code.

Good call. I agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also keep the literal values.

*offset* should be zero or positive
* :data:`SEEK_CUR` or ``1`` -- current stream position; *offset* may
be negative
* :data:`SEEK_END` or ``2`` -- end of the stream; *offset* is usually
negative

Return the new absolute position.
* :data:`SEEK_CUR` -- seek from current stream position;
*offset* may be negative
* :data:`SEEK_END` -- seek from the end of the stream;
*offset* is usually negative
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*offset* is usually negative
*offset* is usually zero or negative


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the pitfalls of seeking with files in text-mode, perhaps IOBase.seek should link to TextIOBase.seek, which has more specific information on the limitations here:

Suggested change
.. note::
When working with a text stream :py:meth:`!seek` has limitations
beyond the general interface described here,
which are described in more detail in :py:meth:`TextIOBase.seek`.

.. versionadded:: 3.1
The ``SEEK_*`` constants.
Expand Down Expand Up @@ -911,21 +923,22 @@ Text I/O

.. method:: seek(offset, whence=SEEK_SET, /)

Change the stream position to the given *offset*. Behaviour depends on
the *whence* parameter. The default value for *whence* is
:data:`SEEK_SET`.

* :data:`SEEK_SET` or ``0``: seek from the start of the stream
(the default); *offset* must either be a number returned by
:meth:`TextIOBase.tell`, or zero. Any other *offset* value
produces undefined behaviour.
* :data:`SEEK_CUR` or ``1``: "seek" to the current position;
*offset* must be zero, which is a no-operation (all other values
are unsupported).
* :data:`SEEK_END` or ``2``: seek to the end of the stream;
*offset* must be zero (all other values are unsupported).

Return the new absolute position as an opaque number.
Change the stream position to the given byte *offset*,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a byte offset.

I suggest also to change the name of the first argument to something other. For example to "cookie" as it is use in the code. Or to "pos" as it was in older documentation. But "offset" is misleading.

Removing references to the name further in the documentation and using "the first argument" instead of the name if it is impossible can also help, because it is hard to came with good name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing docs/docstrings use offset, cookie, and pos. I don't understand what's meant by cookie. I understand what's meant by offset or position. If there is no need to use the same name, there is no need for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think offset is fine, as it is still an offset into the file even if not an integer byte offset. I agree we should stay away from cookie et al.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cookie" means an opaque value. No, it is not an offset. It is several values (position of the buffer in the file, position in the buffer, state of the encoder and decoder, packed in a 257-bit integer. See code of _pack_cookie() and _unpack_cookie() and very complex code of tell() and seek() in _pyio.py.

This PR is needed because semantic of tell() and seek() in binary and text files is so different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, yes TextIOWrapper is a different thing. Very unfortunate.

and return the new absolute position as an opaque number.
*offset* is interpreted relative to the position indicated by *whence*.
Valid values for *whence* are:

* :data:`SEEK_SET` -- seek from the start of the stream (the default);
*offset* must either be a number returned by
:meth:`TextIOBase.tell`, or zero.
Comment on lines +932 to +933
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to describe what will happen exactly.

Suggested change
*offset* must either be a number returned by
:meth:`TextIOBase.tell`, or zero.
*offset* must either be a number returned by
:meth:`TextIOBase.tell`, in which case the stream is returned to the position before calling :meth:`!tell()`, or zero, in which case the stream is rewound to the start.

Any other *offset* value produces undefined behaviour.
* :data:`SEEK_CUR` -- seek from current stream position;
*offset* must be zero, which is a no-operation.
Comment on lines +935 to +936
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :data:`SEEK_CUR` -- seek from current stream position;
*offset* must be zero, which is a no-operation.
* :data:`SEEK_CUR` -- just return the current position; *offset* must be zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it silly? "Seek, but offset must be zero, so actually do not seek." Why write seek if it does not seek? I think it only confuses users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested "seek from the current stream position; offset must be zero, which does not change the stream position. All other values are unsupported, and will raise an exception." above -- would this be an alright alternative?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why say "seek" if it does not seek?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you convinced me. How about dropping 'just':

Suggested change
* :data:`SEEK_CUR` -- seek from current stream position;
*offset* must be zero, which is a no-operation.
* :data:`SEEK_CUR` -- return the current position; *offset* must be zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. Only do not include contradictions.

Since seek() also returns the current position in all other cases, "return the current position" may be superfluous (it stress the difference which does not exist). Maybe just "no-operation", "does nothing" or "does not change the current position"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just "no-operation", "does nothing" or "does not change the current position"?

Yes, I think a shorter sentence is the way to go.

All other values are unsupported.
* :data:`SEEK_END` -- seek from the end of the stream;
*offset* must be zero.
All other values are unsupported.
Comment on lines +931 to +940
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :data:`SEEK_SET` -- seek from the start of the stream (the default);
*offset* must either be a number returned by
:meth:`TextIOBase.tell`, or zero.
Any other *offset* value produces undefined behaviour.
* :data:`SEEK_CUR` -- seek from current stream position;
*offset* must be zero, which is a no-operation.
All other values are unsupported.
* :data:`SEEK_END` -- seek from the end of the stream;
*offset* must be zero.
All other values are unsupported.
* :data:`SEEK_SET` -- seek from the start of the stream (the default);
*offset* must either be a number returned by
:meth:`TextIOBase.tell`, or zero.
Any other *offset* value produces undefined behaviour.
* :data:`SEEK_CUR` -- seek from the current stream position;
*offset* must be zero, which does not change the stream position.
All other values are unsupported, and will raise an exception.
* :data:`SEEK_END` -- seek from the end of the stream;
*offset* must be zero, which does not change the stream position.
All other values are unsupported, and will raise an exception.

Copy link
Contributor Author

@erlend-aasland erlend-aasland Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better:

Suggested change
* :data:`SEEK_SET` -- seek from the start of the stream (the default);
*offset* must either be a number returned by
:meth:`TextIOBase.tell`, or zero.
Any other *offset* value produces undefined behaviour.
* :data:`SEEK_CUR` -- seek from current stream position;
*offset* must be zero, which is a no-operation.
All other values are unsupported.
* :data:`SEEK_END` -- seek from the end of the stream;
*offset* must be zero.
All other values are unsupported.
* :data:`SEEK_SET` -- seek from the start of the stream (the default);
*offset* must either be a number returned by
:meth:`TextIOBase.tell`, or zero.
Any other *offset* value produces undefined behaviour.
* :data:`SEEK_CUR` -- seek from the current stream position;
*offset* must be zero, leaving the stream position unchanged.
All other values are unsupported, and will raise an exception.
* :data:`SEEK_END` -- seek from the end of the stream;
*offset* must be zero, leaving the stream position at the end of the stream.
All other values are unsupported, and will raise an exception.



.. versionadded:: 3.1
The ``SEEK_*`` constants.
Expand Down
14 changes: 8 additions & 6 deletions Doc/tutorial/inputoutput.rst
Original file line number Diff line number Diff line change
Expand Up @@ -433,10 +433,12 @@ an opaque number when in text mode.

To change the file object's position, use ``f.seek(offset, whence)``. The position is computed
from adding *offset* to a reference point; the reference point is selected by
the *whence* argument. A *whence* value of 0 measures from the beginning
of the file, 1 uses the current file position, and 2 uses the end of the file as
the reference point. *whence* can be omitted and defaults to 0, using the
beginning of the file as the reference point. ::
the *whence* argument.
A *whence* value of :data:`io.SEEK_SET` measures from the beginning of the file,
:data:`io.SEEK_CUR` uses the current file position,
and :data:`io.SEEK_END` uses the end of the file as the reference point.
*whence* can be omitted and defaults to :data:`!io.SEEK_SET`,
using the beginning of the file as the reference point. ::
Comment on lines +436 to +441
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do find it strange that the tutorial delves into weird low-level APIs like seek. I can't even remember the last time I needed seek() for a real use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to show that it's possible, especially after covering tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we introduce tell, surely we must also talk about seek. But I think an "how to do advanced I/O" is more fitting for these APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, conceptually it's not really advanced nor is it difficult to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept is easy enough to grasp. The APIs are a pain, though.


>>> f = open('workfile', 'rb+')
>>> f.write(b'0123456789abcdef')
Expand All @@ -445,14 +447,14 @@ beginning of the file as the reference point. ::
5
>>> f.read(1)
b'5'
>>> f.seek(-3, 2) # Go to the 3rd byte before the end
>>> f.seek(-3, io.SEEK_END) # Go to the 3rd byte before the end
13
>>> f.read(1)
b'd'

In text files (those opened without a ``b`` in the mode string), only seeks
relative to the beginning of the file are allowed (the exception being seeking
to the very file end with ``seek(0, 2)``) and the only valid *offset* values are
to the very file end with ``seek(0, io.SEEK_END)``) and the only valid *offset* values are
those returned from the ``f.tell()``, or zero. Any other *offset* value produces
undefined behaviour.

Expand Down
4 changes: 2 additions & 2 deletions Doc/whatsnew/3.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -367,8 +367,8 @@ New, Improved, and Deprecated Modules

(Contributed by Benjamin Peterson and Antoine Pitrou.)

* The :mod:`io` module has three new constants for the :meth:`seek`
method :data:`SEEK_SET`, :data:`SEEK_CUR`, and :data:`SEEK_END`.
* The :mod:`io` module has three new constants for the :meth:`~io.IOBase.seek`
method :data:`~io.SEEK_SET`, :data:`~io.SEEK_CUR`, and :data:`~io.SEEK_END`.

* The :data:`sys.version_info` tuple is now a named tuple::

Expand Down
6 changes: 3 additions & 3 deletions Modules/_io/bufferedio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1284,14 +1284,14 @@ _io__Buffered_tell_impl(buffered *self)

/*[clinic input]
_io._Buffered.seek
target as targetobj: object
whence: int = 0
offset as targetobj: object
whence: int(c_default='0') = io.SEEK_SET
/
[clinic start generated code]*/

static PyObject *
_io__Buffered_seek_impl(buffered *self, PyObject *targetobj, int whence)
/*[clinic end generated code: output=7ae0e8dc46efdefb input=a9c4920bfcba6163]*/
/*[clinic end generated code: output=7ae0e8dc46efdefb input=d9ac57b8f178bc05]*/
{
Py_off_t target, n;
PyObject *res = NULL;
Expand Down
14 changes: 5 additions & 9 deletions Modules/_io/bytesio.c
Original file line number Diff line number Diff line change
Expand Up @@ -636,22 +636,18 @@ bytesio_iternext(bytesio *self)

/*[clinic input]
_io.BytesIO.seek
pos: Py_ssize_t
whence: int = 0
offset as pos: Py_ssize_t
whence: int(c_default='0') = io.SEEK_SET
/

Change stream position.
Change stream position and return the new absolute position.

Seek to byte offset pos relative to position indicated by whence:
0 Start of stream (the default). pos should be >= 0;
1 Current position - pos may be negative;
2 End of stream - pos usually negative.
Returns the new absolute position.
Set the byte offset 'offset', relative to position indicated by 'whence':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to keep a consistent terminology instead of alternating between "offset" and "position". "position" is probably more easily understood IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Position is used to describe the whence param. Would it not be confusing if 'position' is used to describe both parameters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the summary string does say "Change stream position and return the new absolute position". In the next line "stream position" suddenly becomes "byte offset".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this could also be "Set the stream position to the byte 'offset', relative to ...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Set the byte offset 'offset', relative to position indicated by 'whence':
Set the stream position to the byte offset 'offset', relative to the position indicated by 'whence':

[clinic start generated code]*/

static PyObject *
_io_BytesIO_seek_impl(bytesio *self, Py_ssize_t pos, int whence)
/*[clinic end generated code: output=c26204a68e9190e4 input=1e875e6ebc652948]*/
/*[clinic end generated code: output=c26204a68e9190e4 input=dbeeba9f5f031b6e]*/
{
CHECK_CLOSED(self);

Expand Down
4 changes: 2 additions & 2 deletions Modules/_io/clinic/bufferedio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 4 additions & 8 deletions Modules/_io/clinic/bytesio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 4 additions & 13 deletions Modules/_io/clinic/fileio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 18 additions & 9 deletions Modules/_io/clinic/iobase.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 10 additions & 8 deletions Modules/_io/clinic/stringio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 13 additions & 3 deletions Modules/_io/clinic/textio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 3 additions & 13 deletions Modules/_io/fileio.c
Original file line number Diff line number Diff line change
Expand Up @@ -953,24 +953,14 @@ portable_lseek(fileio *self, PyObject *posobj, int whence, bool suppress_pipe_er

/*[clinic input]
_io.FileIO.seek
pos: object
whence: int = 0
offset as pos: object
whence: int(c_default='SEEK_SET') = io.SEEK_SET
/

Move to new file position and return the file position.

Argument offset is a byte count. Optional argument whence defaults to
SEEK_SET or 0 (offset from start of file, offset should be >= 0); other values
are SEEK_CUR or 1 (move relative to current position, positive or negative),
and SEEK_END or 2 (move relative to end of file, usually negative, although
many platforms allow seeking beyond the end of a file).

Note that not all file objects are seekable.
[clinic start generated code]*/

static PyObject *
_io_FileIO_seek_impl(fileio *self, PyObject *pos, int whence)
/*[clinic end generated code: output=c976acdf054e6655 input=0439194b0774d454]*/
/*[clinic end generated code: output=c976acdf054e6655 input=ec74977caf7c656b]*/
{
if (self->fd < 0)
return err_closed();
Expand Down
Loading