Skip to content

Conversation

keunwoochoi
Copy link
Contributor

This is a specialized file opener + decoder that

  • works for various types of sources (s3, gcs, local path)
  • open any text file and stream the content line by line.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 22, 2025
Copy link

pytorch-bot bot commented Aug 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/data/1500

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 8832f44 with merge base 9295079 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.


# We got a valid line, return it
return line_data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a shutdown method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added in the new commit.

self.SOURCE_KEY: self.source.state_dict(),
self.CURRENT_FILE_KEY: self._current_file,
"current_line": self._current_line,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also save METADATA_KEY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i) like self.CURRENT_FILE_KEY, i added self.CURRENT_LINE_KEY.

ii) are you sure you meant saving METADATA_KEY? DATA_KEY and METADATA_KEY play similar roles which i don't think needs to be saved.

@keunwoochoi
Copy link
Contributor Author

thanks for the review. made some changes & pushed three commits.

@keunwoochoi
Copy link
Contributor Author

(note to myself)


(1018 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================== short test summary info ===========================
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_metadata - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpcyw34h51\\test1.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_state_management - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpe4dio47s\\test1.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_empty_file - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpivu3sp7w\\normal.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_encoding - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmp73b21cs9\\utf8.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_error_handling - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmp4oq398ov\\valid.txt'
===== 5 failed, 383 passed, 15 skipped, 27 warnings in 1320.71s (0:22:00) =====
Error: Process completed with exit code 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants