-
Notifications
You must be signed in to change notification settings - Fork 165
Refactor test suite to be more readable? #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Or even more readable: class TestLineReader:
@pytest.fixture
def files_with_text(self):
return [
("file1", "Line1\nLine2"),
("file2", "Line2,1\nLine2,2\nLine2,3"),
]
def make_str_dp(self, files_with_text):
return IterableWrapper([(file, io.StringIO(text)) for file, text in files_with_text])
def make_bytes_dp(self, files_with_text):
return IterableWrapper([(file, io.BytesIO(text.encode("utf-8"))) for file, text in files_with_text])
def test_functional_read_lines_correctly(self, files_with_text):
line_reader_dp = self.make_str_dp(files_with_text).readlines()
expected = []
for file, text in files_with_text:
expected.extend((file, line) for line in text.splitlines())
assert expected == list(line_reader_dp)
def test_functional_strip_new_lines_for_bytes(self, files_with_text):
line_reader_dp = self.make_bytes_dp(files_with_text).readlines()
expected = []
for file, text in files_with_text:
expected.extend((file, line.encode("utf-8")) for line in text.splitlines())
assert expected == list(line_reader_dp)
def test_functional_do_not_strip_newlines(self, files_with_text):
line_reader_dp = self.make_str_dp(files_with_text).readlines(strip_newline=False)
expected = []
for file, text in files_with_text:
expected.extend((file, line) for line in text.splitlines(keepends=True))
assert expected == list(line_reader_dp)
def test_reset(self, files_with_text):
line_reader_dp = LineReader(self.make_str_dp(files_with_text))
expected = []
for file, text in files_with_text:
expected.extend((file, line) for line in text.splitlines())
n_elements_before_reset = 2
res_before_reset, res_after_reset = reset_after_n_next_calls(line_reader_dp, n_elements_before_reset)
assert expected[:n_elements_before_reset] == res_before_reset
assert expected == res_after_reset
def test_len(self, files_with_text):
line_reader_dp = LineReader(self.make_str_dp(files_with_text))
with pytest.raises(TypeError, match="has no len"):
len(line_reader_dp) |
I like this idea! |
Ah, that might be an issue. In PyTorch core you cannot rely on
|
I believe we can do |
Thanks for the suggestion! I think this is cleaner than what we have. It will take quite a bit of manual refactoring of each DataPipe to get there. I am wondering if we can do something even better - a standard template to test out DataPipe with less manual code writing (maybe just specifying the inputs), similar to what |
FWIW, we've started something similar in torchtext. See here if you're interested. |
While working on #174, I also worked on the test suite. In there we have the ginormous tests that are hard to parse, because they do so many things at the same time:
data/test/test_datapipe.py
Lines 382 to 426 in c06066a
I was wondering if there is a reason for that. Can't we split this into multiple smaller ones? Utilizing
pytest
, placing the following class in the test module is equivalent to the test above:This is a lot more readable, since we now actually have 5 separate test cases that can individually fail. Plus, while writing this I also found that
test_reset
andtest_len
were somewhat dependent ontest_functional_do_not_strip_newlines
since they don't neither defineline_reader_dp
norexpected_result
themselves.The text was updated successfully, but these errors were encountered: