Closed
Description
I'm streaming a large CSV file from dropbox, and i'm feeding it into a csv.DictReader(). Well off into the file, iter_lines randomly yields an incomplete line, which causes csv.DictReader to stop processing the file.
This is how my code looks:
def dropbox_yielder(dbx_file):
for line in dbx_file.iter_lines(chunk_size=1024 * 1024): # also tried with no chunk_size
encoding = cchardet.detect(line)
decoded = codecs.decode(line, encoding['encoding'] or 'UTF8', 'ignore')
logger.info("Yielded %d bytes - < %s ... %s >", len(decoded), decoded[:9], decoded[-9:])
yield decoded
2019-09-08 09:37:24,364.364 INFO import_service_v2 - dropbox_yielder: Yielded 121 bytes - < asdasdasd ... 11/2004 00:00:00,"", >
2019-09-08 09:37:24,366.366 INFO import_service_v2 - dropbox_yielder: Yielded 122 bytes - < asdasdasd ... 11/2004 00:00:00,"", >
2019-09-08 09:37:24,368.368 INFO import_service_v2 - dropbox_yielder: Yielded 18 bytes - < asdasdasd ... asdasdasd >
The csv lines are supposed to end with a timestamp, which in the last line does not. It's too short - only 18 bytes. Has anyone else ever had similar problems? I'm not sure if this is an issue with dropbox or with requests.
Metadata
Metadata
Assignees
Labels
No labels