Skip to content

Strange behaviour, dbx_file.iter_lines() ends prematurely #170

Closed
@thestick613

Description

@thestick613

I'm streaming a large CSV file from dropbox, and i'm feeding it into a csv.DictReader(). Well off into the file, iter_lines randomly yields an incomplete line, which causes csv.DictReader to stop processing the file.

This is how my code looks:

def dropbox_yielder(dbx_file):
    for line in dbx_file.iter_lines(chunk_size=1024 * 1024): # also tried with no chunk_size
        encoding = cchardet.detect(line)
        decoded = codecs.decode(line, encoding['encoding'] or 'UTF8', 'ignore')
        logger.info("Yielded %d bytes - < %s ... %s >", len(decoded), decoded[:9], decoded[-9:])
        yield decoded
2019-09-08 09:37:24,364.364 INFO import_service_v2 - dropbox_yielder: Yielded 121 bytes - < asdasdasd ... 11/2004 00:00:00,"", >
2019-09-08 09:37:24,366.366 INFO import_service_v2 - dropbox_yielder: Yielded 122 bytes - < asdasdasd ... 11/2004 00:00:00,"", >
2019-09-08 09:37:24,368.368 INFO import_service_v2 - dropbox_yielder: Yielded 18 bytes - < asdasdasd ... asdasdasd >

The csv lines are supposed to end with a timestamp, which in the last line does not. It's too short - only 18 bytes. Has anyone else ever had similar problems? I'm not sure if this is an issue with dropbox or with requests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions