Skip to content

some samples fail with --force_sratools_download due to changes in prefetch results #98

@dmalzl

Description

@dmalzl

Description of the bug

It is now over a month that I handle my data with fetchngs and I am pretty satisfied with the results. However, I recently encountered some difficulties when trying to force data download via sratools. Previously everything worked fine (in this context previously refers to the month May) but I had to reprocess and thus redownload some of the samples which resulted in pipeline fails due to error when fetching the data with prefetch. I vaguely remember reading somewhere that the SRA has made changes to its data storage policies or similar around beginning of June and the error I get as well as the timing (i.e. rerunning the same pipe command with as in May in June) is quite a hint towards a connection to this change. Looking at the .command.log file of the respective jobs reveals the core of the issue where prefetch will not download the typical *.sra file but something called *.sralite which is not recognized by the subsequent vdb-validate command as prefetch just puts it in the temp directory and not in the ./temp_dir/SRAsomething directory as expected by vdb-validate. This in turn causes the pipeline to fail. I haven't looked into it further as to if vdb-validate also excepts the *.sralite file and the problem being resolved by just checking if prefetch generates the expected folder or the *.sralite file and handling the cases accordingly. However, downloading the failing samples via the ENA FTP is still possible so a temporary fix is downloading everything I can with sratools and fetching the rest from the FTP.

Command used and terminal output

nextflow run nf-core/fetchngs ... --force_sratools_download
2022-06-24T14:44:39 prefetch.2.11.0 int: self NULL while reading file within network system module - cannot Make Compute Environment Token

2022-06-24T14:44:40 prefetch.2.11.0: 1) Downloading 'ERR1141695.sralite'...
2022-06-24T14:44:40 prefetch.2.11.0:  Downloading via HTTPS...
|-------------------------------------------------- 100%
2022-06-24T14:45:04 prefetch.2.11.0:  HTTPS download succeed
2022-06-24T14:45:05 prefetch.2.11.0:  'ERR1141695.sralite' is valid
2022-06-24T14:45:05 prefetch.2.11.0: 1) 'ERR1141695.sralite' was downloaded successfully
2022-06-24T14:45:06 vdb-validate.2.11.0 info: 'ERR1141695' could not be found

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions