Closed
Description
I'm trying to transformAlignments from BAM file in S3, e.g.:
adam-submit transformAlignments s3://1000genomes/phase3/data/HG00154/alignment/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam s3://<mybucket>/1000genomes/adam/bam=HG00154/
It fails with:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 92 in stage 0.0 failed 60 times, most recent failure: Lost task 92.59 in stage 0.0 (TID 1266, ip-10-184-8-118.ec2.internal, executor 1): java.nio.file.ProviderNotFoundException: Provider "s3" not found at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341) at org.seqdoop.hadoop_bam.util.NIOFileUtil.asPath(NIOFileUtil.java:40) ...
If I stage the input BAM file onHDFS, the problem is resolved (the S3 output path works fine - only S3 input path causes problems).
hadoop fs -cp s3://1000genomes/phase3/data/HG00154/alignment/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam /adam/HG00154.bam
adam-submit transformAlignments /adam/HG00154.bam s3://<mybucket>/1000genomes/adam/bam=HG00154/
Do you have any pointers or fixes to get transformAlignment to support S3 input BAM files?