Skip to content

Commit 1ec12d1

Browse files
authored
Merge pull request #62 from Igosuki/avro
Add basic AVRO files (translated copies of the parquet testing files to avro)
2 parents 2c29a73 + a150499 commit 1ec12d1

18 files changed

+37
-0
lines changed

data/avro/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
This directory contains AVRO files corresponding to the parquet testing files at https://github.com/apache/parquet-testing/blob/master/data/
2+
3+
These files were created by using spark using the commands from https://gist.github.com/Igosuki/324b011f40185269d3fc552350d21744
4+
5+
Roughly:
6+
```scala
7+
import com.github.mrpowers.spark.daria.sql.DariaWriters
8+
import org.apache.hadoop.fs.FileSystem
9+
import org.apache.hadoop.fs.Path
10+
import org.apache.hadoop.conf.Configuration
11+
import org.apache.commons.io.FilenameUtils
12+
13+
val fileGlobs = sc.getConf.get("spark.driver.globs")
14+
val dest = sc.getConf.get("spark.driver.out")
15+
16+
val fs = FileSystem.get(new Configuration(true));
17+
val status = fs.globStatus(new Path(fileGlobs))
18+
for (fileStatus <- status) {
19+
val path = fileStatus.getPath().toString()
20+
try {
21+
val dfin = spark.read.format("parquet").load(path)
22+
val fileName = fileStatus.getPath().getName();
23+
val fileNameWithOutExt = FilenameUtils.removeExtension(fileName);
24+
val destination = s"${dest}/${fileNameWithOutExt}.avro"
25+
println(s"Converting $path to avro at $destination")
26+
DariaWriters.writeSingleFile(
27+
df = dfin,
28+
format = "avro",
29+
sc = spark.sparkContext,
30+
tmpFolder = s"/tmp/dw/${fileName}",
31+
filename = destination
32+
)
33+
} catch {
34+
case e: Throwable => println(s"failed to convert $path : ${e.getMessage}")
35+
}
36+
}
37+
```

data/avro/alltypes_dictionary.avro

765 Bytes
Binary file not shown.

data/avro/alltypes_plain.avro

868 Bytes
Binary file not shown.
766 Bytes
Binary file not shown.

data/avro/binary.avro

236 Bytes
Binary file not shown.

data/avro/datapage_v2.snappy.avro

456 Bytes
Binary file not shown.
213 Bytes
Binary file not shown.
436 Bytes
Binary file not shown.
433 Bytes
Binary file not shown.

data/avro/int32_decimal.avro

392 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)