Skip to content

Add Avro Support #4886

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Avro is a widely used binary, row-oriented data encoding. It is very similar to protobuf, and has seen very wide adoption in the data ecosystem, especially for streaming workloads.

Describe the solution you'd like

A new arrow_avro crate will provide vectorised support for reading and writing avro data. The APIs should be designed in such a way as to work for the various different container formats for avro encoded data, including single object encoding, object container files and message even if first-class support is not provided for all these framing mechanisms.

Describe alternatives you've considered

Additional context

DataFusion has some avro support, however, it is based on the row-based apache_avro crate and is therefore likely extremely sub-optimal.

FYI @Samrose-Ahmed @sarutak @devinjdangelo I intend to work on this, but any help with reviews / testing would be most welcome

Subtasks

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions