Skip to content

Add Unstruct functionality to flatten a nested struct  #23

@AdrianOlosutean

Description

@AdrianOlosutean

Background

Currently, there is no way to flatten a struct field in a certain level of nesting.

Feature

When doing f.nestedMapColumn(), the unstruct function should project the fields of a nested struct on the same level as the parent

Example

For a dataset of the following shape:

root
|-- id: long (nullable = true)
|-- my_array: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- a: long (nullable = true)
|    |    |-- b: string (nullable = true)
|    |    |-- c: struct (containsNull = true)
|    |    |    |--nestedField1: string (nullable = true)
|    |    |    |--nestedField2: long (nullable = true)

Applying df.nestedMapColumn("my_array.c", "my_array", c => unstruct(c)) should result in

root
|-- id: long (nullable = true)
|-- my_array: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- a: long (nullable = true)
|    |    |-- b: string (nullable = true)
|    |    |-- nestedField1: string (nullable = true)
|    |    |-- nestedField2: long (nullable = true)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions