Skip to content

Incorrect conversion of pyarrow interval value to datafusion literal #665

Closed
@timsaucer

Description

@timsaucer

Describe the bug
When creating a literal interval value from a pyarrow scalar, the values for month, day, and nanoseconds are not correctly assigned in the literal values. The following minimal example will reproduce. This appears to be limited to datafusion-python and not the rust implementation.

To Reproduce

print("Setting 1 month interval:")
pa_interval = pa.scalar((1, 0, 0), type=pa.month_day_nano_interval())
print("pa_interval:", pa_interval)

lit_interval = lit(pa_interval)
print("lit_interval:", lit_interval)

df.select(lit_interval).limit(1).show()

print("Setting 1 day interval:")
pa_interval = pa.scalar((0, 1, 0), type=pa.month_day_nano_interval())
print("pa_interval:", pa_interval)

lit_interval = lit(pa_interval)
print("lit_interval:", lit_interval)

df.select(lit_interval).limit(1).show()

print("Setting 1 nanosecond interval:")
pa_interval = pa.scalar((0, 0, 1), type=pa.month_day_nano_interval())
print("pa_interval:", pa_interval)

lit_interval = lit(pa_interval)
print("lit_interval:", lit_interval)

df.select(lit_interval).limit(1).show()

Produces the following result:

Setting 1 month interval:
pa_interval: MonthDayNano(months=1, days=0, nanoseconds=0)
lit_interval: Expr(IntervalMonthDayNano("1"))
DataFrame()
+-------------------------------------------------------+
| IntervalMonthDayNano("1")                             |
+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000001 secs |
+-------------------------------------------------------+
Setting 1 day interval:
pa_interval: MonthDayNano(months=0, days=1, nanoseconds=0)
lit_interval: Expr(IntervalMonthDayNano("4294967296"))
DataFrame()
+-------------------------------------------------------+
| IntervalMonthDayNano("4294967296")                    |
+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 4.294967296 secs |
+-------------------------------------------------------+
Setting 1 nanosecond interval:
pa_interval: MonthDayNano(months=0, days=0, nanoseconds=1)
lit_interval: Expr(IntervalMonthDayNano("18446744073709551616"))
DataFrame()
+-------------------------------------------------------+
| IntervalMonthDayNano("18446744073709551616")          |
+-------------------------------------------------------+
| 0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+

Expected behavior
When setting an interval value of 1 month in pyarrow, it should show up as 1 month in the datafusion data frame, and so on for the other values.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions