Skip to content

Default window frames to not match PostgreSQL #688

Closed
@timsaucer

Description

@timsaucer

Describe the bug
When no window frame is specified in the python implementation, we default to unbounded preceeding to current row. If we are to follow PostgreSQL implementation then we should set this value when order_by is specified and otherwise default to unbounded preceeding to unbounded following.

To Reproduce

from datafusion import SessionContext, WindowFrame, col, lit, functions as F
import pyarrow as pa

ctx = SessionContext()

# create a RecordBatch and a new DataFrame from it
batch = pa.RecordBatch.from_arrays(
    [pa.array([1.0, 10.0, 20.0])],
    names=["a"],
)

df = ctx.create_dataframe([[batch]])

window_frame = WindowFrame("rows", None, None)

df = df.select(col("a"), F.window("avg", [col("a")]).alias('no_frame'), F.window("avg", [col("a")], window_frame=window_frame).alias('with_frame'))

df.show()

Produces:

DataFrame()
+------+--------------------+--------------------+
| a    | no_frame           | with_frame         |
+------+--------------------+--------------------+
| 1.0  | 1.0                | 10.333333333333334 |
| 10.0 | 5.5                | 10.333333333333334 |
| 20.0 | 10.333333333333334 | 10.333333333333334 |
+------+--------------------+--------------------+

Expected behavior
When order_by is not specified, default to unbounded preceeding to unbounded following.

Additional context
The offending line of code appears to be here:

https://github.com/apache/datafusion-python/blob/main/src/functions.rs#L230

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions