Closed
Description
Describe the bug
When no window frame is specified in the python implementation, we default to unbounded preceeding to current row. If we are to follow PostgreSQL implementation then we should set this value when order_by
is specified and otherwise default to unbounded preceeding to unbounded following.
To Reproduce
from datafusion import SessionContext, WindowFrame, col, lit, functions as F
import pyarrow as pa
ctx = SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pa.RecordBatch.from_arrays(
[pa.array([1.0, 10.0, 20.0])],
names=["a"],
)
df = ctx.create_dataframe([[batch]])
window_frame = WindowFrame("rows", None, None)
df = df.select(col("a"), F.window("avg", [col("a")]).alias('no_frame'), F.window("avg", [col("a")], window_frame=window_frame).alias('with_frame'))
df.show()
Produces:
DataFrame()
+------+--------------------+--------------------+
| a | no_frame | with_frame |
+------+--------------------+--------------------+
| 1.0 | 1.0 | 10.333333333333334 |
| 10.0 | 5.5 | 10.333333333333334 |
| 20.0 | 10.333333333333334 | 10.333333333333334 |
+------+--------------------+--------------------+
Expected behavior
When order_by
is not specified, default to unbounded preceeding to unbounded following.
Additional context
The offending line of code appears to be here:
https://github.com/apache/datafusion-python/blob/main/src/functions.rs#L230