Closed
Description
Is your feature request related to a problem or challenge?
DataFusion supports this type of syntax, such as count(a, b)
and count(distinct a, b)
.
However, its behavior may not be well-defined. A counterintuitive behavior is that count(distinct a, b)
returns more rows than count(a, b)
.
DataFusion CLI v39.0.0
> create table t(a int, b int) as values(1, NULL), (NULL, 10), (NULL, NULL);
> select count(distinct a,b) from t;
+-------------------------+
| count(DISTINCT t.a,t.b) |
+-------------------------+
| 1 |
+-------------------------+
> select count(a,b) from t;
+----------------+
| count(t.a,t.b) |
+----------------+
| 0 |
+----------------+
Additionally, PostgreSQL does not support this type of syntax. MySQL only supports count(distinct a, b)
, but its result is different from DataFusion.
Server version: 8.3.0 Homebrew
mysql> create table t(a int, b int);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t values(1, NULL), (NULL, 10), (NULL, NULL);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select count(distinct a,b) from t;
+---------------------+
| count(distinct a,b) |
+---------------------+
| 0 |
+---------------------+
1 row in set (0.00 sec)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response