Skip to content

Review the behavior of count with multiple arguments #11303

Closed
@jonahgao

Description

@jonahgao

Is your feature request related to a problem or challenge?

DataFusion supports this type of syntax, such as count(a, b) and count(distinct a, b).
However, its behavior may not be well-defined. A counterintuitive behavior is that count(distinct a, b) returns more rows than count(a, b).

DataFusion CLI v39.0.0
> create table t(a int, b int) as values(1, NULL), (NULL, 10), (NULL, NULL);

> select count(distinct a,b) from t;
+-------------------------+
| count(DISTINCT t.a,t.b) |
+-------------------------+
| 1                       |
+-------------------------+

> select count(a,b) from t;
+----------------+
| count(t.a,t.b) |
+----------------+
| 0              |
+----------------+

Additionally, PostgreSQL does not support this type of syntax. MySQL only supports count(distinct a, b), but its result is different from DataFusion.

Server version: 8.3.0 Homebrew

mysql> create table t(a int, b int);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t  values(1, NULL), (NULL, 10), (NULL, NULL);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(distinct a,b) from t;
+---------------------+
| count(distinct a,b) |
+---------------------+
|                   0 |
+---------------------+
1 row in set (0.00 sec)

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions