Skip to content

IDataView Cleanup: Remove lazy parameter from get row count #1531

Closed
@TomFinley

Description

@TomFinley

How do you tell the number of rows in a data view?

There is this very interesting method here.

long? GetRowCount(bool lazy = true);

The semantics of this are somewhat odd (though logical, in their way), and basically boil down to: under the default value of lazy=true, only return the row count if it is basically an O(1) operation. But what if this returns null? Then you have the lazy=false operation! This is a hint that we ought to possibly expend more effort, but only if doing so entails less work than just iterating over every row and counting directly.

Indeed, this is what this utility function does:

public static long ComputeRowCount(IDataView view)

First it asks (with lazy=false!) for the row count, and failing that will actually open a cursors (with no rows active) and directly count the number.

This is all fairly logical. However, as a practical matter, no one ever bothered to implement a lazy=false different code path as far as I am aware. This is not to say they couldn't have -- you might imagine some text-loader that without trying to parse anything merely counts the number of newline characters in a file, which would be much faster than an iteration -- but again, as a practical matter, no one did.

This suggests removing this lazy parameter to simplify the interface. It would still have the same semantics, just without all the complication of explaining lazy. (Though we'd still need to make clear to people that they should be lazy in the implementation notes.)

/cc @Zruty0 @shauheen @terrajobst

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions