Skip to content

Column name validation should run only once: early during initialization #849

Closed
@sebastian-peter

Description

@sebastian-peter

This came up originally in #807, which is only a quick fix for part 1 of the problem:

Ok I just realized we already have a validation in Factory#validateParameters, which is called by Factory#get(D), which is called everytime that any entity is parsed from some source. The problem with IdCoordinateSources is that these validations are called after some other things, like duplicate removal, happened, that already access the data and fail if some column is missing.

Looking at Factory.get() as a whole, I'd describe two main issues:

  1. validation is sometimes, like in the case of IdCoordinateSource, called too late
  2. validation is executed for every single data point parsed from Factory.get(), which is pretty costly

So I think there's a way to fix both at the same time: Validate column names once during initialization (and early enough).

Originally posted by @sebastian-peter in #807 (comment)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcode qualityCode readability or structure is improved

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions