Closed
Description
This came up originally in #807, which is only a quick fix for part 1 of the problem:
Ok I just realized we already have a validation in Factory#validateParameters
, which is called by Factory#get(D)
, which is called everytime that any entity is parsed from some source. The problem with IdCoordinateSources is that these validations are called after some other things, like duplicate removal, happened, that already access the data and fail if some column is missing.
Looking at Factory.get()
as a whole, I'd describe two main issues:
- validation is sometimes, like in the case of
IdCoordinateSource
, called too late - validation is executed for every single data point parsed from
Factory.get()
, which is pretty costly
So I think there's a way to fix both at the same time: Validate column names once during initialization (and early enough).
Originally posted by @sebastian-peter in #807 (comment)