Skip to content
This repository was archived by the owner on Jun 2, 2025. It is now read-only.
This repository was archived by the owner on Jun 2, 2025. It is now read-only.

ml_id is nan when using pvnet_site_datapipe and leads to unnecessary fills #321

@AUdaltsova

Description

@AUdaltsova

Describe the issue

In ocf_datapipes.load.pv._load_pv_metadata when ml_id is absent in the metadata file it gets added as np.nan, which when using pv_site with just one site gets saved as np.nan, and then in training gets imputed as 0 each time an observation is loaded.

Does not affect results since in this case ml_id is not used anyway, but leads to a lot of small unnecessary operations.

To Reproduce

Visible when training with batches with pv_site data. Example log:

Epoch 6:  89%|████████▉ | 267/300 [19:05<02:21,  0.23it/s, v_num=only][2024-05-31 13:24:46,228][pvnet_site_datapipe][INFO] - Filtering out samples with no data
[2024-05-31 13:24:46,340][ocf_datapipes.training.common][INFO] - Filled NaNs with zeros - {'BatchKey.pv_ml_id'}
[2024-05-31 13:24:46,426][pvnet_site_datapipe][INFO] - Filtering out samples with no data
[2024-05-31 13:24:46,537][ocf_datapipes.training.common][INFO] - Filled NaNs with zeros - {'BatchKey.pv_ml_id'}
[2024-05-31 13:24:46,616][pvnet_site_datapipe][INFO] - Filtering out samples with no data

Suggested fix

When creating ml_id column here for pvnet and here for windnet fill with 0 instead of np.nan

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions