Skip to content

As a user I want a clearer user-facing warning #714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HansVRP opened this issue Jan 27, 2025 · 15 comments
Closed

As a user I want a clearer user-facing warning #714

HansVRP opened this issue Jan 27, 2025 · 15 comments

Comments

@HansVRP
Copy link
Contributor

HansVRP commented Jan 27, 2025

When trying to create a database, which already exist, we currently don't give a clear cut error to our users.

Image

@HansVRP
Copy link
Contributor Author

HansVRP commented Jan 27, 2025

thoughts @VincentVerelst @soxofaan ?

@VincentVerelst
Copy link
Collaborator

Hi @HansVRP , when trying to create a new job database, the pre-defined classes in the Python Client (like CsvJobDatabase) have a convenience method called initialize_from_df. See: https://github.com/Open-EO/openeo-python-client/blob/master/openeo/extra/job_management/__init__.py#L811 for example.

This method will throw a FileExistsError in case your database already exists, and you have set on_exists='error, which is the default.

However, is the error in your screenshot above related to the database existing or not?

@HansVRP
Copy link
Contributor Author

HansVRP commented Jan 27, 2025

its a snapshot of a user who tried to rerun a preexisting notebook,

I think the situation is that the csv already exists and that the jobs were completed, and we try to run the job manager again of the existing csv file

@VincentVerelst
Copy link
Collaborator

Does the user have the full logs? Locally, I've managed to restart jobs from existing CSV files in the past

@soxofaan
Copy link
Member

do you have a copy of that CSV on which this fails?

@HansVRP
Copy link
Contributor Author

HansVRP commented Jan 28, 2025

recreated the issue:

1) initiate a database and run all jobs to completion.

job_tracker = 'jobs.csv'
job_db = CsvJobDatabase(path=job_tracker)
job_db.initialize_from_df(job_df)

manager.run_jobs(start_job=start_job, job_db=job_db)

2) keep the database with fulfilled jobs and run the the job mnager again:

manager.run_jobs(start_job=start_job, job_db=job_db)

results in:

_ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_35328\1710827948.py in ?()
---> 15 import openeo
     16 from openeo.extra.job_management import MultiBackendJobManager, CsvJobDatabase
     17 
     18 # Authenticate and add the backend

c:\Users\VROMPAYH\AppData\Local\anaconda3\envs\eotdl_openeo_env\lib\site-packages\openeo\extra\job_management\__init__.py in ?(self, df, start_job, job_db, **kwargs)
    491 
    492         # TODO: support user-provided `stats`
    493         stats = collections.defaultdict(int)
    494 
--> 495         while sum(job_db.count_by_status(statuses=["not_started", "created", "queued", "running"]).values()) > 0:
    496             self._job_update_loop(job_db=job_db, start_job=start_job, stats=stats)
    497             stats["run_jobs loop"] += 1
    498 

c:\Users\VROMPAYH\AppData\Local\anaconda3\envs\eotdl_openeo_env\lib\site-packages\openeo\extra\job_management\__init__.py in ?(self, statuses)
    835     def count_by_status(self, statuses: Iterable[str] = ()) -> dict:
--> 836         status_histogram = self.df.groupby("status").size().to_dict()
    837         statuses = set(statuses)
    838         if statuses:
    839             status_histogram = {k: v for k, v in status_histogram.items() if k in statuses}
...
   1578             f"The truth value of a {type(self).__name__} is ambiguous. "
   1579             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1580         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()._

jobs.csv

@HansVRP
Copy link
Contributor Author

HansVRP commented Jan 28, 2025

To summarize, it might be useful for users to create a more clear value error, or am I overlooking something?

@soxofaan
Copy link
Member

I agree that that error message is quite unhelpful, but at first sight it seems an internal pandas thing, it's a bit unclear at the moment to me how we could improve that.
And instead of attempting to improve the error message, we might actually fix the root problem.
But anyway, needs some more investigation

@soxofaan
Copy link
Member

ok I can actually reproduce much simpler with that CSV:

from openeo.extra.job_management import CsvJobDatabase
CsvJobDatabase(path='jobs.csv').read()

which fails with

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [6], line 1
----> 1 CsvJobDatabase(path=path).read()

File ~/src/openeo/openeo-python-client/openeo/extra/job_management/__init__.py:907, in CsvJobDatabase.read(self)
    904     import geopandas
    906     # `df.to_csv()` in `persist()` has encoded geometries as WKT, so we decode that here.
--> 907     df = geopandas.GeoDataFrame(df, geometry=geopandas.GeoSeries.from_wkt(df["geometry"]))
    908 return df

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/geopandas/geodataframe.py:188, in GeoDataFrame.__init__(self, data, geometry, crs, *args, **kwargs)
    180     if (
    181         hasattr(geometry, "crs")
    182         and geometry.crs
    183         and crs
    184         and not geometry.crs == crs
    185     ):
    186         raise ValueError(crs_mismatch_error)
--> 188     self.set_geometry(geometry, inplace=True, crs=crs)
    190 if geometry is None and crs:
    191     raise ValueError(
    192         "Assigning CRS to a GeoDataFrame without a geometry column is not "
    193         "supported. Supply geometry using the 'geometry=' keyword argument, "
    194         "or by providing a DataFrame with column name 'geometry'",
    195     )

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/geopandas/geodataframe.py:347, in GeoDataFrame.set_geometry(self, col, drop, inplace, crs)
    345 # Check that we are using a listlike of geometries
    346 level = _ensure_geometry(level, crs=crs)
--> 347 frame[geo_column_name] = level
    348 frame._geometry_column_name = geo_column_name
    349 if not inplace:

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/geopandas/geodataframe.py:1440, in GeoDataFrame.__setitem__(self, key, value)
   1438 try:
   1439     crs = getattr(self, "crs", None)
-> 1440     value = _ensure_geometry(value, crs=crs)
   1441 except TypeError:
   1442     warnings.warn("Geometry column does not contain geometry.")

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/geopandas/geodataframe.py:55, in _ensure_geometry(data, crs)
     52     if data.crs is None and crs is not None:
     53         # Avoids caching issues/crs sharing issues
     54         data = data.copy()
---> 55         data.crs = crs
     56     return data
     57 else:

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/pandas/core/generic.py:6313, in NDFrame.__setattr__(self, name, value)
   6311 try:
   6312     object.__getattribute__(self, name)
-> 6313     return object.__setattr__(self, name, value)
   6314 except AttributeError:
   6315     pass

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/geopandas/base.py:173, in GeoPandasBase.crs(self, value)
    170 @crs.setter
    171 def crs(self, value):
    172     """Sets the value of the crs"""
--> 173     self.geometry.values.crs = value

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/geopandas/array.py:339, in GeometryArray.crs(self, value)
    336 @crs.setter
    337 def crs(self, value):
    338     """Sets the value of the crs"""
--> 339     self._crs = None if not value else CRS.from_user_input(value)

File ~/src/openeo/openeo-python-client/venv/lib/python3.9/site-packages/pandas/core/generic.py:1577, in NDFrame.__nonzero__(self)
   1575 @final
   1576 def __nonzero__(self) -> NoReturn:
-> 1577     raise ValueError(
   1578         f"The truth value of a {type(self).__name__} is ambiguous. "
   1579         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1580     )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

so it is failing in trying to read that file as geopandas dataframe
(not sure why those parts were missing from your stack trace dump)

@soxofaan
Copy link
Member

failing to read the CSV is apparently because of the presence of a column named "crs":

I can confirm that removing the "crs" column eliminates the reading problem

@soxofaan
Copy link
Member

soxofaan commented Jan 28, 2025

not sure how to address this properly

if I understand geopandas/geopandas#2944 correctly, the situation should be better with geopandas>0.13.2
my test environment has geopandas 0.12.1 at the moment FYI

what geopandas version are you using atm?

@HansVRP
Copy link
Contributor Author

HansVRP commented Jan 28, 2025

1.0.1

@soxofaan
Copy link
Member

soxofaan commented Jan 28, 2025

hmm ok , it seems this bug geopandas/geopandas#2942 was only fixed for this usage pattern:

geopandas.GeoDataFrame(
    {
        "crs": [1],
        "geometry": [shapely.geometry.Point(2, 3)]
    }
)

but it still exists (e.g. in geopandas 1.0.1) for the following (which we use in openeo python client):

geopandas.GeoDataFrame(
    {
        "crs": [1],
    },
    geometry=[shapely.geometry.Point(2, 3)]
)

@soxofaan
Copy link
Member

@soxofaan
Copy link
Member

added workaround with 0d295dc

soxofaan added a commit that referenced this issue Jan 28, 2025
Workaround for #714 isn't available on Python 3.8 due to dropped support
in geopandas==0.14
soxofaan added a commit that referenced this issue Jan 28, 2025
Workaround for #714 requires at least geopandas 0.14 but that version is
not available on Python 3.8 (#717)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants