Skip to content

load gzipped models #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 2, 2020
Merged

load gzipped models #49

merged 1 commit into from
Jun 2, 2020

Conversation

kba
Copy link
Contributor

@kba kba commented May 31, 2020

pyrnn models are generally distributed gzipped. If loading gzipped models is inefficient, I can also adapt OCR-D/ocrd_all#103 to gunzip models after download.

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merge came to fast for me. I suggest reverting.

@@ -131,7 +131,7 @@ def process(self):
Produce a new output file by serialising the resulting hierarchy.
"""
# from ocropus-rpred:
self.network = load_object(self.get_model(), verbose=1)
self.network = load_object(self.get_model(), zip=1, verbose=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. The default setting zip=0 already decompressed files if they are named *.gz. See here:

if zip==0 and fname.endswith(".gz"):

Ignoring the file name will now make parameter files (or workflow definitions) fail which don't use .gz suffix, for example because of an old workaround to #41

@finkf
Copy link
Contributor

finkf commented Jun 3, 2020

fine by me.

@kba
Copy link
Contributor Author

kba commented Jun 3, 2020

I was a bit too eager with the PR, this was not meant to be merged as such, I had a lot of trouble getting ocropy to find the models and this fixed it for me. Ocropy should gunzip even with zip==0 if model ends with .gz and should expand the filename with a .gz suffix when searching. But either there is a flaw in the logic somewhere (proper unit tests in ocropy would help) or in my setup.

@bertsky
Copy link
Collaborator

bertsky commented Jun 3, 2020

@kba In #41 I wrote this summary of what gets searched.

I agree we should test this, document this, and probably even generalise a little (all ocrolib search directories except OCROPUS_DATA are unrealistic in OCR-D context). Do we have a consensus/spec on where to search relative path names of model files yet? This seems important for deployment...

@kba kba deleted the load-gz branch June 10, 2020 12:16
@kba
Copy link
Contributor Author

kba commented Jun 10, 2020

Do we have a consensus/spec on where to search relative path names of model files yet?

You mean like https://ocr-d.de/en/models#ocropy--ocrd_cis ?

@bertsky
Copy link
Collaborator

bertsky commented Jun 10, 2020

Do we have a consensus/spec on where to search relative path names of model files yet?

You mean like https://ocr-d.de/en/models#ocropy--ocrd_cis ?

No, locally, like $VIRTUAL_ENV/lib/python3.6/site-packages/$MODULE or $VIRTUAL_ENV/lib/python3.6/site-packages/ocrd_models/processors or under a common environment variable. I mean, this is a problem for all processors that need model files. You usually cannot distribute them as part of package_data via PyPI or even in the same repo, because they are too big. But the package can expect some installer (presumably within ocrd_all) to download and copy it somewhere for it to find at runtime under a relative path.

@finkf
Copy link
Contributor

finkf commented Jun 10, 2020

I totally agree with @bertsky. But I guess this should be discussed in a separate issue. Maybe even on ocrd_core?

@bertsky
Copy link
Collaborator

bertsky commented Jun 10, 2020

But I guess this should be discussed in a separate issue.

Right. See OCR-D/spec#160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants