load gzipped models #49

kba · 2020-05-31T16:52:11Z

pyrnn models are generally distributed gzipped. If loading gzipped models is inefficient, I can also adapt OCR-D/ocrd_all#103 to gunzip models after download.

bertsky

This merge came to fast for me. I suggest reverting.

bertsky · 2020-06-02T20:58:18Z

ocrd_cis/ocropy/recognize.py

@@ -131,7 +131,7 @@ def process(self):
        Produce a new output file by serialising the resulting hierarchy.
        """
        # from ocropus-rpred:
-        self.network = load_object(self.get_model(), verbose=1)
+        self.network = load_object(self.get_model(), zip=1, verbose=1)


I don't think this is necessary. The default setting zip=0 already decompressed files if they are named *.gz. See here:

ocrd_cis/ocrd_cis/ocropy/ocrolib/common.py

Line 447 in b7bba57

if zip==0 and fname.endswith(".gz"):

Ignoring the file name will now make parameter files (or workflow definitions) fail which don't use .gz suffix, for example because of an old workaround to #41

finkf · 2020-06-03T09:37:59Z

fine by me.

kba · 2020-06-03T12:40:44Z

I was a bit too eager with the PR, this was not meant to be merged as such, I had a lot of trouble getting ocropy to find the models and this fixed it for me. Ocropy should gunzip even with zip==0 if model ends with .gz and should expand the filename with a .gz suffix when searching. But either there is a flaw in the logic somewhere (proper unit tests in ocropy would help) or in my setup.

bertsky · 2020-06-03T23:22:23Z

@kba In #41 I wrote this summary of what gets searched.

I agree we should test this, document this, and probably even generalise a little (all ocrolib search directories except OCROPUS_DATA are unrealistic in OCR-D context). Do we have a consensus/spec on where to search relative path names of model files yet? This seems important for deployment...

kba · 2020-06-10T12:18:38Z

Do we have a consensus/spec on where to search relative path names of model files yet?

You mean like https://ocr-d.de/en/models#ocropy--ocrd_cis ?

bertsky · 2020-06-10T12:27:23Z

Do we have a consensus/spec on where to search relative path names of model files yet?

You mean like https://ocr-d.de/en/models#ocropy--ocrd_cis ?

No, locally, like $VIRTUAL_ENV/lib/python3.6/site-packages/$MODULE or $VIRTUAL_ENV/lib/python3.6/site-packages/ocrd_models/processors or under a common environment variable. I mean, this is a problem for all processors that need model files. You usually cannot distribute them as part of package_data via PyPI or even in the same repo, because they are too big. But the package can expect some installer (presumably within ocrd_all) to download and copy it somewhere for it to find at runtime under a relative path.

finkf · 2020-06-10T12:39:47Z

I totally agree with @bertsky. But I guess this should be discussed in a separate issue. Maybe even on ocrd_core?

bertsky · 2020-06-10T13:26:20Z

But I guess this should be discussed in a separate issue.

Right. See OCR-D/spec#160

load gzipped models

3faaf7f

kba mentioned this pull request May 31, 2020

use pyrnn.gz models instead of pyrnn bertsky/workflow-configuration#14

Draft

finkf merged commit b7bba57 into cisocrgroup:dev Jun 2, 2020

bertsky reviewed Jun 2, 2020

View reviewed changes

bertsky mentioned this pull request Jun 3, 2020

Revert "load gzipped models" #52

Merged

kba deleted the load-gz branch June 10, 2020 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

load gzipped models #49

load gzipped models #49

Uh oh!

kba commented May 31, 2020

Uh oh!

bertsky left a comment

Uh oh!

bertsky Jun 2, 2020

Uh oh!

finkf commented Jun 3, 2020

Uh oh!

kba commented Jun 3, 2020

Uh oh!

bertsky commented Jun 3, 2020

Uh oh!

kba commented Jun 10, 2020

Uh oh!

bertsky commented Jun 10, 2020

Uh oh!

finkf commented Jun 10, 2020 •

edited

Loading

Uh oh!

bertsky commented Jun 10, 2020

Uh oh!

Uh oh!

load gzipped models #49

load gzipped models #49

Uh oh!

Conversation

kba commented May 31, 2020

Uh oh!

bertsky left a comment

Choose a reason for hiding this comment

Uh oh!

bertsky Jun 2, 2020

Choose a reason for hiding this comment

Uh oh!

finkf commented Jun 3, 2020

Uh oh!

kba commented Jun 3, 2020

Uh oh!

bertsky commented Jun 3, 2020

Uh oh!

kba commented Jun 10, 2020

Uh oh!

bertsky commented Jun 10, 2020

Uh oh!

finkf commented Jun 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertsky commented Jun 10, 2020

Uh oh!

Uh oh!

finkf commented Jun 10, 2020 •

edited

Loading