Skip to content
This repository was archived by the owner on Aug 15, 2019. It is now read-only.

Add model store managers #1038

Merged
merged 11 commits into from
May 17, 2018
Merged

Add model store managers #1038

merged 11 commits into from
May 17, 2018

Conversation

caisq
Copy link
Collaborator

@caisq caisq commented May 15, 2018

  • Add interface
  • Add implementation for local storage:
    tf.io.browserLocalStorageManager()
  • Add implementation for IndexedDB:
    tf.io.browserIndexedDBManager()
  • Add public API for
    • Listing models: e.g., tf.io.listModels();
    • Copying model: e.g., tf.io.copyModel('indexeddb://foo', 'localstorage://bar');
    • Moving model: e.g., tf.io.moveModel('indexeddb://foo', 'localstorage://bar');
    • Removing model: e.g., tf.io.removeModel('indexeddb://foo');

Each of the two implementation supports:

  • Listing all models: their paths and some meta data
    (ModelArtifactsInfo), such as topology type,
    byte sizes of topology and weights, etc.
  • Deleting a model by URL.
  • Copying a model by URL.

Change the IndexedDB to two tables (object stores) for
efficient look up of ModelArtifactsInfo:

  • One for ModelArtifactsInfo (faster access)
  • One for the actual artifacts (slower access)

Also, change "KerasJSON" to "JSON"

Towards: tensorflow/tfjs#13


This change is Reviewable

caisq added 3 commits May 15, 2018 15:41
- Add interface
- Add implementation for local storage:
  tf.io.browserLocalStorageManager()
- Add implementation for IndexedDB:
  tf.io.browserIndexedDB()

Each of the two implementation supports:
- Listing all models: their paths and some meta data
  (ModelArtifactsInfo), such as topology type,
  byte sizes of topology of weights, etc.
- Deleting a model by path.
- Copying a model by path.

Change the IndexedDB to two tables (object stores) for
efficient look up of ModelArtifactsInfo:
- One for ModelArtifactsInfo (faster access)
- One for the actual artifacts (slower access)

Also, change "KerasJSON" to "JSON"
1. Simplify copyModel() code for local storage and IndexedDB by
   reusing the load() and save() logic.
2. In IndexedDB.save(), roll back incomplete results on failure
   at second step.
@dsmilkov
Copy link
Contributor

Really nice work Shanqing. Few small comments, 1 more on the design side regarding having a registry of ModelStores instead of API specific methods for each type of ModelStore.


Reviewed 7 of 9 files at r1.
Review status: 7 of 9 files reviewed at latest revision, all discussions resolved, some commit checks failed.


src/io/indexed_db.ts, line 34 at r1 (raw file):

// 1. The object store for model data: topology, weights and weight manifests.
const MODEL_STORE_NAME = 'models_store';
// 2. The object store forModelArtifactsInfo, including meta-information such as

typo: s/forModelArtifactsInfo/for ModelArtifactsInfo/. Also s/meta-information/metadata/ for consistency.


src/io/io.ts, line 41 at r1 (raw file):

  browserFiles,
  browserHTTPRequest,
  browserIndexedDBManager,

no need to expose these to the API. Instead expose global listModels/removeModel methods that take storage type as protocol prefix.


src/io/local_storage.ts, line 335 at r1 (raw file):

 *
 * // Create a manager and use it to list existing models.
 * const manager = tf.io.browserLocalStorageManager();

A great benefit to having an interface ModelStorageManager is to hide the need for having storage-specific API (e.g. tf.io.xxxxxxStorageManager()). In this case, I would have generic method tf.io.listModel(), tf.io.deleteModel() etc, that take a path with protocol as prefix (e.g. localstorage://). The protocol points to the actual store that is registered in a registry. I missed your previous PR but IIUC, you already seem to be using registry when saving a model (e.g. model.save(protocol://...)). The registry design completely decouples storage mechanisms from the core API - which is great for future growth and external users registering their own mechanisms.


src/io/types.ts, line 223 at r1 (raw file):

 * - Copying a model in the store, from one path to another.
 */
export interface ModelStoreManager {

nice job on this clean interface


src/io/types.ts, line 253 at r1 (raw file):

   *   if `oldPath` and `newPath` are identical.
   */
  copyModel(oldPath: string, newPath: string): Promise<ModelArtifactsInfo>;

just curious, what's the use-case for copying models?


Comments from Reviewable

Copy link
Contributor

@bileschi bileschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some nits & some thoughts on the API.

// 2. The object store forModelArtifactsInfo, including meta-information such as
// the type of topology (JSON vs binary), byte size of the topology, byte
// size of the weights, etc.
const INFO_STORE_NAME = 'model_info_store';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for bikeshedding style naming suggestions, but:
MODEL_METADATA_STORE_NAME perhaps?

const getRequest = store.get(this.modelPath);
const modelTx = db.transaction(
MODEL_STORE_NAME,
modelArtifacts == null ? 'readonly' : 'readwrite');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelArtifacts is guaranteed to be null here ...

// First, put ModelArtifactsInfo into info store.
const infoTx = db.transaction(
INFO_STORE_NAME,
modelArtifacts == null ? 'readonly' : 'readwrite');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelArtifacts is guaranteed to not be null here

// Second, put model data into model store.
modelTx = db.transaction(
MODEL_STORE_NAME,
modelArtifacts == null ? 'readonly' : 'readwrite');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also

resolve({modelArtifactsInfo});
};
putModelRequest.onerror = error => {
// If the put-model request fails, roll back the info entry as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic makes sense, but requires significant mental effort to follow. If it gets more complicated than this I could see it becoming difficult to manage. Is there a way to modularize or pull some of this out for greater readability?

If not, or if it would be even worse, no big deal.

src/io/types.ts Outdated
* Delete a model specified by `path`.
*
* @param path
* @returns ModelArtifactsInfo of the deleted model (if and only if deletion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does deleting return the ModelArtifactsInfo? Is this a common pattern for the deletion of assets in JS? Are we creating a difficult-to-fix problem if some storage medium can sense presence & delete much faster than access the ModelArtifactsInfo?

src/io/types.ts Outdated
*
* @param oldPath
* @param newPath
* @returns ModelArtifactsInfo of the copied model (if and only if copying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above, should these methods return some sort of status object different from the ModelArtifactsInfo themselves?

@nsthorat
Copy link
Contributor

Review status: 7 of 9 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


src/io/indexed_db.ts, line 185 at r1 (raw file):

            });
            putModelRequest.onsuccess = () => {
              resolve({modelArtifactsInfo});

you can one-line this


src/io/indexed_db.ts, line 406 at r1 (raw file):

 * // Delete a model; list models again.
 * await manager.deleteModel('demo/local-storage-manager/model1');
 * console.log(await manager.listModels());

is the idea here that you're going to expand the API?

Why not just have a top-level list method that takes a storage mechanism and lists the model names. Then you can have another "remove" method that takes a the string path like tf.io.deleteModel('indexeddb://my-model');


src/io/io.ts, line 41 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

no need to expose these to the API. Instead expose global listModels/removeModel methods that take storage type as protocol prefix.

+1, see my comment above


Comments from Reviewable

@caisq
Copy link
Collaborator Author

caisq commented May 16, 2018

Review status: 7 of 9 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


src/io/indexed_db.ts, line 34 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

typo: s/forModelArtifactsInfo/for ModelArtifactsInfo/. Also s/meta-information/metadata/ for consistency.

Done.


src/io/indexed_db.ts, line 37 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

Apologies for bikeshedding style naming suggestions, but:
MODEL_METADATA_STORE_NAME perhaps?

The names here don't really matter that much. They are private to this file. The current name is shorter. So I vote to keep it.


src/io/indexed_db.ts, line 143 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

modelArtifacts is guaranteed to be null here ...

Done.


src/io/indexed_db.ts, line 168 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

modelArtifacts is guaranteed to not be null here

Done.


src/io/indexed_db.ts, line 177 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

here also

Done.


src/io/indexed_db.ts, line 185 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

you can one-line this

Done.


src/io/indexed_db.ts, line 188 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

This logic makes sense, but requires significant mental effort to follow. If it gets more complicated than this I could see it becoming difficult to manage. Is there a way to modularize or pull some of this out for greater readability?

If not, or if it would be even worse, no big deal.

I agree it's a little unreadable. But this is the price we have to pay for keeping two tables. I don't see a way to combine two transactions into an atomic one in IndexedDB API.


src/io/indexed_db.ts, line 406 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

is the idea here that you're going to expand the API?

Why not just have a top-level list method that takes a storage mechanism and lists the model names. Then you can have another "remove" method that takes a the string path like tf.io.deleteModel('indexeddb://my-model');

I've made changes to adopt the suggestion. See my reply to Daniel's comments.


src/io/io.ts, line 41 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

+1, see my comment above

Great point. In this revision I removed this exports and added the new methods such as:

console.log(await tf.io.listModels());

await tf.io.removeModel('indexeddb://foo');
await tf.io.copyModel('indexeddb://foo', 'localstorage://bar');  // Copying between mediums is supported.
await tf.io.moving('indexeddb://foo', 'localstorage://bar');  // Moving between mediums is supported.

Thanks for the great suggestion.


src/io/local_storage.ts, line 335 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

A great benefit to having an interface ModelStorageManager is to hide the need for having storage-specific API (e.g. tf.io.xxxxxxStorageManager()). In this case, I would have generic method tf.io.listModel(), tf.io.deleteModel() etc, that take a path with protocol as prefix (e.g. localstorage://). The protocol points to the actual store that is registered in a registry. I missed your previous PR but IIUC, you already seem to be using registry when saving a model (e.g. model.save(protocol://...)). The registry design completely decouples storage mechanisms from the core API - which is great for future growth and external users registering their own mechanisms.

See my reply to your comment above.


src/io/types.ts, line 237 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

Why does deleting return the ModelArtifactsInfo? Is this a common pattern for the deletion of assets in JS? Are we creating a difficult-to-fix problem if some storage medium can sense presence & delete much faster than access the ModelArtifactsInfo?

We could also return nothing. Returning a ModelArtifactsInfo to the caller is useful (e.g., for the caller to keep track of how much more space has been freed from IndexedDB or Local Storage).


src/io/types.ts, line 248 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

Same question as above, should these methods return some sort of status object different from the ModelArtifactsInfo themselves?

I think returning ModelArtifactsInfo should be sufficient. The presence of a return value indicates that the copy or move succeeded. (In case of failure, an Error will be thrown). It is unclear what other pieces of info would be helpful. The source and destination URLs are known to the caller.


src/io/types.ts, line 253 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

just curious, what's the use-case for copying models?

Good question. I discussed with others offline. We agree that both moving (i.e, renaming) and copying models will be useful, within and between storage mediums. The new API now has both supported.


Comments from Reviewable

@bileschi
Copy link
Contributor

Review status: 3 of 11 files reviewed at latest revision, 8 unresolved discussions.


src/io/model_management.ts, line 65 at r2 (raw file):

    }
    assert(scheme.length > 0, 'scheme must not be an empty string.');
    ModelStoreManagerRegistry.getInstance().managers[scheme] = manager;

protect against multiple managers attempting to use the same scheme?


src/io/model_management_test.ts, line 268 at r2 (raw file):

Successful moveModel between mediums

moveModel with identical src/dest medium has a special code path. Is there a test that this works? I could have missed it.


src/io/model_management_test.ts, line 316 at r2 (raw file):

to 

nit: copying from invalid URL ...


src/io/model_management_test.ts, line 367 at r2 (raw file):

 done();

suggestion : test that the source has not been deleted in this case.


Comments from Reviewable

@bileschi
Copy link
Contributor

Great! Some small comments.


Comments from Reviewable

@bileschi
Copy link
Contributor

:lgtm_strong:


Comments from Reviewable

@caisq
Copy link
Collaborator Author

caisq commented May 16, 2018

Review status: 3 of 11 files reviewed at latest revision, 12 unresolved discussions, some commit checks pending.


src/io/model_management.ts, line 65 at r2 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

protect against multiple managers attempting to use the same scheme?

Done.


src/io/model_management_test.ts, line 268 at r2 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…
Successful moveModel between mediums

moveModel with identical src/dest medium has a special code path. Is there a test that this works? I could have missed it.

Moving / copying model within the same medium is covered in the unit tests in local_storage_test.ts and indexed_db_test.ts.


src/io/model_management_test.ts, line 316 at r2 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…
to 

nit: copying from invalid URL ...

Done.


src/io/model_management_test.ts, line 367 at r2 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…
 done();

suggestion : test that the source has not been deleted in this case.

Good suggestion! Done.


Comments from Reviewable

@nsthorat
Copy link
Contributor

:lgtm_strong:


Review status: 3 of 11 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


Comments from Reviewable

@caisq caisq merged commit b400eed into tensorflow:master May 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants