Skip to content

Offline synchronization #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sebastianhaas opened this issue Apr 8, 2015 · 18 comments
Closed

Offline synchronization #71

sebastianhaas opened this issue Apr 8, 2015 · 18 comments
Assignees

Comments

@sebastianhaas
Copy link

Hey!
Are there any plans to add offline synchronization capabilities as described here [1] to the Android/iOS SDKs? I found some google groups threads from 2014 on that topic, but no changes since then. Is this feature already scheduled?

Thank you!

[1] http://docs.strongloop.com/display/public/LB/Synchronization

@sebastianhaas
Copy link
Author

If not, is this something which you think could be easily done by looking at the existing browser version? Would you accept PRs on that issue?

@bajtos
Copy link
Contributor

bajtos commented Apr 21, 2015

Hey! We don't have plans for adding offline sync to the Android SDK in the near future, but we will gladly accept PRs implementing this feature!

Beware, the implementation won't be exactly easy. If you decide to use the same design as we have in the javascript/browser version, then you need to:

  • Implement a persistence layer that supports LoopBack PersistedModel API (CRUD + our querying language).
  • Refactor the current Adapter design to support pluggable adapters. I think the current implementation is too coupled with REST and may not allow users to swap REST with local persistence.
  • Reimplement the replication algorithm in Java.

Having said that, I think the isomorphic "replicate from anywhere to anywhere" design of the JavaScript version is a bit overkill for Android apps. A simpler design focused on offline sync may work better:

  • Implement new base classes Model/ModelRepository that store data locally instead of calling REST.
  • Implement a background sync component that takes (possibly arbitrary number of) your new ModelRepository, a RestAdapter instance, and performs whatever is needed to replicate the changes. This component can make direct REST API calls to the backend, there's no need to go through Model/ModelRepository abstraction.

@afaucogney
Copy link

Did you ever consider using the Robospice component for doing this ? It would also enable caching pretty well the data repo. I'm not dev crack enough for doing this, but definitively may help in doing this.

@TrevorPage
Copy link

TrevorPage commented Jan 5, 2017

@bajtos In the suggestion you make:

This component can make direct REST API calls to the backend, there's no need to go through Model/ModelRepository abstraction.

Are you suggesting that the Android app could still make use of Loopback's Syncronisation API (such that the Android client "looks like" a client Loopback instance, but actually isn't), or rather are you suggesting it would be better to simply roll your own sync strategy?

I've stumbled across this issue, and the documentation about the Loopback Offline Syncronisation feature, because I want to synchronise objects between my Android application and Loopback backend.

I am using the classic combination of SQLiteOpenHelper, ContentProvider, and SyncAdapter to handle local storage and server synchronisation. For my purposes I only need simplistic synchronisation. For the client to check for data changes at the backend in a way that minimises network use, what I was planning to do is to set the if-modified-since header in the client with the time of last sync, and have a Loopback beforeRemote hook to check that header and only send response data if the server collection is newer.

However, I was just curious about whether I could simply enable Sync on the models in my Loopback server to automatically provide me with extra endpoints to provide diffs, revisions, etc.

@bajtos
Copy link
Contributor

bajtos commented Jan 5, 2017

@TrevorPage

Are you suggesting that the Android app could still make use of Loopback's Syncronisation API (such that the Android client "looks like" a client Loopback instance, but actually isn't), or rather are you suggesting it would be better to simply roll your own sync strategy?

IIRC, my idea was to leverage the server-side Synchronisation API that's already available and implement a custom Android client invoking this API. I am not familiar with Android's ContentProvider/SyncAdapter feature, after a quick read of https://developer.android.com/training/sync-adapters/index.html I think they may be the right abstractions to use.

The synchronisation (we call it "change replication") implemented by LoopBack does not rely on timestamps (like if-modified-since), we use the concept of checkpoints instead. A checkpoint is a number increased whenever a change replication occurs. The client the can query for changes made after the last checkpoint (see the since argument of PersistedModel.changes()).

If you are looking for one-way synchronisation only (Android client fetching updates from the server), then I think it may be reasonably easy to implement a SyncAdapter using the current change-replication API.

However, I was just curious about whether I could simply enable Sync on the models in my Loopback server to automatically provide me with extra endpoints to provide diffs, revisions, etc.

Yes, the flag is called trackChanges. See Enable change tracking in our documentation. I think the other changes described in that section may not be needed if all you want is to fetch changes from the server.

Our docs contains also a list of methods that are related to change replication (link), see also the source code of the method implementing change replication: lib/persisted-model.js#L1120-L1308.

I think the simplified algorithm for pulling the changes from the server would be along these lines:

  1. Create a new server checkpoint by calling POST /api/mymodels/checkpoint
  2. Get a list of changes made since the last checkpoint we pulled (CP) by calling GET /api/mymodels/changes?since=CP
  3. You can skip the diff step because there are no changes made locally to diff against
  4. Get a list of updates to perform locally via POST /api/mymodels/createUpdates, sending the list of changes from step 2 in the request body (I am not entirely sure about the format, the request data may require addition manipulation)
  5. Apply the updates returned by the server (update/create/delete affected model instances), see the implementation of bulkUpdate to learn more.
  6. Save the checkpoint number returned in Step1 as the CP value to use in the next run.

@TrevorPage
Copy link

TrevorPage commented Jan 5, 2017

@bajtos Thank you very much for the detailed guidance.

As you suggested I have enabled trackChanges on one of my models (and also changed the id property to an auto-generated GUID as per the documentation). I then found it incredibly useful to be able to use the StrongLoop API Explorer to see the new endpoints this introduced and experiment with them. I'm ever so impressed by how all this works.

One concern I have is that, for a client to see if there are new changes at the server using the algorithm you propose, the client would be bumping up the checkpoint number ever time it polls for changes. Does this risk creating any unnecessary bloat in the database over time? The slight variation to this strategy could be for the client to simply do GET /api/mymodels/changes?since=CP in the first instance (without creating a checkpoint first); and only if there were new changes would the client then get the 'head' checkpoint (and create one if one doesn't exist at the 'head').

The other concern I have is how much the change history in the database will grow over time, which might be an unnecessary overhead if I don't need the change history (when all I really want to achieve is "is the server data newer than my local data?". Is it possible to remove older change history?

Finally there's this 'known issue' in the documentation:

LoopBack does not fully support fine-grained access control to a selected subset of model instances, therefore it is not possible to replicate models where the user can access only a subset of instances (for example only the instances the user has created)

Does this mean that if the myModels collection contains objects created by different users, and a particular client only wants diffs to models owned by that particular user, would I have an issue using the Sync API?

@TrevorPage
Copy link

Ah! Using Robomongo I now see that the checkpoint just exists as one record. And, if I ever wanted to, it seems I can safely trim the contents of myModels-Change.

I still need to get my head around the meaning of the user access issue. Perhaps it doesn't concern me if I don't use the replication feature of the API.

@bajtos
Copy link
Contributor

bajtos commented Jan 6, 2017

One concern I have is that, for a client to see if there are new changes at the server using the algorithm you propose, the client would be bumping up the checkpoint number ever time it polls for changes. Does this risk creating any unnecessary bloat in the database over time? The slight variation to this strategy could be for the client to simply do GET /api/mymodels/changes?since=CP in the first instance (without creating a checkpoint first); and only if there were new changes would the client then get the 'head' checkpoint (and create one if one doesn't exist at the 'head').

I guess if you have many clients that are checking for changes often, then the checkpoint number can eventually overflow int32/int64 limit. The variation you proposed looks sensible to me, as long as the cost of the extra request is not significant.

Does this mean that if the myModels collection contains objects created by different users, and a particular client only wants diffs to models owned by that particular user, would I have an issue using the Sync API?

Yes. Right now, the built-in sync API does not provide any way for filtering the model instances. I.e. your change list will include instances created by other users, and the "bulkUpdate" endpoint will allow the clients to update any instances, including those created by other users.

BTW this is true for the built-in find method too - it cannot filter the results based on the currently-logged-in user and if there are ACLs using $owner role, then find returns 401 unauthorized IIRC.

I think there may be a solution though:

  • Disable bulkUpdate and any other unused replication endpoints - see https://loopback.io/doc/en/lb3/Exposing-models-over-REST.html
  • Provide a custom implementation of changes method that will fill filter argument in such way that only the records of the currently logged-in user are shown.
  • Provide a custom implementation of createUpdates that will restrict the accessed models to those allowed for the current user. Alternatively, modify your client to fetch all changes individually by making one request per each changed model, instead of calling createUpdates. This may be actually a better option, see below.
// common/models/my-model.js
module.exports = function(MyModel) {
  // the code here is not tested, may not work out-of-the-box

  MyModel.disableRemoteMethod('changes');
  MyModel.disableRemoteMethod('createUpdates');
  MyModel.disableRemoteMethod('bulkUpdate');
  // etc.
  
  MyModel.myChanges = function(since, options, cb) {
    var currentUserId = options && options.accessToken && options.accessToken.userId;
    if (!currentUserId) {
      var err = new Error('Not Authorized');
      err.statusCode = 401;
      return cb(err);
    }

   // assuming "owner" is the FK mapping to users
   var filter = {where: { owner: currentUserId }};
   this.changes(since, filter, cb);
  };

  MyModel.remoteMethod('myChanges', {
    accepts: [
      {arg: 'since', type: 'number', description:
        'Only return changes since this checkpoint'},
      {arg: 'options', type: 'object', http: 'optionsFromRequest'},
    ],
    returns: {arg: 'changes', type: 'array', root: true},
    http: {verb: 'get', path: '/my-changes'},
   });
};

(The code is relying on strongloop/loopback#1495 which was not published to npmjs.org yet.)

Loosely related:

@TrevorPage if you manage to implement an Android client using my suggestions and release it in a form that other LoopBack users can use, then we would be happy to feature it on our http://loopback.io/doc/en/community/index.html page.

@TrevorPage
Copy link

TrevorPage commented Jan 14, 2017

@bajtos Thank you again for taking the time to provide the further advice.

Your proposed myChanges method will be extremely useful. I have already implemented a new remote method called 'mine' to get all objects that only belong to that owner, accessed at /myObjects/mine. So the endpoint you propose above will sit alongside that quite nicely. I believe that the if (!currentUserId) check you're doing can be eliminated -- I assume that currentUserId simply has to exist, because my ACL restricts the endpoint to $authenicated.

Another idea is to expand my MyUser model to provide each user with a private checkpoint of sorts.

I will definitely try to split a stand-alone Android demonstration of this from my project if I can.

A few questions/concerns that still exist are:

  • I wonder if Loopback will ever support user ownership of objects 'out of the box'? At the moment, the fact that the Sync feature only works in terms of every client/user having access to the same 'global' collection seems to be at odds with the fact that many/most applications are private to each user. For example a "todo list" application (like in the Loopback example) might in the real world provide a separate todo list per user rather.

  • Using the Sync feature requires me to enable automatic assignment of GUIDs, but personally I'd like to continue using the MongoDB ObjectID format. Actually I doubt this is an issue because my client already self-generates and assigns the ObjectIDs to the objects before they are submitted to the API.

  • Sync feature requires strict validation. which conflicts with my need to have a lot of arbitrary properties. I've avoided turning on strict validation and the library hasn't issued a warning. EDIT: With strict validation turned off, at least the /changes endpoint still works exactly as it says on the tin, which is actually going to be sufficient for my needs.

@bajtos
Copy link
Contributor

bajtos commented Jan 16, 2017

I wonder if Loopback will ever support user ownership of objects 'out of the box'? At the moment, the fact that the Sync feature only works in terms of every client/user having access to the same 'global' collection seems to be at odds with the fact that many/most applications are private to each user. For example a "todo list" application (like in the Loopback example) might in the real world provide a separate todo list per user rather.

I think this is definitely a feature that we would like to eventually implement, but unfortunately it's not on our near-term roadmap.

cc @raymondfeng @ritch ☝️

Using the Sync feature requires me to enable automatic assignment of GUIDs, but personally I'd like to continue using the MongoDB ObjectID format. Actually I doubt this is an issue because my client already self-generates and assigns the ObjectIDs to the objects before they are submitted to the API.

Agreed - having a client to generate globally-unique ids (like ObjectIDs) and the server letting MongoDB to generate these ids should be equivalent to using LoopBack's GUID generator.

Sync feature requires strict validation. which conflicts with my need to have a lot of arbitrary properties. I've avoided turning on strict validation and the library hasn't issued a warning. EDIT: With strict validation turned off, at least the /changes endpoint still works exactly as it says on the tin, which is actually going to be sufficient for my needs.

The strict validation is required to reliably apply changes through LoopBack API.

Consider the following use case:

  1. Original model: { handle: 'bajtos', facebook: 'bajtos' }
  2. Updated data - user is no longer using Facebook: { handle: 'bajtos' }.

bulkUpdate uses updateAll method under the hood, this method cannot delete properties. (The combination of strict validation + persisting undefined as nulls solves this problem.)

Also:

  1. Original model: { handle: 'bajtos' }
  2. Server side data (change was made at the same time while we are applying client changes): { handle: 'bajtos', facebook: 'mbajtos' }
  3. Client data (update to apply): { handle: 'bajtos', facebook: 'bajtos' }

bulkUpdate is verifying that the data in the database is matching the data assumed by the client before making the change and reports a conflict if another party changed the record "under our hands". We don't have any mechanism for detecting that extra properties were added, we can only detect that a value of a known property was changed.

I think you should be able to code your application in such way that will avoid these two problems.

@crandmck There is quite a lot of useful information in this thread, I think we should capture it in our documentation. Are my comments clear enough so that you can write the first draft yourself, or should I write it myself?

@crandmck
Copy link
Contributor

crandmck commented Jan 25, 2017

@bajtos Let me try to throw something together and then I'll hand it off to you.
Here is the draft I started: https://github.com/strongloop/loopback.io/wiki/Offline-sync-for-Android-SDK-%5Bworking-draft%5D

It's mostly just the last few comments in this issue. I made a few edits, but didn't have time to really make it coherent. Perhaps you could take a swing at it? Once we get it in better shape, I will move it into the actual docs, but for now the wiki page is an easy place to collaborate.

@crandmck
Copy link
Contributor

@bajtos I haven't had much time to work on this. Could you make a pass through https://github.com/strongloop/loopback.io/wiki/Offline-sync-for-Android-SDK-%5Bworking-draft%5D and remove anything irrelevant, and add anything missing. Then, I'll make another pass and we can open a PR from there.

@bajtos bajtos self-assigned this Jan 31, 2017
@bajtos
Copy link
Contributor

bajtos commented Jan 31, 2017

Added to the backlog, it may take a while before I'll have time to work on this. Is the wiki open for editing by people outside of our org? Can e.g. @TrevorPage review and edit that wiki page?

@superkhau
Copy link
Contributor

@bajtos No wiki's are open to public ATM on any project (I found some garbage in the past and cleaned it up before). However, I do suggest in your case to open it up temporarily and lock it down again after.

@crandmck
Copy link
Contributor

Actually the loopback.io wiki should be editable by any GH acct, because I have not checked the Restrict editing to users in teams with push access only option. @TrevorPage give it a go if you can!

@TrevorPage
Copy link

I'd love to contribute to this any way I can. I'll see what I can do. :)

@superkhau
Copy link
Contributor

@TrevorPage The wiki is now open, feel free to edit this repo's wiki or the loopback.io wiki, we can sort out the organization later.

@crandmck
Copy link
Contributor

For this issue, I'd start with https://github.com/strongloop/loopback.io/wiki/Offline-sync-for-Android-SDK-%5Bworking-draft%5D which has most of the pertinent content. It just needs cleanup and editing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants