Optimize replication by sending multiple smaller requests to the server #2961

kobaska · 2016-11-21T08:50:52Z

Description

Optimize replication to page and chunk requests to remote server. This allows for large amounts of data to be replicated and avoid network timeouts and payload being high.

Add a new model-level setting "replicationChunkSize" which allows users to configure change replication algorithm to issue several smaller requests to fetch changes and upload updates.

Related issues

None

Checklist

New tests added or existing tests modified to cover all changes
Code conforms with the style
guide

slnode · 2016-11-21T08:50:53Z

Can one of the admins verify this patch?

slnode · 2016-11-21T08:50:53Z

Can one of the admins verify this patch?

slnode · 2016-11-21T08:50:53Z

Can one of the admins verify this patch?

slnode · 2016-11-21T08:50:53Z

Can one of the admins verify this patch?

bajtos

Hi @kobaska, thank you for the pull request. The proposed changes look good in general. I will be on vacation in the next few days, please expect at least a week or two before I can make a more detailed review.

bajtos · 2016-11-22T11:59:23Z

test/replication.test.js

-      'SourceModel-' + tid,
-      { id: { id: true, type: String, defaultFn: 'guid' } },
-      { trackChanges: true });
+  describe('Replication without chunking', function() {


This is adding too much whitespace changes, can we find a way how to preserve existing tests without changes and only add tests for the new "chunk" mode?

bajtos · 2016-11-22T12:03:57Z

@slnode ok to test

bajtos · 2016-12-01T14:54:49Z

test/util/utils.test.js

+        assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
+      });
+
+      function processFunction(array, cb) {


The usual convention for structuring unit tests is arrange-act-assert. I find your tests difficult to follow, because the processFunction ("act" phase) is defined after the assertions.

Could you please re-arrange all tests in this file to follow arrange-act-assert? For example:

// option A it('should call process function with the chunked arrays', function() { var largeArray = ['item1', 'item2', 'item3']; var calls = []; function processFunction(array, cb) { calls.push(array); cb(); } utils.uploadInChunks(largeArray, 1, processFunction, function() { assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]); }); }); // or even it('should call process function with the chunked arrays', function() { var largeArray = ['item1', 'item2', 'item3']; var calls = []; utils.uploadInChunks( largeArray, 1, function processFunction(smallArray, cb) { calls.push(smallArray); cb(); }, function onDone() { assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]); }); });

You should also check that uploadInChunks did not fail with an error. Also many (if not all) callback-based functions don't call the callback in the same tick of the event loop, therefore you should make all tests async:

it('should call process function with the chunked arrays', function(done) { var largeArray = ['item1', 'item2', 'item3']; var calls = []; utils.uploadInChunks( largeArray, 1, function processFunction(smallArray, cb) { calls.push(smallArray); cb(); }, function finished(err) { if (err) return done(err); assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]); done(); }); });

The same comment applies to other new tests in this file too.

kobaska · 2016-12-14T01:22:41Z

@bajtos I have made the changes you requested for

bajtos

Hi @kobaska, thank you for the update. I reviewed your pull request in more details, see the comments below.

My main concern is about backwards compatibility and possibility of introducing chunking-related bugs to applications that do not need chunks.

I would prefer to play this safe, make chunkSize default to -1 (or undefined) and modify the implementation to disable chunking in such case.

Thoughts?

bajtos · 2016-12-21T14:49:21Z

lib/persisted-model.js

@@ -1155,6 +1156,11 @@ module.exports = function(registry) {
    var Change = sourceModel.getChangeModel();
    var TargetChange = targetModel.getChangeModel();
    var changeTrackingEnabled = Change && TargetChange;
+    var chunkSize = CHUNK_SIZE;
+
+    if (this.settings && this.settings.chunkSize) {


I believe this is undefined in this function. I think you should be using sourceModel.chunkSize instead?

It would be great to have a unit-test verifying this setting - if there was such test, then the problem with this would have been discovered long time ago.

I also find the name chunkSize as not descriptive enough. This setting will be usually configured in model JSON file, where it will "sit" among other general settings and it won't be obvious that chunkSize is related to change replication.

I am proposing replicationChunkSize, but feel free to come up with a different name that makes it clear that the chunk size refers to change replication/sync.

bajtos · 2016-12-21T14:52:09Z

lib/persisted-model.js

@@ -1176,7 +1182,9 @@ module.exports = function(registry) {
    async.waterfall(tasks, done);

    function getSourceChanges(cb) {
-      sourceModel.changes(since.source, options.filter, debug.enabled ? log : cb);
+      utils.downloadInChunks(options.filter, chunkSize, function(filter, pagingCallback) {


When function arguments don't fit on a single line, then we are applying "one arg per line" rule, see http://loopback.io/doc/en/contrib/style-guide.html#one-argument-per-line

utils.downloadInChunks( options.filter, chunkSize, function(filter, pagingCallback) { sourceModel.changes(since.source, filter, pagingCallback); }, debug.enabled ? log : cb);

This applies to all other utils.*InChunks calls below too.

bajtos · 2016-12-21T14:56:29Z

lib/utils.js

+
+    async.waterfall(tasks, cb);
+  } else {
+    processFunction.call(self, largeArray, cb);


Short "else" blocks after a long "if" block make it difficult to build a full picture. Please use reverse-logic with early-return instead:

if (largeArray.length <= chunkSize) { return processFunction.call(self, largeArray, cb); } // the rest of the code is indented one level less

Also IIRC, fn.call(...) method has performance overhead compared to regular function calls fn(...). Also AFAICT, you are not using this in any of the callbacks. Therefore I would prefer to use processFunction(largeArray, cb). Unless I am missing something?

bajtos · 2016-12-21T14:57:09Z

lib/utils.js

+  function pageAndConcatResults(err, pagedResults) {
+    if (err) {
+      cb(err);
+    } else {


if (err) { return cb(err); } // the rest of the code is indented one level less

bajtos · 2016-12-21T14:58:33Z

test/replication.test.js

+            options, function(err, conflicts) {
+              if (err) return done(err);
+
+              assertTargetModelEqualsSourceModel(conflicts, test.SourceModel,


This isn't really verifying that correct chunkSize was applied, is it?

I think you should add some sort of an observer to one of the models and check how many times a replication-related method was called.

Thoughts?

Oh, I see, few lines below you are asserting the number of calls the bulkUpdate method was called 👍

kobaska · 2016-12-23T05:05:27Z

@bajtos, I'm going to be away for a month on holiday. So will do it after, unless someone can continue on from here.

bajtos · 2017-01-02T16:11:40Z

@kobaska No worries, enjoy your vacation 🏖

kobaska · 2017-01-24T03:23:18Z

@bajtos I have done the changes you suggested. Can you review this please?

kobaska · 2017-02-02T01:01:44Z

@bajtos Is there any more work needed for this feature? The PR Builder failed, but the link doesn't seem to work

bajtos · 2017-02-03T06:34:01Z

Thank you for the update, I'll take a closer look next week.

bajtos

Hi @kobaska, sorry for the long delay. I took another look at your patch. The code looks mostly good, I would like you to improve the coding style a bit - see my comments below.

bajtos · 2017-02-17T15:00:33Z

test/replication.test.js

+    });
+
+    describe('Model.replicate(since, targetModel, options, callback)', function() {
+      it('Replicate data using the source model with chunking', function(done) {


The test name should read like a sentence, where it stand for the subject of "describe".

it('calls bulkUpdate multiple times')

bajtos · 2017-02-17T15:01:05Z

test/replication.test.js

+            options, function(err, conflicts) {
+              if (err) return done(err);
+
+              assertTargetModelEqualsSourceModel(conflicts, test.SourceModel,


Oh, I see, few lines below you are asserting the number of calls the bulkUpdate method was called 👍

bajtos · 2017-02-17T15:02:09Z

test/replication.test.js

+    });
+
+    describe('Model.replicate(since, targetModel, options, callback)', function() {
+      it('Replicate data using the source model without chunking', function(done) {


it('calls bulkUpdate only once')

bajtos · 2017-02-17T15:04:45Z

test/replication.test.js

@@ -1803,4 +1908,36 @@ describe('Replication / Change APIs', function() {
  function getIds(list) {
    return getPropValue(list, 'id');
  }
+
+  function assertTargetModelEqualsSourceModel(conflicts, sourceModel,


AFAICT, this helper is defined on L289 too (source). What is the difference between these two variants of the helper? Is there any reason prevent us from having only a single shared copy of this function?

bajtos · 2017-02-17T15:06:11Z

test/util/utils.test.js

+
+describe('Utils', function() {
+  describe('uploadInChunks', function() {
+    it('should call process function with the chunked arrays', function(done) {


We don't use should in our test names, see http://loopback.io/doc/en/contrib/style-guide.html#test-naming

it('calls process function for each chunk', function(done) { // ... });

bajtos · 2017-02-17T15:06:43Z

test/util/utils.test.js

+      });
+    });
+
+    it('should call process function once when array less than chunk size', function(done) {


it('calls process function only once when array is smaller than chunk size')

bajtos · 2017-02-17T15:07:06Z

test/util/utils.test.js

+      cb(null, results);
+    }
+
+    it('should call process function with the correct filters', function(done) {


it('calls process function with the correct filter')

bajtos · 2017-02-17T15:07:37Z

test/util/utils.test.js

+      });
+    });
+
+    it('should concat the results from each call to the process function', function(done) {


it('concats the results of all calls of the process function')

bajtos · 2017-02-17T15:08:10Z

test/util/utils.test.js

+  });
+
+  describe('concatResults', function() {
+    it('should concat arrays', function() {


it('concats regular arrays')

bajtos · 2017-02-17T15:08:22Z

test/util/utils.test.js

+      assert.deepEqual(concatResults, ['item1', 'item2', 'item3', 'item4']);
+    });
+
+    it('should concat objects with arrays', function() {


it('concats objects containing arrays')

kobaska · 2017-02-21T01:38:18Z

@bajtos Thanks for the review. I have improved the coding style. Please let me know if any more changes are required.

bajtos · 2017-02-22T14:03:25Z

@kobaska the patch LGTM to me now, thanks!

Just now I noticed that you are targeting 2.x branch. I am afraid the 2.x release line is in LTS mode now and we are no longer landing new features (semver-minor change) there.

I am happy to land this to master though. Would you like to open a new pull request against master yourself? Should I do it myself (preserving your commit authorship of course)? Let me know what works best for you.

bajtos · 2017-02-22T14:10:55Z

Ah, never mind, I see GitHub allows us to change the target branch of a pull request. I have rebased your patch against master and cleaned up the commit message while I was at rewriting git history.

Note that git pull will not work for you in your local cloned branch, you can run e.g. git reset --hard origin/optimise-replication instead.

Add a new model-level setting "replicationChunkSize" which allows users to configure change replication algorithm to issue several smaller requests to fetch changes and upload updates.

bajtos · 2017-02-22T14:33:45Z

Landed 🎉 , thank you for the contribution!

kobaska · 2017-02-23T01:06:01Z

Thanks @bajtos for your help :) . Was hoping to get it in 2.x as that is the version we are using. I'll start moving to latest loopback version.

crandmck added the #community contribution label Nov 21, 2016

bajtos suggested changes Nov 22, 2016

View reviewed changes

bajtos self-assigned this Nov 22, 2016

bajtos added the replication label Nov 22, 2016

bajtos suggested changes Dec 1, 2016

View reviewed changes

bajtos added the stale label Dec 8, 2016

kobaska force-pushed the optimise-replication branch from 0a2f97d to 5ed003e Compare December 14, 2016 00:44

bajtos changed the title ~~Optimise replication~~ Optimize replication by sending multiple smaller requests to the server Dec 21, 2016

bajtos suggested changes Dec 21, 2016

View reviewed changes

bajtos removed the stale label Dec 21, 2016

bajtos mentioned this pull request Jan 6, 2017

Offline synchronization strongloop-community/loopback-sdk-android#71

Closed

cgole removed #community contribution labels Jan 7, 2017

kobaska force-pushed the optimise-replication branch 2 times, most recently from fe66312 to b1fdd53 Compare January 24, 2017 03:17

bajtos added the feature label Jan 26, 2017

bajtos suggested changes Feb 17, 2017

View reviewed changes

kobaska force-pushed the optimise-replication branch from b1fdd53 to 9d27b75 Compare February 21, 2017 01:33

bajtos approved these changes Feb 22, 2017

View reviewed changes

bajtos changed the base branch from 2.x to master February 22, 2017 14:04

bajtos force-pushed the optimise-replication branch from 9d27b75 to 656fe28 Compare February 22, 2017 14:09

Optimise replication

7078c5d

Add a new model-level setting "replicationChunkSize" which allows users to configure change replication algorithm to issue several smaller requests to fetch changes and upload updates.

bajtos force-pushed the optimise-replication branch from 656fe28 to 7078c5d Compare February 22, 2017 14:13

bajtos merged commit 37b4919 into strongloop:master Feb 22, 2017

Optimize replication by sending multiple smaller requests to the server #2961

Optimize replication by sending multiple smaller requests to the server #2961

Uh oh!

Conversation

kobaska commented Nov 21, 2016 • edited by bajtos Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Checklist

Uh oh!

slnode commented Nov 21, 2016

Uh oh!

slnode commented Nov 21, 2016

Uh oh!

slnode commented Nov 21, 2016

Uh oh!

slnode commented Nov 21, 2016

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bajtos commented Nov 22, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kobaska commented Dec 14, 2016

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kobaska commented Dec 23, 2016

Uh oh!

bajtos commented Jan 2, 2017

Uh oh!

kobaska commented Jan 24, 2017

Uh oh!

kobaska commented Feb 2, 2017

Uh oh!

bajtos commented Feb 3, 2017

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kobaska commented Feb 21, 2017

Uh oh!

bajtos commented Feb 22, 2017

kobaska commented Nov 21, 2016 •

edited by bajtos

Loading