Skip to content

Optimize replication by sending multiple smaller requests to the server #2961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 22, 2017

Conversation

kobaska
Copy link
Contributor

@kobaska kobaska commented Nov 21, 2016

Description

Optimize replication to page and chunk requests to remote server. This allows for large amounts of data to be replicated and avoid network timeouts and payload being high.

Add a new model-level setting "replicationChunkSize" which allows users to configure change replication algorithm to issue several smaller requests to fetch changes and upload updates.

Related issues

  • None

Checklist

  • New tests added or existing tests modified to cover all changes
  • Code conforms with the style
    guide

@slnode
Copy link

slnode commented Nov 21, 2016

Can one of the admins verify this patch?

3 similar comments
@slnode
Copy link

slnode commented Nov 21, 2016

Can one of the admins verify this patch?

@slnode
Copy link

slnode commented Nov 21, 2016

Can one of the admins verify this patch?

@slnode
Copy link

slnode commented Nov 21, 2016

Can one of the admins verify this patch?

Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kobaska, thank you for the pull request. The proposed changes look good in general. I will be on vacation in the next few days, please expect at least a week or two before I can make a more detailed review.

'SourceModel-' + tid,
{ id: { id: true, type: String, defaultFn: 'guid' } },
{ trackChanges: true });
describe('Replication without chunking', function() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding too much whitespace changes, can we find a way how to preserve existing tests without changes and only add tests for the new "chunk" mode?

@bajtos
Copy link
Member

bajtos commented Nov 22, 2016

@slnode ok to test

@bajtos bajtos self-assigned this Nov 22, 2016
assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
});

function processFunction(array, cb) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual convention for structuring unit tests is arrange-act-assert. I find your tests difficult to follow, because the processFunction ("act" phase) is defined after the assertions.

Could you please re-arrange all tests in this file to follow arrange-act-assert? For example:

// option A
    it('should call process function with the chunked arrays', function() {
      var largeArray = ['item1', 'item2', 'item3'];
      var calls = [];

      function processFunction(array, cb) {
        calls.push(array);
        cb();
      }

      utils.uploadInChunks(largeArray, 1, processFunction, function() {
        assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
      });
    });

// or even
    it('should call process function with the chunked arrays', function() {
      var largeArray = ['item1', 'item2', 'item3'];
      var calls = [];

      utils.uploadInChunks(
        largeArray, 1, 
        function processFunction(smallArray, cb) { 
          calls.push(smallArray);
          cb(); 
        },
        function onDone() {
          assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
      });
    });

You should also check that uploadInChunks did not fail with an error. Also many (if not all) callback-based functions don't call the callback in the same tick of the event loop, therefore you should make all tests async:

    it('should call process function with the chunked arrays', function(done) {
      var largeArray = ['item1', 'item2', 'item3'];
      var calls = [];

      utils.uploadInChunks(
        largeArray, 1, 
        function processFunction(smallArray, cb) { 
          calls.push(smallArray);
          cb(); 
        },
        function finished(err) {
          if (err) return done(err);
          assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
          done();
      });
    });

The same comment applies to other new tests in this file too.

@bajtos bajtos added the stale label Dec 8, 2016
@kobaska kobaska force-pushed the optimise-replication branch from 0a2f97d to 5ed003e Compare December 14, 2016 00:44
@kobaska
Copy link
Contributor Author

kobaska commented Dec 14, 2016

@bajtos I have made the changes you requested for

@bajtos bajtos changed the title Optimise replication Optimize replication by sending multiple smaller requests to the server Dec 21, 2016
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kobaska, thank you for the update. I reviewed your pull request in more details, see the comments below.

My main concern is about backwards compatibility and possibility of introducing chunking-related bugs to applications that do not need chunks.

I would prefer to play this safe, make chunkSize default to -1 (or undefined) and modify the implementation to disable chunking in such case.

Thoughts?

@@ -1155,6 +1156,11 @@ module.exports = function(registry) {
var Change = sourceModel.getChangeModel();
var TargetChange = targetModel.getChangeModel();
var changeTrackingEnabled = Change && TargetChange;
var chunkSize = CHUNK_SIZE;

if (this.settings && this.settings.chunkSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is undefined in this function. I think you should be using sourceModel.chunkSize instead?

It would be great to have a unit-test verifying this setting - if there was such test, then the problem with this would have been discovered long time ago.

I also find the name chunkSize as not descriptive enough. This setting will be usually configured in model JSON file, where it will "sit" among other general settings and it won't be obvious that chunkSize is related to change replication.

I am proposing replicationChunkSize, but feel free to come up with a different name that makes it clear that the chunk size refers to change replication/sync.

@@ -1176,7 +1182,9 @@ module.exports = function(registry) {
async.waterfall(tasks, done);

function getSourceChanges(cb) {
sourceModel.changes(since.source, options.filter, debug.enabled ? log : cb);
utils.downloadInChunks(options.filter, chunkSize, function(filter, pagingCallback) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When function arguments don't fit on a single line, then we are applying "one arg per line" rule, see http://loopback.io/doc/en/contrib/style-guide.html#one-argument-per-line

utils.downloadInChunks(
  options.filter, chunkSize, 
  function(filter, pagingCallback) {
    sourceModel.changes(since.source, filter, pagingCallback);
  },
  debug.enabled ? log : cb);

This applies to all other utils.*InChunks calls below too.

lib/utils.js Outdated

async.waterfall(tasks, cb);
} else {
processFunction.call(self, largeArray, cb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short "else" blocks after a long "if" block make it difficult to build a full picture. Please use reverse-logic with early-return instead:

if (largeArray.length <= chunkSize) {
  return processFunction.call(self, largeArray, cb);
}

// the rest of the code is indented one level less

Also IIRC, fn.call(...) method has performance overhead compared to regular function calls fn(...). Also AFAICT, you are not using this in any of the callbacks. Therefore I would prefer to use processFunction(largeArray, cb). Unless I am missing something?

lib/utils.js Outdated
function pageAndConcatResults(err, pagedResults) {
if (err) {
cb(err);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (err) {
  return cb(err);
}

// the rest of the code is indented one level less

options, function(err, conflicts) {
if (err) return done(err);

assertTargetModelEqualsSourceModel(conflicts, test.SourceModel,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really verifying that correct chunkSize was applied, is it?

I think you should add some sort of an observer to one of the models and check how many times a replication-related method was called.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, few lines below you are asserting the number of calls the bulkUpdate method was called 👍

@bajtos bajtos removed the stale label Dec 21, 2016
@kobaska
Copy link
Contributor Author

kobaska commented Dec 23, 2016

@bajtos, I'm going to be away for a month on holiday. So will do it after, unless someone can continue on from here.

@bajtos
Copy link
Member

bajtos commented Jan 2, 2017

@kobaska No worries, enjoy your vacation 🏖

@kobaska
Copy link
Contributor Author

kobaska commented Jan 24, 2017

@bajtos I have done the changes you suggested. Can you review this please?

@bajtos bajtos added the feature label Jan 26, 2017
@kobaska
Copy link
Contributor Author

kobaska commented Feb 2, 2017

@bajtos Is there any more work needed for this feature? The PR Builder failed, but the link doesn't seem to work

@bajtos
Copy link
Member

bajtos commented Feb 3, 2017

Thank you for the update, I'll take a closer look next week.

Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kobaska, sorry for the long delay. I took another look at your patch. The code looks mostly good, I would like you to improve the coding style a bit - see my comments below.

});

describe('Model.replicate(since, targetModel, options, callback)', function() {
it('Replicate data using the source model with chunking', function(done) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name should read like a sentence, where it stand for the subject of "describe".

it('calls bulkUpdate multiple times')

options, function(err, conflicts) {
if (err) return done(err);

assertTargetModelEqualsSourceModel(conflicts, test.SourceModel,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, few lines below you are asserting the number of calls the bulkUpdate method was called 👍

});

describe('Model.replicate(since, targetModel, options, callback)', function() {
it('Replicate data using the source model without chunking', function(done) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it('calls bulkUpdate only once')

@@ -1803,4 +1908,36 @@ describe('Replication / Change APIs', function() {
function getIds(list) {
return getPropValue(list, 'id');
}

function assertTargetModelEqualsSourceModel(conflicts, sourceModel,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this helper is defined on L289 too (source). What is the difference between these two variants of the helper? Is there any reason prevent us from having only a single shared copy of this function?


describe('Utils', function() {
describe('uploadInChunks', function() {
it('should call process function with the chunked arrays', function(done) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use should in our test names, see http://loopback.io/doc/en/contrib/style-guide.html#test-naming

it('calls process function for each chunk', function(done) {
  // ...
});

});
});

it('should call process function once when array less than chunk size', function(done) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it('calls process function only once when array is smaller than chunk size')

cb(null, results);
}

it('should call process function with the correct filters', function(done) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it('calls process function with the correct filter')

});
});

it('should concat the results from each call to the process function', function(done) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it('concats the results of all calls of the process function')

});

describe('concatResults', function() {
it('should concat arrays', function() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it('concats regular arrays')

assert.deepEqual(concatResults, ['item1', 'item2', 'item3', 'item4']);
});

it('should concat objects with arrays', function() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it('concats objects containing arrays')

@kobaska kobaska force-pushed the optimise-replication branch from b1fdd53 to 9d27b75 Compare February 21, 2017 01:33
@kobaska
Copy link
Contributor Author

kobaska commented Feb 21, 2017

@bajtos Thanks for the review. I have improved the coding style. Please let me know if any more changes are required.

@bajtos
Copy link
Member

bajtos commented Feb 22, 2017

@kobaska the patch LGTM to me now, thanks!

Just now I noticed that you are targeting 2.x branch. I am afraid the 2.x release line is in LTS mode now and we are no longer landing new features (semver-minor change) there.

I am happy to land this to master though. Would you like to open a new pull request against master yourself? Should I do it myself (preserving your commit authorship of course)? Let me know what works best for you.

@bajtos bajtos changed the base branch from 2.x to master February 22, 2017 14:04
@bajtos bajtos force-pushed the optimise-replication branch from 9d27b75 to 656fe28 Compare February 22, 2017 14:09
@bajtos
Copy link
Member

bajtos commented Feb 22, 2017

Ah, never mind, I see GitHub allows us to change the target branch of a pull request. I have rebased your patch against master and cleaned up the commit message while I was at rewriting git history.

Note that git pull will not work for you in your local cloned branch, you can run e.g. git reset --hard origin/optimise-replication instead.

Add a new model-level setting "replicationChunkSize" which allows
users to configure change replication algorithm to issue several
smaller requests to fetch changes and upload updates.
@bajtos bajtos force-pushed the optimise-replication branch from 656fe28 to 7078c5d Compare February 22, 2017 14:13
@bajtos bajtos merged commit 37b4919 into strongloop:master Feb 22, 2017
@bajtos
Copy link
Member

bajtos commented Feb 22, 2017

Landed 🎉 , thank you for the contribution!

@kobaska
Copy link
Contributor Author

kobaska commented Feb 23, 2017

Thanks @bajtos for your help :) . Was hoping to get it in 2.x as that is the version we are using. I'll start moving to latest loopback version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants