-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Optimize replication by sending multiple smaller requests to the server #2961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
3 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kobaska, thank you for the pull request. The proposed changes look good in general. I will be on vacation in the next few days, please expect at least a week or two before I can make a more detailed review.
test/replication.test.js
Outdated
'SourceModel-' + tid, | ||
{ id: { id: true, type: String, defaultFn: 'guid' } }, | ||
{ trackChanges: true }); | ||
describe('Replication without chunking', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is adding too much whitespace changes, can we find a way how to preserve existing tests without changes and only add tests for the new "chunk" mode?
@slnode ok to test |
test/util/utils.test.js
Outdated
assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]); | ||
}); | ||
|
||
function processFunction(array, cb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual convention for structuring unit tests is arrange-act-assert. I find your tests difficult to follow, because the processFunction
("act" phase) is defined after the assertions.
Could you please re-arrange all tests in this file to follow arrange-act-assert? For example:
// option A
it('should call process function with the chunked arrays', function() {
var largeArray = ['item1', 'item2', 'item3'];
var calls = [];
function processFunction(array, cb) {
calls.push(array);
cb();
}
utils.uploadInChunks(largeArray, 1, processFunction, function() {
assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
});
});
// or even
it('should call process function with the chunked arrays', function() {
var largeArray = ['item1', 'item2', 'item3'];
var calls = [];
utils.uploadInChunks(
largeArray, 1,
function processFunction(smallArray, cb) {
calls.push(smallArray);
cb();
},
function onDone() {
assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
});
});
You should also check that uploadInChunks
did not fail with an error. Also many (if not all) callback-based functions don't call the callback in the same tick of the event loop, therefore you should make all tests async:
it('should call process function with the chunked arrays', function(done) {
var largeArray = ['item1', 'item2', 'item3'];
var calls = [];
utils.uploadInChunks(
largeArray, 1,
function processFunction(smallArray, cb) {
calls.push(smallArray);
cb();
},
function finished(err) {
if (err) return done(err);
assert.deepEqual(calls, [['item1'], ['item2'], ['item3']]);
done();
});
});
The same comment applies to other new tests in this file too.
0a2f97d
to
5ed003e
Compare
@bajtos I have made the changes you requested for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kobaska, thank you for the update. I reviewed your pull request in more details, see the comments below.
My main concern is about backwards compatibility and possibility of introducing chunking-related bugs to applications that do not need chunks.
I would prefer to play this safe, make chunkSize
default to -1
(or undefined) and modify the implementation to disable chunking in such case.
Thoughts?
lib/persisted-model.js
Outdated
@@ -1155,6 +1156,11 @@ module.exports = function(registry) { | |||
var Change = sourceModel.getChangeModel(); | |||
var TargetChange = targetModel.getChangeModel(); | |||
var changeTrackingEnabled = Change && TargetChange; | |||
var chunkSize = CHUNK_SIZE; | |||
|
|||
if (this.settings && this.settings.chunkSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this
is undefined
in this function. I think you should be using sourceModel.chunkSize
instead?
It would be great to have a unit-test verifying this setting - if there was such test, then the problem with this
would have been discovered long time ago.
I also find the name chunkSize
as not descriptive enough. This setting will be usually configured in model JSON file, where it will "sit" among other general settings and it won't be obvious that chunkSize
is related to change replication.
I am proposing replicationChunkSize
, but feel free to come up with a different name that makes it clear that the chunk size refers to change replication/sync.
lib/persisted-model.js
Outdated
@@ -1176,7 +1182,9 @@ module.exports = function(registry) { | |||
async.waterfall(tasks, done); | |||
|
|||
function getSourceChanges(cb) { | |||
sourceModel.changes(since.source, options.filter, debug.enabled ? log : cb); | |||
utils.downloadInChunks(options.filter, chunkSize, function(filter, pagingCallback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When function arguments don't fit on a single line, then we are applying "one arg per line" rule, see http://loopback.io/doc/en/contrib/style-guide.html#one-argument-per-line
utils.downloadInChunks(
options.filter, chunkSize,
function(filter, pagingCallback) {
sourceModel.changes(since.source, filter, pagingCallback);
},
debug.enabled ? log : cb);
This applies to all other utils.*InChunks
calls below too.
lib/utils.js
Outdated
|
||
async.waterfall(tasks, cb); | ||
} else { | ||
processFunction.call(self, largeArray, cb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short "else" blocks after a long "if" block make it difficult to build a full picture. Please use reverse-logic with early-return instead:
if (largeArray.length <= chunkSize) {
return processFunction.call(self, largeArray, cb);
}
// the rest of the code is indented one level less
Also IIRC, fn.call(...)
method has performance overhead compared to regular function calls fn(...)
. Also AFAICT, you are not using this
in any of the callbacks. Therefore I would prefer to use processFunction(largeArray, cb)
. Unless I am missing something?
lib/utils.js
Outdated
function pageAndConcatResults(err, pagedResults) { | ||
if (err) { | ||
cb(err); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (err) {
return cb(err);
}
// the rest of the code is indented one level less
options, function(err, conflicts) { | ||
if (err) return done(err); | ||
|
||
assertTargetModelEqualsSourceModel(conflicts, test.SourceModel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really verifying that correct chunkSize
was applied, is it?
I think you should add some sort of an observer to one of the models and check how many times a replication-related method was called.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, few lines below you are asserting the number of calls the bulkUpdate
method was called 👍
@bajtos, I'm going to be away for a month on holiday. So will do it after, unless someone can continue on from here. |
@kobaska No worries, enjoy your vacation 🏖 |
fe66312
to
b1fdd53
Compare
@bajtos I have done the changes you suggested. Can you review this please? |
@bajtos Is there any more work needed for this feature? The PR Builder failed, but the link doesn't seem to work |
Thank you for the update, I'll take a closer look next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kobaska, sorry for the long delay. I took another look at your patch. The code looks mostly good, I would like you to improve the coding style a bit - see my comments below.
test/replication.test.js
Outdated
}); | ||
|
||
describe('Model.replicate(since, targetModel, options, callback)', function() { | ||
it('Replicate data using the source model with chunking', function(done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name should read like a sentence, where it
stand for the subject of "describe".
it('calls bulkUpdate multiple times')
options, function(err, conflicts) { | ||
if (err) return done(err); | ||
|
||
assertTargetModelEqualsSourceModel(conflicts, test.SourceModel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, few lines below you are asserting the number of calls the bulkUpdate
method was called 👍
test/replication.test.js
Outdated
}); | ||
|
||
describe('Model.replicate(since, targetModel, options, callback)', function() { | ||
it('Replicate data using the source model without chunking', function(done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it('calls bulkUpdate only once')
@@ -1803,4 +1908,36 @@ describe('Replication / Change APIs', function() { | |||
function getIds(list) { | |||
return getPropValue(list, 'id'); | |||
} | |||
|
|||
function assertTargetModelEqualsSourceModel(conflicts, sourceModel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, this helper is defined on L289 too (source). What is the difference between these two variants of the helper? Is there any reason prevent us from having only a single shared copy of this function?
test/util/utils.test.js
Outdated
|
||
describe('Utils', function() { | ||
describe('uploadInChunks', function() { | ||
it('should call process function with the chunked arrays', function(done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use should
in our test names, see http://loopback.io/doc/en/contrib/style-guide.html#test-naming
it('calls process function for each chunk', function(done) {
// ...
});
test/util/utils.test.js
Outdated
}); | ||
}); | ||
|
||
it('should call process function once when array less than chunk size', function(done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it('calls process function only once when array is smaller than chunk size')
test/util/utils.test.js
Outdated
cb(null, results); | ||
} | ||
|
||
it('should call process function with the correct filters', function(done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it('calls process function with the correct filter')
test/util/utils.test.js
Outdated
}); | ||
}); | ||
|
||
it('should concat the results from each call to the process function', function(done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it('concats the results of all calls of the process function')
test/util/utils.test.js
Outdated
}); | ||
|
||
describe('concatResults', function() { | ||
it('should concat arrays', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it('concats regular arrays')
test/util/utils.test.js
Outdated
assert.deepEqual(concatResults, ['item1', 'item2', 'item3', 'item4']); | ||
}); | ||
|
||
it('should concat objects with arrays', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it('concats objects containing arrays')
b1fdd53
to
9d27b75
Compare
@bajtos Thanks for the review. I have improved the coding style. Please let me know if any more changes are required. |
@kobaska the patch LGTM to me now, thanks! Just now I noticed that you are targeting 2.x branch. I am afraid the 2.x release line is in LTS mode now and we are no longer landing new features (semver-minor change) there. I am happy to land this to |
9d27b75
to
656fe28
Compare
Ah, never mind, I see GitHub allows us to change the target branch of a pull request. I have rebased your patch against Note that |
Add a new model-level setting "replicationChunkSize" which allows users to configure change replication algorithm to issue several smaller requests to fetch changes and upload updates.
656fe28
to
7078c5d
Compare
Landed 🎉 , thank you for the contribution! |
Thanks @bajtos for your help :) . Was hoping to get it in 2.x as that is the version we are using. I'll start moving to latest loopback version. |
Description
Optimize replication to page and chunk requests to remote server. This allows for large amounts of data to be replicated and avoid network timeouts and payload being high.
Add a new model-level setting "replicationChunkSize" which allows users to configure change replication algorithm to issue several smaller requests to fetch changes and upload updates.
Related issues
Checklist
guide