-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add delta snapshot implementation #2799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delta snapshot implementation #2799
Conversation
56a54ed
to
6e82884
Compare
Can you remove the final commit (the one marked [DO NOT MERGE]) from this PR? I get what this commit is doing and why it's important for benchmarking, but we do not want to merge it :) |
6e82884
to
62f69a0
Compare
Done. |
scheduler_listers "k8s.io/kubernetes/pkg/scheduler/listers" | ||
scheduler_nodeinfo "k8s.io/kubernetes/pkg/scheduler/nodeinfo" | ||
scheduler_volumebinder "k8s.io/kubernetes/pkg/scheduler/volumebinder" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why add newline here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-formatted on save ¯_(ツ)_/¯
GetAllNodes() ([]*apiv1.Node, error) | ||
GetAllPods() []*apiv1.Pod | ||
// GetAllNodes returns list of all the nodes in snapshot | ||
GetAllNodes() []*schedulernodeinfo.NodeInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that duplicate .NodeInfos().List(), which snapshot already exposes? I understand how version returning Nodes is different, but this one not so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly a relict of original GetSchedulerListers() in the interface, which required error-checking twice before the list could be used. I think it's still nice to have a version without return error value enforced by scheduler interfaces, but I don't feel very strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historical context makes sense, but I'd rather not have 2 getters for the same thing.
I may no longer be a python developer, but I still subscribe to 'there should be one (and preferably only one) obvious way to do it' part of zen of python :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed GetAllNodes for consistency, based on 'special cases aren’t special enough to break the rules' - we'd only need this call in one place, where a more generic NodeInfo can also be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also removed GetAllPods, since the same argument about two getters works applies to it.
assert.NoError(b, err) | ||
} | ||
|
||
for _, pod := range scheduledPods { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that scheduledPods is always empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was:( Fixed.
b.ResetTimer() | ||
|
||
for i := 0; i < b.N; i++ { | ||
if stillPending, err := filterOutSchedulableByPacking(pendingPods, clusterSnapshot, predicateChecker, 10); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you reset snapshot between runs (technically I guess fork+revert around filterOutSchedulable calls)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename things to clarify this, but all of the pending pods are unschedulable in this scenario (too big for the existing nodes).
Fork() is problematic because for basic snapshot, it can dominate the cost (I've tried using b.StopTimer() and b.StartTimer(), but they seem to hang if the cost of excluded operation is too big). In real use case, we wouldn't be forking at this point, so including it in measurement doesn't make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense if all pods are unschedulable. Maybe add a comment to make this part more obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
assert.NoError(b, err) | ||
} else { | ||
if len(stillPending) < tc.pendingPods { | ||
assert.Equal(b, len(stillPending), tc.pendingPods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it even possible for this assertion not to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - all pending pods in this benchmark are unschedulable (request 10x of the node type's memory).
|
||
for i := 0; i < b.N; i++ { | ||
if stillPending, err := filterOutSchedulableByPacking(pendingPods, clusterSnapshot, predicateChecker, 10); err != nil { | ||
assert.NoError(b, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a strange way of failing the test? If so please just use b.Error()/b.Fatal() directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very strange way of maintaining sanity checks (ignoring errors can cause the benchmark to succeed and report super-fast times if the test setup is faulty), while not defeating the purpose of the benchmarks (for some of the faster benchmarks, always doing assert.NoError seemed to increase the reported time per operation by 5x or more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind explicit checks, but in this case I'd prefer calling b.Fatal() explicitly. It's much more obvious than a doomed-to-fail assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b.Error()/b.Fatal() don't really print a nice message when passed the original error.
b.Error(fmt.Errorf("hello")):
basic_cluster_snapshot_test.go:81: hello
assert.NoError(b, fmt.Errorf("hello")):
basic_cluster_snapshot_test.go:81:
Error Trace: basic_cluster_snapshot_test.go:81
benchmark.go:190
benchmark.go:230
asm_amd64.s:1357
Error: Received unexpected error:
hello
Test: BenchmarkAddNodes/basic:_AddNode()_5000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel particularly strongly about this, other than that assert.NoError seems to work well for this case (and names the scenario which failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fair point about nicer stack trace. I would just include scenario name and all relevant data in Fatalf(), but this works too.
// state of the cluster. Removes from the map nodes that are no longer in the | ||
// nodes list. | ||
func (sd *ScaleDown) updateUnremovableNodes(nodes []*apiv1.Node) { | ||
func (sd *ScaleDown) updateUnremovableNodes(nodes []*schedulernodeinfo.NodeInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's probably not super important I think passing Node was more elegant than NodeInfo. The way I see it NodeInfo is a Node + list of pods, if we don't care about pods we should use Node directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning list of Nodes requires snapshot to either build it on demand each time, or maintain two cached lists. It seemed simpler to just reuse NodeInfos here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind building on demand - the cost should be negligible unless we're calling it in the middle of a loop. It feels like a very minor implementation inconvenience and I think using nodes is more elegant (encapsulate NodeInfos from pieces of code that don't need to know about them), but as I said I don't think it's critical. So I can let this be if you feel it's a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, got rid of GetAllNodes method
// state of the cluster. Removes from the map nodes that are no longer in the | ||
// nodes list. | ||
func (sd *ScaleDown) updateUnremovableNodes(nodes []*apiv1.Node) { | ||
func (sd *ScaleDown) updateUnremovableNodes(nodes []*schedulernodeinfo.NodeInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind building on demand - the cost should be negligible unless we're calling it in the middle of a loop. It feels like a very minor implementation inconvenience and I think using nodes is more elegant (encapsulate NodeInfos from pieces of code that don't need to know about them), but as I said I don't think it's critical. So I can let this be if you feel it's a better way.
limitations under the License. | ||
*/ | ||
|
||
package simulator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments related to this file:
- Can we rename it, given that it seems to tests both basic and delta to the same extent?
- I'd like some more unittests (not only benchmarks).
- Maybe it would be worth moving benchmarks and correctness unittests to separate files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Done
- Not yet done
- Done
}) | ||
} | ||
} | ||
for snapshotName, snapshotFactory := range snapshots { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be an exact copy-paste of loop above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It compares AddNode() vs AddNodes(). Originally I was trying to speed up adding nodes in batch, but it turned out to have negligible impact.
func BenchmarkForkAddRevert(b *testing.B) { | ||
nodeTestCases := []int{1, 10, 100, 1000, 5000, 15000, 100000} | ||
podTestCases := []int{0, 1, 30} | ||
snapshots := map[string]snapshotFactory{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Any reason why this couldn't be done once on module level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
type snapshotFactory func() ClusterSnapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if this was defined before it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
_ = clusterSnapshot.AddNode(node) | ||
} | ||
forkNodes := clusterSnapshot.GetAllNodes() | ||
assert.Equal(t, nodeCount+len(extraNodes), len(forkNodes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine as a sanity check in benchmarks, but in correctness tests I'd prefer to compare list contents too and not just check length.
t.Run(fmt.Sprintf("%s: fork should not affect base data: adding nodes", name), | ||
func(t *testing.T) { | ||
clusterSnapshot := snapshotFactory() | ||
_ = clusterSnapshot.AddNodes(nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.NoError() on all relevant calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
func BenchmarkBuildNodeInfoList(b *testing.B) { | ||
testCases := []struct { | ||
nodeCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why struct with a single int?
for _, tc := range testCases { | ||
b.Run(fmt.Sprintf("fork add 1000 to %d", tc.nodeCount), func(b *testing.B) { | ||
nodes := createTestNodes(tc.nodeCount + 1000) | ||
snapshot := NewDeltaClusterSnapshot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not run that for both implementations like other benchmarks?
} | ||
}) | ||
} | ||
for _, tc := range testCases { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the shared part is minimal - maybe split into two functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
27233e3
to
1105f4b
Compare
// commit - O(n) | ||
// list all pods (no filtering) - O(n), cached | ||
// list all pods (with filtering) - O(n) | ||
// list node infos - O(n), cached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wish I'd get to write code where big-O considerations comes into play :(
havePodsWithAffinity []*schedulernodeinfo.NodeInfo | ||
} | ||
|
||
var errNodeNotFound = fmt.Errorf("node not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I tend to use errors.New() if there's no formatting (on the presumption that Somethingf() should not be used without formatting).
var nodeInfoList []*schedulernodeinfo.NodeInfo | ||
|
||
if len(data.deletedNodeInfos) > 0 { | ||
nodeInfoList = make([]*schedulernodeinfo.NodeInfo, 0, totalLen+100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why +100? I don't see how the result could ever be longer than totalLen, so I'm guessing it's to optimize future addNode() calls? If so maybe make 100 a const and add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping.
Also: If I'm guessing right this feels premature - amortized cost of add should hopefully already not be much in general case. Especially in this particular case, as any cluster likely to have 100 nodes expansion option will already have totalLen large enough that at most you should see a single re-alloc. Have you actually seen any impact of this (I don't mind keeping it as memory cost is negligible, I'm just curious)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this for now. Will add this in another PR with benchmark, if it turns out it matters.
} | ||
nodeInfoList = append(nodeInfoList, bni) | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worth having a separate else branch here? Assuming lookups in empty map are reasonably cheap, it seems like we could get away with always using the first code branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With copy():
BenchmarkBuildNodeInfoList/fork_add_1000_to_1000-12 17793 33766 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_5000-12 13588 43650 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_15000-12 10000 75878 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_100000-12 2824 369051 ns/op
Without (always using the first branch):
BenchmarkBuildNodeInfoList/fork_add_1000_to_1000-12 8946 61269 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_5000-12 3625 157285 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_15000-12 840 686720 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_100000-12 54 9803347 ns/op
Multiply by number of node groups considered in scale-up simulation for real-world usage.
} | ||
|
||
for _, dni := range data.nodeInfoMap { | ||
nodeInfoList = append(nodeInfoList, dni) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need some deduplication somewhere? It seems to me that we'll end up with duplicate nodeInfo for any nodeInfo that was modified in fork (had a pod scheduled on it). Am I missing something obvious here?
If I'm right please also add a test case that would detect this.
|
||
func (data *internalDeltaSnapshotData) addNodeInfo(nodeInfo *schedulernodeinfo.NodeInfo) error { | ||
if _, found := data.nodeInfoMap[nodeInfo.Node().Name]; found { | ||
return fmt.Errorf("node %s already in snapshot", nodeInfo.Node().Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if node was already present in base snapshot? Seems like it will be just silently overridden, which is inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
if !foundInBase && !foundInDelta { | ||
// Node not found in the chain. | ||
return errNodeNotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we even want to treat this as an error?
Context: #2709 (review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It feels like it should depend on our use of this method, but our current use of it is "never remove a node from snapshot", since scale down simulations don't yet use packing in snapshot. Can we resolve it when we migrate that part of the code?
chunkCount := len(data.nodeInfoMap) + len(basePodChunks) | ||
podChunks := make([][]*apiv1.Pod, chunkCount, chunkCount) | ||
copy(podChunks, basePodChunks) | ||
for _, node := range data.nodeInfoMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again seems like some sort of deduplication is missing here. I think you're including pods from nodes that exist in base snapshot, but were deleted in fork. And also double-counting pods on nodeinfos that were modified in fork.
return podList | ||
} | ||
|
||
func (data *internalDeltaSnapshotData) getAllNodes() []*apiv1.Node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
a4a2443
to
46da94d
Compare
46da94d
to
de63103
Compare
limitations under the License. | ||
*/ | ||
|
||
package simulator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still feels like very limited coverage, especially for such a complex and easily-testable (isolated) data structure.
Some tests cases that would be nice to cover (I don't claim it's a complete list, just stuff I was able to list in 2 minutes):
- Add pods, fork, add pods (current test cases don't mix pods added in fork and in base)
- Add node with pods in fork (basic use-case in CA)
- Remove nodes in fork
- Add / remove pods outside of fork
- Testing HavePodsWithAffinityList()
- Including verifying that cache is correctly invalidated when needed
- Test for Clear() (including when forked)
- Maybe a basic test for FilteredList()
- Modify NI and later delete it (ex. remove pod / delete node) to make sure modifiedNI is properly cleaned up when you delete node
- Some test case which involves listing stuff in presence of non-empty addedNI, modifiedNI and deletedNI and making sure it all works together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some test cases, translating this into list so we can follow up:
- Add pods, fork, add pods (current test cases don't mix pods added in fork and in base)
- Add node with pods in fork (basic use-case in CA)
- Remove nodes in fork
- Add / remove pods outside of fork
- Testing HavePodsWithAffinityList()
- Including verifying that cache is correctly invalidated when needed
- Test for Clear() (including when forked)
- Maybe a basic test for FilteredList()
- Modify NI and later delete it (ex. remove pod / delete node) to make sure modifiedNI is properly cleaned up when you delete node
- Some test case which involves listing stuff in presence of non-empty addedNI, modifiedNI and deletedNI and making sure it all works together
var nodeInfoList []*schedulernodeinfo.NodeInfo | ||
|
||
if len(data.deletedNodeInfos) > 0 { | ||
nodeInfoList = make([]*schedulernodeinfo.NodeInfo, 0, totalLen+100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping.
Also: If I'm guessing right this feels premature - amortized cost of add should hopefully already not be much in general case. Especially in this particular case, as any cluster likely to have 100 nodes expansion option will already have totalLen large enough that at most you should see a single re-alloc. Have you actually seen any impact of this (I don't mind keeping it as memory cost is negligible, I'm just curious)?
return data.addNodeInfo(nodeInfo) | ||
} | ||
|
||
func (data *internalDeltaSnapshotData) addNodeInfo(nodeInfo *schedulernodeinfo.NodeInfo) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeInfo may contain pods (you're calling it on modified nodes in commit() and those will generally have pods). Seems like you need to clear pod caches in this case.
I don't mind having this assume no pods (and hence no cache invalidation), but in that case please state that in comment and make sure to explicitly call the cache in places where it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in addNodeInfo
data.removeNode(node.Node().Name) | ||
} | ||
if _, found := data.modifiedNodeInfoMap[node.Node().Name]; found { | ||
data.removeNode(node.Node().Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this node exists in base (which it does if we reach this call), you'll add it to removedNI inside removeNode(). In follow-up addNodeInfo you will add it back to modifiedNIMap, but you won't clean it from removedNI leading to inconsistent internal state.
Incidentally you only ever call updateNode() on base data in commit(). You're probably better of getting rid of this method altogether, rather than trying to make it correct.
} | ||
|
||
dni.AddPod(pod) | ||
if data.podList != nil || data.havePodsWithAffinity != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this if? Calling clearPodCaches if this is if not true does nothing and I don't believe it carries any cost.
dni, found := data.getNodeInfoLocal(nodeName) | ||
if !found { | ||
bni, found := data.baseData.getNodeInfo(nodeName) | ||
if !found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't cover a node that was deleted in fork. You can add pod to it, which will implicitly un-delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
dni, found := data.getNodeInfoLocal(nodeName) | ||
if !found { | ||
bni, found := data.baseData.getNodeInfo(nodeName) | ||
if !found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not safe to node-delete-in-fork
|
||
// Maybe consider deleting from the list in the future. Maybe not. | ||
postAffinityPods := len(dni.PodsWithAffinity()) | ||
if preAffinityPods == 1 && postAffinityPods == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need this logic at all. I question how likely we're to call removePod() while having cached havePodsWithAffinity, but not podList.
return forkedData | ||
} | ||
|
||
func (data *internalDeltaSnapshotData) commit() *internalDeltaSnapshotData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way more tricky than it seems once you start following the implementation of the helper methods. I would very much like to see a test that does all sorts of modifications on fork (add/delete pod on pre-existing nodes, add/delete pod on pre-existing node and later delete this node) and verifies that everything is applied correctly on commit.
} | ||
for node := range data.deletedNodeInfos { | ||
data.baseData.removeNode(node) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalidate caches on baseData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having second thoughts about it :s If we're using the same methods as for other updates, we should be able to trust those methods to invalidate caches if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yet I don't :) As per my earlier comment addNodeInfo must either drop pod caches or explicitly assume nodeInfo it adds have no pods (in which case you should add a comment stating it and drop pod caches here).
Problematic sequence (maybe add some list calls to tests you already have to make them verify cache invalidation?):
snapshot.Pods().List()
snapshot.Fork()
snaphost.AddNode(node)
snapshot.AddPod(pod, node.Name())
snapshot.Commit()
snapshot.Pods().List() # my theory is that this will miss pod we added in fork
assert.NoError(t, err) | ||
|
||
err = snapshot.RemoveNode("node") | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert empty NodeInfos().List()? No reason not to check it.
nodeCount: 15000, | ||
}, | ||
} | ||
for _, modifiedPodCount := range []int{0, 1, 100} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the pods aren't really modified; they're just added in fork
return snapshot | ||
} | ||
|
||
func TestForking(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think this is a nice test setup that allows to easily cover a lot of cases.
} | ||
|
||
func TestClear(t *testing.T) { | ||
// Run with -count=1 to avoid caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that how we run tests normally?
|
||
func TestClear(t *testing.T) { | ||
// Run with -count=1 to avoid caching. | ||
localRand := rand.New(rand.NewSource(time.Now().Unix())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about randomized ut, but in this case I can't name a good argument against it (you log all randomly generated numbers, so failures are easily reproducible). So I guess it's fine.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaciekPytel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Delta snapshot stores a chain of changes-since-last-fork (although forking more than once is illegal for now, following basic snapshot's implementation). It's optimized for frequent fork-add a bit of stuff-run lots of predicates-revert flow by: