Add delta snapshot implementation #2799

aleksandra-malinowska · 2020-02-05T14:42:08Z

Delta snapshot stores a chain of changes-since-last-fork (although forking more than once is illegal for now, following basic snapshot's implementation). It's optimized for frequent fork-add a bit of stuff-run lots of predicates-revert flow by:

copy-on-write - Fork() and Revert() are O(1), at cost of O(n) Commit() and copying NodeInfo objects on some add/remove operations. In the pessimistic case where all the original nodes are modified (e.g. by AddPod()) after Fork(), total copying cost reaches that of basic snapshot's O(n) Fork().
caching of frequently obtained lists of all nodes and pods.

MaciekPytel · 2020-02-06T10:51:43Z

Can you remove the final commit (the one marked [DO NOT MERGE]) from this PR? I get what this commit is doing and why it's important for benchmarking, but we do not want to merge it :)

aleksandra-malinowska · 2020-02-06T11:08:07Z

Can you remove the final commit (the one marked [DO NOT MERGE]) from this PR? I get what this commit is doing and why it's important for benchmarking, but we do not want to merge it :)

Done.

MaciekPytel · 2020-02-06T10:54:52Z

cluster-autoscaler/simulator/scheduler_based_predicates_checker.go

 	scheduler_listers "k8s.io/kubernetes/pkg/scheduler/listers"
 	scheduler_nodeinfo "k8s.io/kubernetes/pkg/scheduler/nodeinfo"
 	scheduler_volumebinder "k8s.io/kubernetes/pkg/scheduler/volumebinder"
+


nit: why add newline here?

Auto-formatted on save ¯_(ツ)_/¯

MaciekPytel · 2020-02-06T13:07:51Z

cluster-autoscaler/simulator/cluster_snapshot.go

-	GetAllNodes() ([]*apiv1.Node, error)
+	GetAllPods() []*apiv1.Pod
+	// GetAllNodes returns list of all the nodes in snapshot
+	GetAllNodes() []*schedulernodeinfo.NodeInfo


Doesn't that duplicate .NodeInfos().List(), which snapshot already exposes? I understand how version returning Nodes is different, but this one not so much.

It's mostly a relict of original GetSchedulerListers() in the interface, which required error-checking twice before the list could be used. I think it's still nice to have a version without return error value enforced by scheduler interfaces, but I don't feel very strongly about it.

Historical context makes sense, but I'd rather not have 2 getters for the same thing.
I may no longer be a python developer, but I still subscribe to 'there should be one (and preferably only one) obvious way to do it' part of zen of python :)

Removed GetAllNodes for consistency, based on 'special cases aren’t special enough to break the rules' - we'd only need this call in one place, where a more generic NodeInfo can also be used.

Also removed GetAllPods, since the same argument about two getters works applies to it.

MaciekPytel · 2020-02-06T13:53:19Z

cluster-autoscaler/core/filter_out_schedulable_test.go

+					assert.NoError(b, err)
+				}
+
+				for _, pod := range scheduledPods {


It seems to me that scheduledPods is always empty?

It was:( Fixed.

MaciekPytel · 2020-02-06T13:57:20Z

cluster-autoscaler/core/filter_out_schedulable_test.go

+				b.ResetTimer()
+
+				for i := 0; i < b.N; i++ {
+					if stillPending, err := filterOutSchedulableByPacking(pendingPods, clusterSnapshot, predicateChecker, 10); err != nil {


Shouldn't you reset snapshot between runs (technically I guess fork+revert around filterOutSchedulable calls)?

I'll rename things to clarify this, but all of the pending pods are unschedulable in this scenario (too big for the existing nodes).

Fork() is problematic because for basic snapshot, it can dominate the cost (I've tried using b.StopTimer() and b.StartTimer(), but they seem to hang if the cost of excluded operation is too big). In real use case, we wouldn't be forking at this point, so including it in measurement doesn't make much sense.

Makes sense if all pods are unschedulable. Maybe add a comment to make this part more obvious?

MaciekPytel · 2020-02-06T13:58:03Z

cluster-autoscaler/core/filter_out_schedulable_test.go

+						assert.NoError(b, err)
+					} else {
+						if len(stillPending) < tc.pendingPods {
+							assert.Equal(b, len(stillPending), tc.pendingPods)


Is it even possible for this assertion not to fail?

See above - all pending pods in this benchmark are unschedulable (request 10x of the node type's memory).

MaciekPytel · 2020-02-06T13:59:44Z

cluster-autoscaler/core/filter_out_schedulable_test.go

+
+				for i := 0; i < b.N; i++ {
+					if stillPending, err := filterOutSchedulableByPacking(pendingPods, clusterSnapshot, predicateChecker, 10); err != nil {
+						assert.NoError(b, err)


Is this a strange way of failing the test? If so please just use b.Error()/b.Fatal() directly.

This is a very strange way of maintaining sanity checks (ignoring errors can cause the benchmark to succeed and report super-fast times if the test setup is faulty), while not defeating the purpose of the benchmarks (for some of the faster benchmarks, always doing assert.NoError seemed to increase the reported time per operation by 5x or more).

I don't mind explicit checks, but in this case I'd prefer calling b.Fatal() explicitly. It's much more obvious than a doomed-to-fail assertion.

b.Error()/b.Fatal() don't really print a nice message when passed the original error.

b.Error(fmt.Errorf("hello")):

basic_cluster_snapshot_test.go:81: hello

assert.NoError(b, fmt.Errorf("hello")):

basic_cluster_snapshot_test.go:81: Error Trace: basic_cluster_snapshot_test.go:81 benchmark.go:190 benchmark.go:230 asm_amd64.s:1357 Error: Received unexpected error: hello Test: BenchmarkAddNodes/basic:_AddNode()_5000

I don't feel particularly strongly about this, other than that assert.NoError seems to work well for this case (and names the scenario which failed).

Ok, fair point about nicer stack trace. I would just include scenario name and all relevant data in Fatalf(), but this works too.

MaciekPytel · 2020-02-06T14:03:26Z

cluster-autoscaler/core/scale_down.go

 // state of the cluster. Removes from the map nodes that are no longer in the
 // nodes list.
-func (sd *ScaleDown) updateUnremovableNodes(nodes []*apiv1.Node) {
+func (sd *ScaleDown) updateUnremovableNodes(nodes []*schedulernodeinfo.NodeInfo) {


nit: It's probably not super important I think passing Node was more elegant than NodeInfo. The way I see it NodeInfo is a Node + list of pods, if we don't care about pods we should use Node directly.

Returning list of Nodes requires snapshot to either build it on demand each time, or maintain two cached lists. It seemed simpler to just reuse NodeInfos here.

I don't mind building on demand - the cost should be negligible unless we're calling it in the middle of a loop. It feels like a very minor implementation inconvenience and I think using nodes is more elegant (encapsulate NodeInfos from pieces of code that don't need to know about them), but as I said I don't think it's critical. So I can let this be if you feel it's a better way.

See above, got rid of GetAllNodes method

MaciekPytel · 2020-02-06T15:09:53Z

cluster-autoscaler/core/scale_down.go

 // state of the cluster. Removes from the map nodes that are no longer in the
 // nodes list.
-func (sd *ScaleDown) updateUnremovableNodes(nodes []*apiv1.Node) {
+func (sd *ScaleDown) updateUnremovableNodes(nodes []*schedulernodeinfo.NodeInfo) {


I don't mind building on demand - the cost should be negligible unless we're calling it in the middle of a loop. It feels like a very minor implementation inconvenience and I think using nodes is more elegant (encapsulate NodeInfos from pieces of code that don't need to know about them), but as I said I don't think it's critical. So I can let this be if you feel it's a better way.

MaciekPytel · 2020-02-06T15:25:27Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+limitations under the License.
+*/
+
+package simulator


General comments related to this file:

Can we rename it, given that it seems to tests both basic and delta to the same extent?

I'd like some more unittests (not only benchmarks).

Maybe it would be worth moving benchmarks and correctness unittests to separate files?

Done

Not yet done

Done

MaciekPytel · 2020-02-06T15:27:22Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+			})
+		}
+	}
+	for snapshotName, snapshotFactory := range snapshots {


This seems to be an exact copy-paste of loop above?

It compares AddNode() vs AddNodes(). Originally I was trying to speed up adding nodes in batch, but it turned out to have negligible impact.

MaciekPytel · 2020-02-06T15:31:52Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+func BenchmarkForkAddRevert(b *testing.B) {
+	nodeTestCases := []int{1, 10, 100, 1000, 5000, 15000, 100000}
+	podTestCases := []int{0, 1, 30}
+	snapshots := map[string]snapshotFactory{


nit: Any reason why this couldn't be done once on module level?

MaciekPytel · 2020-02-06T15:33:12Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+	}
+}
+
+type snapshotFactory func() ClusterSnapshot


I'd prefer if this was defined before it's used.

MaciekPytel · 2020-02-06T15:47:42Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+					_ = clusterSnapshot.AddNode(node)
+				}
+				forkNodes := clusterSnapshot.GetAllNodes()
+				assert.Equal(t, nodeCount+len(extraNodes), len(forkNodes))


It's fine as a sanity check in benchmarks, but in correctness tests I'd prefer to compare list contents too and not just check length.

MaciekPytel · 2020-02-06T15:48:48Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+		t.Run(fmt.Sprintf("%s: fork should not affect base data: adding nodes", name),
+			func(t *testing.T) {
+				clusterSnapshot := snapshotFactory()
+				_ = clusterSnapshot.AddNodes(nodes)


assert.NoError() on all relevant calls.

MaciekPytel · 2020-02-06T15:56:43Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+
+func BenchmarkBuildNodeInfoList(b *testing.B) {
+	testCases := []struct {
+		nodeCount int


Why struct with a single int?

MaciekPytel · 2020-02-06T15:58:17Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+	for _, tc := range testCases {
+		b.Run(fmt.Sprintf("fork add 1000 to %d", tc.nodeCount), func(b *testing.B) {
+			nodes := createTestNodes(tc.nodeCount + 1000)
+			snapshot := NewDeltaClusterSnapshot()


Why not run that for both implementations like other benchmarks?

MaciekPytel · 2020-02-06T16:02:49Z

cluster-autoscaler/simulator/basic_cluster_snapshot_test.go

+			}
+		})
+	}
+	for _, tc := range testCases {


Seems like the shared part is minimal - maybe split into two functions?

MaciekPytel · 2020-02-07T10:11:27Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+//	commit - O(n)
+//	list all pods (no filtering) - O(n), cached
+//	list all pods (with filtering) - O(n)
+//	list node infos - O(n), cached


Wish I'd get to write code where big-O considerations comes into play :(

MaciekPytel · 2020-02-07T10:18:29Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	havePodsWithAffinity []*schedulernodeinfo.NodeInfo
+}
+
+var errNodeNotFound = fmt.Errorf("node not found")


nit: I tend to use errors.New() if there's no formatting (on the presumption that Somethingf() should not be used without formatting).

MaciekPytel · 2020-02-07T10:37:31Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	var nodeInfoList []*schedulernodeinfo.NodeInfo
+
+	if len(data.deletedNodeInfos) > 0 {
+		nodeInfoList = make([]*schedulernodeinfo.NodeInfo, 0, totalLen+100)


Why +100? I don't see how the result could ever be longer than totalLen, so I'm guessing it's to optimize future addNode() calls? If so maybe make 100 a const and add a comment?

Ping.
Also: If I'm guessing right this feels premature - amortized cost of add should hopefully already not be much in general case. Especially in this particular case, as any cluster likely to have 100 nodes expansion option will already have totalLen large enough that at most you should see a single re-alloc. Have you actually seen any impact of this (I don't mind keeping it as memory cost is negligible, I'm just curious)?

Removed this for now. Will add this in another PR with benchmark, if it turns out it matters.

MaciekPytel · 2020-02-07T10:39:15Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+			}
+			nodeInfoList = append(nodeInfoList, bni)
+		}
+	} else {


Is it really worth having a separate else branch here? Assuming lookups in empty map are reasonably cheap, it seems like we could get away with always using the first code branch.

With copy():
BenchmarkBuildNodeInfoList/fork_add_1000_to_1000-12 17793 33766 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_5000-12 13588 43650 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_15000-12 10000 75878 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_100000-12 2824 369051 ns/op

Without (always using the first branch):
BenchmarkBuildNodeInfoList/fork_add_1000_to_1000-12 8946 61269 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_5000-12 3625 157285 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_15000-12 840 686720 ns/op
BenchmarkBuildNodeInfoList/fork_add_1000_to_100000-12 54 9803347 ns/op

Multiply by number of node groups considered in scale-up simulation for real-world usage.

MaciekPytel · 2020-02-07T10:44:55Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	}
+
+	for _, dni := range data.nodeInfoMap {
+		nodeInfoList = append(nodeInfoList, dni)


Don't we need some deduplication somewhere? It seems to me that we'll end up with duplicate nodeInfo for any nodeInfo that was modified in fork (had a pod scheduled on it). Am I missing something obvious here?

If I'm right please also add a test case that would detect this.

MaciekPytel · 2020-02-07T10:50:43Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+
+func (data *internalDeltaSnapshotData) addNodeInfo(nodeInfo *schedulernodeinfo.NodeInfo) error {
+	if _, found := data.nodeInfoMap[nodeInfo.Node().Name]; found {
+		return fmt.Errorf("node %s already in snapshot", nodeInfo.Node().Name)


What if node was already present in base snapshot? Seems like it will be just silently overridden, which is inconsistent.

MaciekPytel · 2020-02-07T10:56:56Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+
+	if !foundInBase && !foundInDelta {
+		// Node not found in the chain.
+		return errNodeNotFound


I wonder if we even want to treat this as an error?
Context: #2709 (review).

I don't know. It feels like it should depend on our use of this method, but our current use of it is "never remove a node from snapshot", since scale down simulations don't yet use packing in snapshot. Can we resolve it when we migrate that part of the code?

MaciekPytel · 2020-02-07T11:09:51Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	chunkCount := len(data.nodeInfoMap) + len(basePodChunks)
+	podChunks := make([][]*apiv1.Pod, chunkCount, chunkCount)
+	copy(podChunks, basePodChunks)
+	for _, node := range data.nodeInfoMap {


Once again seems like some sort of deduplication is missing here. I think you're including pods from nodes that exist in base snapshot, but were deleted in fork. And also double-counting pods on nodeinfos that were modified in fork.

MaciekPytel · 2020-02-07T11:56:58Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	return podList
+}
+
+func (data *internalDeltaSnapshotData) getAllNodes() []*apiv1.Node {


I think this is no longer used?

MaciekPytel · 2020-02-11T14:08:31Z

cluster-autoscaler/simulator/cluster_snapshot_test.go

+limitations under the License.
+*/
+
+package simulator


This still feels like very limited coverage, especially for such a complex and easily-testable (isolated) data structure.
Some tests cases that would be nice to cover (I don't claim it's a complete list, just stuff I was able to list in 2 minutes):

Add pods, fork, add pods (current test cases don't mix pods added in fork and in base)

Add node with pods in fork (basic use-case in CA)

Remove nodes in fork

Add / remove pods outside of fork

Testing HavePodsWithAffinityList()

Including verifying that cache is correctly invalidated when needed

Test for Clear() (including when forked)

Maybe a basic test for FilteredList()

Modify NI and later delete it (ex. remove pod / delete node) to make sure modifiedNI is properly cleaned up when you delete node

Some test case which involves listing stuff in presence of non-empty addedNI, modifiedNI and deletedNI and making sure it all works together

Added some test cases, translating this into list so we can follow up:

Add pods, fork, add pods (current test cases don't mix pods added in fork and in base)

Add node with pods in fork (basic use-case in CA)

Remove nodes in fork

Add / remove pods outside of fork

Testing HavePodsWithAffinityList()

Including verifying that cache is correctly invalidated when needed

Test for Clear() (including when forked)

Maybe a basic test for FilteredList()

Modify NI and later delete it (ex. remove pod / delete node) to make sure modifiedNI is properly cleaned up when you delete node

Some test case which involves listing stuff in presence of non-empty addedNI, modifiedNI and deletedNI and making sure it all works together

MaciekPytel · 2020-02-11T14:19:21Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	var nodeInfoList []*schedulernodeinfo.NodeInfo
+
+	if len(data.deletedNodeInfos) > 0 {
+		nodeInfoList = make([]*schedulernodeinfo.NodeInfo, 0, totalLen+100)


Ping.
Also: If I'm guessing right this feels premature - amortized cost of add should hopefully already not be much in general case. Especially in this particular case, as any cluster likely to have 100 nodes expansion option will already have totalLen large enough that at most you should see a single re-alloc. Have you actually seen any impact of this (I don't mind keeping it as memory cost is negligible, I'm just curious)?

MaciekPytel · 2020-02-11T14:44:02Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	return data.addNodeInfo(nodeInfo)
+}
+
+func (data *internalDeltaSnapshotData) addNodeInfo(nodeInfo *schedulernodeinfo.NodeInfo) error {


NodeInfo may contain pods (you're calling it on modified nodes in commit() and those will generally have pods). Seems like you need to clear pod caches in this case.

I don't mind having this assume no pods (and hence no cache invalidation), but in that case please state that in comment and make sure to explicitly call the cache in places where it's needed.

Fixed in addNodeInfo

MaciekPytel · 2020-02-11T15:03:25Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+		data.removeNode(node.Node().Name)
+	}
+	if _, found := data.modifiedNodeInfoMap[node.Node().Name]; found {
+		data.removeNode(node.Node().Name)


If this node exists in base (which it does if we reach this call), you'll add it to removedNI inside removeNode(). In follow-up addNodeInfo you will add it back to modifiedNIMap, but you won't clean it from removedNI leading to inconsistent internal state.
Incidentally you only ever call updateNode() on base data in commit(). You're probably better of getting rid of this method altogether, rather than trying to make it correct.

MaciekPytel · 2020-02-11T15:08:53Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	}
+
+	dni.AddPod(pod)
+	if data.podList != nil || data.havePodsWithAffinity != nil {


Why this if? Calling clearPodCaches if this is if not true does nothing and I don't believe it carries any cost.

MaciekPytel · 2020-02-11T15:10:22Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	dni, found := data.getNodeInfoLocal(nodeName)
+	if !found {
+		bni, found := data.baseData.getNodeInfo(nodeName)
+		if !found {


This doesn't cover a node that was deleted in fork. You can add pod to it, which will implicitly un-delete it.

MaciekPytel · 2020-02-11T15:12:50Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	dni, found := data.getNodeInfoLocal(nodeName)
+	if !found {
+		bni, found := data.baseData.getNodeInfo(nodeName)
+		if !found {


Also not safe to node-delete-in-fork

MaciekPytel · 2020-02-11T15:17:12Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+
+	// Maybe consider deleting from the list in the future. Maybe not.
+	postAffinityPods := len(dni.PodsWithAffinity())
+	if preAffinityPods == 1 && postAffinityPods == 0 {


I wonder if we need this logic at all. I question how likely we're to call removePod() while having cached havePodsWithAffinity, but not podList.

MaciekPytel · 2020-02-11T15:22:57Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	return forkedData
+}
+
+func (data *internalDeltaSnapshotData) commit() *internalDeltaSnapshotData {


This is way more tricky than it seems once you start following the implementation of the helper methods. I would very much like to see a test that does all sorts of modifications on fork (add/delete pod on pre-existing nodes, add/delete pod on pre-existing node and later delete this node) and verifies that everything is applied correctly on commit.

MaciekPytel · 2020-02-11T15:23:23Z

cluster-autoscaler/simulator/delta_cluster_snapshot.go

+	}
+	for node := range data.deletedNodeInfos {
+		data.baseData.removeNode(node)
+	}


Invalidate caches on baseData

I'm having second thoughts about it :s If we're using the same methods as for other updates, we should be able to trust those methods to invalidate caches if needed.

And yet I don't :) As per my earlier comment addNodeInfo must either drop pod caches or explicitly assume nodeInfo it adds have no pods (in which case you should add a comment stating it and drop pod caches here).

Problematic sequence (maybe add some list calls to tests you already have to make them verify cache invalidation?):

snapshot.Pods().List()
snapshot.Fork()
snaphost.AddNode(node)
snapshot.AddPod(pod, node.Name())
snapshot.Commit()
snapshot.Pods().List() # my theory is that this will miss pod we added in fork

…s in basic)

MaciekPytel · 2020-02-12T14:03:55Z

cluster-autoscaler/simulator/cluster_snapshot_test.go

+				assert.NoError(t, err)
+
+				err = snapshot.RemoveNode("node")
+				assert.NoError(t, err)


nit: assert empty NodeInfos().List()? No reason not to check it.

MaciekPytel · 2020-02-12T14:33:54Z

cluster-autoscaler/simulator/cluster_snapshot_benchmark_test.go

+			nodeCount: 15000,
+		},
+	}
+	for _, modifiedPodCount := range []int{0, 1, 100} {


nit: the pods aren't really modified; they're just added in fork

MaciekPytel · 2020-02-12T15:32:49Z

cluster-autoscaler/simulator/cluster_snapshot_test.go

+	return snapshot
+}
+
+func TestForking(t *testing.T) {


+1 I think this is a nice test setup that allows to easily cover a lot of cases.

MaciekPytel · 2020-02-12T15:35:14Z

cluster-autoscaler/simulator/cluster_snapshot_test.go

 }

+func TestClear(t *testing.T) {
+	// Run with -count=1 to avoid caching.


Is that how we run tests normally?

MaciekPytel · 2020-02-12T15:53:51Z

cluster-autoscaler/simulator/cluster_snapshot_test.go

+
+func TestClear(t *testing.T) {
+	// Run with -count=1 to avoid caching.
+	localRand := rand.New(rand.NewSource(time.Now().Unix()))


I have mixed feelings about randomized ut, but in this case I can't name a good argument against it (you log all randomly generated numbers, so failures are easily reproducible). So I guess it's fine.

MaciekPytel · 2020-02-14T17:15:31Z

/lgtm
/approve

k8s-ci-robot · 2020-02-14T17:17:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Cleanup cluster snapshot interface

9c018dd

aleksandra-malinowska requested review from MaciekPytel and losipiuk February 5, 2020 14:42

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 5, 2020

k8s-ci-robot requested a review from Jeffwan February 5, 2020 14:43

aleksandra-malinowska force-pushed the delta-snapshot-4 branch 3 times, most recently from 56a54ed to 6e82884 Compare February 5, 2020 17:17

aleksandra-malinowska force-pushed the delta-snapshot-4 branch from 6e82884 to 62f69a0 Compare February 6, 2020 11:08

MaciekPytel reviewed Feb 6, 2020

View reviewed changes

aleksandra-malinowska force-pushed the delta-snapshot-4 branch from 27233e3 to 1105f4b Compare February 6, 2020 16:47

MaciekPytel reviewed Feb 7, 2020

View reviewed changes

aleksandra-malinowska force-pushed the delta-snapshot-4 branch from a4a2443 to 46da94d Compare February 10, 2020 16:22

aleksandra-malinowska added 4 commits February 10, 2020 17:28

Add Delta snapshot implementation

6277f0d

Add snapshot benchmarks

a7ae280

Add benchmark for filtering out schedulable pods

2c36e05

Pass node name when removing pod from snapshot

de63103

aleksandra-malinowska force-pushed the delta-snapshot-4 branch from 46da94d to de63103 Compare February 10, 2020 16:28

MaciekPytel reviewed Feb 11, 2020

View reviewed changes

aleksandra-malinowska added 7 commits February 11, 2020 16:47

remove pre/post affinity checks

b71bbe1

add test case for deleting & re-adding node after fork

cf7d7b4

remove re-added node from deleted nodes

3b5860e

Add test for node not found scenarios

9caaf70

Fix cases where node was deleted in delta

f80f391

extract nodeInfoToModify

75b9710

add build pod list benchmark

230665a

aleksandra-malinowska added 9 commits February 11, 2020 19:26

fix pod list

2654a1d

inline update node into commit & return errors

fde060c

add more test cases

ff93db4

remove overhead

a1f3a79

clean up snapshot tests

3ac9bdd

add test case for clear()

51ae036

make Revert() and Commit() in delta no longer fail when not forked (a…

1bd1f8e

…s in basic)

test add node with pods

610f357

test adding node error when node already exists

e2f96f4

MaciekPytel reviewed Feb 12, 2020

View reviewed changes

clear pod cache on add node info if needed

7e5142c

k8s-ci-robot assigned MaciekPytel Feb 14, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 14, 2020

losipiuk mentioned this pull request Feb 14, 2020

Scheduler Framework based PredicateChecker - followups #2797

Closed

19 tasks

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2020

k8s-ci-robot merged commit af1dd84 into kubernetes:master Feb 14, 2020

Add delta snapshot implementation #2799

Add delta snapshot implementation #2799

Uh oh!

Conversation

aleksandra-malinowska commented Feb 5, 2020

Uh oh!

MaciekPytel commented Feb 6, 2020

Uh oh!

aleksandra-malinowska commented Feb 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aleksandra-malinowska Feb 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aleksandra-malinowska Feb 6, 2020 •

edited

Loading