Skip to content

Conversation

losipiuk
Copy link
Contributor

@losipiuk losipiuk commented Jan 3, 2020

  • Updating vendor against [email protected]:kubernetes/kubernetes.git:master (3fca0a6)
  • Remove github.com/aws/aws-sdk-go v1.23.18 from go.mod-extra
  • Get rid of removed testapis
  • Get rid of removed NodeLister.listWithPredicates
  • Tweaks to update-vendor.sh
  • Remove use of PredicateMetadata
  • Get rid of removed predicates.NewFailureReason
  • Rename PDBS.PodDisruptionsAllowed to DisruptionsAllowed
  • Drop IsNodeReadyAndSchedulablePredicate
  • Add temporary deprecated_scheduler_snapshot.go
  • Cleanup simulator.PredicateError
  • Initial migration of PredicateChecker to scheduler framework (with TODOs)
  • Drop ConfigurePredicateCheckerForLoop
  • Directly call scheduler plugins in FitsAny
  • Remove NoOpEventRecorder
  • Extract PredicateChecker interface
  • Remove unused priorityPredicates.
  • Implement NewTestPredicateChecker
  • Implement SnapshotClusterState
  • Define ClusterSnapshot interface
  • Pass ClusterSnaphost explicitly to PredicateChecker
  • BasicClusterSnapshot implementation
  • Add ClusterSnapshot to AutoscalingContext
  • Propagate cluster state to ClusterSnapshot
  • Pass ClusterSnapshot to BinpackingNodeEstimator
  • Add GetAllPods and GetAllNodes to ClusterSnapshot
  • Add upcoming nodes to ClusterSnapshot
  • Remove filterOutSchedulableSimple
  • Always filter out schedulable against upcoming nodes
  • Introduce IsExpendablePod helper function
  • Migrate filter_out_schedulable to use CluseterSnapshot
  • Simplify PodListProcessor interface
  • Fixes for SchedulerBasedPredicateChecker tests
  • Simulate scheduling of pods waiting for preemption in ClusterSnapshot
  • FakeNodeInfoForNodeName
  • Add AddNodeWithPods method to ClusterSnapshot
  • Cleanup/Extend tests for SchedulerBasedPredicateChecker
  • Add TODO
  • Use ClusterSnapshot in ScaleUp
  • Use ClusterSnapshot in BinpackingNodeEstimator
  • Use ClusterSnapshot in GetDaemonSetPodsForNode
  • Set kubernetes.io/hostname label on simulated node in BinpackingNodeEstimator
  • Use ClusterSnapshot in ScaleDown
  • Do not add Pods pointing to inexistent nodes to snapshot
  • fixup! Initial migration of PredicateChecker to scheduler framework (with TODOs)

Follow-up todo list:

  • Cleanup mess in scale-down so we always use ClusterSnapshot as source of truth (see comments below)
  • Remove deprecated_scheduler_snapshot
  • Check if GPU hack still works after removing IsNodeReadyAndSchedulablePredicate
  • Address TODOs in predicates_checker_interface.go
    • Forget FakeNodeInfoForNodeName ever existed
  • Remove SnapshotClusterState() and all related code (ex. keeping listers in PredicateChecker).
  • Pass nodeName to RemovePod to avoid costly iteration over all nodes
  • Add unit tests for BasicClusterSnaphsot (share with DeltaClusterSnapshot)
  • Add list of nodes (node names?) to FitsAny. Use that to optimize number of PreFilter calls in binpacking.
    • Also consider if we could do the same in findPlaceFor (a single FitsAny call instead of tryNodeForPod in a loop).
  • Refactor errors in basic_cluster_snapshot.go.
  • Address every TODO(scheduler_framework_migration).
  • Pass context Estimate(), rather that PredicateChecker and ClusterSnapshot to EstimateBuilder
  • Investigate what happens if there is "pending" pod in cluster with nodeName pointing to node which does not exist in cluster; fix if needed
  • Do not return errors from ClusterSnapshot methods for which it does not make sense (Cleare, Revert?, Get*, RemovePod?, RemoveNode?)
  • Create issue for AWS about failing TestGetRegion
  • Clean up how we calculate podDestinations list so it works with snapshot.

Future improvements:

  • Support fork-of-fork in snapshot.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 3, 2020
@losipiuk losipiuk force-pushed the lo/scheduler-framework-poc2 branch 2 times, most recently from b200caa to 3bf6113 Compare January 3, 2020 10:58
// Ignoring error here is safe - if a test doesn't specify valid estimatorName,
// it either doesn't need one, or should fail when it turns out to be nil.
estimatorBuilder, _ := estimator.NewEstimatorBuilder(options.EstimatorName)
predicateChecker, _ := simulator.NewTestPredicateChecker()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should ignore error (at least not without providing an explanation like the call above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed function to return error (extracted to separate PR: #2782).
And if NewTestPredicateChecker returns error I pass it via return.

ni2.SetNode(node2)

predicateChecker := NewTestPredicateChecker()
predicateChecker, _ := NewSchedulerBasedPredicateChecker(clientsetfake.NewSimpleClientset())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.NoError?

snapshot := scheduler_snapshot.NewEmptySnapshot()

sched, err := scheduler.New(
volumeBinder := scheduler_volumebinder.NewVolumeBinder(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't give this some kind of fake kubeClient? It already has all the informers, it only needs kubeClient for mutating API calls. Obviously it shouldn't happen anyway, since we're only doing the filtering step and not a binding step, but it still feels safer not to give it a real client at all. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on input from @ahg-g we need to pass real kubeclient here.

if !p.enableAffinityPredicate && predInfo.Name == affinityPredicateName {
continue
func (p *PredicateChecker) CheckPredicates(pod *apiv1.Pod, nodeInfo *scheduler_nodeinfo.NodeInfo) *PredicateError {
state := scheduler_framework.NewCycleState()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually contain the node we're considering (since it's not part of snapshot). Doesn't that basically negate any benefits of running PreFilters?

Copy link
Contributor Author

@losipiuk losipiuk Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in followup commits were we introduce ClusterSnapshot

@losipiuk losipiuk force-pushed the lo/scheduler-framework-poc2 branch from 3bf6113 to 2c96cbb Compare January 13, 2020 19:31
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 13, 2020
@losipiuk losipiuk force-pushed the lo/scheduler-framework-poc2 branch from 2c96cbb to 8a5cadd Compare January 16, 2020 17:01
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2020
@losipiuk losipiuk force-pushed the lo/scheduler-framework-poc2 branch 6 times, most recently from ffd3210 to 87e581a Compare January 27, 2020 19:14
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2020
@losipiuk losipiuk force-pushed the lo/scheduler-framework-poc2 branch 2 times, most recently from f3d8e94 to 1ecec77 Compare January 28, 2020 14:10
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2020
@losipiuk losipiuk force-pushed the lo/scheduler-framework-poc2 branch from 1ecec77 to 6e95320 Compare January 28, 2020 14:18
@MaciekPytel
Copy link
Contributor

Please extract "Tweaks to update-vendor.sh" to a separate PR and rebase

schedulingError: err,
})
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove emtpy line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was only in intermediate commit. Removed anyway.

@losipiuk
Copy link
Contributor Author

Please extract "Tweaks to update-vendor.sh" to a separate PR and rebase

#2778

ready := kube_util.IsNodeReadyAndSchedulable(nodeInfo.Node())
if !ready {
return false, []predicates.PredicateFailureReason{predicates.NewFailureReason("node is unready")}, nil
return false, []predicates.PredicateFailureReason{predicates.NewPredicateFailureError("todo", "node is unready")}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo?

Copy link
Contributor Author

@losipiuk losipiuk Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

@MaciekPytel
Copy link
Contributor

"Get rid of removed predicates.NewFailureReason" really doesn't seem to do what it advertises - it's just an error message change.

}

// IsNodeReadyAndSchedulablePredicate checks if node is ready.
func IsNodeReadyAndSchedulablePredicate(pod *apiv1.Pod, meta predicates.Metadata, nodeInfo *schedulernodeinfo.NodeInfo) (bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: verify if this doesn't break "GPU hack" where we override node conditions to pretend it's still booting up.

limitations under the License.
*/

package simulator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: make sure this is actually deleted somewhere in this chain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do that as last commit in the chain (maybe in separate PR) which will also address TODO is PredicateChecker interface.
I do not want to do it right now as this commit will with high probability conflict with any change on this PR. And I expect some changes more as we are during review.

}

// GenericPredicateError return a generic instance of PredicateError to be used in context where predicate name is
// unknown (hack - to be removed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually removed later on? It's not entirely clear to me how you would do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using that for FitsAnyNode. And yeah, to remove that I would need to change interface of that a lot and I do not see a benefit. Will remove the comment.

@losipiuk losipiuk force-pushed the lo/scheduler-framework-poc2 branch from cd5dc18 to 6f834ac Compare February 4, 2020 15:16
@losipiuk
Copy link
Contributor Author

losipiuk commented Feb 4, 2020

Superseded by #2796

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants