-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Closed
Description
If there are too many parameter servers or too many parameter server ports(or sparse ports), some parameter servers will wait forever.
When parameter start up, ti says:
W0522 12:00:09.495564 35864 ParameterServer2.cpp:269] --ports_num or --ports_num_for_sparse might be too large, or total dense parameter size or sparse parameters size might be too small, this psever doesn't store any parameter.
In ParameterServer2.cpp
:
void ParameterServer2::setParameter(const SendParameterRequest& request,
std::vector<Buffer>& inputBuffers,
SendParameterResponse* response,
std::vector<Buffer>* outputBuffers) {
...
if (!request.blocks().size()) {
LOG(WARNING)
<< "--ports_num or --ports_num_for_sparse might be too large, "
<< "or total dense parameter size or sparse parameters size "
<< "might be too small, this psever doesn't store any parameter.";
return;
}
...
void ParameterServer2::addGradient(const SendParameterRequest& request,
std::vector<Buffer>& inputBuffers,
SendParameterResponse* response,
std::vector<Buffer>* outputBuffers) {
if (!numPassFinishClients_) {
REGISTER_BARRIER_DELTA_SERVER_SET(
*statSet_,
"forwardbackwardDelta",
FLAGS_num_gradient_servers,
request.trainer_id(),
request.forwardbackward_time(),
isSparseServer_ ? "_sparseUpdater" : "_denseUpdater");
}
It seems that the hanging problem is due to some other reason. But I still need to figure out the details when parameter block is more than pserver instances
Metadata
Metadata
Assignees
Labels
No labels