-
Notifications
You must be signed in to change notification settings - Fork 699
[Partitioner] User-defined partition #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -171,6 +171,30 @@ struct HostConfig { | |
size_t executorThreads{3}; | ||
}; | ||
|
||
/// This is struct for user defined partition. | ||
struct PartitionConfig { | ||
/// The name of the function to be partitioned. | ||
std::string funcName; | ||
/// The number of user defined partitions. | ||
/// The partition ids are between 0 and numOfPartitions - 1, inclusive. | ||
size_t numOfPartitions; | ||
/// The backend for each partition. backendNames.size() == numOfPartitions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is backendNames.size() == numOfPartitions. Why do we need numOfPartitions at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This means backendNames.size() should be equal to numOfPartitions. Otherwise, there is an error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't we just have backendNames alone and the size tells how many partitions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It also works. The reason I added this separate field is: I think it is clear for people just use a number directly as a reference. The we compare the size of partition names and backends in case something is missing by accident. |
||
std::vector<std::string> backendNames; | ||
/// The name for each partition. partitionNames.size() == numOfPartitions. | ||
std::vector<std::string> partitionNames; | ||
/// The mapping between nodes' name to Partition ids. Assume there are n nodes | ||
/// and m partitions. We have 2 types of valid mapping: 1. all nodes are | ||
/// mapped to a partition. 2. For i-th (0 <= i < m) partition, the nodes | ||
/// mapped to this partition id are not in this map, and the nodes mapped to | ||
/// other partitions ids must be in this map. The node's name should be the | ||
/// name in Glow function and may be different from the original name from | ||
/// models. Since Glow will mangle names to make them unique. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not only to make them unique. We also e.g. replace / with _. Maybe something else. |
||
llvm::StringMap<size_t> nodeToPartition; | ||
|
||
PartitionConfig() : numOfPartitions(0) {} | ||
bool enabled() { return numOfPartitions > 0; } | ||
}; | ||
|
||
} // namespace runtime | ||
} // namespace glow | ||
#endif // GLOW_RUNTIME_RUNTIMETYPES_H |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,11 +22,10 @@ | |
#include "llvm/Support/Format.h" | ||
#include "llvm/Support/raw_ostream.h" | ||
|
||
#include <fstream> | ||
|
||
#include "llvm/Support/CommandLine.h" | ||
#include "llvm/Support/raw_ostream.h" | ||
|
||
#include <fstream> | ||
namespace glow { | ||
bool GlowEnableLoadBalancedPartitioning = false; | ||
static llvm::cl::opt<bool, /* ExternalStorage */ true> | ||
|
@@ -62,6 +61,20 @@ bool sortMinMemory(const std::pair<Function *, uint64_t> &a, | |
return a.second < b.second; | ||
} | ||
|
||
static void dumpPartitionInfo(const NodeToFunctionMap &partitions) { | ||
int i = 0; | ||
for (Function *subF : partitions.getPartitions()) { | ||
LOG(INFO) << "\t Partition " << i++ << ":\n" | ||
<< "\t\t Name :\t" << subF->getName().str() << "\n" | ||
<< "\t\t BackendKind :\t" | ||
<< partitions.getPartitionBackendName(subF) << "\n" | ||
<< "\t\t Memory :\t" | ||
<< partitions.getGraphMemInfo(subF).getTotalMemSize() << "\n" | ||
<< "\t\t LogicalDeviceIDs :\t" | ||
<< partitions.getLogicalDeviceIDList(subF)[0] << "\n"; | ||
} | ||
} | ||
|
||
void Partitioner::dumpDAG(llvm::StringRef dotFilename) const { | ||
if (partitions_.size() == 0) | ||
return; | ||
|
@@ -168,9 +181,10 @@ Partitioner::Partitioner(Module *parent, const std::vector<DeviceInfo> &devices, | |
} | ||
|
||
Partitioner::Partitioner(Module *parent, const std::vector<DeviceInfo> &devices, | ||
bool saturateHost, bool optimized) | ||
bool saturateHost, bool optimized, | ||
PartitionConfig partitionConfig) | ||
: module_(parent), deviceInfo_(devices), saturateHost_(saturateHost), | ||
optimized_(optimized) { | ||
optimized_(optimized), partitionConfig_(partitionConfig) { | ||
memSize_ = module_->getConstantsSize(); | ||
logicalDeviceID_ = 0; | ||
} | ||
|
@@ -1211,7 +1225,6 @@ llvm::Error Partitioner::QuantizationProfilingPartition( | |
module_->eraseFunction(F_); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible that the function can be renamed and will no longer match the name in partitionConfig? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. If partitionConfig is set, Partitioner will go to user-defined flow without considering other options like "-load-profiling". So later renaming in the rest flow won't affect the function name. |
||
std::unique_ptr<Backend> backend(createBackend(profilingBackend)); | ||
for (Function *subF : module_->getFunctions()) { | ||
(void)subF; | ||
assert(subF->verify() && "Conversion led to invalid function"); | ||
if (!optimized_) { | ||
RETURN_IF_ERR(::glow::optimizeFunction(subF, *backend, cctx)); | ||
|
@@ -1231,6 +1244,11 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) { | |
std::vector<std::unique_ptr<Backend>> backendHolder; | ||
getBackendMap(backendMap_, backendHolder, backends); | ||
|
||
if (partitionConfig_.enabled()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this go before There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will do it in front of selectRepFunc. However, we do need to run getBackendMap for later partition validation and logging. |
||
// Jump into user-defined partition, and skip the following auto partition. | ||
return PartitionFromConfig(); | ||
} | ||
|
||
// Step 0: Find the representative function for running partitioning | ||
// algorithm. | ||
F_ = selectRepFunc(module_, memSize_); | ||
|
@@ -1348,27 +1366,104 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) { | |
dumpDAG("DAG.dot"); | ||
} | ||
|
||
int i = 0; | ||
for (Function *subF : funcList) { | ||
(void)subF; | ||
if (logPartition) { | ||
LOG(INFO) << "\t Partition " << i << ":\n" | ||
<< "\t\t Name :\t" << subF->getName().str() << "\n" | ||
<< "\t\t BackendKind :\t" | ||
<< mapping.getPartitionBackendName(subF) << "\n" | ||
<< "\t\t Memory :\t" | ||
<< mapping.getGraphMemInfo(subF).getTotalMemSize() << "\n" | ||
<< "\t\t LogicalDeviceIDs :\t" | ||
<< mapping.getLogicalDeviceIDList(subF)[0] << "\n"; | ||
} | ||
if (dumpPartition) { | ||
subF->dumpDAG("partitionLogicalID" + | ||
std::to_string(mapping.getLogicalDeviceIDList(subF)[0]) + | ||
"__" + subF->getFilename() + "__" + | ||
mapping.getPartitionBackendName(subF) + ".dot"); | ||
} | ||
i++; | ||
assert(subF->verify() && "Conversion led to invalid function"); | ||
} | ||
if (logPartition) { | ||
dumpPartitionInfo(mapping); | ||
} | ||
return llvm::Error::success(); | ||
} | ||
|
||
llvm::Error Partitioner::PartitionFromConfig() { | ||
Function *F = module_->getFunction(partitionConfig_.funcName); | ||
RETURN_ERR_IF_NOT(F, strFormat("Can't find function %s in current module.", | ||
F->getName().str().data())); | ||
|
||
DCHECK(partitionConfig_.numOfPartitions == | ||
partitionConfig_.backendNames.size() && | ||
partitionConfig_.numOfPartitions == | ||
partitionConfig_.partitionNames.size()) | ||
<< "Invalid user-defined partition config."; | ||
|
||
NodeToFunctionMap partitionMap; | ||
std::vector<Function *> funcList; | ||
std::unordered_set<size_t> unused; | ||
std::vector<NodesSet> nodesSets(partitionConfig_.numOfPartitions); | ||
// Create partitions based on the given number and names. | ||
for (size_t i = 0; i < partitionConfig_.numOfPartitions; i++) { | ||
Function *newF = | ||
module_->createFunction(partitionConfig_.partitionNames[i]); | ||
This comment was marked as resolved.
Sorry, something went wrong. |
||
funcList.push_back(newF); | ||
partitionMap.createPartition(newF, partitionConfig_.backendNames[i]); | ||
unused.insert(i); | ||
} | ||
|
||
// Map the nodes the the partitions. | ||
std::vector<Node *> unMapped; | ||
for (auto &node : F->getNodes()) { | ||
auto iter = partitionConfig_.nodeToPartition.find(node.getName()); | ||
if (iter == partitionConfig_.nodeToPartition.end()) { | ||
// If a node in F is not in the node to partition mapping, put it into | ||
// unMaped list. | ||
unMapped.push_back(&node); | ||
} else { | ||
size_t partitionID = iter->second; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please check that partitionID is < partitionConfig_.numOfPartitions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
DCHECK(partitionID < partitionConfig_.numOfPartitions) | ||
<< "Invalid partition id :" << partitionID; | ||
partitionMap.add(&node, funcList[partitionID]); | ||
unused.erase(partitionID); | ||
nodesSets[partitionID].insert(&node); | ||
} | ||
} | ||
|
||
// If there is unused partition and unmapped nodes, map those nodes to the | ||
// unused partition. | ||
if (unMapped.size()) { | ||
DCHECK(unused.size() == 1) << "There must be exactly 1 unused partition."; | ||
auto partitionID = *(unused.begin()); | ||
for (auto &node : unMapped) { | ||
partitionMap.add(node, funcList[partitionID]); | ||
nodesSets[partitionID].insert(node); | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. else { There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There can be at most 1 unused partition (there is the check). Otherwise, it is invalid. Since it means we have empty functions in the module and will cause problems later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It is better to nail these problems early. Here we still can report what exactly is wrong: no nodes were mapped to a partitions #i, #j, #k... Later it will be difficult to track down where the empty function is coming from. In fact, it may fail with obscure error, so first debugging step would be to figure out that the function is empty, and only after that find out why it's empty. I really suggest to report as many problems with the partition as we can find. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree. So far, if there is empty function, it will cause the assertion and stop. But you are right, the error message is not enough. Will improve the debugging ability. Thanks! |
||
|
||
// Validate memory usage. | ||
for (size_t i = 0; i < partitionConfig_.numOfPartitions; i++) { | ||
GraphMemInfo cost = getGraphMemInfo(nodesSets[i]); | ||
partitionMap.setGraphMemInfo(funcList[i], cost); | ||
} | ||
RETURN_IF_ERR(memoryUsageValidation(partitionMap)); | ||
|
||
// Logical device ID validation. | ||
logicalDeviceID_ = assignLogicalDeviceID(partitionMap); | ||
RETURN_IF_ERR(logicalDevicesValidation(partitionMap)); | ||
|
||
// TODO : loop-free validation. | ||
|
||
// Do partition. | ||
doPartitioning(F->getName(), {F}, partitionMap, true); | ||
module_->eraseFunction(F); | ||
|
||
// Do optimization based on backendName. | ||
for (size_t i = 0; i < partitionConfig_.numOfPartitions; i++) { | ||
auto func = funcList[i]; | ||
assert(func->verify() && "Conversion led to invalid function"); | ||
std::unique_ptr<Backend> backend( | ||
createBackend(partitionConfig_.backendNames[i])); | ||
if (!optimized_) { | ||
CompilationContext cctx; | ||
RETURN_IF_ERR(::glow::optimizeFunction(func, *backend, cctx)); | ||
} | ||
} | ||
if (logPartition) { | ||
dumpPartitionInfo(partitionMap); | ||
} | ||
return llvm::Error::success(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about API. Why do we have to change the constructor, this common API used by everyone? Why can't we leave it as is, but pass
PartitionConfig
intoPartitionFromConfig()
method? It is a bit obscure why we acceptPartitionConfig
here, store it, but delay using it untilPartitionFromConfig()
is called.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is, we would like to leave the decision to Partitioner, and from all caller's the Partitioner interface is the same. Later, we are going to refactor the Partitioner. After that, the Partitioner will be a facade that dispatches to the desired subclass/method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? This cannot be achieved. Imagine that one caller has user-defined partitions struct, the other caller doesn't. It means that caller #1 must pass this struct to partitioner, the second caller has nothing to pass. In caller Caller #1 calls
PartitionFromConfig()
, caller #2 calls something else. So interface is already different.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For caller 1, the code should be:
For Caller 2, the code is :
The interface for both caller 1 and 2 is Partition(cctx). The partitionConfig is passed when the Partitioner is initialed. So in Partition(), it will check if "partitionConfig" is enabled, if so, PartitionFromConfig() will be called. Otherwise, just continue the auto-partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see that it's already bad API. Let's say caller 1 has only
partitionConfig
. But now it has to passfalse, false
for some other 2 flags. Caller 1 has to figure out what is saturation, and what to pass for corresponding flag to disable.It would be a good practice to make sure that incompatible flags are not passed together. E.g. saturation=false when user defined partition. But these checks can quickly explode if we have more flags/modes.
What if we later introduce another feature, third mode for partitioner? Say, PyTorch-defined partition? Then caller 3 will have to figure out what to pass for partitionConfig argument to disable user-defined partition feature.
This is why I think we should never accept a whole all-inclusive list of args in one function (in this case, partitioner constructor). I was going to propose create different method for user defined partition. But if you are going to refactor everything later into different classes, I'm also fine with that.