-
Notifications
You must be signed in to change notification settings - Fork 699
[Partitioner] User-defined partition #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/Partitioner/Partitioner.cpp
Outdated
@@ -52,6 +50,8 @@ static llvm::cl::opt<bool> | |||
llvm::cl::desc("Enable dumping the graph of each partitions"), | |||
llvm::cl::init(false), llvm::cl::cat(PartitionerCat)); | |||
|
|||
#include <fstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like fstream
is used. (If it is, put it with the other #include
s above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a handful of nits before landing. Thanks for prioritizing this :)
@@ -1235,6 +1250,11 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) { | |||
// algorithm. | |||
F_ = selectRepFunc(module_, memSize_); | |||
|
|||
if (partitionConfig_.enabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this go before selectRepFunc
(and maybe before getBackendMap
)? It seems like it would circumvent the whole rest of the partitioning process so it'd be good to not do any irrelevant work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do it in front of selectRepFunc. However, we do need to run getBackendMap for later partition validation and logging.
lib/Partitioner/Partitioner.cpp
Outdated
@@ -1348,27 +1368,97 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) { | |||
dumpDAG("DAG.dot"); | |||
} | |||
|
|||
int i = 0; | |||
for (Function *subF : funcList) { | |||
(void)subF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only semi-related to this diff but I don't think (void)subF;
is needed anymore, since subF
is legitimately used below, even in Debug mode.
include/glow/Runtime/RuntimeTypes.h
Outdated
/// a partition. 2. For i-th (0 <= i < m) partition, the nodes mapped to this | ||
/// partition id are not in this map, and the nodes mapped to other partitions | ||
/// ids must be in this map. | ||
llvm::StringMap<size_t> nodeToParitition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp: Paritition
-> Partition
lib/Partitioner/Partitioner.cpp
Outdated
} | ||
} | ||
|
||
// Memory usage Validatian. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sp and inconsistent caps. // Validate memory usage.
would read better.
lib/Partitioner/Partitioner.cpp
Outdated
} | ||
RETURN_IF_ERR(memoryUsageValidation(partitionMap)); | ||
|
||
// Logical device ID validataion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp: validation
lib/Partitioner/Partitioner.cpp
Outdated
logicalDeviceID_ = assignLogicalDeviceID(partitionMap); | ||
RETURN_IF_ERR(logicalDevicesValidation(partitionMap)); | ||
|
||
// TODO : loop-free validataion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp: validation
include/glow/Runtime/RuntimeTypes.h
Outdated
@@ -171,6 +171,28 @@ struct HostConfig { | |||
size_t executorThreads{3}; | |||
}; | |||
|
|||
/// This is struct is for user defined partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This is struct is for user defined partition. | |
/// This is struct for user defined partition. |
include/glow/Runtime/RuntimeTypes.h
Outdated
@@ -171,6 +171,28 @@ struct HostConfig { | |||
size_t executorThreads{3}; | |||
}; | |||
|
|||
/// This is struct is for user defined partition. | |||
struct PartitionConfig { | |||
/// The name of the functiton to be partitioned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The name of the functiton to be partitioned. | |
/// The name of the function to be partitioned. |
include/glow/Runtime/RuntimeTypes.h
Outdated
struct PartitionConfig { | ||
/// The name of the functiton to be partitioned. | ||
std::string funcName; | ||
/// The number of user defined partitions. The partition ids are [0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The number of user defined partitions. The partition ids are [0, | |
/// The number of user defined partitions. | |
/// The partition ids are between 0 and numOfPartitions - 1, inclusive. |
include/glow/Runtime/RuntimeTypes.h
Outdated
/// a partition. 2. For i-th (0 <= i < m) partition, the nodes mapped to this | ||
/// partition id are not in this map, and the nodes mapped to other partitions | ||
/// ids must be in this map. | ||
llvm::StringMap<size_t> nodeToParitition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::StringMap<size_t> nodeToParitition; | |
llvm::StringMap<size_t> nodeToPartition; |
include/glow/Runtime/RuntimeTypes.h
Outdated
std::vector<std::string> backendNames; | ||
/// The name for each partition. partitionNames.size() == numOfPartitions. | ||
std::vector<std::string> partitionNames; | ||
/// The mapping between nodes and Partition ids. Assume there are n nodes and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a map between node's name and unsigned partition number? It wasn't clear to me initially that name is used here. Also, need to clarify that Glow mangles names, meaning that Glow's node name is never the same as name from Caffe2 protobuf.
lib/Partitioner/Partitioner.cpp
Outdated
NodeToFunctionMap partitionMap; | ||
std::vector<Function *> funcList; | ||
std::unordered_set<size_t> unUsed; | ||
std::vector<NodesSet> nodesSets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<NodesSet> nodesSets; | |
std::vector<NodesSet> nodesSets(partitionConfig_.numOfPartitions); |
Then you won't need nodesSets.push_back(NodesSet{});
later.
lib/Partitioner/Partitioner.cpp
Outdated
} | ||
} | ||
|
||
// If there is un used partition and un-mapped nodes, map those nodes to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If there is un used partition and un-mapped nodes, map those nodes to the | |
// If there is unused partition and unmapped nodes, map those nodes to the |
// unMaped list. | ||
unMapped.push_back(&node); | ||
} else { | ||
size_t partitionID = iter->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that partitionID is < partitionConfig_.numOfPartitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/Partitioner/Partitioner.cpp
Outdated
} | ||
|
||
// If there is un used partition and un-mapped nodes, map those nodes to the | ||
// un-used partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// un-used partition. | |
// unused partition. |
lib/Partitioner/Partitioner.cpp
Outdated
// If there is un used partition and un-mapped nodes, map those nodes to the | ||
// un-used partition. | ||
if (unMapped.size()) { | ||
assert(unUsed.size() && "Missing node to partition mapping."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(unUsed.size() && "Missing node to partition mapping."); | |
assert(unUsed.size() == 1 && "There must be exactly 1 unused partition."); |
bfb931b
to
c21d412
Compare
@@ -209,6 +209,9 @@ class Partitioner { | |||
// heterogeneous partition. | |||
bool optimized_; | |||
|
|||
// The struct contain user-defined partition info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doxygen style and above.
/// The number of user defined partitions. | ||
/// The partition ids are between 0 and numOfPartitions - 1, inclusive. | ||
size_t numOfPartitions; | ||
/// The backend for each partition. backendNames.size() == numOfPartitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is backendNames.size() == numOfPartitions. Why do we need numOfPartitions at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means backendNames.size() should be equal to numOfPartitions. Otherwise, there is an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just have backendNames alone and the size tells how many partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also works. The reason I added this separate field is: I think it is clear for people just use a number directly as a reference. The we compare the size of partition names and backends in case something is missing by accident.
lib/Partitioner/Partitioner.cpp
Outdated
|
||
llvm::Error Partitioner::PartitionFromConfig() { | ||
Function *F = module_->getFunction(partitionConfig_.funcName); | ||
RETURN_ERR_IF_NOT(F, "Can't find the function in current module."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, add function name to the error message.
lib/Partitioner/Partitioner.cpp
Outdated
|
||
NodeToFunctionMap partitionMap; | ||
std::vector<Function *> funcList; | ||
std::unordered_set<size_t> unUsed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused.
// Create partitions based on the given number and names. | ||
for (size_t i = 0; i < partitionConfig_.numOfPartitions; i++) { | ||
Function *newF = | ||
module_->createFunction(partitionConfig_.partitionNames[i]); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
lib/Partitioner/Partitioner.cpp
Outdated
unMapped.push_back(&node); | ||
} else { | ||
size_t partitionID = iter->second; | ||
assert(partitionID < partitionConfig_.numOfPartitions && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no asserts. DCHECK, add partition id to the error message.
lib/Partitioner/Partitioner.cpp
Outdated
// If there is unused partition and unmapped nodes, map those nodes to the | ||
// unused partition. | ||
if (unMapped.size()) { | ||
assert(unUsed.size() == 1 && "There must be exactly 1 unused partition."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
tests/unittests/PartitionerTest.cpp
Outdated
@@ -886,3 +886,28 @@ TEST_F(PartitionerTest, memoryUsageValidation1) { | |||
auto err = myPartitioner.Partition(cctx); | |||
EXPECT_TRUE(errToBool(std::move(err))); | |||
} | |||
|
|||
/// This one tests partition from a user-defined config. | |||
TEST_F(PartitionerTest, partitionFromConfig1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: partitionFromConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beicy has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@@ -1211,7 +1225,6 @@ llvm::Error Partitioner::QuantizationProfilingPartition( | |||
module_->eraseFunction(F_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the function can be renamed and will no longer match the name in partitionConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If partitionConfig is set, Partitioner will go to user-defined flow without considering other options like "-load-profiling". So later renaming in the rest flow won't affect the function name.
/// mapped to this partition id are not in this map, and the nodes mapped to | ||
/// other partitions ids must be in this map. The node's name should be the | ||
/// name in Glow function and may be different from the original name from | ||
/// models. Since Glow will mangle names to make them unique. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not only to make them unique. We also e.g. replace / with _. Maybe something else.
You may write: Since Glow will mangle names according to its own conventions.
Partitioner(Module *parent, const std::vector<DeviceInfo> &devices, | ||
bool saturateHost = false, bool optimized = false); | ||
bool saturateHost = false, bool optimized = false, | ||
PartitionConfig partitionConfig = PartitionConfig()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see that it's already bad API. Let's say caller 1 has only partitionConfig
. But now it has to pass false, false
for some other 2 flags. Caller 1 has to figure out what is saturation, and what to pass for corresponding flag to disable.
It would be a good practice to make sure that incompatible flags are not passed together. E.g. saturation=false when user defined partition. But these checks can quickly explode if we have more flags/modes.
What if we later introduce another feature, third mode for partitioner? Say, PyTorch-defined partition? Then caller 3 will have to figure out what to pass for partitionConfig argument to disable user-defined partition feature.
This is why I think we should never accept a whole all-inclusive list of args in one function (in this case, partitioner constructor). I was going to propose create different method for user defined partition. But if you are going to refactor everything later into different classes, I'm also fine with that.
partitionMap.add(node, funcList[partitionID]); | ||
nodesSets[partitionID].insert(node); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else {
what if there are unused partitions? Meaning that some partitions are empty. Is it valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be at most 1 unused partition (there is the check). Otherwise, it is invalid. Since it means we have empty functions in the module and will cause problems later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will cause problems later
It is better to nail these problems early. Here we still can report what exactly is wrong: no nodes were mapped to a partitions #i, #j, #k... Later it will be difficult to track down where the empty function is coming from. In fact, it may fail with obscure error, so first debugging step would be to figure out that the function is empty, and only after that find out why it's empty. I really suggest to report as many problems with the partition as we can find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. So far, if there is empty function, it will cause the assertion and stop. But you are right, the error message is not enough. Will improve the debugging ability. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Great work!
@artemrakhov Thanks a lot for the suggestion on the interface. Let's have more discussion about the interface offline. |
Summary:
This PR added user-defined partition flow. Basically, a struct "PartitionConfig" containing the partition info is passed into Partitioner to enable this flow. Now we let users have the full control of how to do the partitioning.
To use this flow, users can write their helper function to generate PartitionConfig, and call Partitioner directly.
In the following PR, we will add passing PartitionConfig through HostManager from a yaml file.
Related to #2298
Documentation:
[Optional Fixes #issue]
Test Plan:
Added unittest. ninja test.
Please see a detailed explanation of how to fill out the fields in the relevant sections in PULL_REQUEST.md.