-
Notifications
You must be signed in to change notification settings - Fork 699
[Importer] Add C2 importer support for RWQ SLWS/SLS #2292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7efa7c4
to
0cd5e54
Compare
Note that the way this and the SLWS Node are implemented requires duplicating the This could be problematic, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. It would be great to compare C2 output and Glow for these ops (to ensure correctness).
tests/models/caffe2Models/fused_rowwise_quantized_sparse_lengths_sum_predict_net.pbtxt
Outdated
Show resolved
Hide resolved
0cd5e54
to
70300cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stoklund once made a comment that it is more efficient to store each row's scale and offset together with the data, so that we'd not have to do 2 extra fetches of size 1.
Caffe2 actually has 2 operators implementing quantized SLS, and I suspect for this exact reason.
- This one stores scales and offsets separately. It would be very easy to support now, because you've added
RowwiseQuantizedSparseLengthsWeightedSumNode
.
https://caffe2.ai/docs/operators-catalogue.html#sparselengthssum8bitsrowwise - This one fuses scales and offsets, I think we have no other choice than supporting it as a different new node.
https://caffe2.ai/docs/operators-catalogue.html#sparselengthssumfused8bitrowwise
@artemrakhov My understanding from @stoklund was that fusing was for efficiency at execution for an accelerator, not compile time. Backends that need it fused could always re-fuse it later on. I was thinking this could be made more efficient/possible with huge Constants using the "offline Constants" approach we previously discussed, which seemed to be something we may need for other uses in the future too. If so, this unfusing wouldn't happen in For fused Tensors in Glow, like Caffe2 uses -- it feels pretty hacky to me, just modifying the shape to include per-row scales/offsets and assume all users of fused Tensors know what's going on. If we want to go down that path, I'd prefer creating a fused Tensor class which has the correct shape and implements less efficient |
I agree, fusing is done for efficiency at execution, not compile time. At execution time, we have to fetch a lot of rows from I see your point about fusing later per Backend request. But in reality this may take a long time and consume extra memory, due to the size of these Tensors. We'll definitely need a concept of deferred (lazy) optimizations, which will cancel each other out in this case. I don't think that fused op is hacky. Caffe2 actually implements both. I think it's fine for all backends to be aware of this fused format. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Update: After an offline discussion we decided we should implement both fused and unfused nodes. Ideally higher up in the stack we will ensure that Glow receives the weights either in fused or unfused format, depending on what the backend of the host prefers. I will update this PR to have protos that are not fused + update the Caffe2Importer case, and then will put up future PRs to support the fused version of RWQ-SLWS/protos. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
70300cf
to
679c8c5
Compare
I've updated the PR to not used fused scale/offsets. In a future PR I will add in new nodes for fused, as well as Caffe2 importer support for fused. |
679c8c5
to
c30905a
Compare
Description: Add support in the Caffe2 model loader for
SparseLengthsWeightedSum8BitsRowwise
andSparseLengthsSum8BitsRowwise
.Testing: Added unit tests
Documentation: N/A
Related to #1698