[Importer] Add C2 importer support for RWQ SLWS/SLS #2292

jfix71 · 2019-01-23T21:55:38Z

Description: Add support in the Caffe2 model loader for SparseLengthsWeightedSum8BitsRowwise and SparseLengthsSum8BitsRowwise.

Testing: Added unit tests

Documentation: N/A

Related to #1698

jfix71 · 2019-01-23T22:59:25Z

Note that the way this and the SLWS Node are implemented requires duplicating the Data input to the SLWS node -- right now it strips out the fused scale/offset to create three separate Constant inputs to the node (unfused Data, Scales, and Offsets) when loading the model in order to create the SLWS Node.

This could be problematic, as Data could be very large and duplicating it could use too much host memory. We may want to use something like "offline Constants" here (deferring loading/transforming the fused Data Constant until copying the Constants to an accelerator) if it becomes an issue.

rdzhabarov

Cool. It would be great to compare C2 output and Glow for these ops (to ensure correctness).

tests/models/caffe2Models/fused_rowwise_quantized_sparse_lengths_sum_predict_net.pbtxt

lib/Importer/Caffe2ModelLoader.cpp

tests/unittests/Caffe2ImporterTest.cpp

artemrakhov-glow

@stoklund once made a comment that it is more efficient to store each row's scale and offset together with the data, so that we'd not have to do 2 extra fetches of size 1.

Caffe2 actually has 2 operators implementing quantized SLS, and I suspect for this exact reason.

This one stores scales and offsets separately. It would be very easy to support now, because you've added RowwiseQuantizedSparseLengthsWeightedSumNode.
https://caffe2.ai/docs/operators-catalogue.html#sparselengthssum8bitsrowwise
This one fuses scales and offsets, I think we have no other choice than supporting it as a different new node.
https://caffe2.ai/docs/operators-catalogue.html#sparselengthssumfused8bitrowwise

lib/Importer/Caffe2ModelLoader.cpp

jfix71 · 2019-01-24T04:06:51Z

@artemrakhov My understanding from @stoklund was that fusing was for efficiency at execution for an accelerator, not compile time. Backends that need it fused could always re-fuse it later on.

I was thinking this could be made more efficient/possible with huge Constants using the "offline Constants" approach we previously discussed, which seemed to be something we may need for other uses in the future too. If so, this unfusing wouldn't happen in Caffe2ModelLoader anymore -- instead you would add an "unfuse" expression, and later on a backend could add a "fuse" expression, and they would cancel each other out, meaning the fused data stays fused the whole time.

For fused Tensors in Glow, like Caffe2 uses -- it feels pretty hacky to me, just modifying the shape to include per-row scales/offsets and assume all users of fused Tensors know what's going on. If we want to go down that path, I'd prefer creating a fused Tensor class which has the correct shape and implements less efficient at(), raw() etc. that take into account the scales/offsets data fused in.

artemrakhov-glow · 2019-01-24T19:31:25Z

I agree, fusing is done for efficiency at execution, not compile time. At execution time, we have to fetch a lot of rows from Data and we are limited by memory bandwidth. Having to fetch 2 more single numbers per each row will increase memory pressure. This is true for every inference platform.

I see your point about fusing later per Backend request. But in reality this may take a long time and consume extra memory, due to the size of these Tensors. We'll definitely need a concept of deferred (lazy) optimizations, which will cancel each other out in this case.

I don't think that fused op is hacky. Caffe2 actually implements both. I think it's fine for all backends to be aware of this fused format.

stale · 2019-01-30T12:16:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

jfix71 · 2019-02-02T01:42:17Z

Update: After an offline discussion we decided we should implement both fused and unfused nodes. Ideally higher up in the stack we will ensure that Glow receives the weights either in fused or unfused format, depending on what the backend of the host prefers.

I will update this PR to have protos that are not fused + update the Caffe2Importer case, and then will put up future PRs to support the fused version of RWQ-SLWS/protos.

stale · 2019-02-07T05:44:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

jfix71 · 2019-02-08T02:14:05Z

I've updated the PR to not used fused scale/offsets. In a future PR I will add in new nodes for fused, as well as Caffe2 importer support for fused.

*Description*: As noted in #2292, we decided to implement both fused and unfused versions of rowwise-quantized SLWS. *Testing*: Added OperatorTests and Caffe2ImporterTests. *Documentation*: Added. Closes #1698

facebook-github-bot added the CLA Signed label Jan 23, 2019

jfix71 force-pushed the importer_rw_q_sls branch from 7efa7c4 to 0cd5e54 Compare January 23, 2019 22:15

rdzhabarov reviewed Jan 23, 2019

View reviewed changes

tests/models/caffe2Models/fused_rowwise_quantized_sparse_lengths_sum_predict_net.pbtxt Outdated Show resolved Hide resolved

lib/Importer/Caffe2ModelLoader.cpp Outdated Show resolved Hide resolved

lib/Importer/Caffe2ModelLoader.cpp Outdated Show resolved Hide resolved

rdzhabarov approved these changes Jan 23, 2019

View reviewed changes

jfix71 force-pushed the importer_rw_q_sls branch from 0cd5e54 to 70300cf Compare January 23, 2019 23:39

nadavrot reviewed Jan 24, 2019

View reviewed changes

tests/unittests/Caffe2ImporterTest.cpp Show resolved Hide resolved

artemrakhov-glow reviewed Jan 24, 2019

View reviewed changes

lib/Importer/Caffe2ModelLoader.cpp Outdated Show resolved Hide resolved

stale bot added the stale_will_be_closed label Jan 30, 2019

jfix71 removed the stale_will_be_closed label Feb 2, 2019

stale bot added the stale_will_be_closed label Feb 7, 2019

jfix71 force-pushed the importer_rw_q_sls branch from 70300cf to 679c8c5 Compare February 8, 2019 02:12

stale bot removed the stale_will_be_closed label Feb 8, 2019

[Importer] Add C2 importer support for RWQ SLWS/SLS

c30905a

jfix71 force-pushed the importer_rw_q_sls branch from 679c8c5 to c30905a Compare February 8, 2019 04:22

jfix71 changed the title ~~[Importer] Add C2 importer support for fused RWQ SLWS/SLS~~ [Importer] Add C2 importer support for RWQ SLWS/SLS Feb 8, 2019

jfix71 merged commit b1c130c into pytorch:master Feb 8, 2019

jfix71 deleted the importer_rw_q_sls branch February 8, 2019 18:11

jfix71 mentioned this pull request Feb 9, 2019

[New Operator] FusedRowwiseQuantizedSparseLengthsWeightedSumNode #2368

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Importer] Add C2 importer support for RWQ SLWS/SLS #2292

[Importer] Add C2 importer support for RWQ SLWS/SLS #2292

jfix71 commented Jan 23, 2019 •

edited

Loading

jfix71 commented Jan 23, 2019

rdzhabarov left a comment

artemrakhov-glow left a comment

jfix71 commented Jan 24, 2019

artemrakhov-glow commented Jan 24, 2019

stale bot commented Jan 30, 2019

jfix71 commented Feb 2, 2019

stale bot commented Feb 7, 2019

jfix71 commented Feb 8, 2019

[Importer] Add C2 importer support for RWQ SLWS/SLS #2292

[Importer] Add C2 importer support for RWQ SLWS/SLS #2292

Conversation

jfix71 commented Jan 23, 2019 • edited Loading

jfix71 commented Jan 23, 2019

rdzhabarov left a comment

Choose a reason for hiding this comment

artemrakhov-glow left a comment

Choose a reason for hiding this comment

jfix71 commented Jan 24, 2019

artemrakhov-glow commented Jan 24, 2019

stale bot commented Jan 30, 2019

jfix71 commented Feb 2, 2019

stale bot commented Feb 7, 2019

jfix71 commented Feb 8, 2019

jfix71 commented Jan 23, 2019 •

edited

Loading