[Quant] Add FX support in quantization examples #5797

andrewor14 · 2022-04-10T03:15:37Z

Stack from ghstack (oldest at bottom):

-> [Quant] Add FX support in quantization examples #5797

Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

# ==================== PTQ ====================
# MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d,
# shufflenet_v2_x0_5, shufflenet_v2_x1_0

# eager
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# fx
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# ==================== QAT ====================
# mobilenet_v2 eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v2 fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

# mobilenet_v3_large eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v3_large fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 7e77c46 Pull Request resolved: #5797

datumbox

Thanks for the work @andrewor14. Some initial questions concerning the new API below.

torchvision/models/quantization/mobilenetv3.py

references/classification/train_quantization.py

torchvision/models/quantization/mobilenetv3.py

torchvision/models/quantization/utils.py

references/classification/train_quantization.py

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 50b3f48 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 7e77c46 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 50b3f48 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: ab5e469 Pull Request resolved: #5797

jerryzh168

looks great! can you include the accuracy numbers in the summary and update the docs that shows accuracy as well

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: ab5e469 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 6308d7b Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: ab5e469 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 6308d7b Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` # MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, # shufflenet_v2_x0_5, shufflenet_v2_x1_0, mobilenet_v2, mobilenet_v3_large # eager python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm" --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # fx python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm" --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # eager QAT (mobilenet only) python train_quantization.py --device="cuda" --backend="fbgemm" --model="$MODEL" --epochs=10 --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # fx QAT (mobilenet only) python train_quantization.py --device="cuda" --backend="fbgemm" --model="$MODEL" --epochs=10 --weights="IMAGENET1K_V1" --quantization-workflow-type="fx_graph_mode_quantization" ``` Results: - "Before" column refers to accuracies reported [here](https://github.com/pytorch/vision/blob/main/docs/source/models.rst#quantized-models) - TODO: Add results for QAT mobilenet after it's done <img width="641" alt="Screen Shot 2022-04-12 at 10 58 01 PM" src="https://user-images.githubusercontent.com/2133137/163091177-e1c1c666-c3f7-40c3-8866-c0743c264721.png"> Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. TODO: provide accuracy comparison. Test Plan: python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL' model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0 Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 037e999 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 6308d7b Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` # ==================== PTQ ==================== # MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, # shufflenet_v2_x0_5, shufflenet_v2_x1_0 # eager python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # fx python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # ==================== QAT ==================== # mobilenet_v2 eager python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" # mobilenet_v2 fx python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" # mobilenet_v3_large eager python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" # mobilenet_v3_large fx python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: d3a5291 Pull Request resolved: #5797

andrewor14 · 2022-04-14T20:40:04Z

There was a problem with the way I set up the FX experiments. After fixing this problem I uncovered a new blocking issue that I summarized here pytorch/pytorch#75825. I will rerun all the experiments after we fix that issue.

datumbox

Thanks for the update. I've added some additional comments and questions. Let me know your thoughts.

references/classification/train_quantization.py

docs/source/models.rst

datumbox

Just marking as "Request changes" to avoid accidental merges while we clarify some remaining questions.

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 6308d7b Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` # ==================== PTQ ==================== # MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, # shufflenet_v2_x0_5, shufflenet_v2_x1_0 # eager python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # fx python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # ==================== QAT ==================== # mobilenet_v2 eager python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" # mobilenet_v2 fx python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" # mobilenet_v3_large eager python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" # mobilenet_v3_large fx python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: d3a5291 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 166a8e8 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: d3a5291 Pull Request resolved: #5797

andrewor14 · 2022-04-22T00:58:22Z

Hi @datumbox, continuing our discussion here:

What steps if any do I have to take to load PTQ/QAT weights?

I'm trying to figure out what are the steps for loading pre-trained quantized weights to a model. Can this be done directly to the original model without any conversions? Or some FX convert call is needed prior loading?

For the test_only mode, we use quantized weights so convert must happen before loading the weights. For PTQ and QAT, however, we use pretrained FP32 weights so convert must happen after loading the weights. This is the same for eager mode and FX graph mode. In some more detail, here is the behavior today (pseudocode):

# PTQ / QAT eager
weights = get_pretrained_fp32_weights()
model = torchvision.models.quantization.QuantizableResNet50(...)
model.load_state_dict(weights.get_state_dict())
model = prepare(model) # or prepare_qat in QAT mode
calibrate(model, data) # or train in QAT mode
model = convert(model)
evaluate(model)

# test_only eager
model = torchvision.models.quantization.QuantizableResNet50(...)
model = prepare(model)
calibrate(model, dummy_data)
model = convert(model)
weights = get_quantized_weights()
model.load_state_dict(weights.get_state_dict())
evaluate(model)

What we want is the equivalent for FX. The only differences here are (1) We use prepare_fx (and the like) instead of prepare, and (2) We use the unmodified FP32 model (e.g. torchvision.models.ResNet50) instead of the model with manually inserted quant and dequant stubs (e.g. torchvision.models.quantization.QuantizableResNet50).

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 166a8e8 Pull Request resolved: #5797

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` # ==================== PTQ ==================== # MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d, # shufflenet_v2_x0_5, shufflenet_v2_x1_0 # eager python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # fx python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" # ==================== QAT ==================== # mobilenet_v2 eager python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" # mobilenet_v2 fx python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" # mobilenet_v3_large eager python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" # mobilenet_v3_large fx python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` <img width="638" alt="Screen Shot 2022-04-21 at 8 33 41 PM" src="https://user-images.githubusercontent.com/2133137/164572469-5848c86b-0813-42f6-bcb7-0298ff4bb25b.png"> Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo [ghstack-poisoned]

Summary: Previously, the quantization examples use only eager mode quantization. This commit adds support for FX mode quantization as well. Test Plan: ``` python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\ --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\ --quantization-workflow-type="fx_graph_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="eager_mode_quantization" python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\ --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\ --quantization-workflow-type="fx_graph_mode_quantization" ``` Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo ghstack-source-id: 20dba5e Pull Request resolved: #5797

torchvision/models/quantization/mobilenetv3.py

datumbox · 2022-04-22T09:49:21Z

docs/source/models.rst

+ResNet 50                         75.802         92.764
+ResNext 101 32x8d                 79.020         94.468
+Inception V3                      77.206         93.576
+GoogleNet                         69.702         89.388


Worth noting that we need to estimate these values with batch-size=1 on 1 GPU to avoid variances introduced due to batch padding (see #4559).

datumbox · 2022-04-22T09:50:32Z

@andrewor14 Thanks a lot for your work on this. Let's discuss the design doc you shared with me offline to decide on the approach and minimize throw away work. :)

andrewor14 · 2022-04-26T23:25:36Z

As discussed offline, I'm closing this PR for now since the FX graph mode quantization API is subject to change. I will reopen this PR again and rerun all the experiments once the new API is ready. Thank you everyone for your comments so far!

facebook-github-bot added the cla signed label Apr 10, 2022

andrewor14 marked this pull request as draft April 10, 2022 03:16

datumbox self-requested a review April 10, 2022 07:22

andrewor14 marked this pull request as ready for review April 11, 2022 03:54

datumbox reviewed Apr 11, 2022

View reviewed changes

torchvision/models/quantization/mobilenetv3.py Show resolved Hide resolved

references/classification/train_quantization.py Outdated Show resolved Hide resolved

vkuzo reviewed Apr 11, 2022

View reviewed changes

torchvision/models/quantization/mobilenetv3.py Outdated Show resolved Hide resolved

vkuzo reviewed Apr 11, 2022

View reviewed changes

torchvision/models/quantization/utils.py Outdated Show resolved Hide resolved

vkuzo reviewed Apr 11, 2022

View reviewed changes

torchvision/models/quantization/utils.py Outdated Show resolved Hide resolved

vkuzo reviewed Apr 11, 2022

View reviewed changes

references/classification/train_quantization.py Outdated Show resolved Hide resolved

andrewor14 requested review from vkuzo, jerryzh168 and datumbox April 12, 2022 16:35

jerryzh168 approved these changes Apr 13, 2022

View reviewed changes

andrewor14 mentioned this pull request Apr 14, 2022

[Quant][fx] get_default_qconfig_dict doesn't work with fused modules pytorch/pytorch#75825

Closed

datumbox reviewed Apr 19, 2022

View reviewed changes

docs/source/models.rst Outdated Show resolved Hide resolved

datumbox suggested changes Apr 19, 2022

View reviewed changes

datumbox mentioned this pull request Apr 21, 2022

Clean up purely informational fields from Weight Meta-data #5852

Merged

datumbox reviewed Apr 22, 2022

View reviewed changes

torchvision/models/quantization/mobilenetv3.py Show resolved Hide resolved

datumbox reviewed Apr 22, 2022

View reviewed changes

andrewor14 closed this Apr 26, 2022

datumbox deleted the gh/andrewor14/1/head branch April 27, 2022 07:50

datumbox mentioned this pull request Apr 27, 2022

Document MobileNetV3 quantization approach #5891

Merged

datumbox mentioned this pull request Sep 8, 2022

add quantized vision transformer model #6545

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quant] Add FX support in quantization examples #5797

[Quant] Add FX support in quantization examples #5797

andrewor14 commented Apr 10, 2022 •

edited

Loading

datumbox left a comment

jerryzh168 left a comment

andrewor14 commented Apr 14, 2022

datumbox left a comment

datumbox left a comment

andrewor14 commented Apr 22, 2022

datumbox Apr 22, 2022

datumbox commented Apr 22, 2022

andrewor14 commented Apr 26, 2022

[Quant] Add FX support in quantization examples #5797

[Quant] Add FX support in quantization examples #5797

Conversation

andrewor14 commented Apr 10, 2022 • edited Loading

datumbox left a comment

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

andrewor14 commented Apr 14, 2022

datumbox left a comment

Choose a reason for hiding this comment

datumbox left a comment

Choose a reason for hiding this comment

andrewor14 commented Apr 22, 2022

datumbox Apr 22, 2022

Choose a reason for hiding this comment

datumbox commented Apr 22, 2022

andrewor14 commented Apr 26, 2022

andrewor14 commented Apr 10, 2022 •

edited

Loading