Skip to content

Commit 362bd03

Browse files
WafaaTyanbing-jjianan-guWeizhuoZhang-inteldmsuehir
authored
sync with r2.10 (#1001)
* [RNN-T training] Enable FP32 gemm using oneDNN (#531) * Update the Readme guide for distilbert (#534) * Update the Readme guide for distilbert * Fix accuracy grep bug, and grep accuracy for distilbert Co-authored-by: Weizhuo Zhang <[email protected]> * Update end2end public dockerfile to look for IPEX in the conda directory (#535) * Notebook to script conversion example (#516) * Add notebook script conversion example * Fixed doc * Replaces custom preprocessor with built-in one * Changed tag to remove_for_custom_dataset * Add URL check prior to calling urlretrieve (#538) * Add URL check prior to calling urlretrieve Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo Signed-off-by: Abolfazl Shahbazi <[email protected]> * disable for ssd since fused cat cat kernel is slow (#537) * fix bug when adding steps in rnnt inference (#528) * Fix and updates for TensorFlow WW18-2022 SPR (#542) * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for PyTorch WW14-2022 SPR (#543) * Fix and updates for PyTorch WW14-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates required to the base image Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547) * fix quickstart scripts, detect platform type, update to run with pytorch only * Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548) * Enable bert large multi stream inference (#554) * test bert multi stream module * enable input split and output concat for accuracy run * change the default num_streams batchsize cores to 56 * change ssd multi stream throughput to 1 core 1 batch * change the default parameter for rn50 ssd multi stream module * modify enable_ipex_for_squad.diff to align new multistream hint implementation * enable warmup and multi socket support * change default parameter for rn50 ssd multi stream inference * Add train-no-eval for rn50 pytorch (#555) * PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550) * Add h5py install to bert training dockerfile * documentation updates * update docs, and add input_preprocessing to the wrapper package * Update LD_PRELOAD trailing : * Fix syntax * removing unnecessary change * Update DLRM entrypoint * Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR * Fix syntax * increase shm-size to 10g * [RNN-T training] Update scripts -- run on 1S (#561) * Update maskrcnn training script to run on 1s (#562) * use single node to do ssd-rn34 training (#563) * Update training.sh (#564) * Update training.sh (#565) Use tcmalloc instead of jemalloc * use single node to do resnet50 training (#568) * add numactl -C and remove jit warm in main thread (#569) * Update unit-test.yml (#546) * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Fixed make command, updated pip install. Fixed make command to run from the root directory. Replaced pip install tox with a pip install -r requirements-tests.txt to install all dependencies for the tests. * Add tox to test dependencies. Added tox to the dependencies so that the Workflow and others may install it with pip install -r requirements-test.txt and be covered for running make lint and make unit-test. * Update unit-test.yml Changed 'make unit-test' to 'make unit_test' as that is the actual target defined in the Makefile. * Update unit-test.yml Changed apt-get install command. * re-enable int8 for api change (#579) * saperate fully convergency test from training test (#581) Co-authored-by: jianan-gu <[email protected]> * ssd enable new int8 (#580) * v1 * enable new int8 method * Revert "ssd enable new int8 (#580)" (#584) This reverts commit 9eb3211. * Revert "re-enable int8 for api change (#579)" (#583) This reverts commit 0bded92. * Update training script using 1s (#560) * Enable checkpoint during training for bert-large (#573) * minor fix * Add readme for enabling checkpoint * update phase1 to enable checkpoint by default * Update README.md * Enable ssd bf32 inference training (#589) * enable ssd bf32 inference * enable ssd bf32 train * enable RNN-T bf32 inference (#591) * Enable bf32 for bert and distilbert for inference (#593) * enable bf32 distilbert * enable bert bf32 * Enable RNN-T bf32 training (#594) * enable maskrcnn bf32 inference and training (#595) * enable resnet50 and resnext101 bf16 path (#596) * enable bert bf32 train (#600) * update resnet int8 path using new int8 api (#603) * re-enable int8 for api change (#604) Co-authored-by: jianan-gu <[email protected]> * Leslie/ssd enable new int8 (#605) * v1 * enable new int8 method * update json file * add rn50 int8 weight sharing Co-authored-by: Jiang, Xiaofei <[email protected]> * update ssd training bs to the multily of core numbers (#606) * enable bf32 for dlrm (#607) Co-authored-by: jianan-gu <[email protected]> * Update IPEX new int8 API enabling for distilbert/bert-large (#608) * enable distilbert * enable bert * fix max-ind-range and add memory info (#609) Co-authored-by: jianan-gu <[email protected]> * Remove debug code (#610) * update training steps (#611) * fix bandit scan fails (#612) * PYT Image recognition models support on Windows (#549) * fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX * [RNN-T training] fix bandit scan fails (#614) * RNN-T inference: fix IMZ Bandit scan fails (#615) * Update unit-test.yml (#570) Changed the docker user credential to utilize GitHub Secret. * MaskRCNN: fix IMZ Bandit scan fails (#623) * Fix for horovod-related failures in TF nightly runs (#613) * cpp17 horovod failure fix * minor debugging changes * minor fixes - directory name * cleanup * addressing reviewer comments * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624) * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Set 'HOROVOD_WITH_MPI=1' explicitly Signed-off-by: Abolfazl Shahbazi <[email protected]> * update GCC version to GCC 9 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'horovodrun --check-build' for sanity check Signed-off-by: Abolfazl Shahbazi <[email protected]> * removo force install inside Docker Signed-off-by: Abolfazl Shahbazi <[email protected]> * [RNN-T training] Fix ddp sample number issue (#625) * update BF32 usage (#627) * resnet50 training: add warm up before collecting time (#628) * image to bf16 (#629) * Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631) * Update mlpc path for SPR wheels * remove patch * Update Horovod commit id for BareMetal, Docker will be updated next (#630) Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix dlrm convergence and change training performance BS to 32K (#633) Co-authored-by: jianan-gu <[email protected]> * [RNN-T training] Merge sh files to one (#635) * update torch-ccl into 1.12 (#636) * Liangan1/update torch ccl version (#637) * Update torch_ccl version * resnet50_distributed_training: don't set MASTER_ADDR by user (#638) * Update torch_ccl in script (#639) * Enable offline download distilbert (#632) * enable offline download distilbert * add convert * Update README.md * add accuracy.py * add file * refine download * refine path * refine path * add license * Update dlrm_s_pytorch.py (#643) * Update README.md (#649) * init pytorch T5 language model (#648) * init pytorch T5 language model * update README.md * update doc * update fpn models (#650) * pytorch resnet50: directly call ipex.quantization (#653) * fix int8 accuracy (#655) Co-authored-by: Zhang, Weizhuo <[email protected]> * Made fixes to the broken links (#652) * Update Security Center URL (#657) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Weizhuoz/fix for pt 1.12 (#656) * fix vgg11_bn accuracy syntax error * remove exact_match from roberta-base * modify maskrcnn BS to 2*num_cores * Update dlrm_s_pytorch.py (#660) * Update dlrm_s_pytorch.py Reduce int8 memory usage. * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Add BF32 DDP for bert-large (#663) * Update run_ddp_bert_pretrain_phase1.sh * Update run_ddp_bert_pretrain_phase2.sh * Update README.md * move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664) minor changes * remove rn50 ao (#665) * Re-organize models list to be grouped by framework (#654) * re-organize models list to be grouped by framework * update tensorflow ssd-resnet34 training dataset * add T5 in benchmark/README.md * mannuel set torch num threads only for int8 (#666) * Update inference_performance.sh (#669) * improve ssdrn34 perf. (#671) * improve ssdrn34 perf. * minor update. * Fix linting Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix unit tests too Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Use IPEX Pytorch whls instead of building IPEX from source (#674) * Use IPEX Pytorch whls instead of building IPEX from source * Corrected the link to install pytorch/IPEX * Corrected the link to install pytorch/IPEX * Updated the link with latest tutorial to install pytorch/IPEX * Update docs/general/pytorch/BareMetalSetup.md Co-authored-by: Clayne Robison <[email protected]> * Update docs/general/pytorch/BareMetalSetup.md Co-authored-by: Clayne Robison <[email protected]> * Made the suggested tweaks in the names * Adding condition to install jemalloc and tcmalloc Co-authored-by: Clayne Robison <[email protected]> * Added condition to install jemalloc, tcmalloc, vision and torch-ccl * Added some tweaks Co-authored-by: Clayne Robison <[email protected]> * Lpot2inc (#446) * draft for lpot quantization and perf analysis jupyter notebook * update with formal name of model zoo, correct wrong words, add license in python file * rm empty line * renmae LPOT to INC in text and code, and use new api * Update README.md * Update set_env.sh * Update README.md * Update ut.sh * Update local_banchmark.sh * Create local_benchmark.sh * Update README.md * Update inc_for_tensorflow.ipynb * Update ut.sh * Update README.md * rename to local_benchmark.sh * Update ut.sh * Update ut.sh * Update run_jupyter.sh * Delete lpot_for_tensorflow.ipynb * Delete lpot_quantize_model.py * Update README.md * Update README.md * Update README.md * Update inc_for_tensorflow.ipynb * Update README.md * Update README.md * Update inc_for_tensorflow.ipynb * Update requirements.txt Co-authored-by: ltsai1 <[email protected]> * Sriniva2/ssd rn34 (#682) * improve ssdrn34 perf. * minor update. * enabling synthetic data. * Update base_benchmark_util.py * Fix linting error Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Add doc updates for '--synthetic-data' option (#683) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Change checkpoint setting for Bert train phase 1 (#602) * Change checkpoint setting for Bert train phase 1 * fix model and config saving * fix error when runing gpu path (#686) * fix load pretrained model error when using torch_ccl (#688) * update py version in base spec (#678) (#690) * TF addons upgrade to 0.17.1 (#689) (#691) * updated tf adons version * remove comment * Update Dockerfiles prior to IMZ 2.8 release (#693) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update Documents prior to IMZ 2.8 release (#694) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update README.md (#697) * change numpy version requirement (#703) * Remove MiniGo training from IMZ (#644) * remove MiniGo training scripts and unit test * [RNN-T] [Inference] optimize the batch decoder (#711) * reduce fill_ OP in rnnt embedding kernel * optimize add between int and log to reduce dtype conversion * rnnt: support dump tracing file and print profile table (#712) * add support for open SUSE leap operating system (#708) * rnnt inference: pre convert data to bf16 (#713) * remove squeeze/slice/transpose (#714) * update resnet50 training code (#710) * update resnet50 training code * not using ipex optimize for resnet50 training * use ipex.optimize() on the whole model (#718) * resnet50 bf32: calling ipex.optimize to enable bf32 path (#719) * Added batch size as an env variable to the quickstart scripts (#676) * WIP: Adding batch size as an environment variable to the quickstart scripts * Added instructions in README.md for all workloads * Update README.md * Corrected typo in launch_benchmark * Made corrections to .docs and ran model-builder * Delete .README.md.swp * Delete .fp32_accuracy.sh.swp * Update quickstart/image_segmentation/tensorflow/3d_unet_mlperf/inference/cpu/inference_throughput.sh Co-authored-by: Clayne Robison <[email protected]> * Update quickstart/language_translation/tensorflow/transformer_mlperf/inference/cpu/inference_realtime.sh Co-authored-by: Clayne Robison <[email protected]> * Update benchmarks/launch_benchmark.py Co-authored-by: Clayne Robison <[email protected]> * Made corrections to batch-size parameter * Made changes in launch_benchmark for batch-size arg * Made modifications to the README's * Resolved merge conflict by keeping README.md file. * Modified readme for windows * Resolved merge conflict by keeping README.md file. * Corrected SPR run.sh scripts * Removed echo from run.sh Co-authored-by: Clayne Robison <[email protected]> * Added batchsize as an env variable to quickstart scripts (#680) * Added batchsize as an env variable to quickstart scripts * Made modifications to .docs and scripts * Made modifications to README * Resolved merge conflict by incorporating both suggestions. * Made corrections in README.md * Made corrections in README.md * Undo changes in training.sh file * updated readme: nit fix (#723) Co-authored-by: Rahul Nair <[email protected]> * compute throughput by test_mini_batch_size (#740) * pytorch resnet50: fix bf32 training path error (#739) * Fix a subtle 'E275' style issue that causes unknown behavior (#742) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * rearrange the paragraphs and fix Markdown headers (#744) * Align Transformers version for BERT models (#738) * align transformer version(4.18) for bert models * change scripts to legacy * redo calibration * patch fix * Update README.md (#746) * Add support for stock PYT- object detection models (#732) * stock PYT and windows support for object detection models * Weizhuoz/reduce model zoo steps (#762) * reduce steps for bert-base, roberta, fpn models * modify max_iter for fpn models * reduce all img classification models steps * update new config for bert models (#763) * Addin Scipy for TensorFlow serving SSD-MobileNet model (#764) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update TF ResNet50v1.5 inference for SPR (baremetal) (#749) * Added matplotlib dependency to image_segmentation requirements (#768) * Update readmes for the path to output directory (#769) * update wide & deep readme for the path to pretrained model directory (#771) * add a check for ubuntu 22.04 support (#721) * Changes to add bfloat16 support for DIEN training (#679) * Changes to add bfloat16 support for DIEN training * Some for for reporting performance * Fixes for dien training and unit tests * updated tpp file withr2.8 approvals (#773) * Add Windows stock PyTorch support for TransNet v2 (#779) * update TransNet v2 to work with stock pytorch * update Windows.md path in all relevant docs * add P99 metric for LZ models (#780) Co-authored-by: Weizhuo Zhang <[email protected]> * Rn50 training multiple epoches output 1 KPI and add training_steps argument. (#775) * enable --training_steps and 1 training KPI output with multiple epoches * add prefix * update print freq * fix display bug * enable PyTorch resnet50 fp16 path (#783) * enable PyTorch resnet50 fp16 path * fix conflict * Extract p99 metric from log to summary (#784) * enable fp16 bert train and inference (#782) * Vruddarr/pt update windows readmes (#778) * remove bfloat16 experimental support note (#786) * Update IPEX installation path (#788) * Clean up _pycache_ files, remove symlinks, and add license headers for dien training bf16 (#787) * update readme for jemalloc and iomp path (#789) * update readme for jemalloc and iomp path * Updated IOMP path as path to the intel-openmp directory * PyTorch: fix resnext101 running script (#795) * Update 3dunet mlperf bash scripts and README (#797) * update 3dunet mlperf doc to use quickstart scripts, rename quickstart scripts for multi-instance * fix tests job (#803) * rnnt inference: align replace lstm API due to IPEX change (#802) * Adding quick start scripts to MobileNetV1 bfloat16 precision (#793) * Adding quick start scripts to MobileNetV1 bfloat16 precision * Adding executable permissions to files * Adding aikit.md to docs file * updated the comments on readme Co-authored-by: veena.mounika.ruddarraju <[email protected]> * Adding quick start scripts to ssd-mobilenet bfloat16 precision (#798) * Adding quick start scripts to ssd-mobilenet bfloat16 precision * changed file permissions * Updated comments on readme file Co-authored-by: veena.mounika.ruddarraju <[email protected]> * Update T5 model with windows quick start scripts (#790) * Update T5 model with windows quick start scripts * Updated Readme by specifying values to environment variables * Update inference int8 readme and script of 4 CV models using INC (#698) * update docs to add INC int8 models as an option * add instructions for how to quantize a fp32 model using INC * rnnt: fix stft due to PyTorch API change (#811) * rnnt training: fix stft due to PyTorch API change (#813) * Update BareMetalSetup.md (#817) * Gerardod/build container (#807) First phase of GHA WF to build the image of a Model Zoo workload container and push it to CAAS. * Sharvils/tf workload (#808) * TFv2.10 support added. Horovod version updated. * Vruddarr/tf add language translation bert fp32 quick start scripts (#804) * Adding quick start scripts to language translation BERT FP32 model * Corrected typo errors * Changed path to the Readme * Adding spec file <bert-fp32-inference_spec.yml> * Update spec file and model link in Readme tables * Update Readme path in windows.md * Updated TL notebooks for SPR Launch (#810) * Updates for TL PyTorch notebook * Edits for two more TL notebooks * Reverting previous change for virtualenv * Removed --no-deps and some nonexistent links * Added TFHub cache dir * Updated TL notebook README for legal/branding * Update typo in Readme (#821) * PyTorch: using ipex.optimize for bf16 training (#824) * Fix CVEs for Pillow and notebook packages (#831) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * add intel-alphafold2 optimized w/ IPEX from realm of AIDD (#737) * add alphafold2 from AIDD realm * Remove unused variable in mlperf 3DUnet performance run (#832) * Update Model Zoo name, Python version and message for IPEX (#833) Co-authored-by: veena.mounika.ruddarraju <[email protected]> * Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updt… (#830) * Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updting the readme by replacing conda with Miniconda. * Adding comment to install torch in BareMetalSetup.md * Update models main tables (#836) *update main readmes * Adding jemalloc instructions and environment variables (#838) * DLRM hybrid gradient product (#814) * enable hybrid mergedembedding * Hybrid Merge embedding * refine code * Update model file * Fix data loader issue for distributed trianing * Update the print info * Fix lr issue for sparse table both 2/8 ranks get convergenced with 0.75 epochs Co-authored-by: root <[email protected]> * update the TTT evaluation method by excluding dataloader & metric evaluation (#844) Co-authored-by: Zhang, Liangang <[email protected]> * PyTorch: resnet50 distributed training using lars optimizer (#826) * modify dlrm's sklearn metric eval func to ipex's multi-thread version (#850) * modify recall/precision/f1/ap 's eval as optional (#856) * Port dataloader optimization for distributed training of dlrm (#847) * update the TTT evaluation method by excluding dataloader & metric evaluation * port dataloader optimization for distributed training of dlrm * modify dlrm's sklearn metric eval func to ipex's multi-thread version (#850) * modify recall/precision/f1/ap 's eval as optional (#856) * port dataloader optimization for distributed training of dlrm * delete local bs computation in evaluation stage * modify the TTT output name Co-authored-by: Zhang, Liangang <[email protected]> * Update horovod version to fix run time failure due to Status call (#859) * fix regression for dlrm single node training (#864) Co-authored-by: Weizhuo Zhang <[email protected]> * Update pytorch model zoo table of BF32 with landing zoo models (#865) * Added SNYK scan (#855) * Update SSD-ResNet34 code in start.sh(#862) * Add Distilbert base model for inference (Tensorflow) to model zoo (#815) * Add fp32 inference for distilbert base model * Fix Bert spec file (#873) * 1) Add torch.profiler (#871) 2) change the distributed_training.sh for dlrm to diamond cluster * Update Wide & Deep docs (#875) * The copy of #867(Porting evaluation iteration overlapping) (#876) * port evaluation overlapping * remove debug code * remove debug code * remove unused code * remove unused code * add resnet50 distributed training script (#879) * add resnet50 distributed training script * collect TTT Co-authored-by: XiaobingSuper <[email protected]> * reduce redundant bus traffic (#880) * Port all_to_all index overlapping with interaction and top mlp. (#878) * port all_to_all index overlapping with interaction and top mlp * fix seg fault * Add int8 support for distilbert (#823) * Add fp32 inference for distilbert base model Co-authored-by: syedshahbaaz <[email protected]> * Update DIEN inference docs & quickstart scripts (#869) * Update DIEN docs * update for spr ww42 Co-authored-by: WafaaT <[email protected]> * Update ResNet50v1.5 docs (#820) * Update and Validate ResNet50v1.5 Inference and training model for TF SPR * Update and validate docs for TF SPR Co-authored-by: WafaaT <[email protected]> * Update Wide & Deep using Large Dataset docs (#877) * Vruddarr/tf bfloat32 precision check (#893) * Update Wide and Deep Large Dataset Training Model docs (#881) * Vruddarr/tf update image recognition models docs (#816) * Update Inceptionv3,DenseNet 169, Inceptionv4, ResNet50, ResNet101, MobileNet V1 quickstart scripts and docs * Update and validate MobileNet v1 for TF SPR Co-authored-by: WafaaT <[email protected]> * Fix BFloat32 precision check code for Resnet50v1.5 training model (#894) * Update 3DUNet MLperf for SPR (#889) * Updated Bert Large SPR READMEs (#887) * Updated Bert Large SPR READMEs * Included tensorflow and keras versions * Updated bert large README for spr * Updated scripts and README as per reviews * Update SPR quickstart description * updated to downloaded bert checkpoints * Fix typos in MobilenetV1 scripts (#899) * modify time function to solve int8 benchmark issue on windows (#898) * modify time function to solve int8 benchmark issue on windows * Replace the time.time function calls to time.perf_counter to improve the time statistic resolution. Updated for the additional 5 models Co-authored-by: Ying <[email protected]> * Update DIEN Training docs (#882) * Adding permissions to scripts in DIEN and correcting pb file paths in README_SPR_baremetal (#901) * Adding SPR_baremetal_readme and fixing model paths in the tables (#904) * fix acc test for single node (#903) * fix acc test for single node * Update dlrm_s_pytorch.py Co-authored-by: Weizhuo Zhang <[email protected]> * commit cherry-picks from r2.9 (#900) * update tbb files (#843) * fix vulnerability issues reported by snyk scans (#848) * upgrade for ipex 1.13 * Update Pillow to '>=9.3.0' (#884) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix some bugs for p99 (#909) * Update tensorflow benchmarks to use latest horovod commit (#908) * Update start.sh * Update start.sh * Update to use shortened commit hash * do not convert data to bf16 while using fp32 and bf32 (#911) Co-authored-by: Weizhuo Zhang <[email protected]> * Update SSD-Resnet34 training docs for SPR task (#914) * Update SSD-Resnet34 training & docs for SPR * Vruddarr/tf update ssd mobilenet docs (#846) * Update quick start scripts and spec file to run for all precisions * Update and validate SSD-Mobilenet docs for TF SPR Co-authored-by: WafaaT <[email protected]> * fix print issue (#915) Co-authored-by: Weizhuo Zhang <[email protected]> * Update rfcn docs to use same quick start scripts (#897) * Update rfcn docs to use same quick start scripts Co-authored-by: WafaaT <[email protected]> * Sharvils/spr ssd training (#917) * Dockerfile updated * Update SSD-ResNet34 Inference docs (#866) * Update ResNet34 Inference to use same scripts & docs for all precisions * Update for SPR WW42 Co-authored-by: WafaaT <[email protected]> * Update transformer_mlperf scripts and README fro SPR WW42 (#891) Co-authored-by: Wafaa Taie <[email protected]> * Update TF models spec files for SPR WW42 (#919) * update TF models spec files for spr ww42 * update docker partial for tf addons version * workaround rdma config for spr (#925) * remove supported OS checks (#926) * Update Model paths in main readme (#928) * Remove Linux/windows OS platform support checks (#927) * update resnet50 distributed training script (#923) * resnet50 distributed training: use logical core for ccl (#930) * Update bert scripts to add same quick start scripts to all precisions (#910) * Update MobilenetV1 SPR docs (#931) * Update Resnet50v1_5_SPR_docs (#934) * Update SSD-Mobilenet SPR docs (#935) * Update Resenet50v1.5 inference SPR docs (#933) * Fix DIEN inference.sh script and add pretrained model env var in mobilenetv1 SPR baremetal readme (#939) * Update DIEN Inference and Training SPR docs (#937) * Update SSD-Resnet34 training SPR docs (#936) * Update SSD-Resnet34 Inference SPR docs (#938) * Update README_SPR_baremetal.md remove steps and warm_up steps env vars Co-authored-by: Wafaa Taie <[email protected]> * BERT training dockerfile fixed (#921) * BERT repo version fixed for SPR container (#920) * Update spr baremetal instructions for 3dunet, bert large and transformer mlperf (#932) * Update Transformer MLPerf inference docs for pre-trained models (#940) * Fix Language Translation BERT quickstart scripts (#941) * fix scripts to detect the number of cores * Update mlperf_gnmt docs (#945) * Updating Transformer_LT_official scripts (#913) * Add support for dGPU models (#840) (#948) * Add support for dGPU models (#840) * upgrade Pillow version for Yolov4 * Update main README.md (#947) * update main readme * edit transformer_mlperf and bert SPR docs * remove workflows * Fix CVEs based on Snyk scans in TL notebooks (#951) * fix snyk critical issues in TL jupyter notebooks * Remove INC dependency for Snyk issues (#953) * removed neuralcompressorfor to avoid vulnerability in Snyk scans * Remove pointers to BERT Large int8 docs (#952) * fix int8 model link (#958) * Fixed num_intra_threads for bfloat16 (#959) (#960) * Fixed num_intra_threads for bfloat16 * Modified open mpi instructions * Added kmp_blocktime for bfloat16 Co-authored-by: mahathis <[email protected]> * Fix syntax error and pythonpath in ssd-resnet34 training (#962) (#965) Co-authored-by: Veena2207 <[email protected]> * fix training bkms (#967) (#968) * fix T5 inference script (#969) * Fix resnet50v1.5 weightsharing for int8 (#996) * Corrected typo in SPR quickstart scripts (#991) * fix model_init for int8 weightsharing --------- Co-authored-by: mahathis <[email protected]> * TF SPR DevCatalog READMEs (#983) * add image recognition devcats * add tf object detection devcats * add TF language translation devcats * add tf image segmentation devcats * add tf language modeling devcats * add recommendation tf devcats * fix swapped containers and precision in run command * add README_SPR to all getting started links and correct script names * rename files and point getting started to itself * fix last link * fix minor error (#994) * Update TF SPR ww42 containers partials, spec-files and dockerfiles (#998) TF SPR Containers Built and Validated * Sharvils/tf devcats fixes (#995) Minor fixes to SPR TF DevCatalogs --------- Co-authored-by: sharvil.shah * SPR PyTorch DevCatalogs (#993) Added Devcatalog files targeting SPR container launch * Delete SPR containers README_SPR.md (#999) * delete README_SPR.md * remove references in spec-files * fix for auto-merge --------- Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: YanbingJiang <[email protected]> Co-authored-by: jianan-gu <[email protected]> Co-authored-by: Weizhuo Zhang <[email protected]> Co-authored-by: Dina Suehiro Jones <[email protected]> Co-authored-by: Melanie Buehler <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: leslie-fang-intel <[email protected]> Co-authored-by: xiaofeij <[email protected]> Co-authored-by: jiayisunx <[email protected]> Co-authored-by: zhuhaozhe <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Sean-Michael Riesterer <[email protected]> Co-authored-by: liangan1 <[email protected]> Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: blzheng <[email protected]> Co-authored-by: Om Thakkar <[email protected]> Co-authored-by: mahathis <[email protected]> Co-authored-by: Srini511 <[email protected]> Co-authored-by: Clayne Robison <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Neo Zhang Jianyu <[email protected]> Co-authored-by: ltsai1 <[email protected]> Co-authored-by: Jitendra Patil <[email protected]> Co-authored-by: Kanvi Khanna <[email protected]> Co-authored-by: Rahul Nair <[email protected]> Co-authored-by: Veena2207 <[email protected]> Co-authored-by: jojivk-intel-nervana <[email protected]> Co-authored-by: xiangdong <[email protected]> Co-authored-by: Huang, Zhiwei <[email protected]> Co-authored-by: gera-aldama <[email protected]> Co-authored-by: Sharvil Shah <[email protected]> Co-authored-by: wyang2 <[email protected]> Co-authored-by: Yimei Sun <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: tangleintel <[email protected]> Co-authored-by: Syed Shahbaaz Ahmed <[email protected]> Co-authored-by: Er-Xin (Edwin) Shang <[email protected]> Co-authored-by: Ying <[email protected]> Co-authored-by: sevdeawesome <[email protected]> Co-authored-by: DiweiSun <[email protected]> Co-authored-by: Tyler Titsworth <[email protected]> Co-authored-by: Srikanth Ramakrishna <[email protected]>
1 parent f6bd1ea commit 362bd03

File tree

121 files changed

+6273
-1766
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

121 files changed

+6273
-1766
lines changed

benchmarks/image_recognition/tensorflow/resnet50v1_5/inference/int8/model_init.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,15 @@ def run_benchmark_or_accuracy(self):
7777
cmd = os.path.join(
7878
self.args.intelai_models, self.args.mode,
7979
"eval_image_classifier_inference_weight_sharing.py")
80-
if self.args.gpu:
81-
cmd = os.path.join(
82-
self.args.intelai_models, self.args.mode, self.args.precision,
83-
"eval_image_classifier_inference.py")
8480
else:
85-
cmd = os.path.join(
86-
self.args.intelai_models, self.args.mode,
87-
"eval_image_classifier_inference.py")
81+
if self.args.gpu:
82+
cmd = os.path.join(
83+
self.args.intelai_models, self.args.mode, self.args.precision,
84+
"eval_image_classifier_inference.py")
85+
else:
86+
cmd = os.path.join(
87+
self.args.intelai_models, self.args.mode,
88+
"eval_image_classifier_inference.py")
8889

8990
cmd = self.get_command_prefix(self.args.socket_id) + self.python_exe + " " + cmd
9091

benchmarks/language_modeling/tensorflow/bert_large/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,6 @@
33
The following documents have instructions for running BERT large:
44
* [BFloat16 Inference](/benchmarks/language_modeling/tensorflow/bert_large/inference/bfloat16/README.md)
55
* [FP32 Inference](/benchmarks/language_modeling/tensorflow/bert_large/inference/fp32/README.md)
6+
* [Int8 Inference](/benchmarks/language_modeling/tensorflow/bert_large/inference/int8/README.md)
67
* [BFloat16 Training](/benchmarks/language_modeling/tensorflow/bert_large/training/bfloat16/README.md)
78
* [FP32 Training](/benchmarks/language_modeling/tensorflow/bert_large/training/fp32/README.md)
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
<!-- 50. Launch benchmark instructions -->
2+
Once your environment is setup, navigate to the `benchmarks` directory of
3+
the model zoo and set environment variables for the dataset, checkpoint
4+
directory, frozen graph, and an output directory where log files will be written.
5+
```
6+
cd benchmarks
7+
8+
export DATASET_DIR=<path to the squad dataset>
9+
export CHECKPOINT_DIR=<path to the pretrained model checkpoints>
10+
export PRETRAINED_MODEL=<path to the frozen graph .pb file>
11+
export OUTPUT_DIR=<directory where log files will be saved>
12+
```
13+
14+
<model name> <mode> can be run in three different modes:
15+
16+
* Benchmark
17+
```
18+
python launch_benchmark.py \
19+
--model-name=bert_large \
20+
--precision=int8 \
21+
--mode=inference \
22+
--framework=tensorflow \
23+
--batch-size=32 \
24+
--data-location $DATASET_DIR \
25+
--checkpoint $CHECKPOINT_DIR \
26+
--in-graph $PRETRAINED_MODEL \
27+
--output-dir $OUTPUT_DIR \
28+
--docker-image <docker image> \
29+
--benchmark-only \
30+
-- infer_option=SQuAD
31+
```
32+
* Profile
33+
```
34+
python launch_benchmark.py \
35+
--model-name=bert_large \
36+
--precision=int8 \
37+
--mode=inference \
38+
--framework=tensorflow \
39+
--batch-size=32 \
40+
--data-location $DATASET_DIR \
41+
--checkpoint $CHECKPOINT_DIR \
42+
--in-graph $PRETRAINED_MODEL \
43+
--output-dir $OUTPUT_DIR \
44+
--docker-image <docker image> \
45+
--accuracy-only \
46+
-- infer_option=SQuAD
47+
```
48+
* Accuracy
49+
```
50+
python launch_benchmark.py \
51+
--model-name=bert_large \
52+
--precision=int8 \
53+
--mode=inference \
54+
--framework=tensorflow \
55+
--batch-size=32 \
56+
--data-location $DATASET_DIR \
57+
--checkpoint $CHECKPOINT_DIR \
58+
--in-graph $PRETRAINED_MODEL \
59+
--output-dir $OUTPUT_DIR \
60+
--docker-image <docker image> \
61+
--accuracy-only \
62+
-- infer_option=SQuAD
63+
```
64+
65+
Output files and logs are saved to the ${OUTPUT_DIR} directory.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
<!-- 70. Model args -->
2+
Note that args specific to this model are specified after ` -- ` at
3+
the end of the command (like the `profile=True` arg in the Profile
4+
command above. Below is a list of all of the model specific args and
5+
their default values:
6+
7+
| Model arg | Default value |
8+
|-----------|---------------|
9+
| doc_stride | `128` |
10+
| max_seq_length | `384` |
11+
| profile | `False` |
12+
| config_file | `bert_config.json` |
13+
| vocab_file | `vocab.txt` |
14+
| predict_file | `dev-v1.1.json` |
15+
| init_checkpoint | `model.ckpt-3649` |
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
<!--- 0. Title -->
2+
<!-- This document is auto-generated using markdown fragments and the model-builder -->
3+
<!-- To make changes to this doc, please change the fragments instead of modifying this doc directly -->
4+
# BERT Large Int8 inference - Advanced Instructions
5+
6+
<!-- 10. Description -->
7+
This document has advanced instructions for running BERT Large Int8
8+
inference, which provides more control over the individual parameters that
9+
are used. For more information on using [`/benchmarks/launch_benchmark.py`](/benchmarks/launch_benchmark.py),
10+
see the [launch benchmark documentation](/docs/general/tensorflow/LaunchBenchmark.md).
11+
12+
Prior to using these instructions, please follow the setup instructions from
13+
the model's [README](README.md) and/or the
14+
[AI Kit documentation](/docs/general/tensorflow/AIKit.md) to get your environment
15+
setup (if running on bare metal) and download the dataset, pretrained model, etc.
16+
If you are using AI Kit, please exclude the `--docker-image` flag from the
17+
commands below, since you will be running the the TensorFlow conda environment
18+
instead of docker.
19+
20+
<!-- 55. Docker arg -->
21+
Any of the `launch_benchmark.py` commands below can be run on bare metal by
22+
removing the `--docker-image` arg. Ensure that you have all of the
23+
[required prerequisites installed](README.md#run-the-model) in your environment
24+
before running without the docker container.
25+
26+
If you are new to docker and are running into issues with the container,
27+
see [this document](/docs/general/docker.md) for troubleshooting tips.
28+
29+
<!-- 50. Launch benchmark instructions -->
30+
Once your environment is setup, navigate to the `benchmarks` directory of
31+
the model zoo and set environment variables for the dataset, checkpoint
32+
directory, frozen graph, and an output directory where log files will be written.
33+
```
34+
cd benchmarks
35+
36+
export DATASET_DIR=<path to the squad dataset>
37+
export CHECKPOINT_DIR=<path to the pretrained model checkpoints>
38+
export PRETRAINED_MODEL=<path to the frozen graph .pb file>
39+
export OUTPUT_DIR=<directory where log files will be saved>
40+
```
41+
42+
BERT Large inference can be run in three different modes:
43+
44+
* Benchmark
45+
```
46+
python launch_benchmark.py \
47+
--model-name=bert_large \
48+
--precision=int8 \
49+
--mode=inference \
50+
--framework=tensorflow \
51+
--batch-size=32 \
52+
--data-location $DATASET_DIR \
53+
--checkpoint $CHECKPOINT_DIR \
54+
--in-graph $PRETRAINED_MODEL \
55+
--output-dir $OUTPUT_DIR \
56+
--docker-image intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f \
57+
--benchmark-only \
58+
-- infer_option=SQuAD
59+
```
60+
* Profile
61+
```
62+
python launch_benchmark.py \
63+
--model-name=bert_large \
64+
--precision=int8 \
65+
--mode=inference \
66+
--framework=tensorflow \
67+
--batch-size=32 \
68+
--data-location $DATASET_DIR \
69+
--checkpoint $CHECKPOINT_DIR \
70+
--in-graph $PRETRAINED_MODEL \
71+
--output-dir $OUTPUT_DIR \
72+
--docker-image intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f \
73+
--accuracy-only \
74+
-- infer_option=SQuAD
75+
```
76+
* Accuracy
77+
```
78+
python launch_benchmark.py \
79+
--model-name=bert_large \
80+
--precision=int8 \
81+
--mode=inference \
82+
--framework=tensorflow \
83+
--batch-size=32 \
84+
--data-location $DATASET_DIR \
85+
--checkpoint $CHECKPOINT_DIR \
86+
--in-graph $PRETRAINED_MODEL \
87+
--output-dir $OUTPUT_DIR \
88+
--docker-image intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f \
89+
--accuracy-only \
90+
-- infer_option=SQuAD
91+
```
92+
93+
Output files and logs are saved to the ${OUTPUT_DIR} directory.
94+
95+
<!-- 70. Model args -->
96+
Note that args specific to this model are specified after ` -- ` at
97+
the end of the command (like the `profile=True` arg in the Profile
98+
command above. Below is a list of all of the model specific args and
99+
their default values:
100+
101+
| Model arg | Default value |
102+
|-----------|---------------|
103+
| doc_stride | `128` |
104+
| max_seq_length | `384` |
105+
| profile | `False` |
106+
| config_file | `bert_config.json` |
107+
| vocab_file | `vocab.txt` |
108+
| predict_file | `dev-v1.1.json` |
109+
| init_checkpoint | `model.ckpt-3649` |
110+
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright (c) 2021 Intel Corporation
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
#
18+
# SPDX-License-Identifier: EPL-2.0
19+
#
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"optimization_parameters": {
3+
"KMP_AFFINITY": "fine,verbose,compact,1,0",
4+
"KMP_BLOCKTIME": 1,
5+
"KMP_SETTINGS": 1
6+
}
7+
}
8+

0 commit comments

Comments
 (0)