1.3 release

AMD / ROCM Changes:

Improve hip-clang support in build_amd.py (23835).
For int64_t atomicAdd, use the available compiler builtin on ROCm. (24854).
Use correct WARP_SIZE for ROCm for EmbeddingBag ((24868).
Switch to rocThrust for thrust/cub APIs (25620).
rocBLAS deprecated the last two parameters. (25726).
Enable two tests that were skipped b/c of rocThrust bugs fixed in ROCm 2.7 (25724).
Enable jit fusion on ROCm (22872).
Remove NULL arguments that have been marked deprecated by rocBLAS (25866).
Make sparse coalesce warp size aware (25918).
Make spatial depthwise convolution warp size aware (25922).
Make lookup table warp size aware (25926).
Make persistent softmax WARP_SIZE aware. (25937).
Enable unit tests (25963).
Enable Unique operator tests on ROCm (26046).
Enable more mGPU tests (26055).
Make regular softmax warp size aware (25956).
Disable test_cuda.test_stream_event_nogil on ROCm (26087).
Use MIOpen for transpose convolutions (26172).
Switch to the new profiler infrastructure (26174).
Enable basic GPU profiling capability on ROCm. (26300).
Fix compiler unwrapping step in jenkins build scripts for Caffe2/PyTorch on ROCm (25409).
Split PyTorch ROCm tests as 2 CI jobs to run in parallel (26380).
Puts ROCm tests on default stream (26394).

Bug Fixes:

at::view create an empty tensor and set storage instead of clone (23452).
Fix set_grad for extension backends (23516).
torch.is_pinned pin_memory should not copy on already pinned tensors (23484).
Fix gemm call for CUDABlas for THCUNN conv, #23545 (23552).
Fix CTC loss for zero-length targets on GPU (23298).
Adam implementation minor fix (23737).
Add flag to temporarily disable MKL-DNN conv (23837).
Fix test TestCuda.test_streams_multi_gpu_query (23912).
Fix dataloader._shutdown_workers if not all workers are started (23761).
Fix crash on torch.Tensor.repeat() for 0 repeats (23766).
Fix master (24003).
Remove numpy assert that fails on Windows (older numpy versions). (24012).
Add missing include header in tensor_numpy.cpp (24042).
Fix tensor construction from array (24283).
Skip broken test (24453).
Fix Typing Error for Padding with asymmetric signatures (24895).
Avoid race condition in intrusive_ptr.reset_() (24464).
Temporarily fix hub SSL cert issue (25042).
Fixes test_equal (25275).
CUDA_KERNEL_LOOP: prevent int overflow in loop increment. (24818).
Issue #24962: Fix cuda method to support "None" arg for device and a … (25018).
Multiple fixes to test_c10d.py. (25334).
Attempt to fix windows build (25450).
Fix bug in assertNotEqual for int tensors (25412).
Fix pow precision (25476).
Fix 'in' return true incorrectly (24156).
Fix Windows build (26246).
Fix CI (26250).
Fix no auto batching bugs: cannot bulk load; not work with namedtuple (26065).
Fix cdist gradient computation if first arg is 1xn (26254).
Fixes big endian arch bugs. (26383).
Fix CI (26593).
Fix annotation regex for flake8 (26694).
Fix to operate on cuda kernel with clang and libc++ (25553).
Do not call cpuinfo_initialize() on other than x86 arch. (26265).
Fix Vec256::abs() for floating point when applied on -0.0 (26422).

Build / CI:

Refactor the pytorch_doc_push_script to take a branch (23556).
Let user be able to change MKLDNN "-m" flags back and forth in subsequent builds (23608).
Fix CPU-only binary testing by properly installing cpu-only first. (23611).
Omit local version identifier for default configuration. (23654).
add setup metadata to help PyPI flesh out content on pypi package page (22085).
Reduce input sets for tests to speed them up. (23692).
add appropriate install_requires (23722).
cpu binary builds are built with cu100 docker image now instead of cu80 (23772).
allow INSTALL_TEST to pass through from env to cmake (23793).
Remove unnecessary fetch and reset on builder checkout. (23792).
Add CUDA 10.1 to CI. (23791).
Remove nightly suffix from nightlies; upload to pytorch-nightly. (23752).
Delete Travis CI config (23788).
Rename cpu-only to cpuonly, as dash features are not supported. (23879).
Roll master to 1.3.0 (23895).
No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (23806).
Add python_requires to help pip (23863).
Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. (23568).
Fix build failure on OSX (23998).
Don't add local version to Conda packages. (24014).
print clang tidy output to stderr (24052).
When matching a line in CMakeCache.txt, ensure A=B and "A"=B are matched (23745).
Move dict_test.cpp to test folder and fix dict_test.cpp for Aten includes (24071).
Build option USE_NUMA should only show up on Linux. (23673).
Do not force USE_SYSTEM_EIGEN_INSTALL to be OFF in Python build scripts (23990).
Suppress implicit-fallthrough warning on g++ >= 7 in caffe2/utils/math_cpu.cc (24053).
Send flake8 to stderr (24100).
Move iOS.cmake to the cmake folder (24029).
Ignore bugprone-lambda-function-name in clang-tidy. (24190).
Ignoring the test logs in case the tests are ran from the parent directory (24212).
Remove escape_path in our build system. (24044).
Enable QNNPACK for iOS (24030).
Fix Z7_MSVC_OVERRIDE for C source files (24389).
Fix Caffe2 Windows build by switching to ninja. (24330).
Configure pytorch-probot (24423).
Fix CUDNN location related build issue on Antergos Linux (based on Arch) (24300).
Set CUDA arch correctly when building with torch.utils.cpp_extension (23408).
Move the search of cuDNN files to FindCUDNN.cmake. (24293).
Ensure proper file executable permissions in CI. (24214).
Respect pre-defined DOCKER_IMAGE value in binary_populate_env.sh (24787).
Remove support for old architectures in cpp_extension and CMake (24442).
Build libtorch binary with new ABI (23908).
Fix cmake backslash syntax error on Windows. (24420).
Move the detection of cuDNN to FindCUDNN.cmake (24784).
Attempt to fix windows build. (24916).
Move CPU-only jobs to xenial (24506).
Skip setting CUDA_NVCC_EXECUTABLE if CACHE_WRAPPER_DIR not set. (25006).
disable custom class logic for mobile build to avoid rtti (24994).
Turn off fbgemm for libtorch android build (25113).
Fix clang-tidy failing all the time on random lines (25078).
Fix clang-tidy failing on master (25121).
Fix lint checker breakage caused by #25111 (25122).
Update QNNPACK submodule to 901e9d4 (25044).
Add a skip_override option to should_run_job.py (25118).
Switch hub to use requests because of SSL (25083).
Ensure tests get passed on Windows (25145).
prevent generating caffe2::mkl for multiple times (25167).
Add myself as a CODEOWNER for better discoverability (25231).
Move the detection of cuDNN to FindCUDNN.cmake (24938).
Specify width for st.floats in hypothesis_utils.tensor (25188).
Add USE_CUDNN check to AT_CUDNN_ENABLED definition (25037).
Disable cuda_distributions_test and converter_nomigraph_test on Windows. (25305).
Re-enable libtorch tests on Windows (25377).
Upgrade to circleci version 2.1 configs (25336).
Fix binaries build for BUILD_CAFFE2_MOBILE=OFF (25229).
Skip useless macros from Windows.h (25444).
Add windows docs for the binaries (23150).
Turn off warnings on Windows CI. (24331).
Parameterize CircleCI config (25446).
Remove BUILD_ATEN_ONLY build option (24441).
Fix windows build error when TBB enabled and Windows SDK installed (25398).
Remove PYTHON_VERSION (25494).
Remove MULTI_GPU (25509).
Add set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "-Wl,--no-as-needed") to CMakeLists.txt (25445).
Clean up binaries/cmake for mobile (25651).
Move USE_STATIC_DISPATCH from CI script to master cmake (25696).
Do not pass down USE_GLOO_IBVERBS to CMake (25720).
Correctly gate CUDA_ARCH with defined() (25729).
Fix cudnn static linkage (25848).
Fix invalid function cast warnings that show up with GCC 8/9 (25483).
Upgrade NVIDIA driver on CI to 430.40 (24242).
Remove tools/setup_helpers/dist_check.py (25879).
Remove pthreadpool dependency in aten/CMake (25894).
Remove protobuf from Dependencies.cmake for libtorch mobile build (25958).
Fix typo in OpenBLAS cmake detection (25966).
Simply code generation - phase 1 (25961).
Remove pthreadpool.a from install directory (25977).
Remove trailing whitespace in CircleCI configuration files (25987).
Change brew update logic to run much faster (25988).
Refactor macOS build and test (25930).
Run PyTorch macOS CPU-only build/test on all PRs (26096).
Use CircleCI commands for brew update/install (26159).
Turn should_run_job into command (26160).
Turn setup_linux_system_environment into command (26162).
Turn setup_ci_environment into command (26163).
Nightly build for for iOS (26074).
Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (26306).
Fix remaining invalid function cast warnings that show up with GCC 8/9 (26104).
Rebase CircleCI to master if it is gcc5_4 (26321).
Emergency Docker upgrade to version 347. (26466).
Use github actions for flake8 (25824).
Add a CI Job to Check BC Changes in Function Schemas (26329).
prevent generating caffe2::mkldnn for multiple times (25257).
Add namedtensor build & tests to default sets (26633).
Fix github actions for forked PRs (26562).
Remove tools/setup_helpers/cudnn.py (25876).
Allow building docker without torchvision (26168).
Validate Docker version in CI. (26496).
Fix CI docker builds (26704).
Cuda101 upgrade (26823).
Fix building with PARALLEL_BACKEND=NATIVE_TBB (26742).
Fix typo in job name: nigthly->nightly (26881).
Get rid of -u (expansion of undefined variable) setting (26907).
Switch internal CUDA build to C++14 (26757).
No sccache (26059).
Fix c10 registration binary size (26827).
Improve binary size of function schema inference (26860).
Fix shared_ptr binary size in op registration (26869).
Fix binary size in schema inference (26878).
Switch nightly jobs to trigger on 'nightly' branch rather than cron. (26830).

Caffe2:

Add Cast Op (23548).
Remove the confused CPU op (23575).
Remove ONNX & Turn on NO_API for mobile build (23546).
Include protobuf-defined outputs in the graph cutting algorithm (23557).
Support Copy Op (23705).
Format only change (23685).
Add LambdaRank DCG Loss Option (23679).
Fix the bug in regularizer matching (23485).
Fix SliceGradientOp to handle properly empty batches (23784).
Set caffe2_tvm_min_ops to 8 (23893).
Support Gather different indices for different examples in one batch (23813).
Add aligned option to RoIAlign (23706).
Minor comment fix (22140).
SumOp for int32 (23995).
Fix typo "properlyh" (24067).
OpenCV 4 compatibility fix for caffe2/video (24143).
Implement virtual memory computation in caffe2_benchmark binary (24144).
Add support for using caffe2::ThreadPool in pytorch mobile QNNPACK. (23658).
Hypothesis tests: add ability to enforce shape inference (23935).
Make hashing default for bucket-weighted pooling (24266).
Fix rotated rect intersection error (24171).
Format changes (24270).
C2/glow: assign net_pos to a net before applying onnxifi_blacklist_ops (24262).
Return list of AccessedFeatures from get_accessed_features (23983).
Register FC/Conv DNNLowp separately for supporting both tensor type (24361).
Refactor and expose metadata of tum_history layer for online prediction (24290).
Put ParseBlackListOps() into caffe2::glow namespace (24384).
Implement gradient operator for GatherByKeys. (24348).
Add BPR loss to TTSN (24439).
Remove gradient value as input from SparseNormalize op (24357).
BlackBoxPredictor OSS part N + 1 : strip fb/predictor/Transforms.h dependency (#23350) (23350).
Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (24944).
Support focal loss in MTML.
Implementation of cyclical learning rate (23914).
register HeatmapMaxKeypoint with C10 (25191).
Add the sparse feature information during logging in sparse lookup layer (24863).
Add Int8Transpose operator (16382).
Relax roi_width/roi_height check to non-negative (260).
Disable Int8Transpose test.
Format sparse_lengths_sum_benchmark (25529).
Add options to flush cache in SLS benchmarks (25530).
Change shape for some ops to reduce variance (25619).
Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (25519).
Enable PiecewiseLinearTransform test on ROCm (25632).
Add requests as a legit dependency (25596).
Change shape for some ops to reduce variance (25686).
Move GetDimFromOrderString to caffe2/core/types.h (25671).
Make SparseNormalize backwards compatible (25660).
Cyclical learning rate multiplier: use fabs(base_lr) (25628).
Remove caffe2.pb.h dependency for embedding_lookup_idx.cc (25670).
Enable loading int8 prepacked models in PredictorContainer.
Get rid of protobuf dependencies (25650).
Fix device_option propagation (25203).
Increase input shape to reduce variance (25812).
Fix cuDnn build error with CC3.0 platform(#25820) (25825).
Remove cosh_ op test (25893).
Enable variable size embedding (25782).
Add assert to ensure the divisor is not 0 (25960).
Increase failure threshold for timing based assert (25867).
Better error messages in C2 ONNX backend (25809).
Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (25959).
Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (26080).
Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (25970).
Guard dyndep with a lock (26153).
Upgrade Caffe2 docker images to 306 to include roctracer and rocprofiler (26260).
Average Pooling 3D AVX2 Implementation (26111).
Back out "Back out "[Caffe2] Fix device_option propagation"" (25908).
Support unpickle py2 NetDef object in py3 (26147).
Tvm operator dynolog (26295).
Add support for real4bits quant (25426).
Add DimType info in dumped debug nets (26589).
BlobReference getattr can only throw AttributeError (26654).
"fixing" gcc bug introduced with cuda 10.1 (26445).
Whitelist ATen/core sources and headers for Caffe2 (26609).
Adding OpProfile proto into ProfDAGProtos to support storing operation cost (26677).
Use new fbgemm PackedDepthWiseConvMatrix without template parameter (26760).
Rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool.
Enable batch_size = 0 support in DNNLOWP Concat operator (26849).
Use new depthwise conv fbgemm interface (26898).
Fix the weird bug in control_flow_op_test.py (26931).
Disable cudnn transpose for int types (26934).
Expose PiecewiseLinearTransform to PyTorch (26903).
Remove LOG(INFO) from math_cpu.cc (27001).
Add fakefp16 transformation.

BC-Breaking

Improve handling of mixed-type tensor operations (22273).
Migrate comparison ops from the TH to Aten. Added support for type promotion. (26981).
Changed tensor comparison return type from uint8 to bool (21113).
Add align_corners option to grid_sample and affine_grid, change default to False (24931).
torch.pow Port operator from the TH to Aten (23492).
torch.flatten returns a 1-dim tensor on a 0-dim tensor (25406).
Change schedulers to chainable form (24352).
Make options.name_ private, and change all callsites to use options.name() (26419).
Remove deprecated torch.gels (26480).

C++ API Parity

Support custom autograd functions in C++ (23572).
Allow empty Variables to be saved for backwards (23618).
Tests for C++ custom autograd function API (23628).
Hooks for C++ API (24393).
C++ ModuleList (24317).
Build libtorch binary with new ABI (23908).
Templatize Tensor.data_ptr() (24847).
bind autograd functions into C++ (24342).
Deprecate tensor.data(), and codemod tensor.data() to tensor.data_ptr() (24886).
Add Python/C++ torch.nn API parity test harness (23852).
Add Python/C++ API parity tracker for torch.nn (25289).
Use constructor in test_params for C++ API parity test (25749).
Map module options between Python and C++ in API parity test (25784).
Make various improvements to C++ API parity test harness (25828).
C++ Fold nn module (24160).
Fix LBFGS on GPU (25909).
L1Loss module (25902).
C++ MaxPool Module (24860).
C++ Average Pool Module (25800).
C++ unregister_module function for Module (26088).
C++ API parity: at::Tensor::data (26008).
Re-organize C++ API torch::nn folder structure (26262).
C++ API parity: at::Tensor::grad (26150).
C++ API parity: at::Tensor::is_leaf (26186).
C++ API parity: at::Tensor::output_nr (26216).
Support multidimensional inputs to torch::tensor (26210).
Distance module (26424).
Fix options usage in C++ module / optimizer constructors (26483).
C++ API parity: at::Tensor::version (26217).
Minor improvement to C++ nn::Distance tests (26539).
C++ API parity: at::Tensor::version (26561).
C++ API parity: at::Tensor::detach (26251).
C++ API parity: at::Tensor::set_data (26647).
Add tests for C++ functional cosine_similarity and pairwise_distance, and clean up functional test code (26559).
Add C++ nn::Identity (26713).
Add comments for multidim tensor factory limitations, and rename ListInitTensor for better clarity (26756).
Improve C++ maxpool and avgpool (26521).
C++ API parity: TensorTest.Data fix (26920).
C++ API parity: AdaptiveMaxPool1d (26755).
C++ API parity: AdaptiveMaxPool2d (26772).
C++ API parity: AdaptiveMaxPool3d (26775).

Distributed

Extract common classes and functions from test_c10d to common_distributed (23660).
Sync and async torch.distributed.rpc for builtin operators (23228).
python udf over rpc (23569).
Fix naming convention inconsistency and formats in test_rpc.py (24407).
Use c10::ThreadPool to send and receive messages (23968).
Use snake names for all files in distributed.rpc (24502).
throw remote exception on client side (24138).
Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (25012).
Return a message instead of void from rpc udf (25283).
Basic framework for Distributed Autograd context. (24875).
Add missing call to DistAutogradContainer::init (25391).
Remove a unused member var (stop_) in process_group_agent (25392).
Assign each RpcAgent a unique ID, and use ID for sending RPC messages. (24195).
Cuda devices should have same dtype (25470).
Multiple fixes to test_c10d.py. (25441).
Move worker name collection code from Python to C++ (24260).
Attach 'send' autograd function to the autograd graph as part of RPC. (24876).
Error phrasing in torch.distributed helper functions (25574).
Make scatter/gather arguments optional (25575).
Run clang-format on torch/csrc/distributed (25647).
Build torch.distributed with Gloo backend on macOS (25260).
Adding RRef as return value for builtin operators (25169).
Only default USE_DISTRIBUTED=True on Linux (25725).
Adds a -m flag to pytorch.distributed.launch (24910).
Use whitelist instead of blacklist for USE_DISTRIBUTED (25759).
Change worker name constrant (25780).
Make Python RPC handler does not hold module in global variable (25458).
Retry connecting to TCP store on ECONNRESET (25707).
Make python rpc handler to be singleton class (25742).
Disable flaky test_invalid_names in test_rpc.py (25916).
Remove global group name tracking for ProcessGroupNCCL (25905).
Dynamic registration of RPC backends (25734).
Add ProcessGroupGloo::createDefaultDevice (26166).
Clarified ambiguous docstring in NegativeBinomial (25923).
Make ProcessGroupAgent take num_send_recv_threads as constructor argument (26313).
Remove extra get_worker_id call in distributed rpc init (26381).
Make distructor virtual for class with virtual function (26504).
Use timeout in connect function to prevent against (26364).
Corrected variable name and added test (26503).
Add timeout parameter to connect function in TCPStore (26554).
Added test case for reinit (26506).
Add function to get NCCL version for logging (26583).
Add bitwise distributed reduction ops (26824).
RPC Backend Registry (26919).
Acquire GIL before creating py::object in RPC python handler (26988).
Support re-creating/destroying process groups when some trainers recover after failures (26912).

Distributions

Fix log_prob() in torch.distributions.Uniform, HalfCauchy and Gamma (23017).
Implement bool_tensor.bernoulli_ (25076).
Fix CUDA distributions test on Windows (25539).
Fix the Bernoulli distribution sampler (26864).

Documentation

Fix typos in .circleci/README.md (23588).
Documentation for Tensor.record_stream() (24078).
Use prerendered KaTeX in docs. (23376).
Documentation cleanup (23148).
Fix a typo in Functions.cpp (23615).
Slightly improve dataloader docs on when auto-batching is disabled (23671).
Adjust maintainers list (23693).
Fix align_corners doc (23707).
Document empty_strided (23735).
Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
Document bool tensors for bitwise_not. (23800).
Fix typos in op_registration.h (23770).
fix torch.frac documentation (23830).
Delete placeholder so top-level CONTRIBUTING.md is used (23869).
Replace descriptions of args in doc with template (23439).
Fix docstring for argmax (23775).
Adds torch.random to docs/toc (23553).
Migration doc fixes (24033).
Updated SGD docs with subscripts (23985).
Add interfaces in lr_scheduler.pyi (23934).
Document benchmarking practice for CUDA (23910).
Documentation for Tensor.record_stream() (24078).
Use c10::ThreadPool to send and receive messages (23968).
Added .pyi file for flatten (24459).
Test if descriptions of args are in the template (24161).
Add docs to CI (24435).
Add ASAN instructions to CONTRIBUTING.md (24848).
Fixed Error in Transformer Example (24837).
Fix the lint error in transformer doc. (25027).
Typo correction in cuda_deterministic_backward.rst (25011).
Fix typo (25238).
Added documentation for nn.functional.bilinear (24951).
Describe the relation between fold and unfold operations. (24840).
logical_xor doc cleanup (25364).
Fixed flatten docs (I think) (25544).
Add copy logic for LibTorch to avoid issues on Windows (25556).
Update index.rst (24245).
Documentation for cdist (25221).
Update Transformer.py comments to include a full example (25411).
Alphabetize Package Reference section in Docs (25666).
Add CosineAnnealingWarmRestarts to optim documentation (25421).
add torch.nn.Identity to init.pyi.in (25777).
Documentation change of torch.where (25554).
Fix typo: toDense --> to_dense (25832).
Argument 't', mis-referenced to 'torch.t()' (25885).
Fix typo in dataloader.py docs. (26263).
Clarify and correct the doc of atan2. (26180).
Add warning to anomaly_mode doc (26615).
Add instructions for building documentation (26553).
Highlighting in the doc that square root comes before adding epsilon (26735).
Add documentation for overload names (23844).

Improvements

Port resize_as_ and clone from TH to Aten (23027).
support Gather different indices for different examples in one batch (23285)
Remove old Type based backend extensions (22009).
Update MKL to 2019.4 for Windows (23583).
Remove AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF, which isn't used anymore. (22932).
Rename AT_FORALL_SCALAR_TYPES_WITH_COMPLEX to AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_STUBS (23336).
Allowing batching for det/logdet/slogdet operations (22909).
Use dst dir for temp file (23629).
Add overload names to native_functions.yaml (23532).
Adam/AdamW implementation minor fix (22628).
Remove useless code from shape info (23663).
Move addcmul to Aten (22874).
Migrate neg's CUDA implementation to ATen. (23617).
Bump Gloo (23400).
Zero sized tensor support for repeat_interleave (23717).
Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (23701).
Channels last stored in tensor (23391).
Recommend ~ and bitwise_not() when user tries to apply neg (-) on a bool tensor. (23621).
Negate halves on GPU using __hneg() when possible, instead of using float conversion. (23626).
Rename previously THNN conv kernels to have naive_ prefix. (23790).
Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (23833).
Remove K and N function arguments for fbgemm_pack_quantized_matrix (22956).
cleanup torch/nn/functional.py (23977).
Move addcmul to Aten(CUDA) (23814).
Enable Add, sub, mul, and div on CPU for bfloat16 type. (22851).
Removing deprecated warning message from torch.h (24002).
port atan2 from TH to ATen (23558).
Port addcdiv operator from the TH code to Aten (23683).
Add instruction on how to nest nn::Sequential (23939).
Refactor randperm test (23526).
Delete unnecessary file split_types.py (23754).
Make all at::Tensor in-place methods const (23945).
Fix scale and zero_point names (23991).
Allow forward functions with single output to return Variable (23803).
Fix regression in triangular_solve when number of batches = 1 for CUDA (23953).
make more iterator attributes private (23744).
Fixed Bool in IsIntegralType bug (plus review comments) (23942).
Support torch::tensor and at::tensor with bool and BFloat16 dtypes. (23337).
Don't redefine unecessary type stub. (23338).
Port addcdiv operator from the TH code to Aten (24086).
Added type annotations to unpooling layers (24101).
Unboxed kernels in c10 (23447).
Allow kernels that don't have a boxed version (23665).
c10 dispatcher stores autograd kernels (23666).
Move TensorOptions to ATen/core (22020).
Make all at::Tensor in-place methods const (23945).
Optimizing out the division in the fusion (23275).
add function name to error messages generated by checked_tensor_unwrap (24187).
Fix C412 lint from flake8-comprehensions update. (24184).
Align AT_FORALL macros with AT_DISPATCH macros. (23339).
Remove unused parameter from FORALL macros and rename STUBS to QINTS. (23340).
Thread local debug info (22365).
Cleanup warnings (24133).
Enable FBGEMM tests under UBSAN as well (23570).
toString(FunctionSchema) shows overload name (23694).
Disambiguate tensor and string ops (23748).
Simplify tests that should cover all possible devices (23824).
reduce memory usage for centered rmsprop (24170).
Enabled comparison ops for bfloat16 dtype on CPU (24182).
Rename torchtest.test_all_device_types to torchtest.for_all_device_types (24337).
Fix expansion of stride argument in max_pool2d (23954).
Fix expansion of stride argument in max_pool3d (23960).
Fix expansion of stride argument in avg_pool2d (23961).
Fix expansion of stride argument in avg_pool3d (23963).
Resolve unused variables in tests (24075).
Sanity fixes for bitwise_not (24296).
Fix issue with single memory location being written multiple times (23574).
Add logical_not operator (23839).
Add logical_xor operator (23847).
Recommend logical_not() instead of bitwise_not() when applying sub and neg on bool tensors. (23860).
Enabled masked methods for bfloat16 (24183).
Make aten_to_numpy_dtype in tensor_numpy.h public. (23943).
Let logical_not support non-bool tensors. (23916).
Let logical_xor support non-bool tensors. (23978).
Exposing the API for use with pytorch/tvm repo. (24430).
Assert weight_observer has the correct dtype (24436).
Enabled torch.mm and torch.mv for bfloat16 (24224).
Don't require slow test reporting in run_tests.py --pytest (24448).
Modify symmetric eigendecomposition derivative (23018).
Remove unused files from THNN and THCUNN (24820).
Allow SyncBatchNorm without DDP in inference mode (24815).
Enable torch.eye for bool and half (24148).
Allow torch.tril / triu to handle bool and half inputs (24163).
TensorIterator::binary_op input-output overlap check (24058).
Make SobolEngine use random seed if not specified (24884).
Add static dispatch mode to reduce mobile code size (22335).
Use a ptr to store autograd profiler rng (24889).
Fix deprecation warnings (24841).
Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (22907).
Remove unused ATen headers for mobile (24850).
Improve c10 dispatcher lookup perf (24882).
Add epsilon argument to Adagrad optimizer (24980).
Migrate erfinv and erfinv_ from the TH to Aten(CPU) (24908).
Fix for cdist backward for non-batch tensors (22915).
Remove deprecated TH(topk) code. #24778 (24857).
Disable tsan for test_dataloader.py. (25005).
Fixed test_numba_integration (25017).
pin_memory thread now uses 1 thread only (25111).
print padding_mode for Conv modules if not zeros (23996).
Use the EmbeddingLookup API which takes the offsets instead of lengths (24945).
torch.from_numpy fix for np.int (25139).
Creates Torch-friendly Event class and adds Stream tracking to autograd (25130).
generic overrideable convolution for backends (23562).
Optimize LeftRight and either (25133).
data -> data_ptr: upgrade the deprecated APIs (25223).
Moving sign function to ATen (22861).
Add TORCH_WARN_ONCE, and use it in Tensor.data() (25207).
Upgrade the deprecated data to data_ptr APIs (25295).
upgrade MKL-DNN to v0.20.3 (22910).
Remove some unused plugins. (25201).
Disable the copy constructor and = operator of DispatchStub (24932).
Fix infer np scalar dtype mem leak (24267).
Align AT_FORALL macros with DISPATCH macros wrt Half. (25268).
Implementation of cpu_serial_kernel for TensorIterator (25125).
Migrate digammadigamma_polygammapolygamma_ from the TH to Aten (CPU) (25048).
note location (25311).
Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (24943).
Support all_reduce a list of same-device tensors #21640 (24949).
Fix typo "takes takes" -> "takes" (24785).
Remove unused THTensor_(add) and similar functions code. (24864).
Move new_criterion_tests from test_nn.py to common_nn.py (25333).
Use C10_DEPRECATED_MESSAGE instead of TORCH_WARN_ONCE for Tensor.data() (25319).
Extend nn.Transformer to support BERT (gelu) (24181).
Fix possible deadlock in SharedCache inside a forked child proc (25158).
Fix double backward of inplace op on view (23502).
change LBFGS's default tolerance_grad to 1e-7 (25240).
Add OneCycleLR (25324).
Invariant typevar matching on callsite checks (25136).
Fix lint (25371).
Fix bug in assertNotEqual for int tensors (25199).
Kill THNN function auto generation. (25322).
Move the CUDA implementation of ceil to ATen. (24866).
Replace open registration TensorTypeId with closed enum. (25252).
Remove THNN sparse autograd Functions. (25323).
Kill ConvTransposeMixin.forward, which doesn't seem to be used. (25326).
Kill backend-specific lookup in CrossMapLRN2d, as it never succeeds. (25331).
Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (25339).
Adding ModuleList to modules.h (25346).
Fixed masking warnings in tests (25317).
Add support for non-affine batch norm with float stats and half inputs (22750).
Fix allreduce_coalesced tests in c10d (25419).
Remove Module._backend as it's not used anymore. (25342).
Delete toType(const DeprecatedTypeProperties&, ...) (25332).
Update QNNPACK submodule to 7d2a4e9 (25400).
Compare shapes of outputs and grad_outputs in autograd.grad (25349).
Stop initializing THNN backend. (25352).
Stop doing nn wrap. (25353).
Fixes #25454 (25456).
Migrate clamp and clamp_ from the TH to Aten (CPU) (25290).
Get rid of torch._thnn (25354).
Get rid of more unused plugins (25355).
Get rid of extract_cwarp (25356).
Update derivatives.yaml docs to refer to Declarations.yaml rather than Declarations.cwrap. (25357).
Kill non-shared cwrap tools. (25358).
Delete a few cases where we directly use Backend/TensorTypeId. (25467).
Fix implicit fallthrough warnings in FeatureLPPooling.cu (25451).
Update speed benchmark binary to work in USE_STATIC_DISPATCH mode (25449).
Migrate CPU_tensor_apply to TensorIterator in TensorCompare.cpp (25402).
Creates Torch-friendly Event class and adds Stream tracking to autograd (25130).
Run clang-format on torch/lib/c10d (25382).
Checks requiring GPU moved to their own test (25555).
Test_allreduce_coalesced_stress message passed in as kwarg (25557).
Delete torch/csrc/nn/type_checks, which aren't used anymore (25506).
Create helpers for implementing unary ops whose CUDA implementation is ATen (24879).
Implement indexing methods for sparse tensors (24937).
Migrate multinomial from the TH to ATen (CPU) (25274).
Enable broadcasting of batch dimensions RHS and LHS tensors for lu_solve (24333).
Get rid of _th_reciprocal_. (25507).
Enable torch.cholesky for batches > 262140 (24438).
Don't save self in index backward (25594).
Eliminate magic numbers in BatchLinearAlgebra.cu (25524).
Allow TensorMethods.h to include Dispatcher.h (alternative) (23888).
Fix clang-tidy script (25652).
Kill unused enumerate_options_due_to_default. (25588).
Kill discover_sparse_tensor_operations. (25589).
Cpu-strided-complex support for binary-ops (25534).
Port new_empty to ATen (25475).
Port new_full to ATen (25583).
Rename 'mobile' to 'static_dispatch' (25695).
Bring back skipped bitwise dispatch (25689).
Align AliasInfo's operator<< with FunctionSchema (23206).
Migrate digamma and polygamma from the TH to Aten (CUDA) (25662).
Remove tools/setup_helpers/cudnn.py (25482).
Enable BLIS from the FLAME project as a BLAS choice. (23819).
Expose parse_schema and eq function to python and add round trip tests (23208).
Fix error message stack overflow (25146).
Fix typing on nn.Parameter (25586).
More accurately describe field invariants in OperatorEntry (25793).
Enable log_softmax and CrossEntropyLoss for bfloat16 (24457).
Fix missing str to int conversion in the commit f71ddd42 (25861).
Fix test_det_logdet_slogdet_batched on PowerPC (25773).
Unify treatment of warp size / wave size (25884).
Make torch checks same for both CPU and CUDA multinomial (25595).
In the CUDA implementation of erfinv, erfinv() should be used for double (25337).
Fix cpp_extensions test failures with GCC 9.1 from ArrayRef(initializer_list) (25384).
Rename packed tensor accessor (25654).
Gate static aten registerer with USE_STATIC_DISPATCH (25815).
Tensor type set (25308).
Enable libflame as a LAPACK choice (25795).
Fix scatter CPU kernel when (input size, src size) > index size (25839).
Migrate pow from TH to Aten (CUDA) (25517).
Fix int32 overflow in SummaryOps.cu getBin #25747 (25748).
Simplify header inclusion in test/cpp/api/modules.cpp (25921).
Compute common dtype based on inputs only (25593).
Updates autograd engine to respect streams set in forward (8354).
Make running Gloo tests conditional on availability (25913).
Remove superfluous check for POLLIN in TCPStore (25911).
The float version of calc_digamma should return float type. (25488).
Add VariableTensorId, store it in TensorTypeSet (25597).
Add torch.backends.mkldnn.enabled flag (25459).
Skip TestAutograd.test_deep_reentrant on macOS (25942).
Skip TestHub on macOS (26033).
Refactor torch.*solve tests (25733).
Enables _do_cuda_non_default_stream (25989).
Skip test_triangular_solve_batched (26108).
Stop re-ordering TH(C)Blas arguments. (25606).
Kill TH(C)Blas kwarg_only declarations. (25607).
Stop reordering TH random function arguments. (25608).
Fix base_lr overridden in cyclic lr (26105).
Kill kwarg_only declarations in Declarations.cwrap. (25609).
Add device check before accessing data_ptr in PackLayer (26056).
Add data field to Tensor pyi. (26093).
Kill most defaults in Declarations.cwrap. (25610).
Get rid of more defaults in Declarations.cwrap. (25611).
Kill remaining defaults in Declarations.cwrap. (25612).
Remove requests as dependency (26083).
Make schema part of RegisterOperators::Options (26114).
Allow overwriting catch-all kernels (25947).
Creates generic device type testing framework (25967).
Add sync to flaky test_events_multi_gpu_query (26231).
Add possible out of shared memory error message (25730).
Ports most of test_torch.py to generic device type framework (26232).
Add type hint for cuda.set_rng_state (26200).
Call aten ops through c10 dispatcher (23668).
Remove unboxedAutogradKernel from c10 (26130).
Refines test_torch.py generic device testing (26244).
Fix binary size of OpsAlreadyMovedToC10.cpp (26237).
Migrate away from using Variable( in test_nn.py (26077).
Enabled conv methods for the bfloat16 (26167).
Move the CUDA implementation of round to ATen. (25041).
Kill defaults in nn.yaml. (26282).
Add s390x compiler define for s390 builds. (26233).
Add derivative of cholesky_solve (26185).
Kill 'default_init', which isn't needed anymore. (26281).
Adds generic device tests to test_autograd.py (26248).
Ensure that n is non-negative in polygamma. (26294).
Enable batching for pinverse (26095).
Make TORCH_WARN_ONCE capture variables by reference (26289).
Fix race in CUDA initialization (25788).
Kill declared_type and ignore_check from THFormal. (26284).
Replace simple if_true / if_false cases in Declarations.cwrap. (26285).
Fix typo (26298).
Enabled bfloat16 dtype on CUDA (26148).
Move more ops to c10 (26255).
Remove dead function (26259).
fix ctc_loss argument check error message (26325).
Skip testing triangular_solve_batched on non-default CUDA stream (26115).
Kill if_true / if_false in Declarations.cwrap. (26346).
enable xla cpp tests in CI (26347).
Resolve #25605 cyclic reference in _LRScheduler (25776).
use allgatherv for sparse all reduce (23917).
Removes torchtest, expands generic device testing (26374).
Add a float version of calc_erfinv (by templating) on CPU (26070).
Fix type mismatches in the CUDA version of calc_digamma and calc_trigamma (25791).
Adds dtypes decorators to and allows helper methods in device generic test classes (26375).
Fix composite learning rate (26227).
Move the CUDA implementation of rsqrt to ATen. (25285).
Add a flat hashmap (26371).
Preserves insertion and deletion order in flat hashmap (25675).
Moves more tests to TestTorchDeviceType (26435).
Tag files should not be deleted by "python setup.py clean". (26416).
Implement multiple dispatch (25653).
Enabled where for bool tensor on CUDA (26430).
Implement multiple dispatch (26468).
Port lgamma from TH to Aten (25138).
Make c10::Scalar::to() const (26406).
Allocate empty tensor instead of empty_like in binary ops, fix pow (26498).
Implement multiple dispatch (#26468) (26501).
Move the CUDA implementation of floor to ATen. (25372).
Fix for Conv shape check prints overflowed ints (25827).
c10::KernelFunction (26337).
Add two levels to use_c10_dispatcher (26272).
Correct the test of a big number (2 ^ 31) (26491).
Enable creation of boxing wrappers for some aten operators (26273).
ATen port of lgamma (cuda) (26600).
Enabled bfloat16 dtype on CUDA (26407).
Makes test_indexing.py device generic (26634).
Allow batch size of 0 in Conv (26214).
A few hub improvements (25980).
Updates and extends TestNNDeviceType (26638).
Enable registering stackbased kernels with lambdas (26658).
Move the CUDA implementation of trunc to ATen. (25423).
Add derivative for cholesky_inverse (26451).
Vectorize unary operator erfinv (26629).
Expands TestAutogradDeviceType (26708).
Enable hub tests on MacOS (26697).
Simplify operator sign using the helper. (25592).
Address review comments in https://github.com/pytorch/pytorch/pull/26272 (26587).
Add whitelist for backward compatible checks for function schemas (26740).
Expose a torch.result_type and simplify tensor iterator (26012).
Delete backwards compatibility Backend overload for registerOp (25914).
Implement multiple dispatch in boxed c10 dispatcher (26118).
Remove unnecessary include from TensorBody (26360).
Add some missing constructors to IValue. (26718).
Hub improvements (26723).
Upgrade sleef to v3.4.0. (26749).
Lets generic tests use multiple devices (26594).
Refactor checked_tensor_unwrap to take DeviceType instead of Backend (26290).
Port CUDA implementation of expm1 to ATen (26598).
Remove one unnecessary copy of the output during the type promotion. (26816).
Fix Future default constructor missing for ParallelNative (26739).
Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (26592).
torch.load default encoding change to 'utf-8' (26421).
Move the CUDA implementation of log to ATen. (26494).
enable double backward for non-cudnn LSTM and GRU (26660).
Migrate multinomial from the TH to Aten (CUDA) (26481).
Remove three unused declaration. (26699).
Make resize_as_ generic, so XLA works. (26809).
Add some missing constructors to IValue. (26806).
Change calling convention of ATenDispatch from getOp to callUnboxed. (26857).
Refactor dispatch structure so fallback code lives inline. (26367).
Fix nuclear norm with requires_grad=True (26303).
Choose num_threads in parallel_for based on GRAIN_SIZE (26886).
Use intrinsics for trigonometric functions on CPU (26431).
Remove an unused function propagate_names_if_namedtensor_enabled (26176).
Migrate lt and lt_ from the TH to Aten (25998).
Make TypeDefault, TypeDerived and VariableType anonymous namespaces (26882).
Move Generator ops to c10 (26434).
Add torch.can_cast(from, to) function (26805).
Include iteration_ in SGD optimizer serialization (26906).
Make repeat respect the current stream (26946).
Fix issues in torch::tensor constructor (26890).
Named tensor support for: index_fill_, index_fill, squeeze, median(Tensor) (26914).
Add std::variant backport as torch::variant (26836).
fix type annotation (26930).
Bring back the optimization of integer.pow({2.0, 3.0}) on CPU (26938).
Add torch.promote_types function (26655).
Rewrite argmax and argmin as TensorIterator reductions (26181).

Jit:

Cleanup interface of inlineCallTo. (23539).
Make ProfiledTensorType hashable (23116).
add log stmts to peephole.cpp (23279).
add docs for serialization (23456).
Move overview to docs/ folder (23457).
Include recursive class compilations in error call stack (23454).
add a test for inline tracing (23543).
format jit_type.h (23564).
Add logging to Alias Analysis (23383).
Update relative links in OVERVIEW.md (23627).
prefix module qualified names with module (23630).
allow forward hooks in tracing (23613).
Add in-place check to AliasDb (23210).
Support nn.GRU in script (23266).
Remove more uses of DimensionedTensorType (23060).
Compress debug symbols when serializing TorchScript models. (23659).
Fix frontend error message (23576).
Compress all non-Tensor components of a serialized TorchScript model. (23723).
Initial torchbind prototype (21098).
Perform string uniquing by value in pickle serialization. (23741).
don't try to set training after ScriptModule has been initialized. (23680).
Open up AliasAnalysisKind for any ops (23810).
make nn.LSTM accept PackedSequence instead of Tuples (23643).
fix some compiler warnings (23816).
Properly mangle nn.Module.__construct (23779).
Define toIValue conversion for dtype (23708).
format init.cpp (23840).
Recursive script migration guide (23892).
Erase shape information from class types (23362).
Make typing understand exceptions (23565).
Disable optimizer for __setstate__ (23698).
Make assertions refine types (23949).
add NotIn support in script (23637).
metacompile isinstance checks (23885).
add support for overloading functions (23886).
jit.script() testing and fixes (23891).
support tensor as key type in script (23638).
make _overloads importable in nn/functional (24049).
[jit] make sure NameTuples have unique qualified names (23798).
serialize all c++ frontend modules to a single CU. (23645).
fix py-compat fbcode lint warnings (23530).
Add Pickler C++ API (23241).
Moves clamp from autodiff cpp to symbolic script (23927).
support dict augment assign in script (23639).
Open up AliasAnalysisKind for any ops (23834).
Fix builtin function reference (24056).
add initial support for sparse tensors (23841).
support grad and data attribute for tensor in script (23842).
JIT Serialization of nnq.Linear (24048).
Replace Module::copy_into with Module::clone. (24068).
serialize modules as classes (23098).
Delete WeakScriptModuleProxy (23398).
Add Pickler C++ API (23241).
Fix trace docs (24191).
search class type for methods (23689).
simplify NamedType interface (23691).
make NamedType an interface (23696).
make FunctionType a NamedType (23697).
class_table_ to deps_table_ (23845).
clean up import_source (23846).
Use JIT function schema parser to parse builtin RPC ops (24207).
Remove DimensionedTensorType (24077).
Fix flake8 issues in ./torch/jit (24240).
Fix missing version < 2 guard in import (24255).
Add logging to autodiff (23664).
fix test_jit.py so it can be run in parallel (24311).
fix list comprehension type assumed to the same as input type (24271).
simplify NamedType interface (24278).
make NamedType an interface (24279).
make FunctionType a NamedType (24280).
class_table_ to deps_table_ (24281).
clean up import_source (24282).
Add the ability to compile exports on traced modules (24298).
Cleanup documentation around script and trace (24208).
Add trace_module to docs (24258).
JIT trace testing (23987).
copy methods when creating a derived class type (24349).
kill TK_NAMED_TUPLE_DEF (24350).
Misc doc updates / fixes (24371).
remove CompleteTensorType (24169).
Remove type subclassing (24257).
fix IR parsing bug (24294).
pickler read guard (24433).
Fix test_jit_cuda_archflags failure on py27 due to changing dict order. (24501).
fix double copying of constants (24412).
jit_log: Extract a function that prefixes all lines of a string with another string. (24355).
Module: add dump function that recursively prints contents of the module. (24356).
Clear recursive error stack on each compilation (23458).
Add @ignore for script classes (23614).
Record function name as an attribute of CallFunction nodes. (24446).
big cpp test reorg (24801).
Cache node operators to speed up optimization (24827).
Fix VaryingShape::merge (24455).
Make torch.jit.Attribute work with PYTORCH_ENABLED=0 (23851).
Moves (most) ops to symbolic script (23794).
Fix unicode in comments (24218).
serializing function calls (23799).
Removes SymbolicVariable from tests (24007).
Merge ProfiledTensorType and TensorType (24284).
Remove unused DynamicDAG class. (24890).
extend torch.jit._overload to module methods (24259).
Remove torch.contrib._graph_vis (24874).
Fix missing super call error (24852).
bind autograd.grad function into TorchScript (24871).
restore default constructor of OutputArchive (24955).
Misc doc updates #2 (24445).
Fixing size implementation for struct slot_list_impl (24351).
add support for multiple assignment statements (24477).
Load tensors directly from pickle archive (23281).
Fix fbcode weak ordering (25026).
Fix bugs in assignment to optionals (24989).
Fix a bug in creating a prefix string in jit log. (25051).
cleanup tmp name generation (25065).
jni-java wrapper for pytorchScript api (25084).
Fix python lints for generate_test_torchscripts.py (25107).
Clean up after running doc tests (25036).
fix annotated assignment (25094).
dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
add some sparse tensor ops support in TorchScript (24967).
move some methods into function.cpp (25119).
SubgraphMatcher: Factor out matchAttributes. (25073).
SubgraphMatcher: add logging. (25074).
SubgraphMatcher: matching modules support. (25075).
Add logging to JIT CSE pass. (25141).
bind autograd.backward and tensor.backward in TorchScript (23913).
fix to loggin in AA (25143).
Fix bugs in assignment to optionals (25059).
skip fstrings test if not py36 (25184).
Simplify NamedType (25058).
Add interface declarations to JIT (21972).
Remove insert_observers pass (24999).
Remove InsertQuantDeQuantNode (25000).
Implement a bunch of pickle serialization features that optimize for size. (23759).
fix closures which always throw. (25278).
Add interface declarations to JIT (25258).
Remove PythonPrint's is_method_ member (25226).
add serialization of interface (25227).
improve interface error messages (25228).
don't throw in constant prop (25270).
Add source location to class instantiation error (24990).
fix inliner bug (25052).
Pull instruction definitions out of interpreter.cpp. (25148).
Add GET_ATTR instruction (25151).
Fix old annotate() error (25261).
insert_observers use qconfig_dict (25069).
Implement FoldConvBatchnorm2d pass. (25282).
Remove spurious print (25378).
Fix AliasAnalysisKind::PURE on MSVC (25375).
Fix item() call in docs (25404).
Attempt to enable CrossMapLRN2d, as it no longer uses Module._backend. (25343).
Some alias analysis fixes (25425).
Emit script function calls during tracing. (25089).
torch/jit/passes/quantization.{h,cpp} and torch/jit/init.cpp (25403).
add tuple keyword (25474).
Manually implement is_zipfile (25279).
Removes SymbolicVariable (25077).
Added invert bitwise operation to JIT (22324).
Remove friend dependency on ClassType in InterfaceType (25617).
Remove forward compat code for serialization format (25440).
Make NoneType <: Optional[T] (25361).
Remove accidentally re-added file (25677).
move legacy deserialization code into jit/import_legacy.cpp (25649).
preserve ignored function return value type (25262).
Finish testing code examples in the docs (25668).
add getitem to class types (25664).
Make tensor key in Dict works in serialization (25442).
Expose an API to iterate all the registered operators (23207).
Fix missing newline in compiled from source range highlihgt (25802).
SubgraphMatcher: add logging to a check missed previously. (25735).
Fix c10 tracing (25869).
add torch.jit.is_scripting() api (25263).
Make arguments of Module::dump easier to remember. (25740).
Only create a new clone of observer when we actually insert it. (25931).
add set_grad_enabled to TorchScript and fix data attribute (25350).
add torch.jit.is_scripting api (25955).
add support for ModuleDict (25715).
fix use-after-free bug (25965).
Fix torch.arange traced as constant (25363).
Preserve module names in recursive script (24505).
Add in membership checks for lists (25796).
TorchScript Serialization for dynamic LSTM module (25877).
print source code when a function is executed (25868).
make sure all out stringstreams start out empty in jit_log.hpp (25863).
tracing with an opt-in by file name (25895).
Port fuse_linear from pytorch/tvm (25623).
Add documentation to logging (26175).
Register ATen ops with c10 (26131).
Add isBackwardCompatibleWith for Argument and FunctionSchema (23409).
Add a wrapper for inspect in JIT to produce better error message (25415).
Enable CPU fused kernel on Windows (25578).
Fixed size arrays (23695).
fix schema matching of tuples to vartype lists (25944).
min(li) max(li) (26351).
Remove torch.save-related logic from pickler (25502).
Add support for lists for prim::min and prim::max (26155).
Add ivalue::type(), part 1 (25439).
Use static type information to restore type tags (25447).
Add filter function to subgraph rewriter runGraph (26223).
Implement more size-oriented opcodes in the depickler. (25786).
Refactor emitIsInstance (26061).
add setitem to class types (25750).
Make jit dicts ordered (26465).
fix schema matching of tuples to vartype lists (25944).
Fixes test_wrapped_number (26523).
Implement more size-oriented opcodes in the depickler. (26454).
Make is_optional check more robust (26312).
Resolve NamedTuple types in Python (26443).
Fix jit/pass/peephole.cpp fuse addmm (26357).
Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp (26432).
Whenever possible, use function pointers rather than std::function to represent Operation's. (26560).
Serialization for per channel qtensor (26339).
add CondValue to unify refinements and code emission (26145).
Add ObserveHelper and remove some common function parameters (26641).
Remove 'recurse' parameter from Inline. (26487).
Use std::mutex instead of std::call_once in Function when we initialize GraphExecutor. (26571).
Add 'optimized_graph' to Function. (26488).
Use optimized graph in Inline (essentially, making Inline recursive now). (26489).
resolve ignored module method type annotations (26683).
Add traces to specialize_autograd and lower_grad_of (2nd try) (22752).
Register values listed in constants as attributes of the Module. (26581).
Make is_optional check more robust (26312).
Fix builtin lookup for Python functions (26688).
Improve error message in IR parser when accessing undefined variable. (26771).
autodiff changes to enable profiling (25397).
Typevar matching fix + implicit conversions from Scalar to int/float (26453).
support iterables, rangevalue in list comprehensions (26768).
Bytecode export flow (25187).
Use optimized_graph in graph_executor. (26705).
Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place. (26703).
Fix broken failure messages for OverloadedMethodValue (26846).
Improvements to GuardElimination and InsertBailouts (25430).
Fix circular deps in loading (26758).
add AutoNonVariableTypeMode guard on JIT->ATen boundary.
Add logging in constant propagation pass (26653).
fix range for non-int inputs and pow implementation (26926).
Move some class/functions in test_jit.py to jit_utils.py (26839).
Remove unimplmented passes (26978).
Fix race condition in torch::jit::Function (27009).

Mobile:

Add to Tensor symmetric methods getDataAsIntArray, getDataAsByteArray (25183).
Initial commit for android torchvision utils (25185).
Add libtorch android build with shared lib for 4 android abis (25192).
pytorch android circleci integration (25286).
turn off BUILD_BINARY for android CI jobs (25485).
Gradle tasks for publishing to bintray, jcenter, mavencentral etc. (25351).
remove protobuf usage from mobile build (25493).
Fix iOS simulator build (25633).
Fix OSS mobile CI (25755).
Add PR jobs for iOS builds (25840).
Clean up the iOS build script (25822).
Cocoapods for iOS OSS release (25847).
Introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile.
Add NO_EXPORT macro to unset visibility attribute (25816).
Update build_android.sh to not build host protoc for libtorch (25896).
Simplify build_android_gradle.sh (25897).
Change gradle build to use static libtorch + gc-sections (25984).
Use torch::from_blob instead of shareExternalPointer, nits (25973).
Change the source link in podspec (26089).
Tensor renaming to dtype, shape; support long, double (26183).
Fix circle CI (26225).
Remove armv7s build from iOS (26222).
CircleCI android nightly (snapshot) build publishing (26069).
Fix error messages; tensor creation method names with type (26219).
Integrate forked QNNPACK into mobile PyTorch builds. (25844).
Add iOS test app skeleton (26261).
Fix no tab check (26399).
Clean up the PR job script for iOS build (26353).
Exclude libfbjni.so from pytorch_android not to have its duplicating (26382).
Add script to build mobile library with host toolchain (26440).
Fix JNI wrapper for IValue interface change (26448).
Use gradle 4.10.3 for build and publish (26473).
Disable bitcode for iOS CI jobs (26478).
Javadocs for Tensor, IValue, Module (26149).
Turn off autograd mode in android JNI wrapper (26477).
Expose USE_STATIC_DISPATCH macro to public headers.
Improve how pytorch_android cmake imports static lib (26525).
Add eigen blas for mobile build (26508).
Support IValue string type (26517).
Update android/iOS build library packing (26565).
Add testing script for iOS x86 build (26632).
Sync docker images (26651).
Nightly prefix for android nightly jobs (26652).
Refactor android torchvision: not hardcoded mean/std (26690).
Switch our Android CI to Clang (26656).
Prepare for Cocoapods 1.3 Release (26751).
QEngine::QNNPACK enabled, module.eval() (26855).
Add mobile friendly at:parallel_for backend.
Remove backward functions from jit-op-registry for mobile build (26851).
Check if QNNPACK is supported before set (26935).
Fix mobile.sh build (26975).
Fix fbjni packaging, exclude for publishing, include by default (26995).

Named Tensors:

Fix named tensor build by enabling tensor.is_pinned and removing support for clone() (23597).
Add torch._C._BUILD_NAMEDTENSOR() (23623).
Add names to repr for named tensors (23316).
Add name propagation for at::alias, add tensor.set_names (23624).
Improve test_namedtensor.py with named tensor equality check (23801).
Add names argument to ones, rand, randn, zeros, full (23743).
Implement name inference rule for empty_like, clone (23746).
Named inference for contiguous(), bernoulli variants, and dropout. (23808).
Add name propagation for at::alias, add tensor.set_names (24105).
Improve test_namedtensor.py with named tensor equality check (24106).
Add name propagation for at::alias, add tensor.set_names (24202).
Add names argument to ones, rand, randn, zeros, full; fix empty (24107).
Implement name inference rule for empty_like, clone (24108).
Named inference for contiguous(), bernoulli variants, and dropout. (24109).
Implement tensor.align_to(names), torch.align_tensors(*tensors) (23804).
Rename set_names -> view_names, set_names_ -> names_ (23962).
Update tensor.view_names / tensor.names_ API (23973).
Fix out= function semantics for named tensors. (24028).
Name inference for softmax, log_softmax and Dimname overloads. (24087).
Implement name inference for t(), transpose(...) (24203).
Add thread-local-state NamesMode and NoNamesGuard (24367).
Fix named tensor build (24940).
Implement name inference for t(), transpose(...) (24941).
Add thread-local-state NamesMode and NoNamesGuard (24942).
Fix FIXME_default_names by storing static list of 64 none names (24885).
Rename Tensor::names() to Tensor::opt_names() (24907).
Add helper function Tensor::names() (24914).
Fix binary op name inference between unnamed and named tensors. (24921).
Implement name inference for mm, addmm (24306).
Implement name inference for expand (24469).
Implement name inference for addmv, addmv_, mv (24471).
Implement name inference for torch.dot (24474).
Fix named tensor test (25313).
Implement name inference for torch.bmm (25123).
Implement name inference for torch.matmul (25177).
Include the correct header for make_unique in named tensor headers (25178).
Fix dependency by moving Dimname.{h,cpp} NamedTensor.{h,cpp} to core/ (25280).
Add guard for named tensors in the JIT (25344).
Add guards for using named tensor with serialization and multiprocessing (25345).
Prepare to add some Dimname/DimnameList overloads (25405).
Name inference rule for mean, std, var, std_mean, var_mean (25431).
Name inference rule for masked select (25566).
Name inference for masked_fill_ / masked_fill (25567).
Name inference rule for torch.cat (25568).
Fix binary op name inference to happen before shape checks (25563).
Fix named tensor printing (25564).
Name inference rules for relu/relu_/threshold/threshold_ (25569).
Implement initial version of autograd with named tensors (25604).
Fix named tensor build (25673).
Move most BUILD_NAMEDTENSOR macros out of header areas (25721).
Rename tensor.view_names -> tensor.renamed (25711).
Move BUILD_NAMEDTENSOR in NamedTensorUtils.h (25781).
Add flatten for named tensors. (25672).
Quick fixes for named tensor for windows (25728).
Name inference for unbind (25585).
Fix assertion if NamedTensorMeta's num_names != tensor.dim (25778).
Add names= argument to torch.tensor ctor (25424).
Remove some more BUILD_NAMEDTENSOR flags (25919).
Delete tools/autograd/env.py (25920).
Remove unnecessary BUILD_NAMEDTENSOR from interned_strings.h (25938).
Add TEST_NAMEDTENSOR flag to namedtensor ci (25948).
Move NamedTensorMetaInterface definitions to TensorImpl.h (26030).
Experimental warning for named tensors (26050).
Implement tensor.refine_names (25842).
Implement tensor.align_as(other), change tensor.align_to(names) (25843).
Fix bug with named tensors and (no) tracer support (26106).
Fix namedtensor ci (26257).
Turn on BUILD_NAMEDTENSOR permanently (26060).
Implement named tensor unflatten(dim, namedshape). (25658).
Rename torch.namedtensor -> torch._namedtensor_internals (26349).
Change '*' to '...' and ... for named tensor API functions. (26350).
Change "named_guard" in native_functions to "supports_named_tensor" (26352).
ensure c10/macros included before using (26439).
Disable tagged names (26479).
Delete tagged names (26365).
Refactor Dimname.h API to be nicer (26366).
Implement resize_, resize_as_ for named tensors (26493).
Support torch.pow with named tensors (26541).
Name inference for min(Tensor, dim?) / max(Tensor, dim?) (25582).
Renames tensor.renamed -> rename, tensor.names_ -> rename_ (26548).
Fix ellipsis behavior for Tensor.align_to to glob all missing dims (26648).
Typo fix (26417).
Don't generate named tensor functions to RegistrationFunctions.h (26685).
Add a lot of dimname overloads (26636).
Wrap dimensions during named inference (26558).
Named tensor support for: atan2, output_nr, detach{}, requires_grad (26543).
Named tensor support for logsumexp, mode, kthvalue, median, min, max (26563).
Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (26815).
Fix CUDA named tensor copy_ (26829).
Make named tensor implementations more robust (26968).
Better named tensor error messages. (26974).
Enable named tensors for arithmetic, clone, and tensor conversion ops (23237).
Move most BUILD_NAMEDTENSOR macros out of header areas (25721).
Rename tensor.is_named to has_named, expose has_named to python. (23315).

ONNX

Fix unused imports in torch/onnx/symbolic_opset8.py (23678).
Support ONNX export Multinomial (23581).
added opset10 ORT tests (22993).
frobenius_norm onnx export added (23536).
Std opset export (22310).
weight_names bug fix (23848).
canonicalize_ops pass bugfix: copy metadata for new output (23809).
Provide argument in ONNX export to exclude intializers from graph inputs. (23284).
Fix validation of dynamic axes names (23974).
updated pixel_shuffle in opset 11 to use depthToSpace (23739).
Relax precision constraint on ONNXRuntime._gru_test (24340).
Add ONNX Export Support to empty and empty_like (24166).
Update docs for softmax in onnx supported operators (24832).
enable "keeps" from BoxWithNMSLimit and caffe2_fastrcnn_outputs_inference (24451).
cumsum (24476).
Fix some typos in documentation (23507).
Update onnxruntime CI version (24414).
Momentum setting in SyncBatchNorm forward (inference) pass. (24995).
Export Unique (25050).
Fix dead link and syntax in ONNX landing page (25126).
Fixed nondeterministic RG for ORT RNN tests (25205).
Add ONNX Export Support to rsqrt (24153).
Add ONNX export support for torch.log1p. (25808).
remove "build_deps" arg from setup.py command in (26113).
Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (26137).
Export round (26126).
fix test_arange and bump ort ci version (26320).
Automatic update of fbcode/onnx to 1316afc9f972f81340faa05763e2898f38bcc3b0 (26309).
add pass for onnx scalar type conversion (24378).
Export clamp for opset 11 (25797).
Export gelu (24475).
Fix Exporting RNN/LSTM's Initial State (h0/c0) to ONNX (22813).
Update ONNX Export for Gather and Scatter for Opset 11 (24790).
Automatic update of fbcode/onnx to 23bb6ea1a71f08e200114a153f48bd7adb66d486 (26441).
Setting automatic default selection for ONNX IR v4 semantics in ONNX export API (26146).
Update ONNX Export for Interpolate in Opset 11 (24805).
Make ONNX_ATEN_FALLBACK also works for _export (26738).
Automatic update of fbcode/onnx to ab6b94203c595f74b1f126eb118eef22e4c05a57 (26736).
Update ONNX Export for Interpolate in Opset 11 (26778).
Support Negative Axis in Size in ONNX (26436).
Export baddbmm (25738).
Export index_fill and index_copy, fix caffe2 scatter (23052).
Add Support to Dicts and Strings in ONNX for Inputs and Outputs (25889).
export baddbmm (26901).
Updating producer_version in exported ONNX models to PyTorch 1.3. (26976).

Performance and Benchmarking

Added torch.autograd.profiler.record_function() as context manager. (23428).
Fix regression in torch.qr (23591).
Fix pin_memory_thread not exiting quickly (23646).
Increase predefined_minimum_secs to reduce variation (23734).
Enhance Tensor indexSelect performance (23055).
Separate input shapes to reduce default execution time (24136).
Increase default warmup iter and iter (24272).
Fix perf bug with indexed assignment (index_put_) (24083).
Add wipe cache (24390).
Vectorize LowerCholeskyTransform (24131).
Change the location of wipe cache (24454).
Optimize performance for unboxed-only kernels (25055).
Fix ios_crash:backtrace=FBCameraFramework:caffe2::getClockTimeMilliseconds() (perf_observer.cc (24813).
Add speed benchmark binary for torch jit (25230).
Change shape for conv and unary ops (25477).
Add speed benchmark binary for torch jit (25486).
Fix operator level benchmark to have NHWC layout (26577).
Speed up an integer to the power of a positive integer on CPU (26020).
Use Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).
Use parallel_for in DepthwiseConvKernel (26879).

Quantization

Quantized Average Pool kernel (23143).
skip nn.Identity in add_observer (23500).
Change condition in swap module (23561).
make_module: First version (23288).
ConvBn2d/ConvBnReLU2d (23357).
fix conv2d (23690).
QAT modules take qconfig as argument and keep qconfig as memeber (23609).
Remove qconfig_dict from API (23465).
Fix LSTM int8 quantization model size issue (23577).
Expose the quantized inputs and output of dynamic quantized int8 FC operator for debugging (23566).
Support for non-zero zero_points for weight and activation (23541).
qconv operator level benchmark (22895).
Enable OSS quantization tests (23858).
Change fbgemm_linear_{int8,fp16}weight to fbgemm_linear{int8,fp16}_weight_fp32_activation (22955).
clang-format aten/src/ATen/native/quantized (23898).
save()/load() tests and fixes (23911).
Enabling inline in quantized relu (23704).
Fix qconv benchmark (24019).
Adding dequantize_val and requantize_val (23909).
Simplified nnq.Linear class (24046).
State dict serialization of nnq.Linear (24047).
is_quantized support in JIT (24099).
Re-work Conv2d (24115).
state_dict serialization for Conv2d + some bugfixes (24116).
JIT serialization for Conv2d (24117).
fix py2 imports in _intrinsic/modules (24206).
Fix incorrect type annotation on Linear setstate (24209).
Add out variant (23956).
Removing the make_module script. (23635).
Observer returns original tensor for post training quantization (24196).
test_nn_quantized -> test_quantized_nn_mods (24201).
Fix and test conv2d constructor and from_float (24277).
Add out variant (23971).
Add dynamic quantized Linear op in PyTorch (23464).
Dynamic Quantized Linear Module (23128).
Skip test_quantized_nn_mods tests if theres no FBGEMM (24302).
no_deadline on ModuleAPITests and skip on dynamic quantization test (24307).
Add the type matching rule for qconfig_dict (23212).
equal() for QuantizedCPU (24211).
Fix the dimension mismatch issues when running the BERT model (23330).
Make the default qconfig_dict (24232).
Remove the activation observer for default_qconfig (24299).
fix lint (24375).
test {init,from_float} on nnq{,d}.Linear (24364).
Fix more warnings (24291).
Run quantization tests first (24366).
Temporarily disable warnings in dynamic quantization ops (24376).
Fix Lint (24381).
Add intrinsic module mappings (23753).
Change return type of observer to two tensors (24339).
Add _pair for quantized conv module (24409).
Replacing axis with dim in quantized cat (24151).
Remove redundant assignment (24408).
Fix QConfig_dynamic typename (24431).
Baseline observer module, ensuring that (min,max) range includes zero. (24297).
Convert bias to float in quantized conv module (24424).
Fixes the adding of the observer to the FloatFunctional (24418).
Adds a placeholder for the 'mul' operator. (24421).
Increasing precision for avg pool (23906).
Enables inplace in the quantized relu (24374).
extra_repr for quantized modules (24443).
Change kernel_size to self.kernel_size to resolve error in quantized conv module (24499).
Add resnext 32x4d shapes to benchmark (24503).
Add the default_weight_observer for the dynamic quantization path (24231).
Clang formatting the code [1/2] (24867).
Support QScheme in script (24358).
Use absolute import of the parent folder without alias. (24792).
Added relu6 kernel (24799).
PrepareQuant step (24425).
reduce for QScheme (24969).
Remove Symmetric Quantizer in backend (24964).
gradient clipping by norm.
Make observer scriptable (24996).
Add qconv_test to benchmarking tests (24913).
Adding quantized mul kernel (24444).
Enable UBSAN test for FBGEMM in dynamic quant test (25099).
Per Channel quantization APIs (24935).
per channel quantization support (24936).
Add missing functions and methods for channelwise quantization (24934).
Support lowering of fp16 weights.
use avx2 for Add without broadcast and when inputs are uint8_t (25098).
per channel quantization support (25134).
insert_quant_dequant jit pass (24426).
quant_fusion jit pass (24427).
Work around for bias quantization for conv and linear operators (24789).
Handle empty qconfig for functional Modules (24803).
Update mapping dictionary to support functionalmodules and pooling operations (24804).
Support observer without any data calibration (24923).
Serialization for nn.quantized.functional modules (24924).
Move test QAT tests to double precision to ensure numerics match (25189).
Adding return for the observer in the functional_modules.py (25168).
Adding Scalar add/mul. (24447).
Fix scriptability for Observer (25197).
Integration tests for initial quantization graph mode (24428).
skip tests if fbgemm is not supported for test_quantizer.py (25209).
add import for test_quantizer.py (25222).
Remove deprecated graph mode quantization tests (24998).
Move test QAT tests to double precision to ensure numerics match (25211).
Update mapping dictionary to support functionalmodules and pooling operations (25216).
Fix scriptability for Observer (25219).
Disable flaky test_adaptive_avg_pool2d test. (25249).
Handle empty qconfig for functional Modules (25215).
Reducing the test size for adaptive avg pool (25195).
get rid of dynamic_cast in Quantizer (25001).
disable deadline checking on test_adaptive_avg_pool2d (25255).
Fixing the enforcement of the zero_point (25193).
Add new qnnpack_add and qnnpack_maxpool op to C10 registry (24103).
Serialization for nn.quantized.functional modules (25220).
int8 static quantization in the numerical debugger.
Work around for bias quantization for conv and linear operators (25212).
Refactor MinMax observer (23902).
Quantized comparators (24387).
Ensure quantized::add stride matches inputs (25265).
Make quantized relu ops inherit the memory format from input (25271).
insert_quant_dequant work with qconfig_dict (25127).
Integration tests for qconfig_dict (25217).
Removing future imports from the test fixtures. (25296).
Memory layout for pooling ops (25374).
making quant utilities inplace (25054).
Skip test_compare_tensor_scalar due to overflow error (25432).
Per Channel Quantization Support for Quantized Linear Operator (25276).
Skip inserting observers for Tensors inside fused op (25281).
Remove unnecessary checks in InsertQuantDeQuantImpl (25370).
Change exception to warning (25408).
Quantized vec256 + vectorized quantized::add (25202).
Minor fixes in per channel support for qconv kernel (25182).
Vectorized quantized relu/relu6 (25496).
Remove index calculation in quantized max_pool2d (25526).
Add the dynamic quantized LSTM module (25157).
Dynamic dispatch for optimized quantized op kernels (25545).
Rename fbgemm quantized operators to generic quantized ops (25338).
move no_deadline to hypothesis_utils.py (25598).
Rename FBGEMM quantized operators to generic quantized ops (25678).
Inserting observers for all methods called in forward (25503).
Vectorized specialization of max_pool2d for channels-last layout (25676).
Copy quantize routine to vec256 (25685).
Store bias in PackedLinearWeight struct in fbgemm (25428).
derandomize hypothesis tests (25513).
Relax scale to prevent saturation in conv/linear. Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in (25667).
Use more efficient specialized Quantize routine (25731).
Factor unnecesary work out of add inner loop (25751).
Fork QNNPACK into aten/src/ATen/native/quantized/cpu/qnnpack (25500).
Test scripting and tracing for dynamic linear modules (25870).
Store bias in PackedConvWeight in fbgemm (25626).
Add Dropout to blacklist (25881).
Add torch.nn.LSTM into the default dynamic quantize mappings (25954).
Change order of activation and weight in QConfig (25950).
indentation for hypothesis profile and proper inheritance for QuantizationTestCase (25934).
Improve error message when input is not in the right format (25928).
add the tensor_observer to record the runtime tensor for quantization … (25830).
Add new API for Fully Connected and Convolution Operators in QNNPACK (25862).
remove verbose in pytorch_ci hypothesis profile (26075).
Upgrade the naming for fbgemm quantized op (26064).
Use BytesIO instead of tempfile (25976).
Add Runtime flag for quantized backend. (25680).
TorchScript Serialization for dynamic LSTM (26084).
Skip inserting duplicate observers (25504).
Fix build warning in vec256_qint.h (26121).
Support quantizing any methods called (25505).
Add fusion for quantized linear (25624).
Fold quantize op into module (25625).
use whitelist for selecting observed values (25974).
Add histogram observer (23959).
Back out "[quant][observer] Add histogram observer" (26236).
fix hypothesis timeout (26280).
Whiltelist and fusion support for quantized::linear - addmm (26208).
Whiltelist and fusion support for quantized::linear - matmul(without bias) (26209).
Disable broken unit tests (26301).
Whiltelist and fusion support for quantized::linear - matmul(with bias) (26204).
Dynamic quantization for bias. (26057).
Add missing argument for failing function call (26311).
Enable support for dilated convolutions (26205).
Adding quantized::linear function for pytorch mobile in c10 (26135).
Add l2 norm minimization (24022).
Disable QNNPACK tests if pytorch is not built with it. (26427).
Adding quantized::conv2d function for pytorch mobile in c10 (26152).
Add extra filtering for scale/zero_point/dtype in FoldQuantizeCallIntoBuffer (26224).
Remove quantizeBias (26388).
Add NoQEngine to QEngine and refactor the name of set/get qengine (26330).
Fix quantized::linear QuantFusion patterns (26414).
Add per channel observer (25887).
Add support to call unpack for pytorch mobile quantized FC and Conv (26211).
Remove quantization for bias in pattern (26415).
Implement more support for per-channel quantization (26240).
Fold weight permutation inside quantized conv operator (26241).
Fold activation permutation inside quantized conv operator (26242).
Add NoQEngine to QEngine and refactor the name of set/get qengine (26471).
Add the FP16 weight support for LSTM in dynamic_quantize (25975).
Fix quantized::conv2d patterns in QuantFusion (26515).
Changes to support int8 weight and fp32 bias in QNNPACK (26307).
Add the quantized average_pool2d support and adaptive_avg_pool2d support (25899).
Fix the API for record observer (26413).
Unify Quantization APIs for add, pool and relu (26335).
Compiler warnings cleanup for quantization.cpp. (26585).
quantize_linear -> quantize_per_tensor (26574).
Get scalar type from observer module (26425).
Add inplace argument to InsertObservers and InsertQuantDeQuant (26389).
Expose supportedQEngines to python (26474).
quantize_linear_per_channel -> quantize_per_channel (26575).
Skip some fragile tests (26599).
quantized average_pool2d and adaptive_avg_pool2d implementation(Revert d17437015) (26580).
_dequantize_linear -> _dequantize_per_tensor (26576).
Unify Quantization APIs for add, pool and relu (26586).
Simplify observers declaration with functools.partial (26492).
Import torch.quantization when one imports torch (26649).
NHWC specialization for quantized::cat (26524).
Fix the flaky test_qlinear test caused by hypothesis deadline (26663).
quantized torch.topk (26486).
remove unneeded code (26640).
Update qengine flag in python to string (26620).
_per_tensor_affine_qtensor -> _make_per_tensor_quantized_tensor (26678).
Skip observing bias across function call hierarchy (26642).
_per_channel_affine_qtensor -> _make_per_channel_quantized_tensor (26679).
Quantized Interpolate Kernel(upsample_nearest2d) (26617).
Fix _empty_per_channel_affine_quantized to be less hacky (26243).
Per-channel quantized tensor to have only a single axis (26675).
Allow per-channel QTensor accept any floating type for scales (26676).
Use noop observer to pass dtype for dynamic quantization (26709).
Remove duplicate calculation of output shape (26684).
Trivial quantized torch.mean implementation (26253).
Remove _dequantize_per_channel in the pattern (26680).
Un-hardcode epsilon constant in FoldConvBatchNorm2d. (26584).
Remove _dequantize_per_tensor (26681).
Add threadpool in qlinear and qconv for mobile (26728).
move more functions to InsertObserversHelper (26696).
quantized_tensor tests (25429).
Handle DeQuantStub() for QAT (26518).
Add include to resolve PRIu32 macro (26745).
Fake quantization enhancements for QAT/PTQ support (26420).
move more functions to InsertObserversHelper (26773).
quantized_tensor tests (26784).
Quantized Interpolate Kernel(upsample_bilinear2d) (26631).
Throw if someone tries to torch.save() quantized modules (26828).
Re-write of tensor-scalar quantized add (26766).
Try to disable annoying hypothesis warnings again (26853).
Remove unnecessary functions and cleanup code in quantization.cpp. (26852).
Add more inplace arguments to quantization top level API (26782).
batch size 0 support in ChannelShuffle DNNLOWP op (26858).
batch size 0 support in Conv DNNLOWP ops (26871).
batch size 0 tests for element-wise DNNLOWP ops (26870).
batch size 0 support in FC DNNLOWP operators (26872).
batch size 0 tests for Quantize/Dequantize DNNLOWP ops (26873).
batch size 0 support in norm operators (26894).
batch size 0 tests in BatchMatMul ops (26874).
Set quantized engine backend for mobile in speed_benchmark_torch (26911).
Support ceil_mode in quantized maxpool (26916).
Make quantized max_pool2d error message more specific and less silly (26918).
batch size 0 tests for etc DNNLOWP operators (26877).
Fake quantization enhancements for QAT/PTQ support- fix tests (26876).
Serialization and range reduction support for Fake Quant/Observer (26519).
Fix the QuantizedAVX2 build issue (26854).
Default histogram observer (26622).
Fix all factory invocations in quantized to correctly propagate options. (26966).
control of observer/fake-quant operations (26520).
Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (26840).
Fix misuages for TORCH_CHECK/TORCH_INTERNAL_ASSERT with string (26897).
Better error message for calculate_qparams (26985).
Add P99 method with configurable thresholds.
Xray image inference on multi-cpu and dumping dnnlowp tensors (22537).
Add int8 resize nearest 3d op in DNNLOWP (26063).
Re-write of tensor-scalar mul (26937).
Support qadd_relu on pytorch mobile (26982).
Add optimized quantize function for ARM (26867).
Add QuantFusion to graph_executor (26591).
Move patterns in QuantFusion to a separate file (26848).
PyTorch Graph Mode Quantization API (26390).
Add the type matching rule for qconfig_dict (23212).

Visualization

Added mesh plugin (24039).
Update tensorboard.rst (22026).
Remove hard Caffe2 dependency for TensorBoard (24295).
Added test_tensorboard.py to TARGETS (24040).
Hyperparameter plugin (23134).
Removed external tensorboardX dependency (25259).
Fix empty graph problem (25599).
Delay external imports until we're ready to test tensorboard (25993).
Create TensorBoard test classes in all cases (26005).
Fix flaky SummaryWriter test (26395).
Fixing the calling parameters of write_gif function of the moviepy. (21218).
Add Virtual Memory and CPU percentage computation to AIBench (23590).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1.3 release

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally