Skip to content
gchanan edited this page Oct 2, 2019 · 2 revisions

AMD / ROCM Changes:

  • Improve hip-clang support in build_amd.py (23835).
  • For int64_t atomicAdd, use the available compiler builtin on ROCm. (24854).
  • Use correct WARP_SIZE for ROCm for EmbeddingBag ((24868).
  • Switch to rocThrust for thrust/cub APIs (25620).
  • rocBLAS deprecated the last two parameters. (25726).
  • Enable two tests that were skipped b/c of rocThrust bugs fixed in ROCm 2.7 (25724).
  • Enable jit fusion on ROCm (22872).
  • Remove NULL arguments that have been marked deprecated by rocBLAS (25866).
  • Make sparse coalesce warp size aware (25918).
  • Make spatial depthwise convolution warp size aware (25922).
  • Make lookup table warp size aware (25926).
  • Make persistent softmax WARP_SIZE aware. (25937).
  • Enable unit tests (25963).
  • Enable Unique operator tests on ROCm (26046).
  • Enable more mGPU tests (26055).
  • Make regular softmax warp size aware (25956).
  • Disable test_cuda.test_stream_event_nogil on ROCm (26087).
  • Use MIOpen for transpose convolutions (26172).
  • Switch to the new profiler infrastructure (26174).
  • Enable basic GPU profiling capability on ROCm. (26300).
  • Fix compiler unwrapping step in jenkins build scripts for Caffe2/PyTorch on ROCm (25409).
  • Split PyTorch ROCm tests as 2 CI jobs to run in parallel (26380).
  • Puts ROCm tests on default stream (26394).

Bug Fixes:

  • at::view create an empty tensor and set storage instead of clone (23452).
  • Fix set_grad for extension backends (23516).
  • torch.is_pinned pin_memory should not copy on already pinned tensors (23484).
  • Fix gemm call for CUDABlas for THCUNN conv, #23545 (23552).
  • Fix CTC loss for zero-length targets on GPU (23298).
  • Adam implementation minor fix (23737).
  • Add flag to temporarily disable MKL-DNN conv (23837).
  • Fix test TestCuda.test_streams_multi_gpu_query (23912).
  • Fix dataloader._shutdown_workers if not all workers are started (23761).
  • Fix crash on torch.Tensor.repeat() for 0 repeats (23766).
  • Fix master (24003).
  • Remove numpy assert that fails on Windows (older numpy versions). (24012).
  • Add missing include header in tensor_numpy.cpp (24042).
  • Fix tensor construction from array (24283).
  • Skip broken test (24453).
  • Fix Typing Error for Padding with asymmetric signatures (24895).
  • Avoid race condition in intrusive_ptr.reset_() (24464).
  • Temporarily fix hub SSL cert issue (25042).
  • Fixes test_equal (25275).
  • CUDA_KERNEL_LOOP: prevent int overflow in loop increment. (24818).
  • Issue #24962: Fix cuda method to support "None" arg for device and a … (25018).
  • Multiple fixes to test_c10d.py. (25334).
  • Attempt to fix windows build (25450).
  • Fix bug in assertNotEqual for int tensors (25412).
  • Fix pow precision (25476).
  • Fix 'in' return true incorrectly (24156).
  • Fix Windows build (26246).
  • Fix CI (26250).
  • Fix no auto batching bugs: cannot bulk load; not work with namedtuple (26065).
  • Fix cdist gradient computation if first arg is 1xn (26254).
  • Fixes big endian arch bugs. (26383).
  • Fix CI (26593).
  • Fix annotation regex for flake8 (26694).
  • Fix to operate on cuda kernel with clang and libc++ (25553).
  • Do not call cpuinfo_initialize() on other than x86 arch. (26265).
  • Fix Vec256::abs() for floating point when applied on -0.0 (26422).

Build / CI:

  • Refactor the pytorch_doc_push_script to take a branch (23556).
  • Let user be able to change MKLDNN "-m" flags back and forth in subsequent builds (23608).
  • Fix CPU-only binary testing by properly installing cpu-only first. (23611).
  • Omit local version identifier for default configuration. (23654).
  • add setup metadata to help PyPI flesh out content on pypi package page (22085).
  • Reduce input sets for tests to speed them up. (23692).
  • add appropriate install_requires (23722).
  • cpu binary builds are built with cu100 docker image now instead of cu80 (23772).
  • allow INSTALL_TEST to pass through from env to cmake (23793).
  • Remove unnecessary fetch and reset on builder checkout. (23792).
  • Add CUDA 10.1 to CI. (23791).
  • Remove nightly suffix from nightlies; upload to pytorch-nightly. (23752).
  • Delete Travis CI config (23788).
  • Rename cpu-only to cpuonly, as dash features are not supported. (23879).
  • Roll master to 1.3.0 (23895).
  • No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (23806).
  • Add python_requires to help pip (23863).
  • Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. (23568).
  • Fix build failure on OSX (23998).
  • Don't add local version to Conda packages. (24014).
  • print clang tidy output to stderr (24052).
  • When matching a line in CMakeCache.txt, ensure A=B and "A"=B are matched (23745).
  • Move dict_test.cpp to test folder and fix dict_test.cpp for Aten includes (24071).
  • Build option USE_NUMA should only show up on Linux. (23673).
  • Do not force USE_SYSTEM_EIGEN_INSTALL to be OFF in Python build scripts (23990).
  • Suppress implicit-fallthrough warning on g++ >= 7 in caffe2/utils/math_cpu.cc (24053).
  • Send flake8 to stderr (24100).
  • Move iOS.cmake to the cmake folder (24029).
  • Ignore bugprone-lambda-function-name in clang-tidy. (24190).
  • Ignoring the test logs in case the tests are ran from the parent directory (24212).
  • Remove escape_path in our build system. (24044).
  • Enable QNNPACK for iOS (24030).
  • Fix Z7_MSVC_OVERRIDE for C source files (24389).
  • Fix Caffe2 Windows build by switching to ninja. (24330).
  • Configure pytorch-probot (24423).
  • Fix CUDNN location related build issue on Antergos Linux (based on Arch) (24300).
  • Set CUDA arch correctly when building with torch.utils.cpp_extension (23408).
  • Move the search of cuDNN files to FindCUDNN.cmake. (24293).
  • Ensure proper file executable permissions in CI. (24214).
  • Respect pre-defined DOCKER_IMAGE value in binary_populate_env.sh (24787).
  • Remove support for old architectures in cpp_extension and CMake (24442).
  • Build libtorch binary with new ABI (23908).
  • Fix cmake backslash syntax error on Windows. (24420).
  • Move the detection of cuDNN to FindCUDNN.cmake (24784).
  • Attempt to fix windows build. (24916).
  • Move CPU-only jobs to xenial (24506).
  • Skip setting CUDA_NVCC_EXECUTABLE if CACHE_WRAPPER_DIR not set. (25006).
  • disable custom class logic for mobile build to avoid rtti (24994).
  • Turn off fbgemm for libtorch android build (25113).
  • Fix clang-tidy failing all the time on random lines (25078).
  • Fix clang-tidy failing on master (25121).
  • Fix lint checker breakage caused by #25111 (25122).
  • Update QNNPACK submodule to 901e9d4 (25044).
  • Add a skip_override option to should_run_job.py (25118).
  • Switch hub to use requests because of SSL (25083).
  • Ensure tests get passed on Windows (25145).
  • prevent generating caffe2::mkl for multiple times (25167).
  • Add myself as a CODEOWNER for better discoverability (25231).
  • Move the detection of cuDNN to FindCUDNN.cmake (24938).
  • Specify width for st.floats in hypothesis_utils.tensor (25188).
  • Add USE_CUDNN check to AT_CUDNN_ENABLED definition (25037).
  • Disable cuda_distributions_test and converter_nomigraph_test on Windows. (25305).
  • Re-enable libtorch tests on Windows (25377).
  • Upgrade to circleci version 2.1 configs (25336).
  • Fix binaries build for BUILD_CAFFE2_MOBILE=OFF (25229).
  • Skip useless macros from Windows.h (25444).
  • Add windows docs for the binaries (23150).
  • Turn off warnings on Windows CI. (24331).
  • Parameterize CircleCI config (25446).
  • Remove BUILD_ATEN_ONLY build option (24441).
  • Fix windows build error when TBB enabled and Windows SDK installed (25398).
  • Remove PYTHON_VERSION (25494).
  • Remove MULTI_GPU (25509).
  • Add set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "-Wl,--no-as-needed") to CMakeLists.txt (25445).
  • Clean up binaries/cmake for mobile (25651).
  • Move USE_STATIC_DISPATCH from CI script to master cmake (25696).
  • Do not pass down USE_GLOO_IBVERBS to CMake (25720).
  • Correctly gate CUDA_ARCH with defined() (25729).
  • Fix cudnn static linkage (25848).
  • Fix invalid function cast warnings that show up with GCC 8/9 (25483).
  • Upgrade NVIDIA driver on CI to 430.40 (24242).
  • Remove tools/setup_helpers/dist_check.py (25879).
  • Remove pthreadpool dependency in aten/CMake (25894).
  • Remove protobuf from Dependencies.cmake for libtorch mobile build (25958).
  • Fix typo in OpenBLAS cmake detection (25966).
  • Simply code generation - phase 1 (25961).
  • Remove pthreadpool.a from install directory (25977).
  • Remove trailing whitespace in CircleCI configuration files (25987).
  • Change brew update logic to run much faster (25988).
  • Refactor macOS build and test (25930).
  • Run PyTorch macOS CPU-only build/test on all PRs (26096).
  • Use CircleCI commands for brew update/install (26159).
  • Turn should_run_job into command (26160).
  • Turn setup_linux_system_environment into command (26162).
  • Turn setup_ci_environment into command (26163).
  • Nightly build for for iOS (26074).
  • Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (26306).
  • Fix remaining invalid function cast warnings that show up with GCC 8/9 (26104).
  • Rebase CircleCI to master if it is gcc5_4 (26321).
  • Emergency Docker upgrade to version 347. (26466).
  • Use github actions for flake8 (25824).
  • Add a CI Job to Check BC Changes in Function Schemas (26329).
  • prevent generating caffe2::mkldnn for multiple times (25257).
  • Add namedtensor build & tests to default sets (26633).
  • Fix github actions for forked PRs (26562).
  • Remove tools/setup_helpers/cudnn.py (25876).
  • Allow building docker without torchvision (26168).
  • Validate Docker version in CI. (26496).
  • Fix CI docker builds (26704).
  • Cuda101 upgrade (26823).
  • Fix building with PARALLEL_BACKEND=NATIVE_TBB (26742).
  • Fix typo in job name: nigthly->nightly (26881).
  • Get rid of -u (expansion of undefined variable) setting (26907).
  • Switch internal CUDA build to C++14 (26757).
  • No sccache (26059).
  • Fix c10 registration binary size (26827).
  • Improve binary size of function schema inference (26860).
  • Fix shared_ptr binary size in op registration (26869).
  • Fix binary size in schema inference (26878).
  • Switch nightly jobs to trigger on 'nightly' branch rather than cron. (26830).

Caffe2:

  • Add Cast Op (23548).
  • Remove the confused CPU op (23575).
  • Remove ONNX & Turn on NO_API for mobile build (23546).
  • Include protobuf-defined outputs in the graph cutting algorithm (23557).
  • Support Copy Op (23705).
  • Format only change (23685).
  • Add LambdaRank DCG Loss Option (23679).
  • Fix the bug in regularizer matching (23485).
  • Fix SliceGradientOp to handle properly empty batches (23784).
  • Set caffe2_tvm_min_ops to 8 (23893).
  • Support Gather different indices for different examples in one batch (23813).
  • Add aligned option to RoIAlign (23706).
  • Minor comment fix (22140).
  • SumOp for int32 (23995).
  • Fix typo "properlyh" (24067).
  • OpenCV 4 compatibility fix for caffe2/video (24143).
  • Implement virtual memory computation in caffe2_benchmark binary (24144).
  • Add support for using caffe2::ThreadPool in pytorch mobile QNNPACK. (23658).
  • Hypothesis tests: add ability to enforce shape inference (23935).
  • Make hashing default for bucket-weighted pooling (24266).
  • Fix rotated rect intersection error (24171).
  • Format changes (24270).
  • C2/glow: assign net_pos to a net before applying onnxifi_blacklist_ops (24262).
  • Return list of AccessedFeatures from get_accessed_features (23983).
  • Register FC/Conv DNNLowp separately for supporting both tensor type (24361).
  • Refactor and expose metadata of tum_history layer for online prediction (24290).
  • Put ParseBlackListOps() into caffe2::glow namespace (24384).
  • Implement gradient operator for GatherByKeys. (24348).
  • Add BPR loss to TTSN (24439).
  • Remove gradient value as input from SparseNormalize op (24357).
  • BlackBoxPredictor OSS part N + 1 : strip fb/predictor/Transforms.h dependency (#23350) (23350).
  • Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (24944).
  • Support focal loss in MTML.
  • Implementation of cyclical learning rate (23914).
  • register HeatmapMaxKeypoint with C10 (25191).
  • Add the sparse feature information during logging in sparse lookup layer (24863).
  • Add Int8Transpose operator (16382).
  • Relax roi_width/roi_height check to non-negative (260).
  • Disable Int8Transpose test.
  • Format sparse_lengths_sum_benchmark (25529).
  • Add options to flush cache in SLS benchmarks (25530).
  • Change shape for some ops to reduce variance (25619).
  • Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (25519).
  • Enable PiecewiseLinearTransform test on ROCm (25632).
  • Add requests as a legit dependency (25596).
  • Change shape for some ops to reduce variance (25686).
  • Move GetDimFromOrderString to caffe2/core/types.h (25671).
  • Make SparseNormalize backwards compatible (25660).
  • Cyclical learning rate multiplier: use fabs(base_lr) (25628).
  • Remove caffe2.pb.h dependency for embedding_lookup_idx.cc (25670).
  • Enable loading int8 prepacked models in PredictorContainer.
  • Get rid of protobuf dependencies (25650).
  • Fix device_option propagation (25203).
  • Increase input shape to reduce variance (25812).
  • Fix cuDnn build error with CC3.0 platform(#25820) (25825).
  • Remove cosh_ op test (25893).
  • Enable variable size embedding (25782).
  • Add assert to ensure the divisor is not 0 (25960).
  • Increase failure threshold for timing based assert (25867).
  • Better error messages in C2 ONNX backend (25809).
  • Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (25959).
  • Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (26080).
  • Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (25970).
  • Guard dyndep with a lock (26153).
  • Upgrade Caffe2 docker images to 306 to include roctracer and rocprofiler (26260).
  • Average Pooling 3D AVX2 Implementation (26111).
  • Back out "Back out "[Caffe2] Fix device_option propagation"" (25908).
  • Support unpickle py2 NetDef object in py3 (26147).
  • Tvm operator dynolog (26295).
  • Add support for real4bits quant (25426).
  • Add DimType info in dumped debug nets (26589).
  • BlobReference getattr can only throw AttributeError (26654).
  • "fixing" gcc bug introduced with cuda 10.1 (26445).
  • Whitelist ATen/core sources and headers for Caffe2 (26609).
  • Adding OpProfile proto into ProfDAGProtos to support storing operation cost (26677).
  • Use new fbgemm PackedDepthWiseConvMatrix without template parameter (26760).
  • Rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool.
  • Enable batch_size = 0 support in DNNLOWP Concat operator (26849).
  • Use new depthwise conv fbgemm interface (26898).
  • Fix the weird bug in control_flow_op_test.py (26931).
  • Disable cudnn transpose for int types (26934).
  • Expose PiecewiseLinearTransform to PyTorch (26903).
  • Remove LOG(INFO) from math_cpu.cc (27001).
  • Add fakefp16 transformation.

BC-Breaking

  • Improve handling of mixed-type tensor operations (22273).
  • Migrate comparison ops from the TH to Aten. Added support for type promotion. (26981).
  • Changed tensor comparison return type from uint8 to bool (21113).
  • Add align_corners option to grid_sample and affine_grid, change default to False (24931).
  • torch.pow Port operator from the TH to Aten (23492).
  • torch.flatten returns a 1-dim tensor on a 0-dim tensor (25406).
  • Change schedulers to chainable form (24352).
  • Make options.name_ private, and change all callsites to use options.name() (26419).
  • Remove deprecated torch.gels (26480).

C++ API Parity

  • Support custom autograd functions in C++ (23572).
  • Allow empty Variables to be saved for backwards (23618).
  • Tests for C++ custom autograd function API (23628).
  • Hooks for C++ API (24393).
  • C++ ModuleList (24317).
  • Build libtorch binary with new ABI (23908).
  • Templatize Tensor.data_ptr() (24847).
  • bind autograd functions into C++ (24342).
  • Deprecate tensor.data(), and codemod tensor.data() to tensor.data_ptr() (24886).
  • Add Python/C++ torch.nn API parity test harness (23852).
  • Add Python/C++ API parity tracker for torch.nn (25289).
  • Use constructor in test_params for C++ API parity test (25749).
  • Map module options between Python and C++ in API parity test (25784).
  • Make various improvements to C++ API parity test harness (25828).
  • C++ Fold nn module (24160).
  • Fix LBFGS on GPU (25909).
  • L1Loss module (25902).
  • C++ MaxPool Module (24860).
  • C++ Average Pool Module (25800).
  • C++ unregister_module function for Module (26088).
  • C++ API parity: at::Tensor::data (26008).
  • Re-organize C++ API torch::nn folder structure (26262).
  • C++ API parity: at::Tensor::grad (26150).
  • C++ API parity: at::Tensor::is_leaf (26186).
  • C++ API parity: at::Tensor::output_nr (26216).
  • Support multidimensional inputs to torch::tensor (26210).
  • Distance module (26424).
  • Fix options usage in C++ module / optimizer constructors (26483).
  • C++ API parity: at::Tensor::version (26217).
  • Minor improvement to C++ nn::Distance tests (26539).
  • C++ API parity: at::Tensor::version (26561).
  • C++ API parity: at::Tensor::detach (26251).
  • C++ API parity: at::Tensor::set_data (26647).
  • Add tests for C++ functional cosine_similarity and pairwise_distance, and clean up functional test code (26559).
  • Add C++ nn::Identity (26713).
  • Add comments for multidim tensor factory limitations, and rename ListInitTensor for better clarity (26756).
  • Improve C++ maxpool and avgpool (26521).
  • C++ API parity: TensorTest.Data fix (26920).
  • C++ API parity: AdaptiveMaxPool1d (26755).
  • C++ API parity: AdaptiveMaxPool2d (26772).
  • C++ API parity: AdaptiveMaxPool3d (26775).

Distributed

  • Extract common classes and functions from test_c10d to common_distributed (23660).
  • Sync and async torch.distributed.rpc for builtin operators (23228).
  • python udf over rpc (23569).
  • Fix naming convention inconsistency and formats in test_rpc.py (24407).
  • Use c10::ThreadPool to send and receive messages (23968).
  • Use snake names for all files in distributed.rpc (24502).
  • throw remote exception on client side (24138).
  • Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (25012).
  • Return a message instead of void from rpc udf (25283).
  • Basic framework for Distributed Autograd context. (24875).
  • Add missing call to DistAutogradContainer::init (25391).
  • Remove a unused member var (stop_) in process_group_agent (25392).
  • Assign each RpcAgent a unique ID, and use ID for sending RPC messages. (24195).
  • Cuda devices should have same dtype (25470).
  • Multiple fixes to test_c10d.py. (25441).
  • Move worker name collection code from Python to C++ (24260).
  • Attach 'send' autograd function to the autograd graph as part of RPC. (24876).
  • Error phrasing in torch.distributed helper functions (25574).
  • Make scatter/gather arguments optional (25575).
  • Run clang-format on torch/csrc/distributed (25647).
  • Build torch.distributed with Gloo backend on macOS (25260).
  • Adding RRef as return value for builtin operators (25169).
  • Only default USE_DISTRIBUTED=True on Linux (25725).
  • Adds a -m flag to pytorch.distributed.launch (24910).
  • Use whitelist instead of blacklist for USE_DISTRIBUTED (25759).
  • Change worker name constrant (25780).
  • Make Python RPC handler does not hold module in global variable (25458).
  • Retry connecting to TCP store on ECONNRESET (25707).
  • Make python rpc handler to be singleton class (25742).
  • Disable flaky test_invalid_names in test_rpc.py (25916).
  • Remove global group name tracking for ProcessGroupNCCL (25905).
  • Dynamic registration of RPC backends (25734).
  • Add ProcessGroupGloo::createDefaultDevice (26166).
  • Clarified ambiguous docstring in NegativeBinomial (25923).
  • Make ProcessGroupAgent take num_send_recv_threads as constructor argument (26313).
  • Remove extra get_worker_id call in distributed rpc init (26381).
  • Make distructor virtual for class with virtual function (26504).
  • Use timeout in connect function to prevent against (26364).
  • Corrected variable name and added test (26503).
  • Add timeout parameter to connect function in TCPStore (26554).
  • Added test case for reinit (26506).
  • Add function to get NCCL version for logging (26583).
  • Add bitwise distributed reduction ops (26824).
  • RPC Backend Registry (26919).
  • Acquire GIL before creating py::object in RPC python handler (26988).
  • Support re-creating/destroying process groups when some trainers recover after failures (26912).

Distributions

  • Fix log_prob() in torch.distributions.Uniform, HalfCauchy and Gamma (23017).
  • Implement bool_tensor.bernoulli_ (25076).
  • Fix CUDA distributions test on Windows (25539).
  • Fix the Bernoulli distribution sampler (26864).

Documentation

  • Fix typos in .circleci/README.md (23588).
  • Documentation for Tensor.record_stream() (24078).
  • Use prerendered KaTeX in docs. (23376).
  • Documentation cleanup (23148).
  • Fix a typo in Functions.cpp (23615).
  • Slightly improve dataloader docs on when auto-batching is disabled (23671).
  • Adjust maintainers list (23693).
  • Fix align_corners doc (23707).
  • Document empty_strided (23735).
  • Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
  • Document bool tensors for bitwise_not. (23800).
  • Fix typos in op_registration.h (23770).
  • fix torch.frac documentation (23830).
  • Delete placeholder so top-level CONTRIBUTING.md is used (23869).
  • Replace descriptions of args in doc with template (23439).
  • Fix docstring for argmax (23775).
  • Adds torch.random to docs/toc (23553).
  • Migration doc fixes (24033).
  • Updated SGD docs with subscripts (23985).
  • Add interfaces in lr_scheduler.pyi (23934).
  • Document benchmarking practice for CUDA (23910).
  • Documentation for Tensor.record_stream() (24078).
  • Use c10::ThreadPool to send and receive messages (23968).
  • Added .pyi file for flatten (24459).
  • Test if descriptions of args are in the template (24161).
  • Add docs to CI (24435).
  • Add ASAN instructions to CONTRIBUTING.md (24848).
  • Fixed Error in Transformer Example (24837).
  • Fix the lint error in transformer doc. (25027).
  • Typo correction in cuda_deterministic_backward.rst (25011).
  • Fix typo (25238).
  • Added documentation for nn.functional.bilinear (24951).
  • Describe the relation between fold and unfold operations. (24840).
  • logical_xor doc cleanup (25364).
  • Fixed flatten docs (I think) (25544).
  • Add copy logic for LibTorch to avoid issues on Windows (25556).
  • Update index.rst (24245).
  • Documentation for cdist (25221).
  • Update Transformer.py comments to include a full example (25411).
  • Alphabetize Package Reference section in Docs (25666).
  • Add CosineAnnealingWarmRestarts to optim documentation (25421).
  • add torch.nn.Identity to init.pyi.in (25777).
  • Documentation change of torch.where (25554).
  • Fix typo: toDense --> to_dense (25832).
  • Argument 't', mis-referenced to 'torch.t()' (25885).
  • Fix typo in dataloader.py docs. (26263).
  • Clarify and correct the doc of atan2. (26180).
  • Add warning to anomaly_mode doc (26615).
  • Add instructions for building documentation (26553).
  • Highlighting in the doc that square root comes before adding epsilon (26735).
  • Add documentation for overload names (23844).

Improvements

  • Port resize_as_ and clone from TH to Aten (23027).
  • support Gather different indices for different examples in one batch (23285)
  • Remove old Type based backend extensions (22009).
  • Update MKL to 2019.4 for Windows (23583).
  • Remove AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF, which isn't used anymore. (22932).
  • Rename AT_FORALL_SCALAR_TYPES_WITH_COMPLEX to AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_STUBS (23336).
  • Allowing batching for det/logdet/slogdet operations (22909).
  • Use dst dir for temp file (23629).
  • Add overload names to native_functions.yaml (23532).
  • Adam/AdamW implementation minor fix (22628).
  • Remove useless code from shape info (23663).
  • Move addcmul to Aten (22874).
  • Migrate neg's CUDA implementation to ATen. (23617).
  • Bump Gloo (23400).
  • Zero sized tensor support for repeat_interleave (23717).
  • Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (23701).
  • Channels last stored in tensor (23391).
  • Recommend ~ and bitwise_not() when user tries to apply neg (-) on a bool tensor. (23621).
  • Negate halves on GPU using __hneg() when possible, instead of using float conversion. (23626).
  • Rename previously THNN conv kernels to have naive_ prefix. (23790).
  • Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (23833).
  • Remove K and N function arguments for fbgemm_pack_quantized_matrix (22956).
  • cleanup torch/nn/functional.py (23977).
  • Move addcmul to Aten(CUDA) (23814).
  • Enable Add, sub, mul, and div on CPU for bfloat16 type. (22851).
  • Removing deprecated warning message from torch.h (24002).
  • port atan2 from TH to ATen (23558).
  • Port addcdiv operator from the TH code to Aten (23683).
  • Add instruction on how to nest nn::Sequential (23939).
  • Refactor randperm test (23526).
  • Delete unnecessary file split_types.py (23754).
  • Make all at::Tensor in-place methods const (23945).
  • Fix scale and zero_point names (23991).
  • Allow forward functions with single output to return Variable (23803).
  • Fix regression in triangular_solve when number of batches = 1 for CUDA (23953).
  • make more iterator attributes private (23744).
  • Fixed Bool in IsIntegralType bug (plus review comments) (23942).
  • Support torch::tensor and at::tensor with bool and BFloat16 dtypes. (23337).
  • Don't redefine unecessary type stub. (23338).
  • Port addcdiv operator from the TH code to Aten (24086).
  • Added type annotations to unpooling layers (24101).
  • Unboxed kernels in c10 (23447).
  • Allow kernels that don't have a boxed version (23665).
  • c10 dispatcher stores autograd kernels (23666).
  • Move TensorOptions to ATen/core (22020).
  • Make all at::Tensor in-place methods const (23945).
  • Optimizing out the division in the fusion (23275).
  • add function name to error messages generated by checked_tensor_unwrap (24187).
  • Fix C412 lint from flake8-comprehensions update. (24184).
  • Align AT_FORALL macros with AT_DISPATCH macros. (23339).
  • Remove unused parameter from FORALL macros and rename STUBS to QINTS. (23340).
  • Thread local debug info (22365).
  • Cleanup warnings (24133).
  • Enable FBGEMM tests under UBSAN as well (23570).
  • toString(FunctionSchema) shows overload name (23694).
  • Disambiguate tensor and string ops (23748).
  • Simplify tests that should cover all possible devices (23824).
  • reduce memory usage for centered rmsprop (24170).
  • Enabled comparison ops for bfloat16 dtype on CPU (24182).
  • Rename torchtest.test_all_device_types to torchtest.for_all_device_types (24337).
  • Fix expansion of stride argument in max_pool2d (23954).
  • Fix expansion of stride argument in max_pool3d (23960).
  • Fix expansion of stride argument in avg_pool2d (23961).
  • Fix expansion of stride argument in avg_pool3d (23963).
  • Resolve unused variables in tests (24075).
  • Sanity fixes for bitwise_not (24296).
  • Fix issue with single memory location being written multiple times (23574).
  • Add logical_not operator (23839).
  • Add logical_xor operator (23847).
  • Recommend logical_not() instead of bitwise_not() when applying sub and neg on bool tensors. (23860).
  • Enabled masked methods for bfloat16 (24183).
  • Make aten_to_numpy_dtype in tensor_numpy.h public. (23943).
  • Let logical_not support non-bool tensors. (23916).
  • Let logical_xor support non-bool tensors. (23978).
  • Exposing the API for use with pytorch/tvm repo. (24430).
  • Assert weight_observer has the correct dtype (24436).
  • Enabled torch.mm and torch.mv for bfloat16 (24224).
  • Don't require slow test reporting in run_tests.py --pytest (24448).
  • Modify symmetric eigendecomposition derivative (23018).
  • Remove unused files from THNN and THCUNN (24820).
  • Allow SyncBatchNorm without DDP in inference mode (24815).
  • Enable torch.eye for bool and half (24148).
  • Allow torch.tril / triu to handle bool and half inputs (24163).
  • TensorIterator::binary_op input-output overlap check (24058).
  • Make SobolEngine use random seed if not specified (24884).
  • Add static dispatch mode to reduce mobile code size (22335).
  • Use a ptr to store autograd profiler rng (24889).
  • Fix deprecation warnings (24841).
  • Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (22907).
  • Remove unused ATen headers for mobile (24850).
  • Improve c10 dispatcher lookup perf (24882).
  • Add epsilon argument to Adagrad optimizer (24980).
  • Migrate erfinv and erfinv_ from the TH to Aten(CPU) (24908).
  • Fix for cdist backward for non-batch tensors (22915).
  • Remove deprecated TH(topk) code. #24778 (24857).
  • Disable tsan for test_dataloader.py. (25005).
  • Fixed test_numba_integration (25017).
  • pin_memory thread now uses 1 thread only (25111).
  • print padding_mode for Conv modules if not zeros (23996).
  • Use the EmbeddingLookup API which takes the offsets instead of lengths (24945).
  • torch.from_numpy fix for np.int (25139).
  • Creates Torch-friendly Event class and adds Stream tracking to autograd (25130).
  • generic overrideable convolution for backends (23562).
  • Optimize LeftRight and either (25133).
  • data -> data_ptr: upgrade the deprecated APIs (25223).
  • Moving sign function to ATen (22861).
  • Add TORCH_WARN_ONCE, and use it in Tensor.data() (25207).
  • Upgrade the deprecated data to data_ptr APIs (25295).
  • upgrade MKL-DNN to v0.20.3 (22910).
  • Remove some unused plugins. (25201).
  • Disable the copy constructor and = operator of DispatchStub (24932).
  • Fix infer np scalar dtype mem leak (24267).
  • Align AT_FORALL macros with DISPATCH macros wrt Half. (25268).
  • Implementation of cpu_serial_kernel for TensorIterator (25125).
  • Migrate digammadigamma_polygammapolygamma_ from the TH to Aten (CPU) (25048).
  • note location (25311).
  • Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (24943).
  • Support all_reduce a list of same-device tensors #21640 (24949).
  • Fix typo "takes takes" -> "takes" (24785).
  • Remove unused THTensor_(add) and similar functions code. (24864).
  • Move new_criterion_tests from test_nn.py to common_nn.py (25333).
  • Use C10_DEPRECATED_MESSAGE instead of TORCH_WARN_ONCE for Tensor.data() (25319).
  • Extend nn.Transformer to support BERT (gelu) (24181).
  • Fix possible deadlock in SharedCache inside a forked child proc (25158).
  • Fix double backward of inplace op on view (23502).
  • change LBFGS's default tolerance_grad to 1e-7 (25240).
  • Add OneCycleLR (25324).
  • Invariant typevar matching on callsite checks (25136).
  • Fix lint (25371).
  • Fix bug in assertNotEqual for int tensors (25199).
  • Kill THNN function auto generation. (25322).
  • Move the CUDA implementation of ceil to ATen. (24866).
  • Replace open registration TensorTypeId with closed enum. (25252).
  • Remove THNN sparse autograd Functions. (25323).
  • Kill ConvTransposeMixin.forward, which doesn't seem to be used. (25326).
  • Kill backend-specific lookup in CrossMapLRN2d, as it never succeeds. (25331).
  • Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (25339).
  • Adding ModuleList to modules.h (25346).
  • Fixed masking warnings in tests (25317).
  • Add support for non-affine batch norm with float stats and half inputs (22750).
  • Fix allreduce_coalesced tests in c10d (25419).
  • Remove Module._backend as it's not used anymore. (25342).
  • Delete toType(const DeprecatedTypeProperties&, ...) (25332).
  • Update QNNPACK submodule to 7d2a4e9 (25400).
  • Compare shapes of outputs and grad_outputs in autograd.grad (25349).
  • Stop initializing THNN backend. (25352).
  • Stop doing nn wrap. (25353).
  • Fixes #25454 (25456).
  • Migrate clamp and clamp_ from the TH to Aten (CPU) (25290).
  • Get rid of torch._thnn (25354).
  • Get rid of more unused plugins (25355).
  • Get rid of extract_cwarp (25356).
  • Update derivatives.yaml docs to refer to Declarations.yaml rather than Declarations.cwrap. (25357).
  • Kill non-shared cwrap tools. (25358).
  • Delete a few cases where we directly use Backend/TensorTypeId. (25467).
  • Fix implicit fallthrough warnings in FeatureLPPooling.cu (25451).
  • Update speed benchmark binary to work in USE_STATIC_DISPATCH mode (25449).
  • Migrate CPU_tensor_apply to TensorIterator in TensorCompare.cpp (25402).
  • Creates Torch-friendly Event class and adds Stream tracking to autograd (25130).
  • Run clang-format on torch/lib/c10d (25382).
  • Checks requiring GPU moved to their own test (25555).
  • Test_allreduce_coalesced_stress message passed in as kwarg (25557).
  • Delete torch/csrc/nn/type_checks, which aren't used anymore (25506).
  • Create helpers for implementing unary ops whose CUDA implementation is ATen (24879).
  • Implement indexing methods for sparse tensors (24937).
  • Migrate multinomial from the TH to ATen (CPU) (25274).
  • Enable broadcasting of batch dimensions RHS and LHS tensors for lu_solve (24333).
  • Get rid of _th_reciprocal_. (25507).
  • Enable torch.cholesky for batches > 262140 (24438).
  • Don't save self in index backward (25594).
  • Eliminate magic numbers in BatchLinearAlgebra.cu (25524).
  • Allow TensorMethods.h to include Dispatcher.h (alternative) (23888).
  • Fix clang-tidy script (25652).
  • Kill unused enumerate_options_due_to_default. (25588).
  • Kill discover_sparse_tensor_operations. (25589).
  • Cpu-strided-complex support for binary-ops (25534).
  • Port new_empty to ATen (25475).
  • Port new_full to ATen (25583).
  • Rename 'mobile' to 'static_dispatch' (25695).
  • Bring back skipped bitwise dispatch (25689).
  • Align AliasInfo's operator<< with FunctionSchema (23206).
  • Migrate digamma and polygamma from the TH to Aten (CUDA) (25662).
  • Remove tools/setup_helpers/cudnn.py (25482).
  • Enable BLIS from the FLAME project as a BLAS choice. (23819).
  • Expose parse_schema and eq function to python and add round trip tests (23208).
  • Fix error message stack overflow (25146).
  • Fix typing on nn.Parameter (25586).
  • More accurately describe field invariants in OperatorEntry (25793).
  • Enable log_softmax and CrossEntropyLoss for bfloat16 (24457).
  • Fix missing str to int conversion in the commit f71ddd42 (25861).
  • Fix test_det_logdet_slogdet_batched on PowerPC (25773).
  • Unify treatment of warp size / wave size (25884).
  • Make torch checks same for both CPU and CUDA multinomial (25595).
  • In the CUDA implementation of erfinv, erfinv() should be used for double (25337).
  • Fix cpp_extensions test failures with GCC 9.1 from ArrayRef(initializer_list) (25384).
  • Rename packed tensor accessor (25654).
  • Gate static aten registerer with USE_STATIC_DISPATCH (25815).
  • Tensor type set (25308).
  • Enable libflame as a LAPACK choice (25795).
  • Fix scatter CPU kernel when (input size, src size) > index size (25839).
  • Migrate pow from TH to Aten (CUDA) (25517).
  • Fix int32 overflow in SummaryOps.cu getBin #25747 (25748).
  • Simplify header inclusion in test/cpp/api/modules.cpp (25921).
  • Compute common dtype based on inputs only (25593).
  • Updates autograd engine to respect streams set in forward (8354).
  • Make running Gloo tests conditional on availability (25913).
  • Remove superfluous check for POLLIN in TCPStore (25911).
  • The float version of calc_digamma should return float type. (25488).
  • Add VariableTensorId, store it in TensorTypeSet (25597).
  • Add torch.backends.mkldnn.enabled flag (25459).
  • Skip TestAutograd.test_deep_reentrant on macOS (25942).
  • Skip TestHub on macOS (26033).
  • Refactor torch.*solve tests (25733).
  • Enables _do_cuda_non_default_stream (25989).
  • Skip test_triangular_solve_batched (26108).
  • Stop re-ordering TH(C)Blas arguments. (25606).
  • Kill TH(C)Blas kwarg_only declarations. (25607).
  • Stop reordering TH random function arguments. (25608).
  • Fix base_lr overridden in cyclic lr (26105).
  • Kill kwarg_only declarations in Declarations.cwrap. (25609).
  • Add device check before accessing data_ptr in PackLayer (26056).
  • Add data field to Tensor pyi. (26093).
  • Kill most defaults in Declarations.cwrap. (25610).
  • Get rid of more defaults in Declarations.cwrap. (25611).
  • Kill remaining defaults in Declarations.cwrap. (25612).
  • Remove requests as dependency (26083).
  • Make schema part of RegisterOperators::Options (26114).
  • Allow overwriting catch-all kernels (25947).
  • Creates generic device type testing framework (25967).
  • Add sync to flaky test_events_multi_gpu_query (26231).
  • Add possible out of shared memory error message (25730).
  • Ports most of test_torch.py to generic device type framework (26232).
  • Add type hint for cuda.set_rng_state (26200).
  • Call aten ops through c10 dispatcher (23668).
  • Remove unboxedAutogradKernel from c10 (26130).
  • Refines test_torch.py generic device testing (26244).
  • Fix binary size of OpsAlreadyMovedToC10.cpp (26237).
  • Migrate away from using Variable( in test_nn.py (26077).
  • Enabled conv methods for the bfloat16 (26167).
  • Move the CUDA implementation of round to ATen. (25041).
  • Kill defaults in nn.yaml. (26282).
  • Add s390x compiler define for s390 builds. (26233).
  • Add derivative of cholesky_solve (26185).
  • Kill 'default_init', which isn't needed anymore. (26281).
  • Adds generic device tests to test_autograd.py (26248).
  • Ensure that n is non-negative in polygamma. (26294).
  • Enable batching for pinverse (26095).
  • Make TORCH_WARN_ONCE capture variables by reference (26289).
  • Fix race in CUDA initialization (25788).
  • Kill declared_type and ignore_check from THFormal. (26284).
  • Replace simple if_true / if_false cases in Declarations.cwrap. (26285).
  • Fix typo (26298).
  • Enabled bfloat16 dtype on CUDA (26148).
  • Move more ops to c10 (26255).
  • Remove dead function (26259).
  • fix ctc_loss argument check error message (26325).
  • Skip testing triangular_solve_batched on non-default CUDA stream (26115).
  • Kill if_true / if_false in Declarations.cwrap. (26346).
  • enable xla cpp tests in CI (26347).
  • Resolve #25605 cyclic reference in _LRScheduler (25776).
  • use allgatherv for sparse all reduce (23917).
  • Removes torchtest, expands generic device testing (26374).
  • Add a float version of calc_erfinv (by templating) on CPU (26070).
  • Fix type mismatches in the CUDA version of calc_digamma and calc_trigamma (25791).
  • Adds dtypes decorators to and allows helper methods in device generic test classes (26375).
  • Fix composite learning rate (26227).
  • Move the CUDA implementation of rsqrt to ATen. (25285).
  • Add a flat hashmap (26371).
  • Preserves insertion and deletion order in flat hashmap (25675).
  • Moves more tests to TestTorchDeviceType (26435).
  • Tag files should not be deleted by "python setup.py clean". (26416).
  • Implement multiple dispatch (25653).
  • Enabled where for bool tensor on CUDA (26430).
  • Implement multiple dispatch (26468).
  • Port lgamma from TH to Aten (25138).
  • Make c10::Scalar::to() const (26406).
  • Allocate empty tensor instead of empty_like in binary ops, fix pow (26498).
  • Implement multiple dispatch (#26468) (26501).
  • Move the CUDA implementation of floor to ATen. (25372).
  • Fix for Conv shape check prints overflowed ints (25827).
  • c10::KernelFunction (26337).
  • Add two levels to use_c10_dispatcher (26272).
  • Correct the test of a big number (2 ^ 31) (26491).
  • Enable creation of boxing wrappers for some aten operators (26273).
  • ATen port of lgamma (cuda) (26600).
  • Enabled bfloat16 dtype on CUDA (26407).
  • Makes test_indexing.py device generic (26634).
  • Allow batch size of 0 in Conv (26214).
  • A few hub improvements (25980).
  • Updates and extends TestNNDeviceType (26638).
  • Enable registering stackbased kernels with lambdas (26658).
  • Move the CUDA implementation of trunc to ATen. (25423).
  • Add derivative for cholesky_inverse (26451).
  • Vectorize unary operator erfinv (26629).
  • Expands TestAutogradDeviceType (26708).
  • Enable hub tests on MacOS (26697).
  • Simplify operator sign using the helper. (25592).
  • Address review comments in https://github.com/pytorch/pytorch/pull/26272 (26587).
  • Add whitelist for backward compatible checks for function schemas (26740).
  • Expose a torch.result_type and simplify tensor iterator (26012).
  • Delete backwards compatibility Backend overload for registerOp (25914).
  • Implement multiple dispatch in boxed c10 dispatcher (26118).
  • Remove unnecessary include from TensorBody (26360).
  • Add some missing constructors to IValue. (26718).
  • Hub improvements (26723).
  • Upgrade sleef to v3.4.0. (26749).
  • Lets generic tests use multiple devices (26594).
  • Refactor checked_tensor_unwrap to take DeviceType instead of Backend (26290).
  • Port CUDA implementation of expm1 to ATen (26598).
  • Remove one unnecessary copy of the output during the type promotion. (26816).
  • Fix Future default constructor missing for ParallelNative (26739).
  • Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (26592).
  • torch.load default encoding change to 'utf-8' (26421).
  • Move the CUDA implementation of log to ATen. (26494).
  • enable double backward for non-cudnn LSTM and GRU (26660).
  • Migrate multinomial from the TH to Aten (CUDA) (26481).
  • Remove three unused declaration. (26699).
  • Make resize_as_ generic, so XLA works. (26809).
  • Add some missing constructors to IValue. (26806).
  • Change calling convention of ATenDispatch from getOp to callUnboxed. (26857).
  • Refactor dispatch structure so fallback code lives inline. (26367).
  • Fix nuclear norm with requires_grad=True (26303).
  • Choose num_threads in parallel_for based on GRAIN_SIZE (26886).
  • Use intrinsics for trigonometric functions on CPU (26431).
  • Remove an unused function propagate_names_if_namedtensor_enabled (26176).
  • Migrate lt and lt_ from the TH to Aten (25998).
  • Make TypeDefault, TypeDerived and VariableType anonymous namespaces (26882).
  • Move Generator ops to c10 (26434).
  • Add torch.can_cast(from, to) function (26805).
  • Include iteration_ in SGD optimizer serialization (26906).
  • Make repeat respect the current stream (26946).
  • Fix issues in torch::tensor constructor (26890).
  • Named tensor support for: index_fill_, index_fill, squeeze, median(Tensor) (26914).
  • Add std::variant backport as torch::variant (26836).
  • fix type annotation (26930).
  • Bring back the optimization of integer.pow({2.0, 3.0}) on CPU (26938).
  • Add torch.promote_types function (26655).
  • Rewrite argmax and argmin as TensorIterator reductions (26181).

Jit:

  • Cleanup interface of inlineCallTo. (23539).
  • Make ProfiledTensorType hashable (23116).
  • add log stmts to peephole.cpp (23279).
  • add docs for serialization (23456).
  • Move overview to docs/ folder (23457).
  • Include recursive class compilations in error call stack (23454).
  • add a test for inline tracing (23543).
  • format jit_type.h (23564).
  • Add logging to Alias Analysis (23383).
  • Update relative links in OVERVIEW.md (23627).
  • prefix module qualified names with module (23630).
  • allow forward hooks in tracing (23613).
  • Add in-place check to AliasDb (23210).
  • Support nn.GRU in script (23266).
  • Remove more uses of DimensionedTensorType (23060).
  • Compress debug symbols when serializing TorchScript models. (23659).
  • Fix frontend error message (23576).
  • Compress all non-Tensor components of a serialized TorchScript model. (23723).
  • Initial torchbind prototype (21098).
  • Perform string uniquing by value in pickle serialization. (23741).
  • don't try to set training after ScriptModule has been initialized. (23680).
  • Open up AliasAnalysisKind for any ops (23810).
  • make nn.LSTM accept PackedSequence instead of Tuples (23643).
  • fix some compiler warnings (23816).
  • Properly mangle nn.Module.__construct (23779).
  • Define toIValue conversion for dtype (23708).
  • format init.cpp (23840).
  • Recursive script migration guide (23892).
  • Erase shape information from class types (23362).
  • Make typing understand exceptions (23565).
  • Disable optimizer for __setstate__ (23698).
  • Make assertions refine types (23949).
  • add NotIn support in script (23637).
  • metacompile isinstance checks (23885).
  • add support for overloading functions (23886).
  • jit.script() testing and fixes (23891).
  • support tensor as key type in script (23638).
  • make _overloads importable in nn/functional (24049).
  • [jit] make sure NameTuples have unique qualified names (23798).
  • serialize all c++ frontend modules to a single CU. (23645).
  • fix py-compat fbcode lint warnings (23530).
  • Add Pickler C++ API (23241).
  • Moves clamp from autodiff cpp to symbolic script (23927).
  • support dict augment assign in script (23639).
  • Open up AliasAnalysisKind for any ops (23834).
  • Fix builtin function reference (24056).
  • add initial support for sparse tensors (23841).
  • support grad and data attribute for tensor in script (23842).
  • JIT Serialization of nnq.Linear (24048).
  • Replace Module::copy_into with Module::clone. (24068).
  • serialize modules as classes (23098).
  • Delete WeakScriptModuleProxy (23398).
  • Add Pickler C++ API (23241).
  • Fix trace docs (24191).
  • search class type for methods (23689).
  • simplify NamedType interface (23691).
  • make NamedType an interface (23696).
  • make FunctionType a NamedType (23697).
  • class_table_ to deps_table_ (23845).
  • clean up import_source (23846).
  • Use JIT function schema parser to parse builtin RPC ops (24207).
  • Remove DimensionedTensorType (24077).
  • Fix flake8 issues in ./torch/jit (24240).
  • Fix missing version < 2 guard in import (24255).
  • Add logging to autodiff (23664).
  • fix test_jit.py so it can be run in parallel (24311).
  • fix list comprehension type assumed to the same as input type (24271).
  • simplify NamedType interface (24278).
  • make NamedType an interface (24279).
  • make FunctionType a NamedType (24280).
  • class_table_ to deps_table_ (24281).
  • clean up import_source (24282).
  • Add the ability to compile exports on traced modules (24298).
  • Cleanup documentation around script and trace (24208).
  • Add trace_module to docs (24258).
  • JIT trace testing (23987).
  • copy methods when creating a derived class type (24349).
  • kill TK_NAMED_TUPLE_DEF (24350).
  • Misc doc updates / fixes (24371).
  • remove CompleteTensorType (24169).
  • Remove type subclassing (24257).
  • fix IR parsing bug (24294).
  • pickler read guard (24433).
  • Fix test_jit_cuda_archflags failure on py27 due to changing dict order. (24501).
  • fix double copying of constants (24412).
  • jit_log: Extract a function that prefixes all lines of a string with another string. (24355).
  • Module: add dump function that recursively prints contents of the module. (24356).
  • Clear recursive error stack on each compilation (23458).
  • Add @ignore for script classes (23614).
  • Record function name as an attribute of CallFunction nodes. (24446).
  • big cpp test reorg (24801).
  • Cache node operators to speed up optimization (24827).
  • Fix VaryingShape::merge (24455).
  • Make torch.jit.Attribute work with PYTORCH_ENABLED=0 (23851).
  • Moves (most) ops to symbolic script (23794).
  • Fix unicode in comments (24218).
  • serializing function calls (23799).
  • Removes SymbolicVariable from tests (24007).
  • Merge ProfiledTensorType and TensorType (24284).
  • Remove unused DynamicDAG class. (24890).
  • extend torch.jit._overload to module methods (24259).
  • Remove torch.contrib._graph_vis (24874).
  • Fix missing super call error (24852).
  • bind autograd.grad function into TorchScript (24871).
  • restore default constructor of OutputArchive (24955).
  • Misc doc updates #2 (24445).
  • Fixing size implementation for struct slot_list_impl (24351).
  • add support for multiple assignment statements (24477).
  • Load tensors directly from pickle archive (23281).
  • Fix fbcode weak ordering (25026).
  • Fix bugs in assignment to optionals (24989).
  • Fix a bug in creating a prefix string in jit log. (25051).
  • cleanup tmp name generation (25065).
  • jni-java wrapper for pytorchScript api (25084).
  • Fix python lints for generate_test_torchscripts.py (25107).
  • Clean up after running doc tests (25036).
  • fix annotated assignment (25094).
  • dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
  • add some sparse tensor ops support in TorchScript (24967).
  • move some methods into function.cpp (25119).
  • SubgraphMatcher: Factor out matchAttributes. (25073).
  • SubgraphMatcher: add logging. (25074).
  • SubgraphMatcher: matching modules support. (25075).
  • Add logging to JIT CSE pass. (25141).
  • bind autograd.backward and tensor.backward in TorchScript (23913).
  • fix to loggin in AA (25143).
  • Fix bugs in assignment to optionals (25059).
  • skip fstrings test if not py36 (25184).
  • Simplify NamedType (25058).
  • Add interface declarations to JIT (21972).
  • Remove insert_observers pass (24999).
  • Remove InsertQuantDeQuantNode (25000).
  • Implement a bunch of pickle serialization features that optimize for size. (23759).
  • fix closures which always throw. (25278).
  • Add interface declarations to JIT (25258).
  • Remove PythonPrint's is_method_ member (25226).
  • add serialization of interface (25227).
  • improve interface error messages (25228).
  • don't throw in constant prop (25270).
  • Add source location to class instantiation error (24990).
  • fix inliner bug (25052).
  • Pull instruction definitions out of interpreter.cpp. (25148).
  • Add GET_ATTR instruction (25151).
  • Fix old annotate() error (25261).
  • insert_observers use qconfig_dict (25069).
  • Implement FoldConvBatchnorm2d pass. (25282).
  • Remove spurious print (25378).
  • Fix AliasAnalysisKind::PURE on MSVC (25375).
  • Fix item() call in docs (25404).
  • Attempt to enable CrossMapLRN2d, as it no longer uses Module._backend. (25343).
  • Some alias analysis fixes (25425).
  • Emit script function calls during tracing. (25089).
  • torch/jit/passes/quantization.{h,cpp} and torch/jit/init.cpp (25403).
  • add tuple keyword (25474).
  • Manually implement is_zipfile (25279).
  • Removes SymbolicVariable (25077).
  • Added invert bitwise operation to JIT (22324).
  • Remove friend dependency on ClassType in InterfaceType (25617).
  • Remove forward compat code for serialization format (25440).
  • Make NoneType <: Optional[T] (25361).
  • Remove accidentally re-added file (25677).
  • move legacy deserialization code into jit/import_legacy.cpp (25649).
  • preserve ignored function return value type (25262).
  • Finish testing code examples in the docs (25668).
  • add getitem to class types (25664).
  • Make tensor key in Dict works in serialization (25442).
  • Expose an API to iterate all the registered operators (23207).
  • Fix missing newline in compiled from source range highlihgt (25802).
  • SubgraphMatcher: add logging to a check missed previously. (25735).
  • Fix c10 tracing (25869).
  • add torch.jit.is_scripting() api (25263).
  • Make arguments of Module::dump easier to remember. (25740).
  • Only create a new clone of observer when we actually insert it. (25931).
  • add set_grad_enabled to TorchScript and fix data attribute (25350).
  • add torch.jit.is_scripting api (25955).
  • add support for ModuleDict (25715).
  • fix use-after-free bug (25965).
  • Fix torch.arange traced as constant (25363).
  • Preserve module names in recursive script (24505).
  • Add in membership checks for lists (25796).
  • TorchScript Serialization for dynamic LSTM module (25877).
  • print source code when a function is executed (25868).
  • make sure all out stringstreams start out empty in jit_log.hpp (25863).
  • tracing with an opt-in by file name (25895).
  • Port fuse_linear from pytorch/tvm (25623).
  • Add documentation to logging (26175).
  • Register ATen ops with c10 (26131).
  • Add isBackwardCompatibleWith for Argument and FunctionSchema (23409).
  • Add a wrapper for inspect in JIT to produce better error message (25415).
  • Enable CPU fused kernel on Windows (25578).
  • Fixed size arrays (23695).
  • fix schema matching of tuples to vartype lists (25944).
  • min(li) max(li) (26351).
  • Remove torch.save-related logic from pickler (25502).
  • Add support for lists for prim::min and prim::max (26155).
  • Add ivalue::type(), part 1 (25439).
  • Use static type information to restore type tags (25447).
  • Add filter function to subgraph rewriter runGraph (26223).
  • Implement more size-oriented opcodes in the depickler. (25786).
  • Refactor emitIsInstance (26061).
  • add setitem to class types (25750).
  • Make jit dicts ordered (26465).
  • fix schema matching of tuples to vartype lists (25944).
  • Fixes test_wrapped_number (26523).
  • Implement more size-oriented opcodes in the depickler. (26454).
  • Make is_optional check more robust (26312).
  • Resolve NamedTuple types in Python (26443).
  • Fix jit/pass/peephole.cpp fuse addmm (26357).
  • Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp (26432).
  • Whenever possible, use function pointers rather than std::function to represent Operation's. (26560).
  • Serialization for per channel qtensor (26339).
  • add CondValue to unify refinements and code emission (26145).
  • Add ObserveHelper and remove some common function parameters (26641).
  • Remove 'recurse' parameter from Inline. (26487).
  • Use std::mutex instead of std::call_once in Function when we initialize GraphExecutor. (26571).
  • Add 'optimized_graph' to Function. (26488).
  • Use optimized graph in Inline (essentially, making Inline recursive now). (26489).
  • resolve ignored module method type annotations (26683).
  • Add traces to specialize_autograd and lower_grad_of (2nd try) (22752).
  • Register values listed in constants as attributes of the Module. (26581).
  • Make is_optional check more robust (26312).
  • Fix builtin lookup for Python functions (26688).
  • Improve error message in IR parser when accessing undefined variable. (26771).
  • autodiff changes to enable profiling (25397).
  • Typevar matching fix + implicit conversions from Scalar to int/float (26453).
  • support iterables, rangevalue in list comprehensions (26768).
  • Bytecode export flow (25187).
  • Use optimized_graph in graph_executor. (26705).
  • Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place. (26703).
  • Fix broken failure messages for OverloadedMethodValue (26846).
  • Improvements to GuardElimination and InsertBailouts (25430).
  • Fix circular deps in loading (26758).
  • add AutoNonVariableTypeMode guard on JIT->ATen boundary.
  • Add logging in constant propagation pass (26653).
  • fix range for non-int inputs and pow implementation (26926).
  • Move some class/functions in test_jit.py to jit_utils.py (26839).
  • Remove unimplmented passes (26978).
  • Fix race condition in torch::jit::Function (27009).

Mobile:

  • Add to Tensor symmetric methods getDataAsIntArray, getDataAsByteArray (25183).
  • Initial commit for android torchvision utils (25185).
  • Add libtorch android build with shared lib for 4 android abis (25192).
  • pytorch android circleci integration (25286).
  • turn off BUILD_BINARY for android CI jobs (25485).
  • Gradle tasks for publishing to bintray, jcenter, mavencentral etc. (25351).
  • remove protobuf usage from mobile build (25493).
  • Fix iOS simulator build (25633).
  • Fix OSS mobile CI (25755).
  • Add PR jobs for iOS builds (25840).
  • Clean up the iOS build script (25822).
  • Cocoapods for iOS OSS release (25847).
  • Introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile.
  • Add NO_EXPORT macro to unset visibility attribute (25816).
  • Update build_android.sh to not build host protoc for libtorch (25896).
  • Simplify build_android_gradle.sh (25897).
  • Change gradle build to use static libtorch + gc-sections (25984).
  • Use torch::from_blob instead of shareExternalPointer, nits (25973).
  • Change the source link in podspec (26089).
  • Tensor renaming to dtype, shape; support long, double (26183).
  • Fix circle CI (26225).
  • Remove armv7s build from iOS (26222).
  • CircleCI android nightly (snapshot) build publishing (26069).
  • Fix error messages; tensor creation method names with type (26219).
  • Integrate forked QNNPACK into mobile PyTorch builds. (25844).
  • Add iOS test app skeleton (26261).
  • Fix no tab check (26399).
  • Clean up the PR job script for iOS build (26353).
  • Exclude libfbjni.so from pytorch_android not to have its duplicating (26382).
  • Add script to build mobile library with host toolchain (26440).
  • Fix JNI wrapper for IValue interface change (26448).
  • Use gradle 4.10.3 for build and publish (26473).
  • Disable bitcode for iOS CI jobs (26478).
  • Javadocs for Tensor, IValue, Module (26149).
  • Turn off autograd mode in android JNI wrapper (26477).
  • Expose USE_STATIC_DISPATCH macro to public headers.
  • Improve how pytorch_android cmake imports static lib (26525).
  • Add eigen blas for mobile build (26508).
  • Support IValue string type (26517).
  • Update android/iOS build library packing (26565).
  • Add testing script for iOS x86 build (26632).
  • Sync docker images (26651).
  • Nightly prefix for android nightly jobs (26652).
  • Refactor android torchvision: not hardcoded mean/std (26690).
  • Switch our Android CI to Clang (26656).
  • Prepare for Cocoapods 1.3 Release (26751).
  • QEngine::QNNPACK enabled, module.eval() (26855).
  • Add mobile friendly at:parallel_for backend.
  • Remove backward functions from jit-op-registry for mobile build (26851).
  • Check if QNNPACK is supported before set (26935).
  • Fix mobile.sh build (26975).
  • Fix fbjni packaging, exclude for publishing, include by default (26995).

Named Tensors:

  • Fix named tensor build by enabling tensor.is_pinned and removing support for clone() (23597).
  • Add torch._C._BUILD_NAMEDTENSOR() (23623).
  • Add names to repr for named tensors (23316).
  • Add name propagation for at::alias, add tensor.set_names (23624).
  • Improve test_namedtensor.py with named tensor equality check (23801).
  • Add names argument to ones, rand, randn, zeros, full (23743).
  • Implement name inference rule for empty_like, clone (23746).
  • Named inference for contiguous(), bernoulli variants, and dropout. (23808).
  • Add name propagation for at::alias, add tensor.set_names (24105).
  • Improve test_namedtensor.py with named tensor equality check (24106).
  • Add name propagation for at::alias, add tensor.set_names (24202).
  • Add names argument to ones, rand, randn, zeros, full; fix empty (24107).
  • Implement name inference rule for empty_like, clone (24108).
  • Named inference for contiguous(), bernoulli variants, and dropout. (24109).
  • Implement tensor.align_to(names), torch.align_tensors(*tensors) (23804).
  • Rename set_names -> view_names, set_names_ -> names_ (23962).
  • Update tensor.view_names / tensor.names_ API (23973).
  • Fix out= function semantics for named tensors. (24028).
  • Name inference for softmax, log_softmax and Dimname overloads. (24087).
  • Implement name inference for t(), transpose(...) (24203).
  • Add thread-local-state NamesMode and NoNamesGuard (24367).
  • Fix named tensor build (24940).
  • Implement name inference for t(), transpose(...) (24941).
  • Add thread-local-state NamesMode and NoNamesGuard (24942).
  • Fix FIXME_default_names by storing static list of 64 none names (24885).
  • Rename Tensor::names() to Tensor::opt_names() (24907).
  • Add helper function Tensor::names() (24914).
  • Fix binary op name inference between unnamed and named tensors. (24921).
  • Implement name inference for mm, addmm (24306).
  • Implement name inference for expand (24469).
  • Implement name inference for addmv, addmv_, mv (24471).
  • Implement name inference for torch.dot (24474).
  • Fix named tensor test (25313).
  • Implement name inference for torch.bmm (25123).
  • Implement name inference for torch.matmul (25177).
  • Include the correct header for make_unique in named tensor headers (25178).
  • Fix dependency by moving Dimname.{h,cpp} NamedTensor.{h,cpp} to core/ (25280).
  • Add guard for named tensors in the JIT (25344).
  • Add guards for using named tensor with serialization and multiprocessing (25345).
  • Prepare to add some Dimname/DimnameList overloads (25405).
  • Name inference rule for mean, std, var, std_mean, var_mean (25431).
  • Name inference rule for masked select (25566).
  • Name inference for masked_fill_ / masked_fill (25567).
  • Name inference rule for torch.cat (25568).
  • Fix binary op name inference to happen before shape checks (25563).
  • Fix named tensor printing (25564).
  • Name inference rules for relu/relu_/threshold/threshold_ (25569).
  • Implement initial version of autograd with named tensors (25604).
  • Fix named tensor build (25673).
  • Move most BUILD_NAMEDTENSOR macros out of header areas (25721).
  • Rename tensor.view_names -> tensor.renamed (25711).
  • Move BUILD_NAMEDTENSOR in NamedTensorUtils.h (25781).
  • Add flatten for named tensors. (25672).
  • Quick fixes for named tensor for windows (25728).
  • Name inference for unbind (25585).
  • Fix assertion if NamedTensorMeta's num_names != tensor.dim (25778).
  • Add names= argument to torch.tensor ctor (25424).
  • Remove some more BUILD_NAMEDTENSOR flags (25919).
  • Delete tools/autograd/env.py (25920).
  • Remove unnecessary BUILD_NAMEDTENSOR from interned_strings.h (25938).
  • Add TEST_NAMEDTENSOR flag to namedtensor ci (25948).
  • Move NamedTensorMetaInterface definitions to TensorImpl.h (26030).
  • Experimental warning for named tensors (26050).
  • Implement tensor.refine_names (25842).
  • Implement tensor.align_as(other), change tensor.align_to(names) (25843).
  • Fix bug with named tensors and (no) tracer support (26106).
  • Fix namedtensor ci (26257).
  • Turn on BUILD_NAMEDTENSOR permanently (26060).
  • Implement named tensor unflatten(dim, namedshape). (25658).
  • Rename torch.namedtensor -> torch._namedtensor_internals (26349).
  • Change '*' to '...' and ... for named tensor API functions. (26350).
  • Change "named_guard" in native_functions to "supports_named_tensor" (26352).
  • ensure c10/macros included before using (26439).
  • Disable tagged names (26479).
  • Delete tagged names (26365).
  • Refactor Dimname.h API to be nicer (26366).
  • Implement resize_, resize_as_ for named tensors (26493).
  • Support torch.pow with named tensors (26541).
  • Name inference for min(Tensor, dim?) / max(Tensor, dim?) (25582).
  • Renames tensor.renamed -> rename, tensor.names_ -> rename_ (26548).
  • Fix ellipsis behavior for Tensor.align_to to glob all missing dims (26648).
  • Typo fix (26417).
  • Don't generate named tensor functions to RegistrationFunctions.h (26685).
  • Add a lot of dimname overloads (26636).
  • Wrap dimensions during named inference (26558).
  • Named tensor support for: atan2, output_nr, detach{}, requires_grad (26543).
  • Named tensor support for logsumexp, mode, kthvalue, median, min, max (26563).
  • Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (26815).
  • Fix CUDA named tensor copy_ (26829).
  • Make named tensor implementations more robust (26968).
  • Better named tensor error messages. (26974).
  • Enable named tensors for arithmetic, clone, and tensor conversion ops (23237).
  • Move most BUILD_NAMEDTENSOR macros out of header areas (25721).
  • Rename tensor.is_named to has_named, expose has_named to python. (23315).

ONNX

  • Fix unused imports in torch/onnx/symbolic_opset8.py (23678).
  • Support ONNX export Multinomial (23581).
  • added opset10 ORT tests (22993).
  • frobenius_norm onnx export added (23536).
  • Std opset export (22310).
  • weight_names bug fix (23848).
  • canonicalize_ops pass bugfix: copy metadata for new output (23809).
  • Provide argument in ONNX export to exclude intializers from graph inputs. (23284).
  • Fix validation of dynamic axes names (23974).
  • updated pixel_shuffle in opset 11 to use depthToSpace (23739).
  • Relax precision constraint on ONNXRuntime._gru_test (24340).
  • Add ONNX Export Support to empty and empty_like (24166).
  • Update docs for softmax in onnx supported operators (24832).
  • enable "keeps" from BoxWithNMSLimit and caffe2_fastrcnn_outputs_inference (24451).
  • cumsum (24476).
  • Fix some typos in documentation (23507).
  • Update onnxruntime CI version (24414).
  • Momentum setting in SyncBatchNorm forward (inference) pass. (24995).
  • Export Unique (25050).
  • Fix dead link and syntax in ONNX landing page (25126).
  • Fixed nondeterministic RG for ORT RNN tests (25205).
  • Add ONNX Export Support to rsqrt (24153).
  • Add ONNX export support for torch.log1p. (25808).
  • remove "build_deps" arg from setup.py command in (26113).
  • Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (26137).
  • Export round (26126).
  • fix test_arange and bump ort ci version (26320).
  • Automatic update of fbcode/onnx to 1316afc9f972f81340faa05763e2898f38bcc3b0 (26309).
  • add pass for onnx scalar type conversion (24378).
  • Export clamp for opset 11 (25797).
  • Export gelu (24475).
  • Fix Exporting RNN/LSTM's Initial State (h0/c0) to ONNX (22813).
  • Update ONNX Export for Gather and Scatter for Opset 11 (24790).
  • Automatic update of fbcode/onnx to 23bb6ea1a71f08e200114a153f48bd7adb66d486 (26441).
  • Setting automatic default selection for ONNX IR v4 semantics in ONNX export API (26146).
  • Update ONNX Export for Interpolate in Opset 11 (24805).
  • Make ONNX_ATEN_FALLBACK also works for _export (26738).
  • Automatic update of fbcode/onnx to ab6b94203c595f74b1f126eb118eef22e4c05a57 (26736).
  • Update ONNX Export for Interpolate in Opset 11 (26778).
  • Support Negative Axis in Size in ONNX (26436).
  • Export baddbmm (25738).
  • Export index_fill and index_copy, fix caffe2 scatter (23052).
  • Add Support to Dicts and Strings in ONNX for Inputs and Outputs (25889).
  • export baddbmm (26901).
  • Updating producer_version in exported ONNX models to PyTorch 1.3. (26976).

Performance and Benchmarking

  • Added torch.autograd.profiler.record_function() as context manager. (23428).
  • Fix regression in torch.qr (23591).
  • Fix pin_memory_thread not exiting quickly (23646).
  • Increase predefined_minimum_secs to reduce variation (23734).
  • Enhance Tensor indexSelect performance (23055).
  • Separate input shapes to reduce default execution time (24136).
  • Increase default warmup iter and iter (24272).
  • Fix perf bug with indexed assignment (index_put_) (24083).
  • Add wipe cache (24390).
  • Vectorize LowerCholeskyTransform (24131).
  • Change the location of wipe cache (24454).
  • Optimize performance for unboxed-only kernels (25055).
  • Fix ios_crash:backtrace=FBCameraFramework:caffe2::getClockTimeMilliseconds() (perf_observer.cc (24813).
  • Add speed benchmark binary for torch jit (25230).
  • Change shape for conv and unary ops (25477).
  • Add speed benchmark binary for torch jit (25486).
  • Fix operator level benchmark to have NHWC layout (26577).
  • Speed up an integer to the power of a positive integer on CPU (26020).
  • Use Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).
  • Use parallel_for in DepthwiseConvKernel (26879).

Quantization

  • Quantized Average Pool kernel (23143).
  • skip nn.Identity in add_observer (23500).
  • Change condition in swap module (23561).
  • make_module: First version (23288).
  • ConvBn2d/ConvBnReLU2d (23357).
  • fix conv2d (23690).
  • QAT modules take qconfig as argument and keep qconfig as memeber (23609).
  • Remove qconfig_dict from API (23465).
  • Fix LSTM int8 quantization model size issue (23577).
  • Expose the quantized inputs and output of dynamic quantized int8 FC operator for debugging (23566).
  • Support for non-zero zero_points for weight and activation (23541).
  • qconv operator level benchmark (22895).
  • Enable OSS quantization tests (23858).
  • Change fbgemm_linear_{int8,fp16}weight to fbgemm_linear{int8,fp16}_weight_fp32_activation (22955).
  • clang-format aten/src/ATen/native/quantized (23898).
  • save()/load() tests and fixes (23911).
  • Enabling inline in quantized relu (23704).
  • Fix qconv benchmark (24019).
  • Adding dequantize_val and requantize_val (23909).
  • Simplified nnq.Linear class (24046).
  • State dict serialization of nnq.Linear (24047).
  • is_quantized support in JIT (24099).
  • Re-work Conv2d (24115).
  • state_dict serialization for Conv2d + some bugfixes (24116).
  • JIT serialization for Conv2d (24117).
  • fix py2 imports in _intrinsic/modules (24206).
  • Fix incorrect type annotation on Linear setstate (24209).
  • Add out variant (23956).
  • Removing the make_module script. (23635).
  • Observer returns original tensor for post training quantization (24196).
  • test_nn_quantized -> test_quantized_nn_mods (24201).
  • Fix and test conv2d constructor and from_float (24277).
  • Add out variant (23971).
  • Add dynamic quantized Linear op in PyTorch (23464).
  • Dynamic Quantized Linear Module (23128).
  • Skip test_quantized_nn_mods tests if theres no FBGEMM (24302).
  • no_deadline on ModuleAPITests and skip on dynamic quantization test (24307).
  • Add the type matching rule for qconfig_dict (23212).
  • equal() for QuantizedCPU (24211).
  • Fix the dimension mismatch issues when running the BERT model (23330).
  • Make the default qconfig_dict (24232).
  • Remove the activation observer for default_qconfig (24299).
  • fix lint (24375).
  • test {init,from_float} on nnq{,d}.Linear (24364).
  • Fix more warnings (24291).
  • Run quantization tests first (24366).
  • Temporarily disable warnings in dynamic quantization ops (24376).
  • Fix Lint (24381).
  • Add intrinsic module mappings (23753).
  • Change return type of observer to two tensors (24339).
  • Add _pair for quantized conv module (24409).
  • Replacing axis with dim in quantized cat (24151).
  • Remove redundant assignment (24408).
  • Fix QConfig_dynamic typename (24431).
  • Baseline observer module, ensuring that (min,max) range includes zero. (24297).
  • Convert bias to float in quantized conv module (24424).
  • Fixes the adding of the observer to the FloatFunctional (24418).
  • Adds a placeholder for the 'mul' operator. (24421).
  • Increasing precision for avg pool (23906).
  • Enables inplace in the quantized relu (24374).
  • extra_repr for quantized modules (24443).
  • Change kernel_size to self.kernel_size to resolve error in quantized conv module (24499).
  • Add resnext 32x4d shapes to benchmark (24503).
  • Add the default_weight_observer for the dynamic quantization path (24231).
  • Clang formatting the code [1/2] (24867).
  • Support QScheme in script (24358).
  • Use absolute import of the parent folder without alias. (24792).
  • Added relu6 kernel (24799).
  • PrepareQuant step (24425).
  • reduce for QScheme (24969).
  • Remove Symmetric Quantizer in backend (24964).
  • gradient clipping by norm.
  • Make observer scriptable (24996).
  • Add qconv_test to benchmarking tests (24913).
  • Adding quantized mul kernel (24444).
  • Enable UBSAN test for FBGEMM in dynamic quant test (25099).
  • Per Channel quantization APIs (24935).
  • per channel quantization support (24936).
  • Add missing functions and methods for channelwise quantization (24934).
  • Support lowering of fp16 weights.
  • use avx2 for Add without broadcast and when inputs are uint8_t (25098).
  • per channel quantization support (25134).
  • insert_quant_dequant jit pass (24426).
  • quant_fusion jit pass (24427).
  • Work around for bias quantization for conv and linear operators (24789).
  • Handle empty qconfig for functional Modules (24803).
  • Update mapping dictionary to support functionalmodules and pooling operations (24804).
  • Support observer without any data calibration (24923).
  • Serialization for nn.quantized.functional modules (24924).
  • Move test QAT tests to double precision to ensure numerics match (25189).
  • Adding return for the observer in the functional_modules.py (25168).
  • Adding Scalar add/mul. (24447).
  • Fix scriptability for Observer (25197).
  • Integration tests for initial quantization graph mode (24428).
  • skip tests if fbgemm is not supported for test_quantizer.py (25209).
  • add import for test_quantizer.py (25222).
  • Remove deprecated graph mode quantization tests (24998).
  • Move test QAT tests to double precision to ensure numerics match (25211).
  • Update mapping dictionary to support functionalmodules and pooling operations (25216).
  • Fix scriptability for Observer (25219).
  • Disable flaky test_adaptive_avg_pool2d test. (25249).
  • Handle empty qconfig for functional Modules (25215).
  • Reducing the test size for adaptive avg pool (25195).
  • get rid of dynamic_cast in Quantizer (25001).
  • disable deadline checking on test_adaptive_avg_pool2d (25255).
  • Fixing the enforcement of the zero_point (25193).
  • Add new qnnpack_add and qnnpack_maxpool op to C10 registry (24103).
  • Serialization for nn.quantized.functional modules (25220).
  • int8 static quantization in the numerical debugger.
  • Work around for bias quantization for conv and linear operators (25212).
  • Refactor MinMax observer (23902).
  • Quantized comparators (24387).
  • Ensure quantized::add stride matches inputs (25265).
  • Make quantized relu ops inherit the memory format from input (25271).
  • insert_quant_dequant work with qconfig_dict (25127).
  • Integration tests for qconfig_dict (25217).
  • Removing future imports from the test fixtures. (25296).
  • Memory layout for pooling ops (25374).
  • making quant utilities inplace (25054).
  • Skip test_compare_tensor_scalar due to overflow error (25432).
  • Per Channel Quantization Support for Quantized Linear Operator (25276).
  • Skip inserting observers for Tensors inside fused op (25281).
  • Remove unnecessary checks in InsertQuantDeQuantImpl (25370).
  • Change exception to warning (25408).
  • Quantized vec256 + vectorized quantized::add (25202).
  • Minor fixes in per channel support for qconv kernel (25182).
  • Vectorized quantized relu/relu6 (25496).
  • Remove index calculation in quantized max_pool2d (25526).
  • Add the dynamic quantized LSTM module (25157).
  • Dynamic dispatch for optimized quantized op kernels (25545).
  • Rename fbgemm quantized operators to generic quantized ops (25338).
  • move no_deadline to hypothesis_utils.py (25598).
  • Rename FBGEMM quantized operators to generic quantized ops (25678).
  • Inserting observers for all methods called in forward (25503).
  • Vectorized specialization of max_pool2d for channels-last layout (25676).
  • Copy quantize routine to vec256 (25685).
  • Store bias in PackedLinearWeight struct in fbgemm (25428).
  • derandomize hypothesis tests (25513).
  • Relax scale to prevent saturation in conv/linear. Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in (25667).
  • Use more efficient specialized Quantize routine (25731).
  • Factor unnecesary work out of add inner loop (25751).
  • Fork QNNPACK into aten/src/ATen/native/quantized/cpu/qnnpack (25500).
  • Test scripting and tracing for dynamic linear modules (25870).
  • Store bias in PackedConvWeight in fbgemm (25626).
  • Add Dropout to blacklist (25881).
  • Add torch.nn.LSTM into the default dynamic quantize mappings (25954).
  • Change order of activation and weight in QConfig (25950).
  • indentation for hypothesis profile and proper inheritance for QuantizationTestCase (25934).
  • Improve error message when input is not in the right format (25928).
  • add the tensor_observer to record the runtime tensor for quantization … (25830).
  • Add new API for Fully Connected and Convolution Operators in QNNPACK (25862).
  • remove verbose in pytorch_ci hypothesis profile (26075).
  • Upgrade the naming for fbgemm quantized op (26064).
  • Use BytesIO instead of tempfile (25976).
  • Add Runtime flag for quantized backend. (25680).
  • TorchScript Serialization for dynamic LSTM (26084).
  • Skip inserting duplicate observers (25504).
  • Fix build warning in vec256_qint.h (26121).
  • Support quantizing any methods called (25505).
  • Add fusion for quantized linear (25624).
  • Fold quantize op into module (25625).
  • use whitelist for selecting observed values (25974).
  • Add histogram observer (23959).
  • Back out "[quant][observer] Add histogram observer" (26236).
  • fix hypothesis timeout (26280).
  • Whiltelist and fusion support for quantized::linear - addmm (26208).
  • Whiltelist and fusion support for quantized::linear - matmul(without bias) (26209).
  • Disable broken unit tests (26301).
  • Whiltelist and fusion support for quantized::linear - matmul(with bias) (26204).
  • Dynamic quantization for bias. (26057).
  • Add missing argument for failing function call (26311).
  • Enable support for dilated convolutions (26205).
  • Adding quantized::linear function for pytorch mobile in c10 (26135).
  • Add l2 norm minimization (24022).
  • Disable QNNPACK tests if pytorch is not built with it. (26427).
  • Adding quantized::conv2d function for pytorch mobile in c10 (26152).
  • Add extra filtering for scale/zero_point/dtype in FoldQuantizeCallIntoBuffer (26224).
  • Remove quantizeBias (26388).
  • Add NoQEngine to QEngine and refactor the name of set/get qengine (26330).
  • Fix quantized::linear QuantFusion patterns (26414).
  • Add per channel observer (25887).
  • Add support to call unpack for pytorch mobile quantized FC and Conv (26211).
  • Remove quantization for bias in pattern (26415).
  • Implement more support for per-channel quantization (26240).
  • Fold weight permutation inside quantized conv operator (26241).
  • Fold activation permutation inside quantized conv operator (26242).
  • Add NoQEngine to QEngine and refactor the name of set/get qengine (26471).
  • Add the FP16 weight support for LSTM in dynamic_quantize (25975).
  • Fix quantized::conv2d patterns in QuantFusion (26515).
  • Changes to support int8 weight and fp32 bias in QNNPACK (26307).
  • Add the quantized average_pool2d support and adaptive_avg_pool2d support (25899).
  • Fix the API for record observer (26413).
  • Unify Quantization APIs for add, pool and relu (26335).
  • Compiler warnings cleanup for quantization.cpp. (26585).
  • quantize_linear -> quantize_per_tensor (26574).
  • Get scalar type from observer module (26425).
  • Add inplace argument to InsertObservers and InsertQuantDeQuant (26389).
  • Expose supportedQEngines to python (26474).
  • quantize_linear_per_channel -> quantize_per_channel (26575).
  • Skip some fragile tests (26599).
  • quantized average_pool2d and adaptive_avg_pool2d implementation(Revert d17437015) (26580).
  • _dequantize_linear -> _dequantize_per_tensor (26576).
  • Unify Quantization APIs for add, pool and relu (26586).
  • Simplify observers declaration with functools.partial (26492).
  • Import torch.quantization when one imports torch (26649).
  • NHWC specialization for quantized::cat (26524).
  • Fix the flaky test_qlinear test caused by hypothesis deadline (26663).
  • quantized torch.topk (26486).
  • remove unneeded code (26640).
  • Update qengine flag in python to string (26620).
  • _per_tensor_affine_qtensor -> _make_per_tensor_quantized_tensor (26678).
  • Skip observing bias across function call hierarchy (26642).
  • _per_channel_affine_qtensor -> _make_per_channel_quantized_tensor (26679).
  • Quantized Interpolate Kernel(upsample_nearest2d) (26617).
  • Fix _empty_per_channel_affine_quantized to be less hacky (26243).
  • Per-channel quantized tensor to have only a single axis (26675).
  • Allow per-channel QTensor accept any floating type for scales (26676).
  • Use noop observer to pass dtype for dynamic quantization (26709).
  • Remove duplicate calculation of output shape (26684).
  • Trivial quantized torch.mean implementation (26253).
  • Remove _dequantize_per_channel in the pattern (26680).
  • Un-hardcode epsilon constant in FoldConvBatchNorm2d. (26584).
  • Remove _dequantize_per_tensor (26681).
  • Add threadpool in qlinear and qconv for mobile (26728).
  • move more functions to InsertObserversHelper (26696).
  • quantized_tensor tests (25429).
  • Handle DeQuantStub() for QAT (26518).
  • Add include to resolve PRIu32 macro (26745).
  • Fake quantization enhancements for QAT/PTQ support (26420).
  • move more functions to InsertObserversHelper (26773).
  • quantized_tensor tests (26784).
  • Quantized Interpolate Kernel(upsample_bilinear2d) (26631).
  • Throw if someone tries to torch.save() quantized modules (26828).
  • Re-write of tensor-scalar quantized add (26766).
  • Try to disable annoying hypothesis warnings again (26853).
  • Remove unnecessary functions and cleanup code in quantization.cpp. (26852).
  • Add more inplace arguments to quantization top level API (26782).
  • batch size 0 support in ChannelShuffle DNNLOWP op (26858).
  • batch size 0 support in Conv DNNLOWP ops (26871).
  • batch size 0 tests for element-wise DNNLOWP ops (26870).
  • batch size 0 support in FC DNNLOWP operators (26872).
  • batch size 0 tests for Quantize/Dequantize DNNLOWP ops (26873).
  • batch size 0 support in norm operators (26894).
  • batch size 0 tests in BatchMatMul ops (26874).
  • Set quantized engine backend for mobile in speed_benchmark_torch (26911).
  • Support ceil_mode in quantized maxpool (26916).
  • Make quantized max_pool2d error message more specific and less silly (26918).
  • batch size 0 tests for etc DNNLOWP operators (26877).
  • Fake quantization enhancements for QAT/PTQ support- fix tests (26876).
  • Serialization and range reduction support for Fake Quant/Observer (26519).
  • Fix the QuantizedAVX2 build issue (26854).
  • Default histogram observer (26622).
  • Fix all factory invocations in quantized to correctly propagate options. (26966).
  • control of observer/fake-quant operations (26520).
  • Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (26840).
  • Fix misuages for TORCH_CHECK/TORCH_INTERNAL_ASSERT with string (26897).
  • Better error message for calculate_qparams (26985).
  • Add P99 method with configurable thresholds.
  • Xray image inference on multi-cpu and dumping dnnlowp tensors (22537).
  • Add int8 resize nearest 3d op in DNNLOWP (26063).
  • Re-write of tensor-scalar mul (26937).
  • Support qadd_relu on pytorch mobile (26982).
  • Add optimized quantize function for ARM (26867).
  • Add QuantFusion to graph_executor (26591).
  • Move patterns in QuantFusion to a separate file (26848).
  • PyTorch Graph Mode Quantization API (26390).
  • Add the type matching rule for qconfig_dict (23212).

Visualization

  • Added mesh plugin (24039).
  • Update tensorboard.rst (22026).
  • Remove hard Caffe2 dependency for TensorBoard (24295).
  • Added test_tensorboard.py to TARGETS (24040).
  • Hyperparameter plugin (23134).
  • Removed external tensorboardX dependency (25259).
  • Fix empty graph problem (25599).
  • Delay external imports until we're ready to test tensorboard (25993).
  • Create TensorBoard test classes in all cases (26005).
  • Fix flaky SummaryWriter test (26395).
  • Fixing the calling parameters of write_gif function of the moviepy. (21218).
  • Add Virtual Memory and CPU percentage computation to AIBench (23590).
Clone this wiki locally