forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 1
1.3 release
gchanan edited this page Oct 2, 2019
·
2 revisions
AMD / ROCM Changes:
- Improve hip-clang support in build_amd.py (23835).
- For int64_t atomicAdd, use the available compiler builtin on ROCm. (24854).
- Use correct WARP_SIZE for ROCm for EmbeddingBag ((24868).
- Switch to rocThrust for thrust/cub APIs (25620).
- rocBLAS deprecated the last two parameters. (25726).
- Enable two tests that were skipped b/c of rocThrust bugs fixed in ROCm 2.7 (25724).
- Enable jit fusion on ROCm (22872).
- Remove NULL arguments that have been marked deprecated by rocBLAS (25866).
- Make sparse coalesce warp size aware (25918).
- Make spatial depthwise convolution warp size aware (25922).
- Make lookup table warp size aware (25926).
- Make persistent softmax WARP_SIZE aware. (25937).
- Enable unit tests (25963).
- Enable Unique operator tests on ROCm (26046).
- Enable more mGPU tests (26055).
- Make regular softmax warp size aware (25956).
- Disable test_cuda.test_stream_event_nogil on ROCm (26087).
- Use MIOpen for transpose convolutions (26172).
- Switch to the new profiler infrastructure (26174).
- Enable basic GPU profiling capability on ROCm. (26300).
- Fix compiler unwrapping step in jenkins build scripts for Caffe2/PyTorch on ROCm (25409).
- Split PyTorch ROCm tests as 2 CI jobs to run in parallel (26380).
- Puts ROCm tests on default stream (26394).
Bug Fixes:
-
at::view
create an empty tensor and set storage instead of clone (23452). - Fix set_grad for extension backends (23516).
-
torch.is_pinned
pin_memory should not copy on already pinned tensors (23484). - Fix gemm call for CUDABlas for THCUNN conv, #23545 (23552).
- Fix CTC loss for zero-length targets on GPU (23298).
- Adam implementation minor fix (23737).
- Add flag to temporarily disable MKL-DNN conv (23837).
- Fix test TestCuda.test_streams_multi_gpu_query (23912).
- Fix dataloader._shutdown_workers if not all workers are started (23761).
- Fix crash on torch.Tensor.repeat() for 0 repeats (23766).
- Fix master (24003).
- Remove numpy assert that fails on Windows (older numpy versions). (24012).
- Add missing include header in tensor_numpy.cpp (24042).
- Fix tensor construction from array (24283).
- Skip broken test (24453).
- Fix Typing Error for Padding with asymmetric signatures (24895).
- Avoid race condition in intrusive_ptr.reset_() (24464).
- Temporarily fix hub SSL cert issue (25042).
- Fixes test_equal (25275).
- CUDA_KERNEL_LOOP: prevent int overflow in loop increment. (24818).
- Issue #24962: Fix cuda method to support "None" arg for device and a … (25018).
- Multiple fixes to test_c10d.py. (25334).
- Attempt to fix windows build (25450).
- Fix bug in assertNotEqual for int tensors (25412).
- Fix pow precision (25476).
- Fix 'in' return true incorrectly (24156).
- Fix Windows build (26246).
- Fix CI (26250).
- Fix no auto batching bugs: cannot bulk load; not work with namedtuple (26065).
- Fix cdist gradient computation if first arg is 1xn (26254).
- Fixes big endian arch bugs. (26383).
- Fix CI (26593).
- Fix annotation regex for flake8 (26694).
- Fix to operate on cuda kernel with clang and libc++ (25553).
- Do not call cpuinfo_initialize() on other than x86 arch. (26265).
- Fix Vec256::abs() for floating point when applied on -0.0 (26422).
Build / CI:
- Refactor the pytorch_doc_push_script to take a branch (23556).
- Let user be able to change MKLDNN "-m" flags back and forth in subsequent builds (23608).
- Fix CPU-only binary testing by properly installing cpu-only first. (23611).
- Omit local version identifier for default configuration. (23654).
- add setup metadata to help PyPI flesh out content on pypi package page (22085).
- Reduce input sets for tests to speed them up. (23692).
- add appropriate install_requires (23722).
- cpu binary builds are built with cu100 docker image now instead of cu80 (23772).
- allow INSTALL_TEST to pass through from env to cmake (23793).
- Remove unnecessary fetch and reset on builder checkout. (23792).
- Add CUDA 10.1 to CI. (23791).
- Remove nightly suffix from nightlies; upload to pytorch-nightly. (23752).
- Delete Travis CI config (23788).
- Rename cpu-only to cpuonly, as dash features are not supported. (23879).
- Roll master to 1.3.0 (23895).
- No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (23806).
- Add python_requires to help pip (23863).
- Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. (23568).
- Fix build failure on OSX (23998).
- Don't add local version to Conda packages. (24014).
- print clang tidy output to stderr (24052).
- When matching a line in CMakeCache.txt, ensure A=B and "A"=B are matched (23745).
- Move dict_test.cpp to test folder and fix dict_test.cpp for Aten includes (24071).
- Build option USE_NUMA should only show up on Linux. (23673).
- Do not force USE_SYSTEM_EIGEN_INSTALL to be OFF in Python build scripts (23990).
- Suppress implicit-fallthrough warning on g++ >= 7 in caffe2/utils/math_cpu.cc (24053).
- Send flake8 to stderr (24100).
- Move iOS.cmake to the cmake folder (24029).
- Ignore bugprone-lambda-function-name in clang-tidy. (24190).
- Ignoring the test logs in case the tests are ran from the parent directory (24212).
- Remove escape_path in our build system. (24044).
- Enable QNNPACK for iOS (24030).
- Fix Z7_MSVC_OVERRIDE for C source files (24389).
- Fix Caffe2 Windows build by switching to ninja. (24330).
- Configure pytorch-probot (24423).
- Fix CUDNN location related build issue on Antergos Linux (based on Arch) (24300).
- Set CUDA arch correctly when building with torch.utils.cpp_extension (23408).
- Move the search of cuDNN files to FindCUDNN.cmake. (24293).
- Ensure proper file executable permissions in CI. (24214).
- Respect pre-defined DOCKER_IMAGE value in binary_populate_env.sh (24787).
- Remove support for old architectures in cpp_extension and CMake (24442).
- Build libtorch binary with new ABI (23908).
- Fix cmake backslash syntax error on Windows. (24420).
- Move the detection of cuDNN to FindCUDNN.cmake (24784).
- Attempt to fix windows build. (24916).
- Move CPU-only jobs to xenial (24506).
- Skip setting
CUDA_NVCC_EXECUTABLE
ifCACHE_WRAPPER_DIR
not set. (25006). - disable custom class logic for mobile build to avoid rtti (24994).
- Turn off fbgemm for libtorch android build (25113).
- Fix clang-tidy failing all the time on random lines (25078).
- Fix clang-tidy failing on master (25121).
- Fix lint checker breakage caused by #25111 (25122).
- Update QNNPACK submodule to 901e9d4 (25044).
- Add a skip_override option to should_run_job.py (25118).
- Switch hub to use
requests
because of SSL (25083). - Ensure tests get passed on Windows (25145).
- prevent generating caffe2::mkl for multiple times (25167).
- Add myself as a CODEOWNER for better discoverability (25231).
- Move the detection of cuDNN to FindCUDNN.cmake (24938).
- Specify width for st.floats in hypothesis_utils.tensor (25188).
- Add USE_CUDNN check to AT_CUDNN_ENABLED definition (25037).
- Disable cuda_distributions_test and converter_nomigraph_test on Windows. (25305).
- Re-enable libtorch tests on Windows (25377).
- Upgrade to circleci version 2.1 configs (25336).
- Fix binaries build for BUILD_CAFFE2_MOBILE=OFF (25229).
- Skip useless macros from Windows.h (25444).
- Add windows docs for the binaries (23150).
- Turn off warnings on Windows CI. (24331).
- Parameterize CircleCI config (25446).
- Remove BUILD_ATEN_ONLY build option (24441).
- Fix windows build error when TBB enabled and Windows SDK installed (25398).
- Remove PYTHON_VERSION (25494).
- Remove MULTI_GPU (25509).
- Add set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "-Wl,--no-as-needed") to CMakeLists.txt (25445).
- Clean up binaries/cmake for mobile (25651).
- Move USE_STATIC_DISPATCH from CI script to master cmake (25696).
- Do not pass down USE_GLOO_IBVERBS to CMake (25720).
- Correctly gate CUDA_ARCH with defined() (25729).
- Fix cudnn static linkage (25848).
- Fix invalid function cast warnings that show up with GCC 8/9 (25483).
- Upgrade NVIDIA driver on CI to 430.40 (24242).
- Remove tools/setup_helpers/dist_check.py (25879).
- Remove pthreadpool dependency in aten/CMake (25894).
- Remove protobuf from Dependencies.cmake for libtorch mobile build (25958).
- Fix typo in OpenBLAS cmake detection (25966).
- Simply code generation - phase 1 (25961).
- Remove pthreadpool.a from install directory (25977).
- Remove trailing whitespace in CircleCI configuration files (25987).
- Change brew update logic to run much faster (25988).
- Refactor macOS build and test (25930).
- Run PyTorch macOS CPU-only build/test on all PRs (26096).
- Use CircleCI commands for brew update/install (26159).
- Turn should_run_job into command (26160).
- Turn setup_linux_system_environment into command (26162).
- Turn setup_ci_environment into command (26163).
- Nightly build for for iOS (26074).
- Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (26306).
- Fix remaining invalid function cast warnings that show up with GCC 8/9 (26104).
- Rebase CircleCI to master if it is gcc5_4 (26321).
- Emergency Docker upgrade to version 347. (26466).
- Use github actions for flake8 (25824).
- Add a CI Job to Check BC Changes in Function Schemas (26329).
- prevent generating caffe2::mkldnn for multiple times (25257).
- Add namedtensor build & tests to default sets (26633).
- Fix github actions for forked PRs (26562).
- Remove tools/setup_helpers/cudnn.py (25876).
- Allow building docker without torchvision (26168).
- Validate Docker version in CI. (26496).
- Fix CI docker builds (26704).
- Cuda101 upgrade (26823).
- Fix building with PARALLEL_BACKEND=NATIVE_TBB (26742).
- Fix typo in job name: nigthly->nightly (26881).
- Get rid of -u (expansion of undefined variable) setting (26907).
- Switch internal CUDA build to C++14 (26757).
- No sccache (26059).
- Fix c10 registration binary size (26827).
- Improve binary size of function schema inference (26860).
- Fix shared_ptr binary size in op registration (26869).
- Fix binary size in schema inference (26878).
- Switch nightly jobs to trigger on 'nightly' branch rather than cron. (26830).
Caffe2:
- Add Cast Op (23548).
- Remove the confused CPU op (23575).
- Remove ONNX & Turn on
NO_API
for mobile build (23546). - Include protobuf-defined outputs in the graph cutting algorithm (23557).
- Support Copy Op (23705).
- Format only change (23685).
- Add LambdaRank DCG Loss Option (23679).
- Fix the bug in regularizer matching (23485).
- Fix SliceGradientOp to handle properly empty batches (23784).
- Set caffe2_tvm_min_ops to 8 (23893).
- Support Gather different indices for different examples in one batch (23813).
- Add aligned option to RoIAlign (23706).
- Minor comment fix (22140).
- SumOp for int32 (23995).
- Fix typo "properlyh" (24067).
- OpenCV 4 compatibility fix for caffe2/video (24143).
- Implement virtual memory computation in caffe2_benchmark binary (24144).
- Add support for using caffe2::ThreadPool in pytorch mobile QNNPACK. (23658).
- Hypothesis tests: add ability to enforce shape inference (23935).
- Make hashing default for bucket-weighted pooling (24266).
- Fix rotated rect intersection error (24171).
- Format changes (24270).
- C2/glow: assign net_pos to a net before applying onnxifi_blacklist_ops (24262).
- Return list of AccessedFeatures from get_accessed_features (23983).
- Register FC/Conv DNNLowp separately for supporting both tensor type (24361).
- Refactor and expose metadata of tum_history layer for online prediction (24290).
- Put ParseBlackListOps() into caffe2::glow namespace (24384).
- Implement gradient operator for GatherByKeys. (24348).
- Add BPR loss to TTSN (24439).
- Remove gradient value as input from SparseNormalize op (24357).
- BlackBoxPredictor OSS part N + 1 : strip fb/predictor/Transforms.h dependency (#23350) (23350).
- Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (24944).
- Support focal loss in MTML.
- Implementation of cyclical learning rate (23914).
- register HeatmapMaxKeypoint with C10 (25191).
- Add the sparse feature information during logging in sparse lookup layer (24863).
- Add Int8Transpose operator (16382).
- Relax roi_width/roi_height check to non-negative (260).
- Disable Int8Transpose test.
- Format sparse_lengths_sum_benchmark (25529).
- Add options to flush cache in SLS benchmarks (25530).
- Change shape for some ops to reduce variance (25619).
- Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (25519).
- Enable PiecewiseLinearTransform test on ROCm (25632).
- Add requests as a legit dependency (25596).
- Change shape for some ops to reduce variance (25686).
- Move GetDimFromOrderString to caffe2/core/types.h (25671).
- Make SparseNormalize backwards compatible (25660).
- Cyclical learning rate multiplier: use fabs(base_lr) (25628).
- Remove caffe2.pb.h dependency for embedding_lookup_idx.cc (25670).
- Enable loading int8 prepacked models in PredictorContainer.
- Get rid of protobuf dependencies (25650).
- Fix device_option propagation (25203).
- Increase input shape to reduce variance (25812).
- Fix cuDnn build error with CC3.0 platform(#25820) (25825).
- Remove cosh_ op test (25893).
- Enable variable size embedding (25782).
- Add assert to ensure the divisor is not 0 (25960).
- Increase failure threshold for timing based assert (25867).
- Better error messages in C2 ONNX backend (25809).
- Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (25959).
- Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (26080).
- Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (25970).
- Guard dyndep with a lock (26153).
- Upgrade Caffe2 docker images to 306 to include roctracer and rocprofiler (26260).
- Average Pooling 3D AVX2 Implementation (26111).
- Back out "Back out "[Caffe2] Fix device_option propagation"" (25908).
- Support unpickle py2 NetDef object in py3 (26147).
- Tvm operator dynolog (26295).
- Add support for real4bits quant (25426).
- Add DimType info in dumped debug nets (26589).
- BlobReference getattr can only throw AttributeError (26654).
- "fixing" gcc bug introduced with cuda 10.1 (26445).
- Whitelist ATen/core sources and headers for Caffe2 (26609).
- Adding OpProfile proto into ProfDAGProtos to support storing operation cost (26677).
- Use new fbgemm PackedDepthWiseConvMatrix without template parameter (26760).
- Rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool.
- Enable batch_size = 0 support in DNNLOWP Concat operator (26849).
- Use new depthwise conv fbgemm interface (26898).
- Fix the weird bug in control_flow_op_test.py (26931).
- Disable cudnn transpose for int types (26934).
- Expose PiecewiseLinearTransform to PyTorch (26903).
- Remove LOG(INFO) from math_cpu.cc (27001).
- Add fakefp16 transformation.
BC-Breaking
- Improve handling of mixed-type tensor operations (22273).
- Migrate comparison ops from the TH to Aten. Added support for type promotion. (26981).
- Changed tensor comparison return type from uint8 to bool (21113).
- Add align_corners option to grid_sample and affine_grid, change default to False (24931).
-
torch.pow
Port operator from the TH to Aten (23492). -
torch.flatten
returns a 1-dim tensor on a 0-dim tensor (25406). - Change schedulers to chainable form (24352).
- Make
options.name_
private, and change all callsites to useoptions.name()
(26419). - Remove deprecated
torch.gels
(26480).
C++ API Parity
- Support custom autograd functions in C++ (23572).
- Allow empty Variables to be saved for backwards (23618).
- Tests for C++ custom autograd function API (23628).
- Hooks for C++ API (24393).
- C++ ModuleList (24317).
- Build libtorch binary with new ABI (23908).
- Templatize Tensor.data_ptr() (24847).
- bind autograd functions into C++ (24342).
- Deprecate tensor.data(), and codemod tensor.data() to tensor.data_ptr() (24886).
- Add Python/C++ torch.nn API parity test harness (23852).
- Add Python/C++ API parity tracker for torch.nn (25289).
- Use
constructor
in test_params for C++ API parity test (25749). - Map module options between Python and C++ in API parity test (25784).
- Make various improvements to C++ API parity test harness (25828).
- C++ Fold nn module (24160).
- Fix LBFGS on GPU (25909).
- L1Loss module (25902).
- C++ MaxPool Module (24860).
- C++ Average Pool Module (25800).
- C++ unregister_module function for Module (26088).
- C++ API parity: at::Tensor::data (26008).
- Re-organize C++ API
torch::nn
folder structure (26262). - C++ API parity: at::Tensor::grad (26150).
- C++ API parity: at::Tensor::is_leaf (26186).
- C++ API parity: at::Tensor::output_nr (26216).
- Support multidimensional inputs to torch::tensor (26210).
- Distance module (26424).
- Fix options usage in C++ module / optimizer constructors (26483).
- C++ API parity: at::Tensor::version (26217).
- Minor improvement to C++ nn::Distance tests (26539).
- C++ API parity: at::Tensor::version (26561).
- C++ API parity: at::Tensor::detach (26251).
- C++ API parity: at::Tensor::set_data (26647).
- Add tests for C++ functional cosine_similarity and pairwise_distance, and clean up functional test code (26559).
- Add C++ nn::Identity (26713).
- Add comments for multidim tensor factory limitations, and rename ListInitTensor for better clarity (26756).
- Improve C++ maxpool and avgpool (26521).
- C++ API parity: TensorTest.Data fix (26920).
- C++ API parity: AdaptiveMaxPool1d (26755).
- C++ API parity: AdaptiveMaxPool2d (26772).
- C++ API parity: AdaptiveMaxPool3d (26775).
Distributed
- Extract common classes and functions from test_c10d to common_distributed (23660).
- Sync and async torch.distributed.rpc for builtin operators (23228).
- python udf over rpc (23569).
- Fix naming convention inconsistency and formats in test_rpc.py (24407).
- Use c10::ThreadPool to send and receive messages (23968).
- Use snake names for all files in distributed.rpc (24502).
- throw remote exception on client side (24138).
- Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (25012).
- Return a message instead of void from rpc udf (25283).
- Basic framework for Distributed Autograd context. (24875).
- Add missing call to DistAutogradContainer::init (25391).
- Remove a unused member var (stop_) in process_group_agent (25392).
- Assign each RpcAgent a unique ID, and use ID for sending RPC messages. (24195).
- Cuda devices should have same dtype (25470).
- Multiple fixes to test_c10d.py. (25441).
- Move worker name collection code from Python to C++ (24260).
- Attach 'send' autograd function to the autograd graph as part of RPC. (24876).
- Error phrasing in torch.distributed helper functions (25574).
- Make scatter/gather arguments optional (25575).
- Run clang-format on torch/csrc/distributed (25647).
- Build torch.distributed with Gloo backend on macOS (25260).
- Adding RRef as return value for builtin operators (25169).
- Only default USE_DISTRIBUTED=True on Linux (25725).
- Adds a -m flag to pytorch.distributed.launch (24910).
- Use whitelist instead of blacklist for USE_DISTRIBUTED (25759).
- Change worker name constrant (25780).
- Make Python RPC handler does not hold module in global variable (25458).
- Retry connecting to TCP store on ECONNRESET (25707).
- Make python rpc handler to be singleton class (25742).
- Disable flaky test_invalid_names in test_rpc.py (25916).
- Remove global group name tracking for ProcessGroupNCCL (25905).
- Dynamic registration of RPC backends (25734).
- Add ProcessGroupGloo::createDefaultDevice (26166).
- Clarified ambiguous docstring in NegativeBinomial (25923).
- Make ProcessGroupAgent take num_send_recv_threads as constructor argument (26313).
- Remove extra get_worker_id call in distributed rpc init (26381).
- Make distructor virtual for class with virtual function (26504).
- Use timeout in connect function to prevent against (26364).
- Corrected variable name and added test (26503).
- Add timeout parameter to connect function in TCPStore (26554).
- Added test case for reinit (26506).
- Add function to get NCCL version for logging (26583).
- Add bitwise distributed reduction ops (26824).
- RPC Backend Registry (26919).
- Acquire GIL before creating py::object in RPC python handler (26988).
- Support re-creating/destroying process groups when some trainers recover after failures (26912).
Distributions
- Fix log_prob() in torch.distributions.Uniform, HalfCauchy and Gamma (23017).
- Implement bool_tensor.bernoulli_ (25076).
- Fix CUDA distributions test on Windows (25539).
- Fix the Bernoulli distribution sampler (26864).
Documentation
- Fix typos in .circleci/README.md (23588).
- Documentation for Tensor.record_stream() (24078).
- Use prerendered KaTeX in docs. (23376).
- Documentation cleanup (23148).
- Fix a typo in Functions.cpp (23615).
- Slightly improve dataloader docs on when auto-batching is disabled (23671).
- Adjust maintainers list (23693).
- Fix align_corners doc (23707).
- Document empty_strided (23735).
- Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
- Document bool tensors for bitwise_not. (23800).
- Fix typos in op_registration.h (23770).
- fix
torch.frac
documentation (23830). - Delete placeholder so top-level CONTRIBUTING.md is used (23869).
- Replace descriptions of args in doc with template (23439).
- Fix docstring for argmax (23775).
- Adds
torch.random
to docs/toc (23553). - Migration doc fixes (24033).
- Updated SGD docs with subscripts (23985).
- Add interfaces in lr_scheduler.pyi (23934).
- Document benchmarking practice for CUDA (23910).
- Documentation for Tensor.record_stream() (24078).
- Use c10::ThreadPool to send and receive messages (23968).
- Added .pyi file for flatten (24459).
- Test if descriptions of args are in the template (24161).
- Add docs to CI (24435).
- Add ASAN instructions to CONTRIBUTING.md (24848).
- Fixed Error in Transformer Example (24837).
- Fix the lint error in transformer doc. (25027).
- Typo correction in cuda_deterministic_backward.rst (25011).
- Fix typo (25238).
- Added documentation for nn.functional.bilinear (24951).
- Describe the relation between fold and unfold operations. (24840).
- logical_xor doc cleanup (25364).
- Fixed flatten docs (I think) (25544).
- Add copy logic for LibTorch to avoid issues on Windows (25556).
- Update index.rst (24245).
- Documentation for cdist (25221).
- Update Transformer.py comments to include a full example (25411).
- Alphabetize Package Reference section in Docs (25666).
- Add CosineAnnealingWarmRestarts to optim documentation (25421).
- add torch.nn.Identity to init.pyi.in (25777).
- Documentation change of torch.where (25554).
- Fix typo: toDense --> to_dense (25832).
- Argument 't', mis-referenced to 'torch.t()' (25885).
- Fix typo in dataloader.py docs. (26263).
- Clarify and correct the doc of atan2. (26180).
- Add warning to anomaly_mode doc (26615).
- Add instructions for building documentation (26553).
- Highlighting in the doc that square root comes before adding epsilon (26735).
- Add documentation for overload names (23844).
Improvements
- Port
resize_as_
andclone
from TH to Aten (23027). - support Gather different indices for different examples in one batch (23285)
- Remove old Type based backend extensions (22009).
- Update MKL to 2019.4 for Windows (23583).
- Remove AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF, which isn't used anymore. (22932).
- Rename AT_FORALL_SCALAR_TYPES_WITH_COMPLEX to AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_STUBS (23336).
- Allowing batching for det/logdet/slogdet operations (22909).
- Use dst dir for temp file (23629).
- Add overload names to native_functions.yaml (23532).
- Adam/AdamW implementation minor fix (22628).
- Remove useless code from shape info (23663).
- Move addcmul to Aten (22874).
- Migrate neg's CUDA implementation to ATen. (23617).
- Bump Gloo (23400).
- Zero sized tensor support for repeat_interleave (23717).
- Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (23701).
- Channels last stored in tensor (23391).
- Recommend
~
andbitwise_not()
when user tries to apply neg (-
) on a bool tensor. (23621). - Negate halves on GPU using __hneg() when possible, instead of using float conversion. (23626).
- Rename previously THNN conv kernels to have naive_ prefix. (23790).
- Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (23833).
- Remove K and N function arguments for fbgemm_pack_quantized_matrix (22956).
- cleanup torch/nn/functional.py (23977).
- Move addcmul to Aten(CUDA) (23814).
- Enable Add, sub, mul, and div on CPU for bfloat16 type. (22851).
- Removing deprecated warning message from torch.h (24002).
- port atan2 from TH to ATen (23558).
- Port addcdiv operator from the TH code to Aten (23683).
- Add instruction on how to nest nn::Sequential (23939).
- Refactor randperm test (23526).
- Delete unnecessary file split_types.py (23754).
- Make all at::Tensor in-place methods const (23945).
- Fix scale and zero_point names (23991).
- Allow forward functions with single output to return Variable (23803).
- Fix regression in triangular_solve when number of batches = 1 for CUDA (23953).
- make more iterator attributes private (23744).
- Fixed Bool in IsIntegralType bug (plus review comments) (23942).
- Support torch::tensor and at::tensor with bool and BFloat16 dtypes. (23337).
- Don't redefine unecessary type stub. (23338).
- Port addcdiv operator from the TH code to Aten (24086).
- Added type annotations to unpooling layers (24101).
- Unboxed kernels in c10 (23447).
- Allow kernels that don't have a boxed version (23665).
- c10 dispatcher stores autograd kernels (23666).
- Move TensorOptions to ATen/core (22020).
- Make all at::Tensor in-place methods const (23945).
- Optimizing out the division in the fusion (23275).
- add function name to error messages generated by checked_tensor_unwrap (24187).
- Fix C412 lint from flake8-comprehensions update. (24184).
- Align AT_FORALL macros with AT_DISPATCH macros. (23339).
- Remove unused parameter from FORALL macros and rename STUBS to QINTS. (23340).
- Thread local debug info (22365).
- Cleanup warnings (24133).
- Enable FBGEMM tests under UBSAN as well (23570).
-
toString(FunctionSchema)
shows overload name (23694). - Disambiguate tensor and string ops (23748).
- Simplify tests that should cover all possible devices (23824).
- reduce memory usage for centered rmsprop (24170).
- Enabled comparison ops for bfloat16 dtype on CPU (24182).
- Rename torchtest.test_all_device_types to torchtest.for_all_device_types (24337).
- Fix expansion of stride argument in max_pool2d (23954).
- Fix expansion of stride argument in max_pool3d (23960).
- Fix expansion of stride argument in avg_pool2d (23961).
- Fix expansion of stride argument in avg_pool3d (23963).
- Resolve unused variables in tests (24075).
- Sanity fixes for bitwise_not (24296).
- Fix issue with single memory location being written multiple times (23574).
- Add
logical_not
operator (23839). - Add
logical_xor
operator (23847). - Recommend logical_not() instead of bitwise_not() when applying sub and neg on bool tensors. (23860).
- Enabled masked methods for bfloat16 (24183).
- Make aten_to_numpy_dtype in tensor_numpy.h public. (23943).
- Let logical_not support non-bool tensors. (23916).
- Let logical_xor support non-bool tensors. (23978).
- Exposing the API for use with pytorch/tvm repo. (24430).
- Assert weight_observer has the correct dtype (24436).
- Enabled torch.mm and torch.mv for bfloat16 (24224).
- Don't require slow test reporting in
run_tests.py --pytest
(24448). - Modify symmetric eigendecomposition derivative (23018).
- Remove unused files from THNN and THCUNN (24820).
- Allow SyncBatchNorm without DDP in inference mode (24815).
- Enable
torch.eye
for bool and half (24148). - Allow
torch.tril
/ triu to handle bool and half inputs (24163). - TensorIterator::binary_op input-output overlap check (24058).
- Make SobolEngine use random seed if not specified (24884).
- Add static dispatch mode to reduce mobile code size (22335).
- Use a ptr to store autograd profiler rng (24889).
- Fix deprecation warnings (24841).
- Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (22907).
- Remove unused ATen headers for mobile (24850).
- Improve c10 dispatcher lookup perf (24882).
- Add epsilon argument to Adagrad optimizer (24980).
- Migrate erfinv and erfinv_ from the TH to Aten(CPU) (24908).
- Fix for cdist backward for non-batch tensors (22915).
- Remove deprecated TH(topk) code. #24778 (24857).
- Disable tsan for test_dataloader.py. (25005).
- Fixed test_numba_integration (25017).
- pin_memory thread now uses 1 thread only (25111).
- print padding_mode for Conv modules if not zeros (23996).
- Use the EmbeddingLookup API which takes the offsets instead of lengths (24945).
- torch.from_numpy fix for np.int (25139).
- Creates Torch-friendly Event class and adds Stream tracking to autograd (25130).
- generic overrideable convolution for backends (23562).
- Optimize LeftRight and either (25133).
- data -> data_ptr: upgrade the deprecated APIs (25223).
- Moving sign function to ATen (22861).
- Add TORCH_WARN_ONCE, and use it in Tensor.data() (25207).
- Upgrade the deprecated data to data_ptr APIs (25295).
- upgrade MKL-DNN to v0.20.3 (22910).
- Remove some unused plugins. (25201).
- Disable the copy constructor and = operator of DispatchStub (24932).
- Fix infer np scalar dtype mem leak (24267).
- Align AT_FORALL macros with DISPATCH macros wrt Half. (25268).
- Implementation of cpu_serial_kernel for TensorIterator (25125).
- Migrate digammadigamma_polygammapolygamma_ from the TH to Aten (CPU) (25048).
- note location (25311).
- Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (24943).
- Support all_reduce a list of same-device tensors #21640 (24949).
- Fix typo "takes takes" -> "takes" (24785).
- Remove unused THTensor_(add) and similar functions code. (24864).
- Move new_criterion_tests from test_nn.py to common_nn.py (25333).
- Use C10_DEPRECATED_MESSAGE instead of TORCH_WARN_ONCE for Tensor.data() (25319).
- Extend nn.Transformer to support BERT (gelu) (24181).
- Fix possible deadlock in SharedCache inside a forked child proc (25158).
- Fix double backward of inplace op on view (23502).
- change LBFGS's default tolerance_grad to 1e-7 (25240).
- Add OneCycleLR (25324).
- Invariant typevar matching on callsite checks (25136).
- Fix lint (25371).
- Fix bug in assertNotEqual for int tensors (25199).
- Kill THNN function auto generation. (25322).
- Move the CUDA implementation of ceil to ATen. (24866).
- Replace open registration TensorTypeId with closed enum. (25252).
- Remove THNN sparse autograd Functions. (25323).
- Kill ConvTransposeMixin.forward, which doesn't seem to be used. (25326).
- Kill backend-specific lookup in CrossMapLRN2d, as it never succeeds. (25331).
- Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (25339).
- Adding ModuleList to modules.h (25346).
- Fixed masking warnings in tests (25317).
- Add support for non-affine batch norm with float stats and half inputs (22750).
- Fix allreduce_coalesced tests in c10d (25419).
- Remove Module._backend as it's not used anymore. (25342).
- Delete toType(const DeprecatedTypeProperties&, ...) (25332).
- Update QNNPACK submodule to 7d2a4e9 (25400).
- Compare shapes of outputs and grad_outputs in autograd.grad (25349).
- Stop initializing THNN backend. (25352).
- Stop doing nn wrap. (25353).
- Fixes #25454 (25456).
- Migrate clamp and clamp_ from the TH to Aten (CPU) (25290).
- Get rid of torch._thnn (25354).
- Get rid of more unused plugins (25355).
- Get rid of extract_cwarp (25356).
- Update derivatives.yaml docs to refer to Declarations.yaml rather than Declarations.cwrap. (25357).
- Kill non-shared cwrap tools. (25358).
- Delete a few cases where we directly use Backend/TensorTypeId. (25467).
- Fix implicit fallthrough warnings in FeatureLPPooling.cu (25451).
- Update speed benchmark binary to work in USE_STATIC_DISPATCH mode (25449).
- Migrate CPU_tensor_apply to TensorIterator in TensorCompare.cpp (25402).
- Creates Torch-friendly Event class and adds Stream tracking to autograd (25130).
- Run clang-format on torch/lib/c10d (25382).
- Checks requiring GPU moved to their own test (25555).
- Test_allreduce_coalesced_stress message passed in as kwarg (25557).
- Delete torch/csrc/nn/type_checks, which aren't used anymore (25506).
- Create helpers for implementing unary ops whose CUDA implementation is ATen (24879).
- Implement indexing methods for sparse tensors (24937).
- Migrate multinomial from the TH to ATen (CPU) (25274).
- Enable broadcasting of batch dimensions RHS and LHS tensors for lu_solve (24333).
- Get rid of _th_reciprocal_. (25507).
- Enable torch.cholesky for batches > 262140 (24438).
- Don't save
self
inindex
backward (25594). - Eliminate magic numbers in BatchLinearAlgebra.cu (25524).
- Allow TensorMethods.h to include Dispatcher.h (alternative) (23888).
- Fix clang-tidy script (25652).
- Kill unused enumerate_options_due_to_default. (25588).
- Kill discover_sparse_tensor_operations. (25589).
- Cpu-strided-complex support for binary-ops (25534).
- Port new_empty to ATen (25475).
- Port new_full to ATen (25583).
- Rename 'mobile' to 'static_dispatch' (25695).
- Bring back skipped bitwise dispatch (25689).
- Align AliasInfo's
operator<<
with FunctionSchema (23206). - Migrate digamma and polygamma from the TH to Aten (CUDA) (25662).
- Remove tools/setup_helpers/cudnn.py (25482).
- Enable BLIS from the FLAME project as a BLAS choice. (23819).
- Expose parse_schema and eq function to python and add round trip tests (23208).
- Fix error message stack overflow (25146).
- Fix typing on nn.Parameter (25586).
- More accurately describe field invariants in OperatorEntry (25793).
- Enable log_softmax and CrossEntropyLoss for bfloat16 (24457).
- Fix missing str to int conversion in the commit f71ddd42 (25861).
- Fix test_det_logdet_slogdet_batched on PowerPC (25773).
- Unify treatment of warp size / wave size (25884).
- Make torch checks same for both CPU and CUDA multinomial (25595).
- In the CUDA implementation of erfinv, erfinv() should be used for double (25337).
- Fix cpp_extensions test failures with GCC 9.1 from ArrayRef(initializer_list) (25384).
- Rename packed tensor accessor (25654).
- Gate static aten registerer with USE_STATIC_DISPATCH (25815).
- Tensor type set (25308).
- Enable libflame as a LAPACK choice (25795).
- Fix scatter CPU kernel when (input size, src size) > index size (25839).
- Migrate pow from TH to Aten (CUDA) (25517).
- Fix int32 overflow in SummaryOps.cu getBin #25747 (25748).
- Simplify header inclusion in test/cpp/api/modules.cpp (25921).
- Compute common dtype based on inputs only (25593).
- Updates autograd engine to respect streams set in forward (8354).
- Make running Gloo tests conditional on availability (25913).
- Remove superfluous check for POLLIN in TCPStore (25911).
- The float version of calc_digamma should return float type. (25488).
- Add VariableTensorId, store it in TensorTypeSet (25597).
- Add torch.backends.mkldnn.enabled flag (25459).
- Skip TestAutograd.test_deep_reentrant on macOS (25942).
- Skip TestHub on macOS (26033).
- Refactor torch.*solve tests (25733).
- Enables _do_cuda_non_default_stream (25989).
- Skip test_triangular_solve_batched (26108).
- Stop re-ordering TH(C)Blas arguments. (25606).
- Kill TH(C)Blas kwarg_only declarations. (25607).
- Stop reordering TH random function arguments. (25608).
- Fix base_lr overridden in cyclic lr (26105).
- Kill kwarg_only declarations in Declarations.cwrap. (25609).
- Add device check before accessing data_ptr in PackLayer (26056).
- Add data field to Tensor pyi. (26093).
- Kill most defaults in Declarations.cwrap. (25610).
- Get rid of more defaults in Declarations.cwrap. (25611).
- Kill remaining defaults in Declarations.cwrap. (25612).
- Remove requests as dependency (26083).
- Make schema part of RegisterOperators::Options (26114).
- Allow overwriting catch-all kernels (25947).
- Creates generic device type testing framework (25967).
- Add sync to flaky test_events_multi_gpu_query (26231).
- Add possible out of shared memory error message (25730).
- Ports most of test_torch.py to generic device type framework (26232).
- Add type hint for cuda.set_rng_state (26200).
- Call aten ops through c10 dispatcher (23668).
- Remove unboxedAutogradKernel from c10 (26130).
- Refines test_torch.py generic device testing (26244).
- Fix binary size of OpsAlreadyMovedToC10.cpp (26237).
- Migrate away from using Variable( in test_nn.py (26077).
- Enabled conv methods for the bfloat16 (26167).
- Move the CUDA implementation of round to ATen. (25041).
- Kill defaults in nn.yaml. (26282).
- Add s390x compiler define for s390 builds. (26233).
- Add derivative of cholesky_solve (26185).
- Kill 'default_init', which isn't needed anymore. (26281).
- Adds generic device tests to test_autograd.py (26248).
- Ensure that n is non-negative in polygamma. (26294).
- Enable batching for pinverse (26095).
- Make TORCH_WARN_ONCE capture variables by reference (26289).
- Fix race in CUDA initialization (25788).
- Kill declared_type and ignore_check from THFormal. (26284).
- Replace simple if_true / if_false cases in Declarations.cwrap. (26285).
- Fix typo (26298).
- Enabled bfloat16 dtype on CUDA (26148).
- Move more ops to c10 (26255).
- Remove dead function (26259).
- fix ctc_loss argument check error message (26325).
- Skip testing triangular_solve_batched on non-default CUDA stream (26115).
- Kill if_true / if_false in Declarations.cwrap. (26346).
- enable xla cpp tests in CI (26347).
- Resolve #25605 cyclic reference in _LRScheduler (25776).
- use allgatherv for sparse all reduce (23917).
- Removes torchtest, expands generic device testing (26374).
- Add a float version of calc_erfinv (by templating) on CPU (26070).
- Fix type mismatches in the CUDA version of calc_digamma and calc_trigamma (25791).
- Adds dtypes decorators to and allows helper methods in device generic test classes (26375).
- Fix composite learning rate (26227).
- Move the CUDA implementation of rsqrt to ATen. (25285).
- Add a flat hashmap (26371).
- Preserves insertion and deletion order in flat hashmap (25675).
- Moves more tests to TestTorchDeviceType (26435).
- Tag files should not be deleted by "python setup.py clean". (26416).
- Implement multiple dispatch (25653).
- Enabled where for bool tensor on CUDA (26430).
- Implement multiple dispatch (26468).
- Port lgamma from TH to Aten (25138).
- Make c10::Scalar::to() const (26406).
- Allocate empty tensor instead of empty_like in binary ops, fix pow (26498).
- Implement multiple dispatch (#26468) (26501).
- Move the CUDA implementation of floor to ATen. (25372).
- Fix for Conv shape check prints overflowed ints (25827).
- c10::KernelFunction (26337).
- Add two levels to use_c10_dispatcher (26272).
- Correct the test of a big number (2 ^ 31) (26491).
- Enable creation of boxing wrappers for some aten operators (26273).
- ATen port of lgamma (cuda) (26600).
- Enabled bfloat16 dtype on CUDA (26407).
- Makes test_indexing.py device generic (26634).
- Allow batch size of 0 in Conv (26214).
- A few hub improvements (25980).
- Updates and extends TestNNDeviceType (26638).
- Enable registering stackbased kernels with lambdas (26658).
- Move the CUDA implementation of trunc to ATen. (25423).
- Add derivative for cholesky_inverse (26451).
- Vectorize unary operator erfinv (26629).
- Expands TestAutogradDeviceType (26708).
- Enable hub tests on MacOS (26697).
- Simplify operator
sign
using the helper. (25592). - Address review comments in https://github.com/pytorch/pytorch/pull/26272 (26587).
- Add whitelist for backward compatible checks for function schemas (26740).
- Expose a torch.result_type and simplify tensor iterator (26012).
- Delete backwards compatibility Backend overload for registerOp (25914).
- Implement multiple dispatch in boxed c10 dispatcher (26118).
- Remove unnecessary include from TensorBody (26360).
- Add some missing constructors to IValue. (26718).
- Hub improvements (26723).
- Upgrade sleef to v3.4.0. (26749).
- Lets generic tests use multiple devices (26594).
- Refactor checked_tensor_unwrap to take DeviceType instead of Backend (26290).
- Port CUDA implementation of expm1 to ATen (26598).
- Remove one unnecessary copy of the output during the type promotion. (26816).
- Fix Future default constructor missing for ParallelNative (26739).
- Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (26592).
-
torch.load
default encoding change to 'utf-8' (26421). - Move the CUDA implementation of log to ATen. (26494).
- enable double backward for non-cudnn LSTM and GRU (26660).
- Migrate multinomial from the TH to Aten (CUDA) (26481).
- Remove three unused declaration. (26699).
- Make resize_as_ generic, so XLA works. (26809).
- Add some missing constructors to IValue. (26806).
- Change calling convention of ATenDispatch from getOp to callUnboxed. (26857).
- Refactor dispatch structure so fallback code lives inline. (26367).
- Fix nuclear norm with requires_grad=True (26303).
- Choose num_threads in parallel_for based on GRAIN_SIZE (26886).
- Use intrinsics for trigonometric functions on CPU (26431).
- Remove an unused function propagate_names_if_namedtensor_enabled (26176).
- Migrate lt and lt_ from the TH to Aten (25998).
- Make TypeDefault, TypeDerived and VariableType anonymous namespaces (26882).
- Move Generator ops to c10 (26434).
- Add torch.can_cast(from, to) function (26805).
- Include
iteration_
in SGD optimizer serialization (26906). - Make
repeat
respect the current stream (26946). - Fix issues in torch::tensor constructor (26890).
- Named tensor support for: index_fill_, index_fill, squeeze, median(Tensor) (26914).
- Add std::variant backport as torch::variant (26836).
- fix type annotation (26930).
- Bring back the optimization of integer.pow({2.0, 3.0}) on CPU (26938).
- Add torch.promote_types function (26655).
- Rewrite argmax and argmin as TensorIterator reductions (26181).
Jit:
- Cleanup interface of inlineCallTo. (23539).
- Make ProfiledTensorType hashable (23116).
- add log stmts to peephole.cpp (23279).
- add docs for serialization (23456).
- Move overview to docs/ folder (23457).
- Include recursive class compilations in error call stack (23454).
- add a test for inline tracing (23543).
- format jit_type.h (23564).
- Add logging to Alias Analysis (23383).
- Update relative links in OVERVIEW.md (23627).
- prefix module qualified names with module (23630).
- allow forward hooks in tracing (23613).
- Add in-place check to AliasDb (23210).
- Support nn.GRU in script (23266).
- Remove more uses of
DimensionedTensorType
(23060). - Compress debug symbols when serializing TorchScript models. (23659).
- Fix frontend error message (23576).
- Compress all non-Tensor components of a serialized TorchScript model. (23723).
- Initial torchbind prototype (21098).
- Perform string uniquing by value in pickle serialization. (23741).
- don't try to set training after ScriptModule has been initialized. (23680).
- Open up AliasAnalysisKind for any ops (23810).
- make nn.LSTM accept PackedSequence instead of Tuples (23643).
- fix some compiler warnings (23816).
- Properly mangle
nn.Module.__construct
(23779). - Define toIValue conversion for dtype (23708).
- format init.cpp (23840).
- Recursive script migration guide (23892).
- Erase shape information from class types (23362).
- Make typing understand exceptions (23565).
- Disable optimizer for
__setstate__
(23698). - Make assertions refine types (23949).
- add NotIn support in script (23637).
- metacompile isinstance checks (23885).
- add support for overloading functions (23886).
- jit.script() testing and fixes (23891).
- support tensor as key type in script (23638).
- make _overloads importable in nn/functional (24049).
- [jit] make sure NameTuples have unique qualified names (23798).
- serialize all c++ frontend modules to a single CU. (23645).
- fix py-compat fbcode lint warnings (23530).
- Add Pickler C++ API (23241).
- Moves clamp from autodiff cpp to symbolic script (23927).
- support dict augment assign in script (23639).
- Open up AliasAnalysisKind for any ops (23834).
- Fix builtin function reference (24056).
- add initial support for sparse tensors (23841).
- support grad and data attribute for tensor in script (23842).
- JIT Serialization of nnq.Linear (24048).
- Replace Module::copy_into with Module::clone. (24068).
- serialize modules as classes (23098).
- Delete
WeakScriptModuleProxy
(23398). - Add Pickler C++ API (23241).
- Fix trace docs (24191).
- search class type for methods (23689).
- simplify NamedType interface (23691).
- make NamedType an interface (23696).
- make FunctionType a NamedType (23697).
- class_table_ to deps_table_ (23845).
- clean up import_source (23846).
- Use JIT function schema parser to parse builtin RPC ops (24207).
- Remove DimensionedTensorType (24077).
- Fix flake8 issues in ./torch/jit (24240).
- Fix missing version < 2 guard in import (24255).
- Add logging to autodiff (23664).
- fix test_jit.py so it can be run in parallel (24311).
- fix list comprehension type assumed to the same as input type (24271).
- simplify NamedType interface (24278).
- make NamedType an interface (24279).
- make FunctionType a NamedType (24280).
- class_table_ to deps_table_ (24281).
- clean up import_source (24282).
- Add the ability to compile exports on traced modules (24298).
- Cleanup documentation around
script
andtrace
(24208). - Add
trace_module
to docs (24258). - JIT trace testing (23987).
- copy methods when creating a derived class type (24349).
- kill TK_NAMED_TUPLE_DEF (24350).
- Misc doc updates / fixes (24371).
- remove CompleteTensorType (24169).
- Remove type subclassing (24257).
- fix IR parsing bug (24294).
- pickler read guard (24433).
- Fix test_jit_cuda_archflags failure on py27 due to changing dict order. (24501).
- fix double copying of constants (24412).
- jit_log: Extract a function that prefixes all lines of a string with another string. (24355).
- Module: add dump function that recursively prints contents of the module. (24356).
- Clear recursive error stack on each compilation (23458).
- Add
@ignore
for script classes (23614). - Record function name as an attribute of CallFunction nodes. (24446).
- big cpp test reorg (24801).
- Cache node operators to speed up optimization (24827).
- Fix VaryingShape::merge (24455).
- Make torch.jit.Attribute work with PYTORCH_ENABLED=0 (23851).
- Moves (most) ops to symbolic script (23794).
- Fix unicode in comments (24218).
- serializing function calls (23799).
- Removes SymbolicVariable from tests (24007).
- Merge ProfiledTensorType and TensorType (24284).
- Remove unused DynamicDAG class. (24890).
- extend torch.jit._overload to module methods (24259).
- Remove
torch.contrib._graph_vis
(24874). - Fix missing
super
call error (24852). - bind autograd.grad function into TorchScript (24871).
- restore default constructor of OutputArchive (24955).
- Misc doc updates #2 (24445).
- Fixing size implementation for struct slot_list_impl (24351).
- add support for multiple assignment statements (24477).
- Load tensors directly from pickle archive (23281).
- Fix fbcode weak ordering (25026).
- Fix bugs in assignment to optionals (24989).
- Fix a bug in creating a prefix string in jit log. (25051).
- cleanup tmp name generation (25065).
- jni-java wrapper for pytorchScript api (25084).
- Fix python lints for generate_test_torchscripts.py (25107).
- Clean up after running doc tests (25036).
- fix annotated assignment (25094).
- dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
- add some sparse tensor ops support in TorchScript (24967).
- move some methods into function.cpp (25119).
- SubgraphMatcher: Factor out matchAttributes. (25073).
- SubgraphMatcher: add logging. (25074).
- SubgraphMatcher: matching modules support. (25075).
- Add logging to JIT CSE pass. (25141).
- bind autograd.backward and tensor.backward in TorchScript (23913).
- fix to loggin in AA (25143).
- Fix bugs in assignment to optionals (25059).
- skip fstrings test if not py36 (25184).
- Simplify NamedType (25058).
- Add interface declarations to JIT (21972).
- Remove insert_observers pass (24999).
- Remove InsertQuantDeQuantNode (25000).
- Implement a bunch of pickle serialization features that optimize for size. (23759).
- fix closures which always throw. (25278).
- Add interface declarations to JIT (25258).
- Remove PythonPrint's is_method_ member (25226).
- add serialization of interface (25227).
- improve interface error messages (25228).
- don't throw in constant prop (25270).
- Add source location to class instantiation error (24990).
- fix inliner bug (25052).
- Pull instruction definitions out of interpreter.cpp. (25148).
- Add GET_ATTR instruction (25151).
- Fix old annotate() error (25261).
- insert_observers use qconfig_dict (25069).
- Implement FoldConvBatchnorm2d pass. (25282).
- Remove spurious print (25378).
- Fix AliasAnalysisKind::PURE on MSVC (25375).
- Fix
item()
call in docs (25404). - Attempt to enable CrossMapLRN2d, as it no longer uses Module._backend. (25343).
- Some alias analysis fixes (25425).
- Emit script function calls during tracing. (25089).
- torch/jit/passes/quantization.{h,cpp} and torch/jit/init.cpp (25403).
- add tuple keyword (25474).
- Manually implement
is_zipfile
(25279). - Removes SymbolicVariable (25077).
- Added invert bitwise operation to JIT (22324).
- Remove friend dependency on ClassType in InterfaceType (25617).
- Remove forward compat code for serialization format (25440).
- Make NoneType <: Optional[T] (25361).
- Remove accidentally re-added file (25677).
- move legacy deserialization code into jit/import_legacy.cpp (25649).
- preserve ignored function return value type (25262).
- Finish testing code examples in the docs (25668).
- add getitem to class types (25664).
- Make tensor key in Dict works in serialization (25442).
- Expose an API to iterate all the registered operators (23207).
- Fix missing newline in compiled from source range highlihgt (25802).
- SubgraphMatcher: add logging to a check missed previously. (25735).
- Fix c10 tracing (25869).
- add torch.jit.is_scripting() api (25263).
- Make arguments of Module::dump easier to remember. (25740).
- Only create a new clone of observer when we actually insert it. (25931).
- add set_grad_enabled to TorchScript and fix data attribute (25350).
- add torch.jit.is_scripting api (25955).
- add support for ModuleDict (25715).
- fix use-after-free bug (25965).
- Fix torch.arange traced as constant (25363).
- Preserve module names in recursive script (24505).
- Add
in
membership checks for lists (25796). - TorchScript Serialization for dynamic LSTM module (25877).
- print source code when a function is executed (25868).
- make sure all out stringstreams start out empty in jit_log.hpp (25863).
- tracing with an opt-in by file name (25895).
- Port fuse_linear from pytorch/tvm (25623).
- Add documentation to logging (26175).
- Register ATen ops with c10 (26131).
- Add isBackwardCompatibleWith for Argument and FunctionSchema (23409).
- Add a wrapper for inspect in JIT to produce better error message (25415).
- Enable CPU fused kernel on Windows (25578).
- Fixed size arrays (23695).
- fix schema matching of tuples to vartype lists (25944).
- min(li) max(li) (26351).
- Remove
torch.save
-related logic from pickler (25502). - Add support for lists for prim::min and prim::max (26155).
- Add ivalue::type(), part 1 (25439).
- Use static type information to restore type tags (25447).
- Add filter function to subgraph rewriter runGraph (26223).
- Implement more size-oriented opcodes in the depickler. (25786).
- Refactor emitIsInstance (26061).
- add setitem to class types (25750).
- Make jit dicts ordered (26465).
- fix schema matching of tuples to vartype lists (25944).
- Fixes test_wrapped_number (26523).
- Implement more size-oriented opcodes in the depickler. (26454).
- Make
is_optional
check more robust (26312). - Resolve NamedTuple types in Python (26443).
- Fix jit/pass/peephole.cpp fuse addmm (26357).
- Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp (26432).
- Whenever possible, use function pointers rather than std::function to represent Operation's. (26560).
- Serialization for per channel qtensor (26339).
- add CondValue to unify refinements and code emission (26145).
- Add ObserveHelper and remove some common function parameters (26641).
- Remove 'recurse' parameter from Inline. (26487).
- Use std::mutex instead of std::call_once in Function when we initialize GraphExecutor. (26571).
- Add 'optimized_graph' to Function. (26488).
- Use optimized graph in Inline (essentially, making Inline recursive now). (26489).
- resolve ignored module method type annotations (26683).
- Add traces to specialize_autograd and lower_grad_of (2nd try) (22752).
- Register values listed in constants as attributes of the Module. (26581).
- Make
is_optional
check more robust (26312). - Fix builtin lookup for Python functions (26688).
- Improve error message in IR parser when accessing undefined variable. (26771).
- autodiff changes to enable profiling (25397).
- Typevar matching fix + implicit conversions from Scalar to int/float (26453).
- support iterables, rangevalue in list comprehensions (26768).
- Bytecode export flow (25187).
- Use optimized_graph in graph_executor. (26705).
- Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place. (26703).
- Fix broken failure messages for OverloadedMethodValue (26846).
- Improvements to GuardElimination and InsertBailouts (25430).
- Fix circular deps in loading (26758).
- add AutoNonVariableTypeMode guard on JIT->ATen boundary.
- Add logging in constant propagation pass (26653).
- fix range for non-int inputs and pow implementation (26926).
- Move some class/functions in test_jit.py to jit_utils.py (26839).
- Remove unimplmented passes (26978).
- Fix race condition in torch::jit::Function (27009).
Mobile:
- Add to Tensor symmetric methods getDataAsIntArray, getDataAsByteArray (25183).
- Initial commit for android torchvision utils (25185).
- Add libtorch android build with shared lib for 4 android abis (25192).
- pytorch android circleci integration (25286).
- turn off BUILD_BINARY for android CI jobs (25485).
- Gradle tasks for publishing to bintray, jcenter, mavencentral etc. (25351).
- remove protobuf usage from mobile build (25493).
- Fix iOS simulator build (25633).
- Fix OSS mobile CI (25755).
- Add PR jobs for iOS builds (25840).
- Clean up the iOS build script (25822).
- Cocoapods for iOS OSS release (25847).
- Introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile.
- Add NO_EXPORT macro to unset visibility attribute (25816).
- Update build_android.sh to not build host protoc for libtorch (25896).
- Simplify build_android_gradle.sh (25897).
- Change gradle build to use static libtorch + gc-sections (25984).
- Use torch::from_blob instead of shareExternalPointer, nits (25973).
- Change the source link in podspec (26089).
- Tensor renaming to dtype, shape; support long, double (26183).
- Fix circle CI (26225).
- Remove armv7s build from iOS (26222).
- CircleCI android nightly (snapshot) build publishing (26069).
- Fix error messages; tensor creation method names with type (26219).
- Integrate forked QNNPACK into mobile PyTorch builds. (25844).
- Add iOS test app skeleton (26261).
- Fix no tab check (26399).
- Clean up the PR job script for iOS build (26353).
- Exclude libfbjni.so from pytorch_android not to have its duplicating (26382).
- Add script to build mobile library with host toolchain (26440).
- Fix JNI wrapper for IValue interface change (26448).
- Use gradle 4.10.3 for build and publish (26473).
- Disable bitcode for iOS CI jobs (26478).
- Javadocs for Tensor, IValue, Module (26149).
- Turn off autograd mode in android JNI wrapper (26477).
- Expose USE_STATIC_DISPATCH macro to public headers.
- Improve how pytorch_android cmake imports static lib (26525).
- Add eigen blas for mobile build (26508).
- Support IValue string type (26517).
- Update android/iOS build library packing (26565).
- Add testing script for iOS x86 build (26632).
- Sync docker images (26651).
- Nightly prefix for android nightly jobs (26652).
- Refactor android torchvision: not hardcoded mean/std (26690).
- Switch our Android CI to Clang (26656).
- Prepare for Cocoapods 1.3 Release (26751).
- QEngine::QNNPACK enabled, module.eval() (26855).
- Add mobile friendly at:parallel_for backend.
- Remove backward functions from jit-op-registry for mobile build (26851).
- Check if QNNPACK is supported before set (26935).
- Fix mobile.sh build (26975).
- Fix fbjni packaging, exclude for publishing, include by default (26995).
Named Tensors:
- Fix named tensor build by enabling tensor.is_pinned and removing support for clone() (23597).
- Add torch._C._BUILD_NAMEDTENSOR() (23623).
- Add names to repr for named tensors (23316).
- Add name propagation for at::alias, add tensor.set_names (23624).
- Improve test_namedtensor.py with named tensor equality check (23801).
- Add
names
argument to ones, rand, randn, zeros, full (23743). - Implement name inference rule for empty_like, clone (23746).
- Named inference for contiguous(), bernoulli variants, and dropout. (23808).
- Add name propagation for at::alias, add tensor.set_names (24105).
- Improve test_namedtensor.py with named tensor equality check (24106).
- Add name propagation for at::alias, add tensor.set_names (24202).
- Add
names
argument to ones, rand, randn, zeros, full; fix empty (24107). - Implement name inference rule for empty_like, clone (24108).
- Named inference for contiguous(), bernoulli variants, and dropout. (24109).
- Implement tensor.align_to(names), torch.align_tensors(*tensors) (23804).
- Rename set_names -> view_names, set_names_ -> names_ (23962).
- Update tensor.view_names / tensor.names_ API (23973).
- Fix out= function semantics for named tensors. (24028).
- Name inference for softmax, log_softmax and Dimname overloads. (24087).
- Implement name inference for t(), transpose(...) (24203).
- Add thread-local-state NamesMode and NoNamesGuard (24367).
- Fix named tensor build (24940).
- Implement name inference for t(), transpose(...) (24941).
- Add thread-local-state NamesMode and NoNamesGuard (24942).
- Fix
FIXME_default_names
by storing static list of 64 none names (24885). - Rename Tensor::names() to Tensor::opt_names() (24907).
- Add helper function Tensor::names() (24914).
- Fix binary op name inference between unnamed and named tensors. (24921).
- Implement name inference for mm, addmm (24306).
- Implement name inference for expand (24469).
- Implement name inference for addmv, addmv_, mv (24471).
- Implement name inference for torch.dot (24474).
- Fix named tensor test (25313).
- Implement name inference for torch.bmm (25123).
- Implement name inference for torch.matmul (25177).
- Include the correct header for make_unique in named tensor headers (25178).
- Fix dependency by moving Dimname.{h,cpp} NamedTensor.{h,cpp} to core/ (25280).
- Add guard for named tensors in the JIT (25344).
- Add guards for using named tensor with serialization and multiprocessing (25345).
- Prepare to add some Dimname/DimnameList overloads (25405).
- Name inference rule for mean, std, var, std_mean, var_mean (25431).
- Name inference rule for masked select (25566).
- Name inference for masked_fill_ / masked_fill (25567).
- Name inference rule for torch.cat (25568).
- Fix binary op name inference to happen before shape checks (25563).
- Fix named tensor printing (25564).
- Name inference rules for relu/relu_/threshold/threshold_ (25569).
- Implement initial version of autograd with named tensors (25604).
- Fix named tensor build (25673).
- Move most BUILD_NAMEDTENSOR macros out of header areas (25721).
- Rename tensor.view_names -> tensor.renamed (25711).
- Move BUILD_NAMEDTENSOR in NamedTensorUtils.h (25781).
- Add flatten for named tensors. (25672).
- Quick fixes for named tensor for windows (25728).
- Name inference for unbind (25585).
- Fix assertion if NamedTensorMeta's num_names != tensor.dim (25778).
- Add names= argument to torch.tensor ctor (25424).
- Remove some more BUILD_NAMEDTENSOR flags (25919).
- Delete tools/autograd/env.py (25920).
- Remove unnecessary BUILD_NAMEDTENSOR from interned_strings.h (25938).
- Add TEST_NAMEDTENSOR flag to namedtensor ci (25948).
- Move NamedTensorMetaInterface definitions to TensorImpl.h (26030).
- Experimental warning for named tensors (26050).
- Implement tensor.refine_names (25842).
- Implement tensor.align_as(other), change tensor.align_to(names) (25843).
- Fix bug with named tensors and (no) tracer support (26106).
- Fix namedtensor ci (26257).
- Turn on BUILD_NAMEDTENSOR permanently (26060).
- Implement named tensor
unflatten(dim, namedshape)
. (25658). - Rename torch.namedtensor -> torch._namedtensor_internals (26349).
- Change '*' to '...' and
...
for named tensor API functions. (26350). - Change "named_guard" in native_functions to "supports_named_tensor" (26352).
- ensure c10/macros included before using (26439).
- Disable tagged names (26479).
- Delete tagged names (26365).
- Refactor Dimname.h API to be nicer (26366).
- Implement resize_, resize_as_ for named tensors (26493).
- Support torch.pow with named tensors (26541).
- Name inference for min(Tensor, dim?) / max(Tensor, dim?) (25582).
- Renames
tensor.renamed -> rename
,tensor.names_ -> rename_
(26548). - Fix ellipsis behavior for
Tensor.align_to
to glob all missing dims (26648). - Typo fix (26417).
- Don't generate named tensor functions to RegistrationFunctions.h (26685).
- Add a lot of dimname overloads (26636).
- Wrap dimensions during named inference (26558).
- Named tensor support for: atan2, output_nr, detach{}, requires_grad (26543).
- Named tensor support for logsumexp, mode, kthvalue, median, min, max (26563).
- Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (26815).
- Fix CUDA named tensor
copy_
(26829). - Make named tensor implementations more robust (26968).
- Better named tensor error messages. (26974).
- Enable named tensors for arithmetic, clone, and tensor conversion ops (23237).
- Move most BUILD_NAMEDTENSOR macros out of header areas (25721).
- Rename tensor.is_named to has_named, expose has_named to python. (23315).
ONNX
- Fix unused imports in torch/onnx/symbolic_opset8.py (23678).
- Support ONNX export Multinomial (23581).
- added opset10 ORT tests (22993).
- frobenius_norm onnx export added (23536).
- Std opset export (22310).
- weight_names bug fix (23848).
- canonicalize_ops pass bugfix: copy metadata for new output (23809).
- Provide argument in ONNX export to exclude intializers from graph inputs. (23284).
- Fix validation of dynamic axes names (23974).
- updated pixel_shuffle in opset 11 to use depthToSpace (23739).
- Relax precision constraint on ONNXRuntime._gru_test (24340).
- Add ONNX Export Support to empty and empty_like (24166).
- Update docs for softmax in onnx supported operators (24832).
- enable "keeps" from BoxWithNMSLimit and caffe2_fastrcnn_outputs_inference (24451).
- cumsum (24476).
- Fix some typos in documentation (23507).
- Update onnxruntime CI version (24414).
- Momentum setting in SyncBatchNorm forward (inference) pass. (24995).
- Export Unique (25050).
- Fix dead link and syntax in ONNX landing page (25126).
- Fixed nondeterministic RG for ORT RNN tests (25205).
- Add ONNX Export Support to rsqrt (24153).
- Add ONNX export support for torch.log1p. (25808).
- remove "build_deps" arg from setup.py command in (26113).
- Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (26137).
- Export round (26126).
- fix test_arange and bump ort ci version (26320).
- Automatic update of fbcode/onnx to 1316afc9f972f81340faa05763e2898f38bcc3b0 (26309).
- add pass for onnx scalar type conversion (24378).
- Export clamp for opset 11 (25797).
- Export gelu (24475).
- Fix Exporting RNN/LSTM's Initial State (h0/c0) to ONNX (22813).
- Update ONNX Export for Gather and Scatter for Opset 11 (24790).
- Automatic update of fbcode/onnx to 23bb6ea1a71f08e200114a153f48bd7adb66d486 (26441).
- Setting automatic default selection for ONNX IR v4 semantics in ONNX export API (26146).
- Update ONNX Export for Interpolate in Opset 11 (24805).
- Make ONNX_ATEN_FALLBACK also works for _export (26738).
- Automatic update of fbcode/onnx to ab6b94203c595f74b1f126eb118eef22e4c05a57 (26736).
- Update ONNX Export for Interpolate in Opset 11 (26778).
- Support Negative Axis in Size in ONNX (26436).
- Export baddbmm (25738).
- Export index_fill and index_copy, fix caffe2 scatter (23052).
- Add Support to Dicts and Strings in ONNX for Inputs and Outputs (25889).
- export baddbmm (26901).
- Updating producer_version in exported ONNX models to PyTorch 1.3. (26976).
Performance and Benchmarking
- Added torch.autograd.profiler.record_function() as context manager. (23428).
- Fix regression in torch.qr (23591).
- Fix pin_memory_thread not exiting quickly (23646).
- Increase predefined_minimum_secs to reduce variation (23734).
- Enhance Tensor indexSelect performance (23055).
- Separate input shapes to reduce default execution time (24136).
- Increase default warmup iter and iter (24272).
- Fix perf bug with indexed assignment (index_put_) (24083).
- Add wipe cache (24390).
- Vectorize LowerCholeskyTransform (24131).
- Change the location of wipe cache (24454).
- Optimize performance for unboxed-only kernels (25055).
- Fix ios_crash:backtrace=FBCameraFramework:caffe2::getClockTimeMilliseconds() (perf_observer.cc (24813).
- Add speed benchmark binary for torch jit (25230).
- Change shape for conv and unary ops (25477).
- Add speed benchmark binary for torch jit (25486).
- Fix operator level benchmark to have NHWC layout (26577).
- Speed up an integer to the power of a positive integer on CPU (26020).
- Use Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).
- Use parallel_for in DepthwiseConvKernel (26879).
Quantization
- Quantized Average Pool kernel (23143).
- skip nn.Identity in add_observer (23500).
- Change condition in swap module (23561).
- make_module: First version (23288).
- ConvBn2d/ConvBnReLU2d (23357).
- fix conv2d (23690).
- QAT modules take qconfig as argument and keep qconfig as memeber (23609).
- Remove qconfig_dict from API (23465).
- Fix LSTM int8 quantization model size issue (23577).
- Expose the quantized inputs and output of dynamic quantized int8 FC operator for debugging (23566).
- Support for non-zero zero_points for weight and activation (23541).
- qconv operator level benchmark (22895).
- Enable OSS quantization tests (23858).
- Change fbgemm_linear_{int8,fp16}weight to fbgemm_linear{int8,fp16}_weight_fp32_activation (22955).
- clang-format aten/src/ATen/native/quantized (23898).
- save()/load() tests and fixes (23911).
- Enabling inline in quantized relu (23704).
- Fix qconv benchmark (24019).
- Adding dequantize_val and requantize_val (23909).
- Simplified nnq.Linear class (24046).
- State dict serialization of nnq.Linear (24047).
- is_quantized support in JIT (24099).
- Re-work Conv2d (24115).
- state_dict serialization for Conv2d + some bugfixes (24116).
- JIT serialization for Conv2d (24117).
- fix py2 imports in _intrinsic/modules (24206).
- Fix incorrect type annotation on Linear setstate (24209).
- Add out variant (23956).
- Removing the make_module script. (23635).
- Observer returns original tensor for post training quantization (24196).
- test_nn_quantized -> test_quantized_nn_mods (24201).
- Fix and test conv2d constructor and from_float (24277).
- Add out variant (23971).
- Add dynamic quantized Linear op in PyTorch (23464).
- Dynamic Quantized Linear Module (23128).
- Skip test_quantized_nn_mods tests if theres no FBGEMM (24302).
- no_deadline on ModuleAPITests and skip on dynamic quantization test (24307).
- Add the type matching rule for qconfig_dict (23212).
- equal() for QuantizedCPU (24211).
- Fix the dimension mismatch issues when running the BERT model (23330).
- Make the default qconfig_dict (24232).
- Remove the activation observer for default_qconfig (24299).
- fix lint (24375).
- test {init,from_float} on nnq{,d}.Linear (24364).
- Fix more warnings (24291).
- Run quantization tests first (24366).
- Temporarily disable warnings in dynamic quantization ops (24376).
- Fix Lint (24381).
- Add intrinsic module mappings (23753).
- Change return type of observer to two tensors (24339).
- Add _pair for quantized conv module (24409).
- Replacing axis with dim in quantized cat (24151).
- Remove redundant assignment (24408).
- Fix QConfig_dynamic typename (24431).
- Baseline observer module, ensuring that (min,max) range includes zero. (24297).
- Convert bias to float in quantized conv module (24424).
- Fixes the adding of the observer to the FloatFunctional (24418).
- Adds a placeholder for the 'mul' operator. (24421).
- Increasing precision for avg pool (23906).
- Enables
inplace
in the quantized relu (24374). - extra_repr for quantized modules (24443).
- Change kernel_size to self.kernel_size to resolve error in quantized conv module (24499).
- Add resnext 32x4d shapes to benchmark (24503).
- Add the default_weight_observer for the dynamic quantization path (24231).
- Clang formatting the code [1/2] (24867).
- Support QScheme in script (24358).
- Use absolute import of the parent folder without alias. (24792).
- Added relu6 kernel (24799).
- PrepareQuant step (24425).
- reduce for QScheme (24969).
- Remove Symmetric Quantizer in backend (24964).
- gradient clipping by norm.
- Make observer scriptable (24996).
- Add qconv_test to benchmarking tests (24913).
- Adding quantized mul kernel (24444).
- Enable UBSAN test for FBGEMM in dynamic quant test (25099).
- Per Channel quantization APIs (24935).
- per channel quantization support (24936).
- Add missing functions and methods for channelwise quantization (24934).
- Support lowering of fp16 weights.
- use avx2 for Add without broadcast and when inputs are uint8_t (25098).
- per channel quantization support (25134).
- insert_quant_dequant jit pass (24426).
- quant_fusion jit pass (24427).
- Work around for bias quantization for conv and linear operators (24789).
- Handle empty qconfig for functional Modules (24803).
- Update mapping dictionary to support functionalmodules and pooling operations (24804).
- Support observer without any data calibration (24923).
- Serialization for nn.quantized.functional modules (24924).
- Move test QAT tests to double precision to ensure numerics match (25189).
- Adding return for the observer in the functional_modules.py (25168).
- Adding Scalar add/mul. (24447).
- Fix scriptability for Observer (25197).
- Integration tests for initial quantization graph mode (24428).
- skip tests if fbgemm is not supported for test_quantizer.py (25209).
- add import for test_quantizer.py (25222).
- Remove deprecated graph mode quantization tests (24998).
- Move test QAT tests to double precision to ensure numerics match (25211).
- Update mapping dictionary to support functionalmodules and pooling operations (25216).
- Fix scriptability for Observer (25219).
- Disable flaky test_adaptive_avg_pool2d test. (25249).
- Handle empty qconfig for functional Modules (25215).
- Reducing the test size for adaptive avg pool (25195).
- get rid of dynamic_cast in Quantizer (25001).
- disable deadline checking on test_adaptive_avg_pool2d (25255).
- Fixing the enforcement of the zero_point (25193).
- Add new qnnpack_add and qnnpack_maxpool op to C10 registry (24103).
- Serialization for nn.quantized.functional modules (25220).
- int8 static quantization in the numerical debugger.
- Work around for bias quantization for conv and linear operators (25212).
- Refactor MinMax observer (23902).
- Quantized comparators (24387).
- Ensure quantized::add stride matches inputs (25265).
- Make quantized relu ops inherit the memory format from input (25271).
- insert_quant_dequant work with qconfig_dict (25127).
- Integration tests for qconfig_dict (25217).
- Removing future imports from the test fixtures. (25296).
- Memory layout for pooling ops (25374).
- making quant utilities inplace (25054).
- Skip test_compare_tensor_scalar due to overflow error (25432).
- Per Channel Quantization Support for Quantized Linear Operator (25276).
- Skip inserting observers for Tensors inside fused op (25281).
- Remove unnecessary checks in InsertQuantDeQuantImpl (25370).
- Change exception to warning (25408).
- Quantized vec256 + vectorized quantized::add (25202).
- Minor fixes in per channel support for qconv kernel (25182).
- Vectorized quantized relu/relu6 (25496).
- Remove index calculation in quantized max_pool2d (25526).
- Add the dynamic quantized LSTM module (25157).
- Dynamic dispatch for optimized quantized op kernels (25545).
- Rename fbgemm quantized operators to generic
quantized
ops (25338). - move no_deadline to hypothesis_utils.py (25598).
- Rename FBGEMM quantized operators to generic quantized ops (25678).
- Inserting observers for all methods called in forward (25503).
- Vectorized specialization of max_pool2d for channels-last layout (25676).
- Copy quantize routine to vec256 (25685).
- Store bias in PackedLinearWeight struct in fbgemm (25428).
- derandomize hypothesis tests (25513).
- Relax scale to prevent saturation in conv/linear. Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in (25667).
- Use more efficient specialized Quantize routine (25731).
- Factor unnecesary work out of add inner loop (25751).
- Fork QNNPACK into aten/src/ATen/native/quantized/cpu/qnnpack (25500).
- Test scripting and tracing for dynamic linear modules (25870).
- Store bias in PackedConvWeight in fbgemm (25626).
- Add Dropout to blacklist (25881).
- Add torch.nn.LSTM into the default dynamic quantize mappings (25954).
- Change order of activation and weight in QConfig (25950).
- indentation for hypothesis profile and proper inheritance for QuantizationTestCase (25934).
- Improve error message when input is not in the right format (25928).
- add the tensor_observer to record the runtime tensor for quantization … (25830).
- Add new API for Fully Connected and Convolution Operators in QNNPACK (25862).
- remove verbose in pytorch_ci hypothesis profile (26075).
- Upgrade the naming for fbgemm quantized op (26064).
- Use BytesIO instead of tempfile (25976).
- Add Runtime flag for quantized backend. (25680).
- TorchScript Serialization for dynamic LSTM (26084).
- Skip inserting duplicate observers (25504).
- Fix build warning in vec256_qint.h (26121).
- Support quantizing any methods called (25505).
- Add fusion for quantized linear (25624).
- Fold quantize op into module (25625).
- use whitelist for selecting observed values (25974).
- Add histogram observer (23959).
- Back out "[quant][observer] Add histogram observer" (26236).
- fix hypothesis timeout (26280).
- Whiltelist and fusion support for quantized::linear - addmm (26208).
- Whiltelist and fusion support for quantized::linear - matmul(without bias) (26209).
- Disable broken unit tests (26301).
- Whiltelist and fusion support for quantized::linear - matmul(with bias) (26204).
- Dynamic quantization for bias. (26057).
- Add missing argument for failing function call (26311).
- Enable support for dilated convolutions (26205).
- Adding quantized::linear function for pytorch mobile in c10 (26135).
- Add l2 norm minimization (24022).
- Disable QNNPACK tests if pytorch is not built with it. (26427).
- Adding quantized::conv2d function for pytorch mobile in c10 (26152).
- Add extra filtering for scale/zero_point/dtype in FoldQuantizeCallIntoBuffer (26224).
- Remove quantizeBias (26388).
- Add NoQEngine to QEngine and refactor the name of set/get qengine (26330).
- Fix quantized::linear QuantFusion patterns (26414).
- Add per channel observer (25887).
- Add support to call unpack for pytorch mobile quantized FC and Conv (26211).
- Remove quantization for bias in pattern (26415).
- Implement more support for per-channel quantization (26240).
- Fold weight permutation inside quantized conv operator (26241).
- Fold activation permutation inside quantized conv operator (26242).
- Add NoQEngine to QEngine and refactor the name of set/get qengine (26471).
- Add the FP16 weight support for LSTM in dynamic_quantize (25975).
- Fix quantized::conv2d patterns in QuantFusion (26515).
- Changes to support int8 weight and fp32 bias in QNNPACK (26307).
- Add the quantized average_pool2d support and adaptive_avg_pool2d support (25899).
- Fix the API for record observer (26413).
- Unify Quantization APIs for add, pool and relu (26335).
- Compiler warnings cleanup for quantization.cpp. (26585).
- quantize_linear -> quantize_per_tensor (26574).
- Get scalar type from observer module (26425).
- Add inplace argument to InsertObservers and InsertQuantDeQuant (26389).
- Expose supportedQEngines to python (26474).
- quantize_linear_per_channel -> quantize_per_channel (26575).
- Skip some fragile tests (26599).
- quantized average_pool2d and adaptive_avg_pool2d implementation(Revert d17437015) (26580).
- _dequantize_linear -> _dequantize_per_tensor (26576).
- Unify Quantization APIs for add, pool and relu (26586).
- Simplify observers declaration with functools.partial (26492).
- Import torch.quantization when one imports torch (26649).
- NHWC specialization for quantized::cat (26524).
- Fix the flaky test_qlinear test caused by hypothesis deadline (26663).
- quantized torch.topk (26486).
- remove unneeded code (26640).
- Update qengine flag in python to string (26620).
- _per_tensor_affine_qtensor -> _make_per_tensor_quantized_tensor (26678).
- Skip observing bias across function call hierarchy (26642).
- _per_channel_affine_qtensor -> _make_per_channel_quantized_tensor (26679).
- Quantized Interpolate Kernel(upsample_nearest2d) (26617).
- Fix _empty_per_channel_affine_quantized to be less hacky (26243).
- Per-channel quantized tensor to have only a single axis (26675).
- Allow per-channel QTensor accept any floating type for scales (26676).
- Use noop observer to pass dtype for dynamic quantization (26709).
- Remove duplicate calculation of output shape (26684).
- Trivial quantized torch.mean implementation (26253).
- Remove _dequantize_per_channel in the pattern (26680).
- Un-hardcode epsilon constant in FoldConvBatchNorm2d. (26584).
- Remove _dequantize_per_tensor (26681).
- Add threadpool in qlinear and qconv for mobile (26728).
- move more functions to InsertObserversHelper (26696).
- quantized_tensor tests (25429).
- Handle DeQuantStub() for QAT (26518).
- Add include to resolve PRIu32 macro (26745).
- Fake quantization enhancements for QAT/PTQ support (26420).
- move more functions to InsertObserversHelper (26773).
- quantized_tensor tests (26784).
- Quantized Interpolate Kernel(upsample_bilinear2d) (26631).
- Throw if someone tries to torch.save() quantized modules (26828).
- Re-write of tensor-scalar quantized add (26766).
- Try to disable annoying hypothesis warnings again (26853).
- Remove unnecessary functions and cleanup code in quantization.cpp. (26852).
- Add more inplace arguments to quantization top level API (26782).
- batch size 0 support in ChannelShuffle DNNLOWP op (26858).
- batch size 0 support in Conv DNNLOWP ops (26871).
- batch size 0 tests for element-wise DNNLOWP ops (26870).
- batch size 0 support in FC DNNLOWP operators (26872).
- batch size 0 tests for Quantize/Dequantize DNNLOWP ops (26873).
- batch size 0 support in norm operators (26894).
- batch size 0 tests in BatchMatMul ops (26874).
- Set quantized engine backend for mobile in speed_benchmark_torch (26911).
- Support ceil_mode in quantized maxpool (26916).
- Make quantized max_pool2d error message more specific and less silly (26918).
- batch size 0 tests for etc DNNLOWP operators (26877).
- Fake quantization enhancements for QAT/PTQ support- fix tests (26876).
- Serialization and range reduction support for Fake Quant/Observer (26519).
- Fix the QuantizedAVX2 build issue (26854).
- Default histogram observer (26622).
- Fix all factory invocations in quantized to correctly propagate options. (26966).
- control of observer/fake-quant operations (26520).
- Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (26840).
- Fix misuages for TORCH_CHECK/TORCH_INTERNAL_ASSERT with string (26897).
- Better error message for calculate_qparams (26985).
- Add P99 method with configurable thresholds.
- Xray image inference on multi-cpu and dumping dnnlowp tensors (22537).
- Add int8 resize nearest 3d op in DNNLOWP (26063).
- Re-write of tensor-scalar mul (26937).
- Support qadd_relu on pytorch mobile (26982).
- Add optimized quantize function for ARM (26867).
- Add QuantFusion to graph_executor (26591).
- Move patterns in QuantFusion to a separate file (26848).
- PyTorch Graph Mode Quantization API (26390).
- Add the type matching rule for qconfig_dict (23212).
Visualization
- Added mesh plugin (24039).
- Update tensorboard.rst (22026).
- Remove hard Caffe2 dependency for TensorBoard (24295).
- Added test_tensorboard.py to TARGETS (24040).
- Hyperparameter plugin (23134).
- Removed external tensorboardX dependency (25259).
- Fix empty graph problem (25599).
- Delay external imports until we're ready to test tensorboard (25993).
- Create TensorBoard test classes in all cases (26005).
- Fix flaky SummaryWriter test (26395).
- Fixing the calling parameters of write_gif function of the moviepy. (21218).
- Add Virtual Memory and CPU percentage computation to AIBench (23590).