diff --git a/docsrc/tutorials/notebooks.rst b/docsrc/tutorials/notebooks.rst index df903fc353..46a006c40e 100644 --- a/docsrc/tutorials/notebooks.rst +++ b/docsrc/tutorials/notebooks.rst @@ -23,13 +23,13 @@ and running it to test the speedup obtained. * `Torch-TensorRT Getting Started - CitriNet `_ -Compiling EfficentNet with Torch-TensorRT +Compiling EfficientNet with Torch-TensorRT ******************************************** -EfficentNet is a feedforward CNN designed to achieve better performance and accuracy than alternative architectures +EfficientNet is a feedforward CNN designed to achieve better performance and accuracy than alternative architectures by using a "scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient". -This notebook demonstrates the steps for optimizing a pretrained EfficentNet model with Torch-TensorRT, +This notebook demonstrates the steps for optimizing a pretrained EfficientNet model with Torch-TensorRT, and running it to test the speedup obtained. * `Torch-TensorRT Getting Started - EfficientNet-B0 `_ @@ -43,7 +43,7 @@ This way, the model learns an inner representation of the English language that features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs." (https://huggingface.co/bert-base-uncased) -This notebook demonstrates the steps for optimizing a pretrained EfficentNet model with Torch-TensorRT, +This notebook demonstrates the steps for optimizing a pretrained EfficientNet model with Torch-TensorRT, and running it to test the speedup obtained. * `Masked Language Modeling (MLM) with Hugging Face BERT Transformer `_ @@ -73,7 +73,7 @@ Using Dynamic Shapes with Torch-TensorRT Making use of Dynamic Shaped Tensors in Torch TensorRT is quite simple. Let's say you are using the ``torch_tensorrt.compile(...)`` function to compile a torchscript module. One -of the args in this function in this function is ``input``: which defines an input to a +of the args in this function is ``input``: which defines an input to a module in terms of expected shape, data type and tensor format: ``torch_tensorrt.Input.`` For the purposes of this walkthrough we just need three kwargs: `min_shape`, `opt_shape`` and `max_shape`. @@ -96,8 +96,8 @@ In this example, we are going to use a simple ResNet model to demonstrate the us Using the FX Frontend with Torch-TensorRT ******************************************** -The purpose of this example is to demostrate the overall flow of lowering a PyTorch model to TensorRT -conveniently with using FX. +The purpose of this example is to demonstrate the overall flow of lowering a PyTorch model to TensorRT +conveniently using FX. * `Using the FX Frontend with Torch-TensorRT `_ diff --git a/docsrc/tutorials/serving_torch_tensorrt_with_triton.rst b/docsrc/tutorials/serving_torch_tensorrt_with_triton.rst index a713d563c2..31cb6733cf 100644 --- a/docsrc/tutorials/serving_torch_tensorrt_with_triton.rst +++ b/docsrc/tutorials/serving_torch_tensorrt_with_triton.rst @@ -4,11 +4,11 @@ Serving a Torch-TensorRT model with Triton ========================================== Optimization and deployment go hand in hand in a discussion about Machine -Learning infrastructure. Once network level optimzation are done +Learning infrastructure. Once network level optimization are done to get the maximum performance, the next step would be to deploy it. -However, serving this optimized model comes with it's own set of considerations -and challenges like: building an infrastructure to support concorrent model +However, serving this optimized model comes with its own set of considerations +and challenges like: building an infrastructure to support concurrent model executions, supporting clients over HTTP or gRPC and more. The `Triton Inference Server `__ @@ -67,7 +67,7 @@ highly recommend to checking our `Github Repository `__. To use Triton, we need to make a model repository. A model repository, as the -name suggested, is a repository of the models the Inference server hosts. While +name suggests, is a repository of the models the Inference server hosts. While Triton can serve models from multiple repositories, in this example, we will discuss the simplest possible form of the model repository. @@ -204,7 +204,7 @@ Lastly, we send an inference request to the Triton Inference Server. inference_output = results.as_numpy('output__0') print(inference_output[:5]) -The output of the same should look like below: +The output should look like below: :: diff --git a/docsrc/user_guide/dynamic_shapes.rst b/docsrc/user_guide/dynamic_shapes.rst index 73ed9b594a..9d339e80a4 100644 --- a/docsrc/user_guide/dynamic_shapes.rst +++ b/docsrc/user_guide/dynamic_shapes.rst @@ -6,9 +6,9 @@ Dynamic shapes with Torch-TensorRT By default, you can run a pytorch model with varied input shapes and the output shapes are determined eagerly. However, Torch-TensorRT is an AOT compiler which requires some prior information about the input shapes to compile and optimize the model. In the case of dynamic input shapes, we must provide the (min_shape, opt_shape, max_shape) arguments so that the model can be optimized for -these range of input shapes. An example usage of static and dynamic shapes is as follows. +this range of input shapes. An example usage of static and dynamic shapes is as follows. -NOTE: The following code uses Dynamo Frontend. Incase of Torchscript Frontend, please swap out ``ir=dynamo`` with ``ir=ts`` and the behavior is exactly the same. +NOTE: The following code uses Dynamo Frontend. In case of Torchscript Frontend, please swap out ``ir=dynamo`` with ``ir=ts`` and the behavior is exactly the same. .. code-block:: python diff --git a/docsrc/user_guide/ptq.rst b/docsrc/user_guide/ptq.rst index b62457109f..f855c6679c 100644 --- a/docsrc/user_guide/ptq.rst +++ b/docsrc/user_guide/ptq.rst @@ -13,14 +13,14 @@ Users writing TensorRT applications are required to setup a calibrator class whi the TensorRT calibrator. With Torch-TensorRT we look to leverage existing infrastructure in PyTorch to make implementing calibrators easier. -LibTorch provides a ``DataLoader`` and ``Dataset`` API which steamlines preprocessing and batching input data. +LibTorch provides a ``DataLoader`` and ``Dataset`` API which streamlines preprocessing and batching input data. These APIs are exposed via both C++ and Python interface which makes it easier for the end user. For C++ interface, we use ``torch::Dataset`` and ``torch::data::make_data_loader`` objects to construct and perform pre-processing on datasets. The equivalent functionality in python interface uses ``torch.utils.data.Dataset`` and ``torch.utils.data.DataLoader``. This section of the PyTorch documentation has more information https://pytorch.org/tutorials/advanced/cpp_frontend.html#loading-data and https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html. Torch-TensorRT uses Dataloaders as the base of a generic calibrator implementation. So you will be able to reuse or quickly -implement a ``torch::Dataset`` for your target domain, place it in a DataLoader and create a INT8 Calibrator -which you can provide to Torch-TensorRT to run INT8 Calibration during compliation of your module. +implement a ``torch::Dataset`` for your target domain, place it in a DataLoader and create an INT8 Calibrator +which you can provide to Torch-TensorRT to run INT8 Calibration during compilation of your module. .. _writing_ptq_cpp: @@ -108,7 +108,7 @@ Next we create a calibrator from the ``calibration_dataloader`` using the calibr Here we also define a location to write a calibration cache file to which we can use to reuse the calibration data without needing the dataset and whether or not we should use the cache file if it exists. There also exists a ``torch_tensorrt::ptq::make_int8_cache_calibrator`` factory which creates a calibrator that uses the cache -only for cases where you may do engine building on a machine that has limited storage (i.e. no space for a full dataset) or to have a simpiler deployment application. +only for cases where you may do engine building on a machine that has limited storage (i.e. no space for a full dataset) or to have a simpler deployment application. The calibrator factories create a calibrator that inherits from a ``nvinfer1::IInt8Calibrator`` virtual class (``nvinfer1::IInt8EntropyCalibrator2`` by default) which defines the calibration algorithm used when calibrating. You can explicitly make the selection of calibration algorithm like this: @@ -118,7 +118,7 @@ defines the calibration algorithm used when calibrating. You can explicitly make // MinMax Calibrator is geared more towards NLP tasks auto calibrator = torch_tensorrt::ptq::make_int8_calibrator(std::move(calibration_dataloader), calibration_cache_file, true); -Then all thats required to setup the module for INT8 calibration is to set the following compile settings in the `torch_tensorrt::CompileSpec` struct and compiling the module: +Then all that's required to setup the module for INT8 calibration is to set the following compile settings in the `torch_tensorrt::CompileSpec` struct and compiling the module: .. code-block:: c++ diff --git a/docsrc/user_guide/runtime.rst b/docsrc/user_guide/runtime.rst index 8264abdd32..4b9f3688a3 100644 --- a/docsrc/user_guide/runtime.rst +++ b/docsrc/user_guide/runtime.rst @@ -4,7 +4,7 @@ Deploying Torch-TensorRT Programs ==================================== After compiling and saving Torch-TensorRT programs there is no longer a strict dependency on the full -Torch-TensorRT library. All that is required to run a compiled program is the runtime. There are therfore a couple +Torch-TensorRT library. All that is required to run a compiled program is the runtime. There are therefore a couple options to deploy your programs other than shipping the full Torch-TensorRT compiler with your applications. Torch-TensorRT package / libtorchtrt.so @@ -24,7 +24,7 @@ programs just as you would otherwise via PyTorch API. .. note:: If you are using the standard distribution of PyTorch in Python on x86, likely you will need the pre-cxx11-abi variant of ``libtorchtrt_runtime.so``, check :ref:`Installation` documentation for more details. -.. note:: If you are linking ``libtorchtrt_runtime.so``, likely using the following flags will help ``-Wl,--no-as-needed -ltorchtrt -Wl,--as-needed`` as theres no direct symbol dependency to anything in the Torch-TensorRT runtime for most Torch-TensorRT runtime applications +.. note:: If you are linking ``libtorchtrt_runtime.so``, likely using the following flags will help ``-Wl,--no-as-needed -ltorchtrt -Wl,--as-needed`` as there's no direct symbol dependency to anything in the Torch-TensorRT runtime for most Torch-TensorRT runtime applications An example of how to use ``libtorchtrt_runtime.so`` can be found here: https://github.com/pytorch/TensorRT/tree/master/examples/torchtrt_runtime_example @@ -33,7 +33,7 @@ Plugin Library In the case you use Torch-TensorRT as a converter to a TensorRT engine and your engine uses plugins provided by Torch-TensorRT, Torch-TensorRT ships the library ``libtorchtrt_plugins.so`` which contains the implementation of the TensorRT plugins used by Torch-TensorRT during -compilation. This library can be ``DL_OPEN`` or ``LD_PRELOAD`` similar to other TensorRT plugin libraries. +compilation. This library can be ``DL_OPEN`` or ``LD_PRELOAD`` similarly to other TensorRT plugin libraries. Multi Device Safe Mode --------------- @@ -60,7 +60,7 @@ doubles as a context manager. TensorRT requires that each engine be associated with the CUDA context in the active thread from which it is invoked. Therefore, if the device were to change in the active thread, which may be the case when invoking engines on multiple GPUs from the same Python process, safe mode will cause Torch-TensorRT to display -an alert and switch GPUs accordingly. If safe mode were not enabled, there could be a mismatch in the engine +an alert and switch GPUs accordingly. If safe mode is not enabled, there could be a mismatch in the engine device and CUDA context device, which could lead the program to crash. One technique for managing multiple TRT engines on different GPUs while not sacrificing performance for diff --git a/docsrc/user_guide/using_dla.rst b/docsrc/user_guide/using_dla.rst index 9b062f7053..d3f5f7d735 100644 --- a/docsrc/user_guide/using_dla.rst +++ b/docsrc/user_guide/using_dla.rst @@ -7,7 +7,7 @@ DLA NOTE: DLA supports fp16 and int8 precision only. -Using DLA with torchtrtc +Using DLA with `torchtrtc` .. code-block:: shell @@ -41,7 +41,7 @@ Using DLA in a python application compile_spec = { "inputs": [torch_tensorrt.Input(self.input.shape)], "device": torch_tensorrt.Device("dla:0", allow_gpu_fallback=True), - "enalbed_precisions": {torch.half}, + "enabled_precisions": {torch.half}, } trt_mod = torch_tensorrt.compile(self.scripted_model, compile_spec)