Skip to content

Commit a51c7b6

Browse files
committed
refactor: Upgrading to LibTorch 1.5.0 (CUDA 10.2, cuDNN 7.6.5, TensorRT
7.0.0) - Closes #42 - Issue #1 is back, unknown root cause, will follow up with the PyTorch Team - Closes #14: The default build now requires users to grab the tarballs from the NVIDIA website to support hermetic builds, may look at some methods to smooth this out later. The old method is still available - New operators need to be implemented to support MobileNet in 1.5.0 (blocks merge into master) Signed-off-by: Naren Dasan <[email protected]> Signed-off-by: Naren Dasan <[email protected]>
1 parent 36d27da commit a51c7b6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+459
-233
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,4 @@ cpp/ptq/datasets/data/
2424
tests/accuracy/datasets/data/*
2525
._.DS_Store
2626
*.tar.gz
27+
*.tgz

README.md

Lines changed: 71 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
> Ahead of Time (AOT) compiling for PyTorch JIT
44
5-
TRTorch is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, TRTorch is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. TRTorch operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/F16) and other settings for your module.
5+
TRTorch is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, TRTorch is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. TRTorch operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/F16/INT8) and other settings for your module.
66

77
More Information / System Architecture:
88

@@ -35,28 +35,89 @@ auto results = trt_mod.forward({in_tensor});
3535
| Platform | Support |
3636
| -------- | ------- |
3737
| Linux AMD64 / GPU | **Supported** |
38-
| Linux aarch64 / GPU | **Planned/Possible with Native Compiation and small modifications to the build system** |
38+
| Linux aarch64 / GPU | **Planned/Possible with Native Compiation but untested** |
3939
| Linux aarch64 / DLA | **Planned/Possible with Native Compilation but untested** |
4040
| Windows / GPU | - |
4141
| Linux ppc64le / GPU | - |
4242
4343
### Dependencies
4444
45-
- Libtorch 1.4.0
46-
- CUDA 10.1
47-
- cuDNN 7.6
48-
- TensorRT 6.0.1
45+
- Libtorch 1.5.0
46+
- CUDA 10.2
47+
- cuDNN 7.6.5
48+
- TensorRT 7.0.0
4949
5050
## Prebuilt Binaries
5151
5252
Releases: https://github.com/NVIDIA/TRTorch/releases
5353
5454
## Compiling TRTorch
5555
56-
Install TensorRT, CUDA and cuDNN on the system before starting to compile.
56+
### Installing Dependencies
5757
58+
You need to start by having CUDA installed on the system, Libtorch will automatically be pulled for you by bazel,
59+
then you have two options.
60+
61+
#### 1. Building using cuDNN & TensorRT tarball distributions
62+
63+
> This is recommended so as to build TRTorch hermetically and insures any bugs are not caused by version issues
64+
65+
> Make sure when running TRTorch that these versions of the libraries are prioritized in your `$LD_LIBRARY_PATH`
66+
67+
1. You need to download the tarball distributions of TensorRT and cuDNN from the NVIDIA website.
68+
- https://developer.nvidia.com/cudnn
69+
- https://developer.nvidia.com/tensorrt
70+
2. Place these files in a directory (the directories `thrid_party/distdir/[x86_64-linux-gnu | aarch64-linux-gnu]` exist for this purpose)
71+
3. Compile using:
72+
``` shell
73+
bazel build //:libtrtorch --compilation_mode opt --distdir thrid_party/distdir/[x86_64-linux-gnu | aarch64-linux-gnu]
74+
```
75+
76+
#### 2. Building using locally installed cuDNN & TensorRT
77+
78+
> If you find bugs and you compiled using this method please disclose it in the issue
79+
> (an `ldd` dump would be nice too)
80+
81+
1. Install TensorRT, CUDA and cuDNN on the system before starting to compile.
82+
2. In `WORKSPACE` comment out
83+
```py
84+
# Downloaded distributions to use with --distdir
85+
http_archive(
86+
name = "cudnn",
87+
urls = ["<URL>",],
88+
89+
build_file = "@//third_party/cudnn/archive:BUILD",
90+
sha256 = "<TAR SHA256>",
91+
strip_prefix = "cuda"
92+
)
93+
94+
http_archive(
95+
name = "tensorrt",
96+
urls = ["<URL>",],
97+
98+
build_file = "@//third_party/tensorrt/archive:BUILD",
99+
sha256 = "<TAR SHA256>",
100+
strip_prefix = "TensorRT-<VERSION>"
101+
)
102+
```
103+
and uncomment
104+
```py
105+
# Locally installed dependencies
106+
new_local_repository(
107+
name = "cudnn",
108+
path = "/usr/",
109+
build_file = "@//third_party/cudnn/local:BUILD"
110+
)
111+
112+
new_local_repository(
113+
name = "tensorrt",
114+
path = "/usr/",
115+
build_file = "@//third_party/tensorrt/local:BUILD"
116+
)
117+
```
118+
3. Compile using:
58119
``` shell
59-
bazel build //:libtrtorch --compilation_mode=opt
120+
bazel build //:libtrtorch --compilation_mode opt
60121
```
61122

62123
### Debug build
@@ -84,9 +145,9 @@ Thanks for wanting to contribute! There are two main ways to handle supporting a
84145

85146
### In my application?
86147

87-
> The Node Converter Registry is not exposed in the top level API but you can try using the internal headers shipped with the tarball.
148+
> The Node Converter Registry is not exposed in the top level API but in the internal headers shipped with the tarball.
88149
89-
You can register a converter for your op using the NodeConverterRegistry inside your application.
150+
You can register a converter for your op using the `NodeConverterRegistry` inside your application.
90151

91152
## Structure of the repo
92153

WORKSPACE

Lines changed: 37 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,6 @@ py_repositories()
1616
load("@rules_python//python:pip.bzl", "pip_repositories", "pip_import")
1717
pip_repositories()
1818

19-
http_archive(
20-
name = "libtorch",
21-
build_file = "@//third_party/libtorch:BUILD",
22-
strip_prefix = "libtorch",
23-
urls = ["https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.4.0.zip"],
24-
sha256 = "f214bfde532877aa5d4e0803e51a28fa8edd97b6a44b6615f75a70352b6b542e"
25-
)
26-
27-
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
2819
http_archive(
2920
name = "rules_pkg",
3021
url = "https://github.com/bazelbuild/rules_pkg/releases/download/0.2.4/rules_pkg-0.2.4.tar.gz",
@@ -34,24 +25,53 @@ http_archive(
3425
load("@rules_pkg//:deps.bzl", "rules_pkg_dependencies")
3526
rules_pkg_dependencies()
3627

28+
# CUDA should be installed on the system locally
3729
new_local_repository(
3830
name = "cuda",
39-
path = "/usr/local/cuda-10.1/targets/x86_64-linux/",
31+
path = "/usr/local/cuda-10.2/targets/x86_64-linux/",
4032
build_file = "@//third_party/cuda:BUILD",
4133
)
4234

43-
new_local_repository(
35+
http_archive(
36+
name = "libtorch",
37+
build_file = "@//third_party/libtorch:BUILD",
38+
strip_prefix = "libtorch",
39+
urls = ["https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.5.0.zip"],
40+
sha256 = "0efdd4e709ab11088fa75f0501c19b0e294404231442bab1d1fb953924feb6b5"
41+
)
42+
43+
# Downloaded distributions to use with --distdir
44+
http_archive(
4445
name = "cudnn",
45-
path = "/usr/",
46-
build_file = "@//third_party/cudnn:BUILD"
46+
urls = ["https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.2_20191118/cudnn-10.2-linux-x64-v7.6.5.32.tgz",],
47+
48+
build_file = "@//third_party/cudnn/archive:BUILD",
49+
sha256 = "600267f2caaed2fd58eb214ba669d8ea35f396a7d19b94822e6b36f9f7088c20",
50+
strip_prefix = "cuda"
4751
)
4852

49-
new_local_repository(
50-
name = "tensorrt",
51-
path = "/usr/",
52-
build_file = "@//third_party/tensorrt:BUILD"
53+
http_archive(
54+
name = "tensorrt",
55+
urls = ["https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/7.0/7.0.0.11/tars/TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz",],
56+
57+
build_file = "@//third_party/tensorrt/archive:BUILD",
58+
sha256 = "c7d73b2585b18aae68b740249efa8c8ba5ae852abe9a023720595432a8eb4efd",
59+
strip_prefix = "TensorRT-7.0.0.11"
5360
)
5461

62+
## Locally installed dependencies
63+
# new_local_repository(
64+
# name = "cudnn",
65+
# path = "/usr/",
66+
# build_file = "@//third_party/cudnn/local:BUILD"
67+
#)
68+
69+
# new_local_repository(
70+
# name = "tensorrt",
71+
# path = "/usr/",
72+
# build_file = "@//third_party/tensorrt/local:BUILD"
73+
#)
74+
5575
git_repository(
5676
name = "googletest",
5777
remote = "https://github.com/google/googletest",

core/compiler.cpp

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,11 @@
77

88
#include "ATen/core/function_schema.h"
99

10-
#include "torch/csrc/jit/ir.h"
11-
#include "torch/csrc/jit/pass_manager.h"
10+
#include "torch/csrc/jit/frontend/function_schema_parser.h"
11+
#include "torch/csrc/jit/ir/ir.h"
12+
#include "torch/csrc/jit/passes/pass_manager.h"
1213
#include "torch/csrc/jit/passes/lower_graph.h"
1314
#include "torch/csrc/jit/passes/graph_fuser.h"
14-
#include "torch/csrc/jit/script/module.h"
15-
#include "torch/csrc/jit/script/function_schema_parser.h"
1615

1716
#include "core/util/prelude.h"
1817
#include "core/compiler.h"
@@ -42,25 +41,31 @@ c10::FunctionSchema GenerateGraphSchema(torch::jit::script::Module mod, std::str
4241

4342
void AddEngineToGraph(torch::jit::script::Module mod, std::shared_ptr<torch::jit::Graph>& g, std::string& serialized_engine) {
4443
execution::EngineID uid = execution::RegisterEngineFromSerializedEngine(serialized_engine);
45-
auto schema = execution::GetEngineFunctionSchema(uid);
4644
auto num_io = execution::GetEngineIO(uid);
4745

4846
auto self = g->addInput("self.1");
4947
self->setType(mod.type());
50-
std::vector<torch::jit::Value*> graph_inputs;
48+
49+
auto id_val = g->insertConstant(uid);
50+
51+
std::vector<torch::jit::Value*> engine_inputs;
52+
engine_inputs.push_back(id_val);
53+
5154
for (uint64_t i = 0; i < num_io.first; i++) {
5255
auto in_val = g->addInput("");
5356
in_val->setType(c10::TensorType::get());
54-
graph_inputs.push_back(in_val);
57+
engine_inputs.push_back(in_val);
5558
}
5659

57-
auto engine_node = g->create(c10::Symbol::fromQualString(schema.name()), torch::jit::ArrayRef<torch::jit::Value*>(graph_inputs), num_io.second);
60+
auto engine_node = g->create(c10::Symbol::fromQualString("trt::execute_engine"), torch::jit::ArrayRef<torch::jit::Value*>(engine_inputs), num_io.second);
5861
g->block()->appendNode(engine_node);
5962

6063
for (auto o : engine_node->outputs()) {
6164
g->registerOutput(o);
6265
}
6366

67+
LOG_DEBUG(*g << "(AddEngineToGraph)\n");
68+
6469
return;
6570
}
6671

core/compiler.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#pragma once
22

33
#include <vector>
4-
#include "torch/csrc/jit/script/module.h"
4+
#include "torch/csrc/jit/api/module.h"
55
#include "core/conversion/conversion.h"
66

77
namespace trtorch {

core/conversion/conversion.cpp

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,7 @@ namespace conversion {
1414
bool isNodeConversionBlacklisted(const torch::jit::Node* n);
1515

1616
bool OpSupported(const torch::jit::Node* n) {
17-
bool evalable = evaluators::shouldEvalAtConversionTime(n);
18-
bool convertable = converters::node_is_convertable(n);
19-
return evalable || convertable;
17+
return evaluators::shouldEvalAtConversionTime(n) || converters::node_is_convertable(n);
2018
}
2119

2220
c10::optional<torch::jit::IValue> EvaluateNode(ConversionCtx* ctx, const torch::jit::Node* n, int level=0, int limit=10) {

core/conversion/conversion.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#include <map>
44

55
#include "NvInfer.h"
6-
#include "torch/csrc/jit/ir.h"
6+
#include "torch/csrc/jit/ir/ir.h"
77
#include "core/conversion/conversionctx/ConversionCtx.h"
88

99
namespace torch {

core/conversion/conversion_blacklist.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
#include <string>
22
#include <unordered_set>
33

4-
#include "torch/csrc/jit/ir.h"
4+
#include "torch/csrc/jit/ir/ir.h"
55

66
namespace trtorch {
77
namespace core {
88
namespace conversion {
9-
9+
1010
const std::unordered_set<std::string>& get_non_convertable_nodes() {
1111
// Set of nodes that should not invoke a converter or evaluator
1212
static std::unordered_set<std::string> nonconvertable_nodes = {

core/conversion/conversionctx/ConversionCtx.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
#include <memory>
66

77
//#include "ATen/ATen.h"
8-
#include "torch/csrc/jit/ir.h"
8+
#include "torch/csrc/jit/ir/ir.h"
99
#include "NvInfer.h"
1010

1111
#include "core/util/prelude.h"

core/conversion/converters/NodeConverterRegistry.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#include "core/util/prelude.h"
22
#include "core/conversion/converters/converters.h"
3-
#include "torch/csrc/jit/script/function_schema_parser.h"
3+
#include "torch/csrc/jit/frontend/function_schema_parser.h"
44

55
namespace trtorch {
66
namespace core {

core/conversion/converters/converters.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#include <string>
44
#include <map>
55

6-
#include "torch/csrc/jit/custom_operator.h"
6+
#include "torch/csrc/jit/runtime/custom_operator.h"
77
#include "ATen/core/function_schema.h"
88

99
#include "core/conversion/conversionctx/ConversionCtx.h"

core/conversion/evaluators/NodeEvaluatorRegistry.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#include <unordered_map>
22

3-
#include "torch/csrc/jit/ir.h"
4-
#include "torch/csrc/jit/constants.h"
3+
#include "torch/csrc/jit/ir/ir.h"
4+
#include "torch/csrc/jit/ir/constants.h"
55
#include "ATen/core/functional.h"
66
#include "ATen/core/ivalue.h"
77
#include "ATen/core/List.h"
@@ -41,7 +41,7 @@ class NodeEvaluatorRegistry {
4141
return true;
4242
}
4343
}
44-
44+
4545
private:
4646
EvaluatorLUT evaluator_lut_;
4747
};

core/conversion/evaluators/evaluators.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#include <string>
44
#include <map>
55

6-
#include "torch/csrc/jit/ir.h"
6+
#include "torch/csrc/jit/ir/ir.h"
77

88
namespace trtorch {
99
namespace core {
@@ -19,7 +19,7 @@ typedef std::map<const torch::jit::Value*, const torch::jit::IValue*> kwargs;
1919
// when writing evaluators
2020
typedef std::function<c10::optional<torch::jit::IValue>(const torch::jit::Node*, const kwargs&)> NodeEvaluator;
2121

22-
struct EvalRegistration {
22+
struct EvalRegistration {
2323
torch::jit::NodeKind kind;
2424
NodeEvaluator evaluator;
2525
};

core/conversion/evaluators/prim.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
#include "torch/csrc/jit/ir.h"
2-
#include "torch/csrc/jit/constants.h"
1+
#include "torch/csrc/jit/ir/ir.h"
2+
#include "torch/csrc/jit/ir/constants.h"
33
#include "ATen/core/functional.h"
44
#include "ATen/core/ivalue.h"
55
#include "ATen/core/List.h"

core/execution/BUILD

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ cc_library(
1414
"@tensorrt//:nvinfer",
1515
"@libtorch//:libtorch",
1616
"//core/util:prelude"
17-
]
17+
],
18+
alwayslink = True,
1819
)
1920

2021
load("@rules_pkg//:pkg.bzl", "pkg_tar")

0 commit comments

Comments
 (0)