Skip to content

Commit 355b7dc

Browse files
authored
updates for ipex example page (#3486)
1 parent 05c058d commit 355b7dc

File tree

1 file changed

+18
-216
lines changed

1 file changed

+18
-216
lines changed

recipes_source/intel_extension_for_pytorch.rst

Lines changed: 18 additions & 216 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,11 @@ Intel® Extension for PyTorch* shares most of features for CPU and GPU.
4141
these optimizations will be landed in PyTorch master through PRs that are
4242
being submitted and reviewed. Auto Mixed Precision (AMP) with both BFloat16
4343
and Float16 have been enabled for Intel discrete GPUs.
44-
- **Graph Optimization:** To optimize performance further with torchscript,
45-
Intel® Extension for PyTorch* supports fusion of frequently used operator
46-
patterns, like Conv2D+ReLU, Linear+ReLU, etc. The benefit of the fusions are
47-
delivered to users in a transparent fashion. Detailed fusion patterns
48-
supported can be found `here <https://github.com/intel/intel-extension-for-pytorch>`_.
44+
- **Graph Optimization:** To optimize performance further, Intel® Extension for
45+
PyTorch* supports fusion of frequently used operator patterns, like Conv2D+ReLU,
46+
Linear+ReLU, etc. The benefit of the fusions are delivered to users in a transparent
47+
fashion. Detailed fusion patterns supported can be found
48+
`here <https://github.com/intel/intel-extension-for-pytorch>`_.
4949
The graph optimization will be up-streamed to PyTorch with the introduction
5050
of oneDNN Graph API.
5151
- **Operator Optimization:** Intel® Extension for PyTorch* also optimizes
@@ -186,8 +186,8 @@ BFloat16
186186
'optimizer_state_dict': optimizer.state_dict(),
187187
}, 'checkpoint.pth')
188188
189-
Inference - Imperative Mode
190-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
189+
Inference
190+
~~~~~~~~~
191191

192192
Float32
193193
^^^^^^^
@@ -234,67 +234,6 @@ BFloat16
234234
with torch.cpu.amp.autocast():
235235
model(data)
236236
237-
Inference - TorchScript Mode
238-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
239-
240-
TorchScript mode makes graph optimization possible, hence improves
241-
performance for some topologies. Intel® Extension for PyTorch* enables most
242-
commonly used operator pattern fusion, and users can get the performance
243-
benefit without additional code changes.
244-
245-
Float32
246-
^^^^^^^
247-
248-
.. code:: python3
249-
250-
import torch
251-
import torchvision.models as models
252-
253-
model = models.resnet50(pretrained=True)
254-
model.eval()
255-
data = torch.rand(1, 3, 224, 224)
256-
257-
#################### code changes ####################
258-
import intel_extension_for_pytorch as ipex
259-
model = ipex.optimize(model)
260-
######################################################
261-
262-
with torch.no_grad():
263-
d = torch.rand(1, 3, 224, 224)
264-
model = torch.jit.trace(model, d)
265-
model = torch.jit.freeze(model)
266-
267-
model(data)
268-
269-
BFloat16
270-
^^^^^^^^
271-
272-
.. code:: python3
273-
274-
import torch
275-
from transformers import BertModel
276-
277-
model = BertModel.from_pretrained(args.model_name)
278-
model.eval()
279-
280-
vocab_size = model.config.vocab_size
281-
batch_size = 1
282-
seq_length = 512
283-
data = torch.randint(vocab_size, size=[batch_size, seq_length])
284-
285-
#################### code changes ####################
286-
import intel_extension_for_pytorch as ipex
287-
model = ipex.optimize(model, dtype=torch.bfloat16)
288-
######################################################
289-
290-
with torch.no_grad():
291-
with torch.cpu.amp.autocast():
292-
d = torch.randint(vocab_size, size=[batch_size, seq_length])
293-
model = torch.jit.trace(model, (d,), check_trace=False, strict=False)
294-
model = torch.jit.freeze(model)
295-
296-
model(data)
297-
298237
Examples -- GPU
299238
---------------
300239

@@ -420,8 +359,8 @@ BFloat16
420359
'optimizer_state_dict': optimizer.state_dict(),
421360
}, 'checkpoint.pth')
422361
423-
Inference - Imperative Mode
424-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
362+
Inference
363+
~~~~~~~~~
425364

426365
Float32
427366
^^^^^^^
@@ -510,121 +449,6 @@ Float16
510449
################################# code changes ######################################
511450
model(data)
512451
513-
Inference - TorchScript Mode
514-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
515-
516-
TorchScript mode makes graph optimization possible, hence improves
517-
performance for some topologies. Intel® Extension for PyTorch* enables most
518-
commonly used operator pattern fusion, and users can get the performance
519-
benefit without additional code changes.
520-
521-
Float32
522-
^^^^^^^
523-
524-
.. code:: python3
525-
526-
import torch
527-
from transformers import BertModel
528-
############# code changes ###############
529-
import intel_extension_for_pytorch as ipex
530-
############# code changes ###############
531-
532-
model = BertModel.from_pretrained(args.model_name)
533-
model.eval()
534-
535-
vocab_size = model.config.vocab_size
536-
batch_size = 1
537-
seq_length = 512
538-
data = torch.randint(vocab_size, size=[batch_size, seq_length])
539-
540-
#################### code changes ################
541-
model = model.to("xpu")
542-
data = data.to("xpu")
543-
model = ipex.optimize(model, dtype=torch.float32)
544-
#################### code changes ################
545-
546-
with torch.no_grad():
547-
d = torch.randint(vocab_size, size=[batch_size, seq_length])
548-
##### code changes #####
549-
d = d.to("xpu")
550-
##### code changes #####
551-
model = torch.jit.trace(model, (d,), check_trace=False, strict=False)
552-
model = torch.jit.freeze(model)
553-
554-
model(data)
555-
556-
BFloat16
557-
^^^^^^^^
558-
559-
.. code:: python3
560-
561-
import torch
562-
from transformers import BertModel
563-
############# code changes ###############
564-
import intel_extension_for_pytorch as ipex
565-
############# code changes ###############
566-
567-
model = BertModel.from_pretrained(args.model_name)
568-
model.eval()
569-
570-
vocab_size = model.config.vocab_size
571-
batch_size = 1
572-
seq_length = 512
573-
data = torch.randint(vocab_size, size=[batch_size, seq_length])
574-
575-
#################### code changes #################
576-
model = model.to("xpu")
577-
data = data.to("xpu")
578-
model = ipex.optimize(model, dtype=torch.bfloat16)
579-
#################### code changes #################
580-
581-
with torch.no_grad():
582-
d = torch.randint(vocab_size, size=[batch_size, seq_length])
583-
################################# code changes ######################################
584-
d = d.to("xpu")
585-
with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16, cache_enabled=False):
586-
################################# code changes ######################################
587-
model = torch.jit.trace(model, (d,), check_trace=False, strict=False)
588-
model = torch.jit.freeze(model)
589-
590-
model(data)
591-
592-
Float16
593-
^^^^^^^
594-
595-
.. code:: python3
596-
597-
import torch
598-
from transformers import BertModel
599-
############# code changes ###############
600-
import intel_extension_for_pytorch as ipex
601-
############# code changes ###############
602-
603-
model = BertModel.from_pretrained(args.model_name)
604-
model.eval()
605-
606-
vocab_size = model.config.vocab_size
607-
batch_size = 1
608-
seq_length = 512
609-
data = torch.randint(vocab_size, size=[batch_size, seq_length])
610-
611-
#################### code changes ################
612-
model = model.to("xpu")
613-
data = data.to("xpu")
614-
model = ipex.optimize(model, dtype=torch.float16)
615-
#################### code changes ################
616-
617-
with torch.no_grad():
618-
d = torch.randint(vocab_size, size=[batch_size, seq_length])
619-
################################# code changes ######################################
620-
d = d.to("xpu")
621-
with torch.xpu.amp.autocast(enabled=True, dtype=torch.float16, cache_enabled=False):
622-
################################# code changes ######################################
623-
model = torch.jit.trace(model, (d,), check_trace=False, strict=False)
624-
model = torch.jit.freeze(model)
625-
626-
model(data)
627-
628452
C++ (CPU only)
629453
~~~~~~~~~~~~~~
630454

@@ -657,10 +481,11 @@ once C++ dynamic library of Intel® Extension for PyTorch* is linked.
657481
}
658482
std::vector<torch::jit::IValue> inputs;
659483
// make sure input data are converted to channels last format
660-
inputs.push_back(torch::ones({1, 3, 224, 224}).to(c10::MemoryFormat::ChannelsLast));
484+
inputs.push_back(torch::rand({1, 3, 224, 224});
661485
662486
at::Tensor output = module.forward(inputs).toTensor();
663-
487+
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << std::endl;
488+
std::cout << "Execution finished" << std::endl;
664489
return 0;
665490
}
666491
@@ -676,7 +501,7 @@ once C++ dynamic library of Intel® Extension for PyTorch* is linked.
676501
add_executable(example-app example-app.cpp)
677502
target_link_libraries(example-app "${TORCH_LIBRARIES}")
678503

679-
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)
504+
set_property(TARGET example-app PROPERTY CXX_STANDARD 17)
680505

681506
**Command for compilation**
682507

@@ -691,31 +516,20 @@ into the binary. This can be verified with the Linux command `ldd`.
691516
::
692517

693518
$ cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
694-
-- The C compiler identification is GNU 9.3.0
695-
-- The CXX compiler identification is GNU 9.3.0
696-
-- Check for working C compiler: /usr/bin/cc
697-
-- Check for working C compiler: /usr/bin/cc -- works
519+
-- The C compiler identification is GNU XX.X.X
520+
-- The CXX compiler identification is GNU XX.X.X
698521
-- Detecting C compiler ABI info
699522
-- Detecting C compiler ABI info - done
523+
-- Check for working C compiler: /usr/bin/cc - skipped
700524
-- Detecting C compile features
701525
-- Detecting C compile features - done
702-
-- Check for working CXX compiler: /usr/bin/c++
703-
-- Check for working CXX compiler: /usr/bin/c++ -- works
704526
-- Detecting CXX compiler ABI info
705527
-- Detecting CXX compiler ABI info - done
528+
-- Check for working CXX compiler: /usr/bin/c++ - skipped
706529
-- Detecting CXX compile features
707530
-- Detecting CXX compile features - done
708-
-- Looking for pthread.h
709-
-- Looking for pthread.h - found
710-
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
711-
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
712-
-- Looking for pthread_create in pthreads
713-
-- Looking for pthread_create in pthreads - not found
714-
-- Looking for pthread_create in pthread
715-
-- Looking for pthread_create in pthread - found
716-
-- Found Threads: TRUE
717531
-- Found Torch: /workspace/libtorch/lib/libtorch.so
718-
-- Found INTEL_EXT_PT_CPU: TRUE
532+
-- Found IPEX: /workspace/libtorch/lib/libintel-ext-pt-cpu.so
719533
-- Configuring done
720534
-- Generating done
721535
-- Build files have been written to: /workspace/build
@@ -726,18 +540,6 @@ into the binary. This can be verified with the Linux command `ldd`.
726540
libc10.so => /workspace/libtorch/lib/libc10.so (0x00007f3cf985a000)
727541
libintel-ext-pt-cpu.so => /workspace/libtorch/lib/libintel-ext-pt-cpu.so (0x00007f3cf70fc000)
728542
libtorch_cpu.so => /workspace/libtorch/lib/libtorch_cpu.so (0x00007f3ce16ac000)
729-
...
730-
libdnnl_graph.so.0 => /workspace/libtorch/lib/libdnnl_graph.so.0 (0x00007f3cde954000)
731-
...
732-
733-
Model Zoo (CPU only)
734-
--------------------
735-
736-
Use cases that had already been optimized by Intel engineers are available at
737-
`Model Zoo for Intel® Architecture <https://github.com/IntelAI/models/>`_ (with
738-
the branch name in format of `pytorch-r<version>-models`). Many PyTorch use
739-
cases for benchmarking are also available on the GitHub page. You can get
740-
performance benefits out-of-the-box by simply running scripts in the Model Zoo.
741543

742544
Tutorials
743545
---------

0 commit comments

Comments
 (0)