[Bug]: Neuron + Vllm inference broken with backward incompatible change

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Model name:                      Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
CPU family:                      6
Model:                           106
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       2
Stepping:                        6
BogoMIPS:                        5799.98
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd ida arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear flush_l1d arch_capabilities
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       3 MiB (64 instances)
L1i cache:                       2 MiB (64 instances)
L2 cache:                        80 MiB (64 instances)
L3 cache:                        108 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-31,64-95
NUMA node1 CPU(s):               32-63,96-127
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==8.9.2.26
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-nccl-cu12==2.18.1
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==26.2.0
[pip3] torch==2.1.2
[pip3] torch-neuronx==2.1.2.2.3.0
[pip3] torch-xla==2.1.4
[pip3] torchvision==0.16.2
[pip3] transformers==4.44.2
[pip3] transformers-neuronx==0.12.313
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: (0, 'instance-type: trn1.32xlarge\ninstance-id: i-0046fea840efb721c\n+--------+--------+--------+---------------+---------+\n| NEURON | NEURON | NEURON |   CONNECTED   |   PCI   |\n| DEVICE | CORES  | MEMORY |    DEVICES    |   BDF   |\n+--------+--------+--------+---------------+---------+\n| 0      | 2      | 32 GB  | 12, 3, 4, 1   | 10:1c.0 |\n| 1      | 2      | 32 GB  | 13, 0, 5, 2   | 10:1d.0 |\n| 2      | 2      | 32 GB  | 14, 1, 6, 3   | a0:1c.0 |\n| 3      | 2      | 32 GB  | 15, 2, 7, 0   | a0:1d.0 |\n| 4      | 2      | 32 GB  | 0, 7, 8, 5    | 20:1b.0 |\n| 5      | 2      | 32 GB  | 1, 4, 9, 6    | 20:1c.0 |\n| 6      | 2      | 32 GB  | 2, 5, 10, 7   | 90:1b.0 |\n| 7      | 2      | 32 GB  | 3, 6, 11, 4   | 90:1c.0 |\n| 8      | 2      | 32 GB  | 4, 11, 12, 9  | 20:1d.0 |\n| 9      | 2      | 32 GB  | 5, 8, 13, 10  | 20:1e.0 |\n| 10     | 2      | 32 GB  | 6, 9, 14, 11  | 90:1d.0 |\n| 11     | 2      | 32 GB  | 7, 10, 15, 8  | 90:1e.0 |\n| 12     | 2      | 32 GB  | 8, 15, 0, 13  | 10:1e.0 |\n| 13     | 2      | 32 GB  | 9, 12, 1, 14  | 10:1b.0 |\n| 14     | 2      | 32 GB  | 10, 13, 2, 15 | a0:1e.0 |\n| 15     | 2      | 32 GB  | 11, 14, 3, 12 | a0:1b.0 |\n+--------+--------+--------+---------------+---------+', '')
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

```

</details>


### Model Input Dumps

_No response_

### 🐛 Describe the bug

Recent change as part of this [commit](https://github.com/vllm-project/vllm/commit/99aa4eddaf929f57dac405b00db3f5286624ee8b) has used new Python Custom Op High level APIs which are only supported with torch > 2.4. [Ref1](https://pytorch.org/tutorials/advanced/python_custom_ops.html) 

Neuron currently supports upto pytorch 2.1 so its leading to errors when following the installation steps here - https://docs.vllm.ai/en/latest/getting_started/neuron-installation.html

and running the offline inference script - https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_neuron.py

with following error 

```
    Traceback (most recent call last):
      File "/home/Vllm_Upstream/bug_report/vllm/examples/offline_inference_neuron.py", line 3, in <module>
        from vllm import LLM, SamplingParams
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/__init__.py", line 3, in <module>
        from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/engine/arg_utils.py", line 11, in <module>
        from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/config.py", line 12, in <module>
        from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/model_executor/__init__.py", line 1, in <module>
        from vllm.model_executor.parameter import (BasevLLMParameter,
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/model_executor/parameter.py", line 7, in <module>
        from vllm.distributed import get_tensor_model_parallel_rank
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/distributed/__init__.py", line 1, in <module>
        from .communication_op import *
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/distributed/communication_op.py", line 6, in <module>
        from .parallel_state import get_tp_group
      File "/home/Vllm_Upstream/bug_report/vllm/vllm/distributed/parallel_state.py", line 98, in <module>
        @torch.library.custom_op("vllm::inplace_all_reduce", mutates_args=["tensor"])
    AttributeError: module 'torch.library' has no attribute 'custom_op'
```

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Neuron + Vllm inference broken with backward incompatible change #8677

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Neuron + Vllm inference broken with backward incompatible change #8677

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions