Open
Description
Is there an existing issue for this bug?
- I have searched the existing issues
The bug has not been fixed in the latest main branch
- I have checked the latest main branch
Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)
Yes, I will share a minimal reproducible script.
🐛 Describe the bug
I install the environment following "https://github.com/hpcaitech/ColossalAI/tree/main/applications/ColossalChat#install-the-environment"
with the latest main branch, colossalai 0.4.8, 28.02.2025
Firstly I encountered the bug the same as
"#5458"
I tried `ln -s ../../extensions .'
inside the colossalai.kernel folder.
But encountered a new problem
"ImportError: cannot import name 'CpuAdamArmExtension' from 'colossalai.kernel.extensions' (unknown location)"
My script is
colossalai run --hostfile path-to-host-file --nproc_per_node 8 lora_finetune.py --pretrained path-to-DeepSeek-R1-bf16 --dataset path-to-dataset.jsonl --plugin moe --lr 2e-5 --max_length 256 -g --ep 8 --pp 3 --batch_size 24 --lora_rank 8 --lora_alpha 16 --num_epochs 2 --warmup_steps 8 --tensorboard_dir logs --save_dir DeepSeek-R1-bf16-lora
Full log is:
Traceback (most recent call last):
File "/mnt/code/ColossalAI/applications/ColossalChat/examples/training_scripts/lora_finetune.py", line 23, in <module>
from colossalai.booster import Booster
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/booster/__init__.py", line 2, in <module>
from .booster import Booster
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/booster/booster.py", line 27, in <module>
from .plugin import Plugin
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/booster/plugin/__init__.py", line 1, in <module>
from .gemini_plugin import GeminiPlugin
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/booster/plugin/gemini_plugin.py", line 31, in <module>
from colossalai.shardformer import ShardConfig, ShardFormer
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/shardformer/__init__.py", line 1, in <module>
from .shard import GradientCheckpointConfig, ModelSharder, PipelineGradientCheckpointConfig, ShardConfig, ShardFormer
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/shardformer/shard/__init__.py", line 3, in <module>
from .sharder import ModelSharder
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 10, in <module>
from ..policies.auto_policy import get_autopolicy
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/shardformer/policies/auto_policy.py", line 6, in <module>
from .base_policy import Policy
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/shardformer/policies/base_policy.py", line 13, in <module>
from ..layer.normalization import BaseLayerNorm
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/shardformer/layer/__init__.py", line 2, in <module>
from .attn import AttnMaskType, ColoAttention, RingAttention, get_pad_info
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/shardformer/layer/attn.py", line 11, in <module>
from colossalai.kernel.kernel_loader import (
File "/opt/conda/envs/colossal-chat/lib/python3.10/site-packages/colossalai/kernel/kernel_loader.py", line 4, in <module>
from .extensions import (
ImportError: cannot import name 'CpuAdamArmExtension' from 'colossalai.kernel.extensions' (unknown location)
Environment
GPU: A100
cuda12.3
docker.xuanyuan.me/hpcaitech/colossalai:latest