You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding fine tuning of quanitized internlm/internlm-xcomposer2-4khd-7b model. I have made quantization of 4khd model with lmdeploy, not trying to make fine tunning of this model. However, I am getting, such issue during this process. Do you have any suggestion, how I can solve it?
Env:
intern_clean) ubuntu@ip-172-31-18-91:~/InternLM-XComposer/finetune$ lmdeploy check_env
Matplotlib is building the font cache; this may take a moment.
sys.platform: linux
Python: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA A10G
CUDA_HOME: /usr/local/cuda-12.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 12.1
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 8.9.7 (built against CUDA 12.2)
- Built with CuDNN 8.9.2
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.17.2+cu121
LMDeploy: 0.5.1+5840351
transformers: 4.33.2
gradio: 4.13.0
fastapi: 0.111.1
pydantic: 2.8.2
triton: 2.2.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PHB PHB PHB 0-47 0 N/A
GPU1 PHB X PHB PHB 0-47 0 N/A
GPU2 PHB PHB X PHB 0-47 0 N/A
GPU3 PHB PHB PHB X 0-47 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
(intern_clean) ubuntu@ip-172-31-18-91:~/InternLM-XComposer/finetune$ sh finetune_lora.sh
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING]
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] *****************************************
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] *****************************************
[2024-07-25 16:59:32,011] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,019] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,047] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,050] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-07-25 16:59:34,054] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-07-25 16:59:34,099] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-07-25 16:59:34,127] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-25 16:59:34,127] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-07-25 16:59:34,131] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
Set max length to 16384
Set max length to 16384
Set max length to 16384
Set max length to 16384
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.55it/s]
Some weights of the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit were not used when initializing InternLMXComposer2ForCausalLM: ['model.layers.12.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w1.qweight', 'model.layers.19.attention.wqkv.scales', 'model.layers.0.feed_forward.w1.scales', 'model.layers.4.attention.wqkv.qweight', 'model.layers.1.attention.wqkv.scales', 'model.layers.30.feed_forward.w3.qzeros', 'model.layers.29.attention.wqkv.qzeros', 'model.layers.15.attention.wo.qweight', 'model.layers.29.feed_forward.w2.scales', 'model.layers.3.feed_forward.w1.qzeros', 'model.layers.10.feed_forward.w2.qzeros', 'model.layers.19.feed_forward.w1.scales', 'model.layers.18.feed_forward.w2.scales', 'model.layers.14.feed_forward.w2.qzeros', 'model.layers.30.feed_forward.w2.qzeros', 'model.layers.16.feed_forward.w3.scales', 'model.layers.8.feed_forward.w2.qweight', 'model.layers.31.feed_forward.w3.qzeros', 'model.layers.29.feed_forward.w3.qweight', 'model.layers.0.feed_forward.w1.qzeros', 'model.layers.23.attention.wqkv.scales', 'model.layers.28.attention.wqkv.qweight', 'model.layers.1.attention.wqkv.qzeros', 'model.layers.10.attention.wo.qweight', 'model.layers.7.attention.wqkv.qweight', 'model.layers.24.feed_forward.w3.qzeros', 'model.layers.25.attention.wo.qweight', 'model.layers.27.feed_forward.w2.scales', 'model.layers.18.feed_forward.w2.qzeros', 'model.layers.18.feed_forward.w3.scales', 'model.layers.13.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w3.qzeros', 'model.layers.3.feed_forward.w2.qzeros', 'model.layers.24.feed_forward.w2.qweight', 'model.layers.20.feed_forward.w3.qweight', 'model.layers.24.attention.wqkv.qzeros', 'model.layers.3.feed_forward.w1.qweight', 'model.layers.22.attention.wqkv.scales', 'model.layers.10.feed_forward.w3.scales', 'model.layers.20.feed_forward.w1.qweight', 'model.layers.5.attention.wqkv.qzeros', 'model.layers.11.attention.wo.qweight', 'model.layers.11.attention.wqkv.qweight', 'model.layers.16.feed_forward.w1.qweight', 'model.layers.24.attention.wqkv.scales', 'model.layers.10.feed_forward.w1.qzeros', 'model.layers.23.attention.wqkv.qzeros', 'model.layers.21.attention.wqkv.qzeros', 'model.layers.7.feed_forward.w2.qweight', 'model.layers.7.attention.wo.qzeros', 'model.layers.9.attention.wo.qzeros', 'model.layers.29.feed_forward.w2.qweight', 'model.layers.24.feed_forward.w1.scales', 'model.layers.20.feed_forward.w1.scales', 'model.layers.15.feed_forward.w3.scales', 'model.layers.5.attention.wo.scales', 'model.layers.27.attention.wo.scales', 'model.layers.23.feed_forward.w1.qzeros', 'model.layers.25.feed_forward.w1.qzeros', 'model.layers.2.feed_forward.w2.qweight', 'model.layers.20.feed_forward.w2.scales', 'model.layers.12.attention.wqkv.qzeros', 'model.layers.14.attention.wo.qzeros', 'model.layers.22.feed_forward.w2.qzeros', 'model.layers.25.feed_forward.w3.qzeros', 'model.layers.6.feed_forward.w3.scales', 'model.layers.29.feed_forward.w3.qzeros', 'model.layers.19.attention.wo.qweight', 'model.layers.2.attention.wqkv.scales', 'model.layers.1.feed_forward.w2.scales', 'model.layers.19.feed_forward.w1.qweight', 'model.layers.13.attention.wo.qweight', 'model.layers.3.attention.wqkv.qweight', 'model.layers.23.feed_forward.w2.qweight', 'model.layers.22.attention.wo.qzeros', 'model.layers.3.feed_forward.w3.qweight', 'model.layers.16.feed_forward.w1.scales', 'model.layers.27.feed_forward.w1.qweight', 'model.layers.6.attention.wo.qzeros', 'model.layers.5.feed_forward.w1.scales', 'model.layers.1.feed_forward.w1.qzeros', 'model.layers.6.attention.wo.scales', 'model.layers.7.feed_forward.w1.qweight', 'model.layers.22.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w2.scales', 'model.layers.11.feed_forward.w3.qzeros', 'model.layers.13.feed_forward.w1.qzeros', 'model.layers.22.feed_forward.w3.scales', 'model.layers.5.feed_forward.w1.qweight', 'model.layers.28.attention.wqkv.scales', 'model.layers.26.feed_forward.w2.qweight', 'model.layers.4.attention.wo.scales', 'model.layers.19.feed_forward.w3.qzeros', 'model.layers.16.attention.wo.qzeros', 'model.layers.21.feed_forward.w3.qzeros', 'model.layers.16.feed_forward.w2.qzeros', 'model.layers.12.attention.wo.qzeros', 'model.layers.19.attention.wo.scales', 'model.layers.11.feed_forward.w3.qweight', 'model.layers.1.feed_forward.w1.scales', 'model.layers.26.feed_forward.w3.qweight', 'model.layers.31.feed_forward.w1.scales', 'model.layers.17.feed_forward.w2.scales', 'model.layers.13.attention.wo.scales', 'model.layers.21.attention.wo.qweight', 'model.layers.21.attention.wo.scales', 'model.layers.12.feed_forward.w1.scales', 'model.layers.31.attention.wo.qweight', 'model.layers.8.feed_forward.w2.qzeros', 'model.layers.31.feed_forward.w3.scales', 'model.layers.19.attention.wo.qzeros', 'model.layers.11.attention.wo.scales', 'model.layers.28.attention.wo.qweight', 'model.layers.6.feed_forward.w2.qzeros', 'model.layers.18.feed_forward.w1.qweight', 'model.layers.10.feed_forward.w1.scales', 'model.layers.8.feed_forward.w3.qzeros', 'model.layers.31.feed_forward.w2.qweight', 'model.layers.8.attention.wqkv.scales', 'model.layers.3.attention.wo.qweight', 'model.layers.6.feed_forward.w2.qweight', 'model.layers.0.attention.wo.qweight', 'model.layers.23.feed_forward.w2.qzeros', 'model.layers.21.feed_forward.w1.scales', 'model.layers.29.attention.wqkv.qweight', 'model.layers.3.attention.wo.scales', 'model.layers.25.feed_forward.w3.qweight', 'model.layers.30.feed_forward.w3.qweight', 'model.layers.31.attention.wo.scales', 'model.layers.17.attention.wqkv.scales', 'model.layers.29.attention.wo.qweight', 'model.layers.5.attention.wo.qweight', 'model.layers.10.attention.wqkv.qweight', 'model.layers.15.feed_forward.w2.scales', 'model.layers.29.attention.wqkv.scales', 'model.layers.14.feed_forward.w3.qweight', 'model.layers.7.feed_forward.w3.qweight', 'model.layers.16.feed_forward.w3.qweight', 'model.layers.26.attention.wqkv.qzeros', 'model.layers.28.feed_forward.w3.scales', 'model.layers.20.attention.wo.qweight', 'model.layers.25.feed_forward.w1.scales', 'model.layers.26.attention.wo.qweight', 'model.layers.21.feed_forward.w2.qweight', 'model.layers.17.feed_forward.w3.scales', 'model.layers.27.attention.wo.qzeros', 'model.layers.15.attention.wo.scales', 'model.layers.17.feed_forward.w2.qzeros', 'model.layers.12.attention.wqkv.scales', 'model.layers.11.attention.wqkv.scales', 'model.layers.28.feed_forward.w2.qzeros', 'model.layers.1.feed_forward.w3.scales', 'model.layers.0.attention.wqkv.qzeros', 'model.layers.30.attention.wo.qweight', 'model.layers.28.feed_forward.w1.scales', 'model.layers.18.attention.wo.scales', 'model.layers.17.feed_forward.w3.qweight', 'model.layers.3.feed_forward.w2.qweight', 'model.layers.1.feed_forward.w1.qweight', 'model.layers.23.feed_forward.w3.qweight', 'model.layers.9.feed_forward.w2.qzeros', 'model.layers.9.feed_forward.w1.qzeros', 'model.layers.28.feed_forward.w1.qzeros', 'model.layers.21.feed_forward.w1.qweight', 'model.layers.25.attention.wqkv.qzeros', 'model.layers.29.feed_forward.w1.qweight', 'model.layers.4.attention.wo.qzeros', 'model.layers.13.feed_forward.w1.qweight', 'model.layers.15.attention.wqkv.qzeros', 'model.layers.13.attention.wo.qzeros', 'model.layers.26.feed_forward.w1.qzeros', 'model.layers.8.attention.wo.qzeros', 'model.layers.14.feed_forward.w1.qzeros', 'model.layers.4.feed_forward.w3.scales', 'model.layers.22.feed_forward.w2.qweight', 'model.layers.7.feed_forward.w2.qzeros', 'model.layers.29.attention.wo.qzeros', 'model.layers.3.attention.wo.qzeros', 'model.layers.8.feed_forward.w2.scales', 'model.layers.5.feed_forward.w2.scales', 'model.layers.17.attention.wo.qzeros', 'model.layers.11.attention.wqkv.qzeros', 'model.layers.2.feed_forward.w1.scales', 'model.layers.24.attention.wo.qweight', 'model.layers.24.feed_forward.w1.qweight', 'model.layers.4.feed_forward.w3.qweight', 'model.layers.6.feed_forward.w1.qzeros', 'model.layers.26.attention.wo.scales', 'model.layers.20.feed_forward.w1.qzeros', 'model.layers.30.feed_forward.w1.scales', 'model.layers.22.feed_forward.w1.qzeros', 'model.layers.15.feed_forward.w2.qzeros', 'model.layers.5.feed_forward.w1.qzeros', 'model.layers.7.feed_forward.w1.scales', 'model.layers.0.feed_forward.w2.qweight', 'model.layers.17.attention.wo.scales', 'model.layers.20.feed_forward.w2.qzeros', 'model.layers.20.attention.wqkv.scales', 'model.layers.18.feed_forward.w3.qzeros', 'model.layers.26.attention.wo.qzeros', 'model.layers.14.feed_forward.w2.qweight', 'model.layers.14.feed_forward.w1.qweight', 'model.layers.10.feed_forward.w2.scales', 'model.layers.10.attention.wo.scales', 'model.layers.9.attention.wo.scales', 'model.layers.0.attention.wqkv.qweight', 'model.layers.12.feed_forward.w3.qweight', 'model.layers.22.feed_forward.w3.qzeros', 'model.layers.9.feed_forward.w1.qweight', 'model.layers.8.feed_forward.w3.scales', 'model.layers.23.attention.wo.qweight', 'model.layers.22.attention.wqkv.qweight', 'model.layers.23.attention.wo.qzeros', 'model.layers.31.feed_forward.w2.scales', 'model.layers.5.feed_forward.w3.scales', 'model.layers.10.attention.wqkv.scales', 'model.layers.14.attention.wo.scales', 'model.layers.14.attention.wqkv.qzeros', 'model.layers.19.feed_forward.w3.qweight', 'model.layers.4.attention.wo.qweight', 'model.layers.4.feed_forward.w2.qzeros', 'model.layers.12.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w1.scales', 'model.layers.4.feed_forward.w3.qzeros', 'model.layers.29.attention.wo.scales', 'model.layers.24.attention.wqkv.qweight', 'model.layers.4.feed_forward.w1.qzeros', 'model.layers.18.attention.wqkv.qzeros', 'model.layers.24.feed_forward.w3.scales', 'model.layers.18.attention.wqkv.qweight', 'model.layers.21.feed_forward.w3.qweight', 'model.layers.6.feed_forward.w3.qweight', 'model.layers.22.feed_forward.w1.qweight', 'model.layers.22.attention.wo.qweight', 'model.layers.21.attention.wqkv.qweight', 'model.layers.6.feed_forward.w2.scales', 'model.layers.24.feed_forward.w1.qzeros', 'model.layers.22.attention.wo.scales', 'model.layers.30.attention.wqkv.scales', 'model.layers.8.feed_forward.w1.qweight', 'model.layers.7.attention.wqkv.scales', 'model.layers.25.feed_forward.w2.scales', 'model.layers.24.attention.wo.qzeros', 'model.layers.15.feed_forward.w3.qweight', 'model.layers.24.feed_forward.w2.scales', 'model.layers.25.attention.wqkv.qweight', 'model.layers.4.attention.wqkv.qzeros', 'model.layers.1.feed_forward.w2.qzeros', 'model.layers.23.feed_forward.w3.scales', 'model.layers.13.feed_forward.w3.qzeros', 'model.layers.8.attention.wo.qweight', 'model.layers.0.feed_forward.w3.qweight', 'model.layers.29.feed_forward.w1.scales', 'model.layers.30.feed_forward.w3.scales', 'model.layers.4.attention.wqkv.scales', 'model.layers.27.feed_forward.w2.qzeros', 'model.layers.17.attention.wqkv.qzeros', 'model.layers.15.feed_forward.w2.qweight', 'model.layers.17.attention.wqkv.qweight', 'model.layers.11.attention.wo.qzeros', 'model.layers.20.feed_forward.w2.qweight', 'model.layers.28.attention.wo.qzeros', 'model.layers.10.attention.wqkv.qzeros', 'model.layers.18.attention.wqkv.scales', 'model.layers.7.attention.wqkv.qzeros', 'model.layers.2.attention.wqkv.qweight', 'model.layers.1.feed_forward.w3.qzeros', 'model.layers.31.attention.wqkv.qzeros', 'model.layers.0.feed_forward.w3.scales', 'model.layers.9.attention.wqkv.qzeros', 'model.layers.2.attention.wo.qzeros', 'model.layers.1.feed_forward.w2.qweight', 'model.layers.26.feed_forward.w2.scales', 'model.layers.18.feed_forward.w2.qweight', 'model.layers.1.attention.wo.qweight', 'model.layers.27.feed_forward.w1.qzeros', 'model.layers.30.attention.wqkv.qweight', 'model.layers.19.attention.wqkv.qzeros', 'model.layers.14.attention.wo.qweight', 'model.layers.13.feed_forward.w3.scales', 'model.layers.9.feed_forward.w3.scales', 'model.layers.22.feed_forward.w1.scales', 'model.layers.24.attention.wo.scales', 'model.layers.5.attention.wqkv.qweight', 'model.layers.6.attention.wo.qweight', 'model.layers.10.attention.wo.qzeros', 'model.layers.15.feed_forward.w1.scales', 'model.layers.0.feed_forward.w3.qzeros', 'model.layers.14.feed_forward.w3.scales', 'model.layers.12.feed_forward.w3.scales', 'model.layers.22.feed_forward.w2.scales', 'model.layers.8.feed_forward.w3.qweight', 'model.layers.31.feed_forward.w3.qweight', 'model.layers.28.feed_forward.w3.qzeros', 'model.layers.19.feed_forward.w1.qzeros', 'model.layers.31.feed_forward.w1.qweight', 'model.layers.14.attention.wqkv.qweight', 'model.layers.11.feed_forward.w3.scales', 'model.layers.12.attention.wo.qweight', 'model.layers.25.feed_forward.w1.qweight', 'model.layers.3.feed_forward.w1.scales', 'model.layers.4.feed_forward.w2.qweight', 'model.layers.27.feed_forward.w3.scales', 'model.layers.13.feed_forward.w1.scales', 'model.layers.2.feed_forward.w3.qweight', 'model.layers.6.attention.wqkv.qweight', 'model.layers.23.attention.wo.scales', 'model.layers.13.feed_forward.w2.qzeros', 'model.layers.31.attention.wqkv.qweight', 'model.layers.9.feed_forward.w3.qzeros', 'model.layers.6.feed_forward.w1.qweight', 'model.layers.5.feed_forward.w2.qzeros', 'model.layers.3.feed_forward.w3.scales', 'model.layers.13.feed_forward.w2.qweight', 'model.layers.1.attention.wo.scales', 'model.layers.3.feed_forward.w3.qzeros', 'model.layers.7.feed_forward.w3.scales', 'model.layers.30.feed_forward.w1.qzeros', 'model.layers.27.feed_forward.w3.qweight', 'model.layers.21.feed_forward.w3.scales', 'model.layers.6.feed_forward.w1.scales', 'model.layers.16.feed_forward.w3.qzeros', 'model.layers.4.feed_forward.w2.scales', 'model.layers.28.attention.wqkv.qzeros', 'model.layers.25.attention.wo.scales', 'model.layers.11.feed_forward.w2.scales', 'model.layers.29.feed_forward.w1.qzeros', 'model.layers.11.feed_forward.w2.qzeros', 'model.layers.21.attention.wo.qzeros', 'model.layers.0.attention.wo.scales', 'model.layers.30.feed_forward.w2.qweight', 'model.layers.27.attention.wqkv.qweight', 'model.layers.9.feed_forward.w1.scales', 'model.layers.2.feed_forward.w3.scales', 'model.layers.24.feed_forward.w2.qzeros', 'model.layers.20.attention.wqkv.qweight', 'model.layers.27.feed_forward.w1.scales', 'model.layers.18.attention.wo.qweight', 'model.layers.10.feed_forward.w3.qzeros', 'model.layers.17.attention.wo.qweight', 'model.layers.3.attention.wqkv.qzeros', 'model.layers.14.feed_forward.w1.scales', 'model.layers.12.feed_forward.w2.qweight', 'model.layers.11.feed_forward.w1.qzeros', 'model.layers.18.feed_forward.w1.qzeros', 'model.layers.8.attention.wo.scales', 'model.layers.17.feed_forward.w2.qweight', 'model.layers.27.attention.wqkv.scales', 'model.layers.12.feed_forward.w2.scales', 'model.layers.0.feed_forward.w1.qweight', 'model.layers.9.attention.wo.qweight', 'model.layers.3.feed_forward.w2.scales', 'model.layers.0.feed_forward.w2.scales', 'model.layers.2.attention.wo.qweight', 'model.layers.8.feed_forward.w1.scales', 'model.layers.21.feed_forward.w1.qzeros', 'model.layers.14.attention.wqkv.scales', 'model.layers.4.feed_forward.w1.qweight', 'model.layers.30.attention.wo.qzeros', 'model.layers.1.feed_forward.w3.qweight', 'model.layers.18.feed_forward.w3.qweight', 'model.layers.9.feed_forward.w2.scales', 'model.layers.31.feed_forward.w2.qzeros', 'model.layers.16.attention.wqkv.scales', 'model.layers.25.attention.wqkv.scales', 'model.layers.9.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w2.qzeros', 'model.layers.23.feed_forward.w2.scales', 'model.layers.12.attention.wqkv.qweight', 'model.layers.16.attention.wo.scales', 'model.layers.7.feed_forward.w1.qzeros', 'model.layers.27.attention.wqkv.qzeros', 'model.layers.19.feed_forward.w2.qweight', 'model.layers.20.attention.wo.qzeros', 'model.layers.26.feed_forward.w3.qzeros', 'model.layers.3.attention.wqkv.scales', 'model.layers.23.feed_forward.w3.qzeros', 'model.layers.7.attention.wo.qweight', 'model.layers.14.feed_forward.w2.scales', 'model.layers.1.attention.wo.qzeros', 'model.layers.20.attention.wqkv.qzeros', 'model.layers.7.feed_forward.w2.scales', 'model.layers.0.attention.wo.qzeros', 'model.layers.28.feed_forward.w3.qweight', 'model.layers.7.attention.wo.scales', 'model.layers.23.feed_forward.w1.qweight', 'model.layers.21.attention.wqkv.scales', 'model.layers.5.attention.wo.qzeros', 'model.layers.12.feed_forward.w1.qzeros', 'model.layers.24.feed_forward.w3.qweight', 'model.layers.17.feed_forward.w1.qzeros', 'model.layers.26.attention.wqkv.scales', 'model.layers.8.attention.wqkv.qzeros', 'model.layers.8.attention.wqkv.qweight', 'model.layers.6.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w3.scales', 'model.layers.14.feed_forward.w3.qzeros', 'model.layers.28.attention.wo.scales', 'model.layers.15.attention.wqkv.scales', 'model.layers.19.feed_forward.w3.scales', 'model.layers.17.feed_forward.w1.qweight', 'model.layers.25.attention.wo.qzeros', 'model.layers.16.attention.wqkv.qzeros', 'model.layers.16.feed_forward.w1.qzeros', 'model.layers.30.feed_forward.w1.qweight', 'model.layers.16.attention.wo.qweight', 'model.layers.2.attention.wqkv.qzeros', 'model.layers.23.feed_forward.w1.scales', 'model.layers.10.feed_forward.w1.qweight', 'model.layers.29.feed_forward.w2.qzeros', 'model.layers.6.attention.wqkv.qzeros', 'model.layers.15.feed_forward.w1.qweight', 'model.layers.0.feed_forward.w2.qzeros', 'model.layers.27.feed_forward.w3.qzeros', 'model.layers.31.attention.wo.qzeros', 'model.layers.2.attention.wo.scales', 'model.layers.17.feed_forward.w3.qzeros', 'model.layers.19.feed_forward.w2.scales', 'model.layers.18.attention.wo.qzeros', 'model.layers.12.attention.wo.scales', 'model.layers.27.attention.wo.qweight', 'model.layers.30.attention.wo.scales', 'model.layers.20.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w1.scales', 'model.layers.30.attention.wqkv.qzeros', 'model.layers.30.feed_forward.w2.scales', 'model.layers.29.feed_forward.w3.scales', 'model.layers.8.feed_forward.w1.qzeros', 'model.layers.0.attention.wqkv.scales', 'model.layers.27.feed_forward.w2.qweight', 'model.layers.21.feed_forward.w2.qzeros', 'model.layers.23.attention.wqkv.qweight', 'model.layers.7.feed_forward.w3.qzeros', 'model.layers.2.feed_forward.w1.qweight', 'model.layers.20.feed_forward.w3.scales', 'model.layers.15.attention.wo.qzeros', 'model.layers.5.attention.wqkv.scales', 'model.layers.10.feed_forward.w2.qweight', 'model.layers.22.attention.wqkv.qzeros', 'model.layers.13.feed_forward.w2.scales', 'model.layers.15.feed_forward.w3.qzeros', 'model.layers.5.feed_forward.w3.qzeros', 'model.layers.28.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w2.qweight', 'model.layers.21.feed_forward.w2.scales', 'model.layers.1.attention.wqkv.qweight', 'model.layers.5.feed_forward.w3.qweight', 'model.layers.25.feed_forward.w3.scales', 'model.layers.12.feed_forward.w2.qzeros', 'model.layers.28.feed_forward.w2.qweight', 'model.layers.20.attention.wo.scales', 'model.layers.13.attention.wqkv.scales', 'model.layers.13.attention.wqkv.qweight', 'model.layers.18.feed_forward.w1.scales', 'model.layers.28.feed_forward.w2.scales', 'model.layers.19.feed_forward.w2.qzeros', 'model.layers.31.attention.wqkv.scales', 'model.layers.5.feed_forward.w2.qweight', 'model.layers.15.attention.wqkv.qweight', 'model.layers.13.attention.wqkv.qzeros', 'model.layers.16.feed_forward.w2.scales', 'model.layers.25.feed_forward.w2.qweight', 'model.layers.15.feed_forward.w1.qzeros', 'model.layers.16.feed_forward.w2.qweight', 'model.layers.9.feed_forward.w2.qweight', 'model.layers.17.feed_forward.w1.scales', 'model.layers.25.feed_forward.w2.qzeros', 'model.layers.26.attention.wqkv.qweight', 'model.layers.9.attention.wqkv.qweight', 'model.layers.9.attention.wqkv.scales', 'model.layers.26.feed_forward.w2.qzeros', 'model.layers.31.feed_forward.w1.qzeros', 'model.layers.16.attention.wqkv.qweight', 'model.layers.4.feed_forward.w1.scales', 'model.layers.6.attention.wqkv.scales', 'model.layers.19.attention.wqkv.qweight', 'model.layers.10.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w1.qzeros']
- This IS expected if you are initializing InternLMXComposer2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing InternLMXComposer2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of InternLMXComposer2ForCausalLM were not initialized from the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit and are newly initialized: ['model.layers.23.attention.wo.weight', 'model.layers.31.attention.wqkv.weight', 'model.layers.24.feed_forward.w3.weight', 'model.layers.7.feed_forward.w3.weight', 'model.layers.18.attention.wqkv.weight', 'model.layers.19.feed_forward.w2.weight', 'model.layers.30.attention.wo.weight', 'model.layers.11.attention.wo.weight', 'model.layers.3.attention.wqkv.weight', 'model.layers.25.feed_forward.w2.weight', 'model.layers.9.attention.wo.weight', 'model.layers.10.feed_forward.w3.weight', 'model.layers.0.feed_forward.w3.weight', 'model.layers.28.feed_forward.w2.weight', 'model.layers.19.feed_forward.w1.weight', 'model.layers.12.feed_forward.w2.weight', 'model.layers.31.feed_forward.w2.weight', 'model.layers.0.attention.wqkv.weight', 'model.layers.31.feed_forward.w3.weight', 'model.layers.22.feed_forward.w1.weight', 'model.layers.4.feed_forward.w3.weight', 'model.layers.2.feed_forward.w3.weight', 'model.layers.3.attention.wo.weight', 'model.layers.29.feed_forward.w1.weight', 'model.layers.28.attention.wo.weight', 'model.layers.31.attention.wo.weight', 'model.layers.23.feed_forward.w2.weight', 'model.layers.2.feed_forward.w1.weight', 'model.layers.14.feed_forward.w2.weight', 'model.layers.20.feed_forward.w2.weight', 'model.layers.12.attention.wo.weight', 'model.layers.10.attention.wo.weight', 'model.layers.1.feed_forward.w2.weight', 'model.layers.25.attention.wqkv.weight', 'model.layers.17.attention.wqkv.weight', 'model.layers.0.feed_forward.w1.weight', 'model.layers.10.feed_forward.w1.weight', 'model.layers.14.feed_forward.w3.weight', 'model.layers.30.feed_forward.w3.weight', 'model.layers.19.attention.wqkv.weight', 'model.layers.16.attention.wo.weight', 'model.layers.6.feed_forward.w3.weight', 'model.layers.26.feed_forward.w1.weight', 'model.layers.15.attention.wqkv.weight', 'model.layers.21.feed_forward.w1.weight', 'model.layers.13.attention.wqkv.weight', 'model.layers.22.attention.wo.weight', 'model.layers.11.feed_forward.w2.weight', 'model.layers.11.feed_forward.w3.weight', 'model.layers.9.attention.wqkv.weight', 'model.layers.0.attention.wo.weight', 'model.layers.16.feed_forward.w3.weight', 'model.layers.18.feed_forward.w1.weight', 'model.layers.11.feed_forward.w1.weight', 'model.layers.12.feed_forward.w3.weight', 'model.layers.17.feed_forward.w2.weight', 'model.layers.1.attention.wqkv.weight', 'model.layers.6.attention.wqkv.weight', 'model.layers.22.attention.wqkv.weight', 'model.layers.16.feed_forward.w2.weight', 'model.layers.20.attention.wo.weight', 'model.layers.5.attention.wo.weight', 'model.layers.18.feed_forward.w2.weight', 'model.layers.17.feed_forward.w3.weight', 'model.layers.29.feed_forward.w2.weight', 'model.layers.26.feed_forward.w2.weight', 'model.layers.28.feed_forward.w3.weight', 'model.layers.21.attention.wqkv.weight', 'model.layers.14.attention.wqkv.weight', 'model.layers.5.attention.wqkv.weight', 'model.layers.6.feed_forward.w2.weight', 'model.layers.22.feed_forward.w2.weight', 'model.layers.2.feed_forward.w2.weight', 'model.layers.3.feed_forward.w1.weight', 'model.layers.31.feed_forward.w1.weight', 'model.layers.16.feed_forward.w1.weight', 'model.layers.21.attention.wo.weight', 'model.layers.4.attention.wo.weight', 'model.layers.26.feed_forward.w3.weight', 'model.layers.9.feed_forward.w1.weight', 'model.layers.29.attention.wo.weight', 'model.layers.29.attention.wqkv.weight', 'model.layers.9.feed_forward.w2.weight', 'model.layers.27.feed_forward.w2.weight', 'model.layers.28.feed_forward.w1.weight', 'model.layers.26.attention.wo.weight', 'model.layers.8.attention.wo.weight', 'model.layers.12.attention.wqkv.weight', 'model.layers.8.feed_forward.w3.weight', 'model.layers.10.feed_forward.w2.weight', 'model.layers.2.attention.wqkv.weight', 'model.layers.4.feed_forward.w2.weight', 'model.layers.24.feed_forward.w2.weight', 'model.layers.15.feed_forward.w1.weight', 'model.layers.27.attention.wo.weight', 'model.layers.3.feed_forward.w2.weight', 'model.layers.16.attention.wqkv.weight', 'model.layers.1.feed_forward.w1.weight', 'model.layers.8.feed_forward.w1.weight', 'model.layers.20.feed_forward.w1.weight', 'model.layers.25.feed_forward.w1.weight', 'model.layers.7.attention.wqkv.weight', 'model.layers.20.feed_forward.w3.weight', 'model.layers.13.feed_forward.w2.weight', 'model.layers.7.attention.wo.weight', 'model.layers.23.feed_forward.w3.weight', 'model.layers.18.attention.wo.weight', 'model.layers.4.feed_forward.w1.weight', 'model.layers.23.attention.wqkv.weight', 'model.layers.5.feed_forward.w1.weight', 'model.layers.19.attention.wo.weight', 'model.layers.4.attention.wqkv.weight', 'model.layers.13.attention.wo.weight', 'model.layers.13.feed_forward.w1.weight', 'model.layers.24.attention.wqkv.weight', 'model.layers.27.feed_forward.w1.weight', 'model.layers.29.feed_forward.w3.weight', 'model.layers.9.feed_forward.w3.weight', 'model.layers.14.feed_forward.w1.weight', 'model.layers.6.feed_forward.w1.weight', 'model.layers.2.attention.wo.weight', 'model.layers.30.feed_forward.w2.weight', 'model.layers.21.feed_forward.w3.weight', 'model.layers.1.attention.wo.weight', 'model.layers.14.attention.wo.weight', 'model.layers.7.feed_forward.w2.weight', 'model.layers.17.attention.wo.weight', 'model.layers.8.feed_forward.w2.weight', 'model.layers.11.attention.wqkv.weight', 'model.layers.3.feed_forward.w3.weight', 'model.layers.5.feed_forward.w2.weight', 'model.layers.10.attention.wqkv.weight', 'model.layers.21.feed_forward.w2.weight', 'model.layers.19.feed_forward.w3.weight', 'model.layers.30.feed_forward.w1.weight', 'model.layers.25.attention.wo.weight', 'model.layers.26.attention.wqkv.weight', 'model.layers.15.attention.wo.weight', 'model.layers.27.attention.wqkv.weight', 'model.layers.18.feed_forward.w3.weight', 'model.layers.28.attention.wqkv.weight', 'model.layers.7.feed_forward.w1.weight', 'model.layers.5.feed_forward.w3.weight', 'model.layers.23.feed_forward.w1.weight', 'model.layers.1.feed_forward.w3.weight', 'model.layers.24.attention.wo.weight', 'model.layers.8.attention.wqkv.weight', 'model.layers.15.feed_forward.w3.weight', 'model.layers.20.attention.wqkv.weight', 'model.layers.27.feed_forward.w3.weight', 'model.layers.17.feed_forward.w1.weight', 'model.layers.12.feed_forward.w1.weight', 'model.layers.24.feed_forward.w1.weight', 'model.layers.15.feed_forward.w2.weight', 'model.layers.6.attention.wo.weight', 'model.layers.30.attention.wqkv.weight', 'model.layers.25.feed_forward.w3.weight', 'model.layers.22.feed_forward.w3.weight', 'model.layers.13.feed_forward.w3.weight', 'model.layers.0.feed_forward.w2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.53it/s]
Some weights of the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit were not used when initializing InternLMXComposer2ForCausalLM: ['model.layers.3.feed_forward.w1.scales', 'model.layers.21.attention.wqkv.scales', 'model.layers.14.feed_forward.w2.qzeros', 'model.layers.30.feed_forward.w1.scales', 'model.layers.0.feed_forward.w1.qzeros', 'model.layers.2.attention.wqkv.scales', 'model.layers.17.attention.wo.qweight', 'model.layers.15.attention.wo.qzeros', 'model.layers.15.attention.wqkv.scales', 'model.layers.21.feed_forward.w1.qzeros', 'model.layers.29.attention.wqkv.qzeros',
*************************
ers.23.attention.wqkv.weight', 'model.layers.27.feed_forward.w3.weight', 'model.layers.8.attention.wqkv.weight', 'model.layers.15.attention.wo.weight', 'model.layers.25.feed_forward.w2.weight', 'model.layers.18.feed_forward.w1.weight', 'model.layers.19.attention.wqkv.weight', 'model.layers.21.feed_forward.w3.weight', 'model.layers.22.attention.wqkv.weight', 'model.layers.30.attention.wqkv.weight', 'model.layers.27.attention.wqkv.weight', 'model.layers.24.attention.wo.weight', 'model.layers.29.attention.wo.weight', 'model.layers.20.feed_forward.w3.weight', 'model.layers.6.feed_forward.w3.weight', 'model.layers.1.feed_forward.w3.weight', 'model.layers.29.feed_forward.w3.weight', 'model.layers.19.attention.wo.weight', 'model.layers.25.feed_forward.w1.weight', 'model.layers.17.feed_forward.w1.weight', 'model.layers.21.attention.wo.weight', 'model.layers.27.feed_forward.w2.weight', 'model.layers.30.feed_forward.w3.weight', 'model.layers.27.attention.wo.weight', 'model.layers.6.feed_forward.w1.weight', 'model.layers.13.feed_forward.w3.weight', 'model.layers.16.feed_forward.w1.weight', 'model.layers.4.attention.wo.weight', 'model.layers.29.attention.wqkv.weight', 'model.layers.17.attention.wo.weight', 'model.layers.13.attention.wo.weight', 'model.layers.10.feed_forward.w2.weight', 'model.layers.9.feed_forward.w3.weight', 'model.layers.24.feed_forward.w1.weight', 'model.layers.8.feed_forward.w2.weight', 'model.layers.4.feed_forward.w2.weight', 'model.layers.23.feed_forward.w3.weight', 'model.layers.2.feed_forward.w1.weight', 'model.layers.0.feed_forward.w2.weight', 'model.layers.3.attention.wqkv.weight', 'model.layers.7.attention.wo.weight', 'model.layers.18.feed_forward.w3.weight', 'model.layers.4.feed_forward.w1.weight', 'model.layers.15.feed_forward.w2.weight', 'model.layers.17.feed_forward.w2.weight', 'model.layers.3.feed_forward.w1.weight', 'model.layers.25.feed_forward.w3.weight', 'model.layers.26.feed_forward.w3.weight', 'model.layers.5.attention.wo.weight', 'model.layers.14.feed_forward.w1.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 3
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
warnings.warn(
Traceback (most recent call last):
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
train()
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
trainer = Trainer(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`.
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
Loading data...
Load 10 samples from ['data/single_turn_single_image_example.json', '0.01']
init mix data at rank 0
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
warnings.warn(
Traceback (most recent call last):
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
train()
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
trainer = Trainer(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`.
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 2
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
warnings.warn(
Traceback (most recent call last):
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
train()
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
trainer = Trainer(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`.
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 1
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
warnings.warn(
Traceback (most recent call last):
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
train()
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
trainer = Trainer(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`.
[2024-07-25 17:00:45,016] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 10096 closing signal SIGTERM
[2024-07-25 17:00:45,331] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 10095) of binary: /home/ubuntu/miniconda3/envs/intern_clean/bin/python
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/intern_clean/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-07-25_17:00:45
host : ip-172-31-18-91.ec2.internal
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 10097)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-07-25_17:00:45
host : ip-172-31-18-91.ec2.internal
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 10098)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-07-25_17:00:45
host : ip-172-31-18-91.ec2.internal
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 10095)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Also I am using the finetune.py without any changes.
The text was updated successfully, but these errors were encountered:
@yuhangzang I have tried to run the code with 2.0 version. However still getting the same error:
Traceback (most recent call last):
File "/home/ubuntu/InternLM-XComposer/InternLM-XComposer-2.0/finetune/finetune.py", line 318, in <module>
train()
File "/home/ubuntu/InternLM-XComposer/InternLM-XComposer-2.0/finetune/finetune.py", line 305, in train
trainer = Trainer(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`.
Let me know, which additional information is needed, thank you!
Hello,
I have a question regarding fine tuning of quanitized internlm/internlm-xcomposer2-4khd-7b model. I have made quantization of 4khd model with lmdeploy, not trying to make fine tunning of this model. However, I am getting, such issue during this process. Do you have any suggestion, how I can solve it?
Env:
finetune_lora.sh
ds_config_zero2.json
data.txt
Traceback:
Also I am using the
finetune.py
without any changes.The text was updated successfully, but these errors were encountered: