Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tuning of quanitized internlm/internlm-xcomposer2-4khd-7b model? #402

Open
zhuraromdev opened this issue Jul 25, 2024 · 3 comments
Open

Comments

@zhuraromdev
Copy link

zhuraromdev commented Jul 25, 2024

Hello,

I have a question regarding fine tuning of quanitized internlm/internlm-xcomposer2-4khd-7b model. I have made quantization of 4khd model with lmdeploy, not trying to make fine tunning of this model. However, I am getting, such issue during this process. Do you have any suggestion, how I can solve it?

Env:

intern_clean) ubuntu@ip-172-31-18-91:~/InternLM-XComposer/finetune$ lmdeploy check_env
Matplotlib is building the font cache; this may take a moment.
sys.platform: linux
Python: 3.9.19 (main, May  6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA A10G
CUDA_HOME: /usr/local/cuda-12.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.7  (built against CUDA 12.2)
    - Built with CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.17.2+cu121
LMDeploy: 0.5.1+5840351
transformers: 4.33.2
gradio: 4.13.0
fastapi: 0.111.1
pydantic: 2.8.2
triton: 2.2.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PHB     PHB     PHB     0-47    0               N/A
GPU1    PHB      X      PHB     PHB     0-47    0               N/A
GPU2    PHB     PHB      X      PHB     0-47    0               N/A
GPU3    PHB     PHB     PHB      X      0-47    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

finetune_lora.sh

#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1
DIR=`pwd`

export TOKEN="TOKEN"

export MODEL="agentsea/internlm-xcomposer2-4khd-7b-4bit"
# export DATA="path of data"
export DATA="data.txt"

GPUS_PER_NODE=4
NNODES=1
NODE_RANK=0
MASTER_ADDR=localhost
MASTER_PORT=6001

DISTRIBUTED_ARGS="
    --nproc_per_node $GPUS_PER_NODE \
    --nnodes $NNODES \
    --node_rank $NODE_RANK \
    --master_addr $MASTER_ADDR \
    --master_port $MASTER_PORT
"

torchrun $DISTRIBUTED_ARGS finetune.py \
    --model_name_or_path $MODEL \
    --data_path $DATA \
    --given_num True \
    --bf16 True \
    --fix_vit True \
    --fix_sampler True \
    --use_lora True \
    --hd_num 16 \
    --output_dir output/finetune_lora \
    --num_train_epochs 5 \
    --batch_size 2 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "epoch" \
    --save_total_limit 1 \
    --learning_rate 5e-5 \
    --weight_decay 0.1 \
    --adam_beta2 0.95 \
    --warmup_ratio 0.01 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --report_to "none" \
    --max_length 16384 \
    --deepspeed ds_config_zero2.json \
    --gradient_checkpointing True

ds_config_zero2.json

{
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": "auto"
    },
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "none",
            "pin_memory": true
        },
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 100,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

data.txt

data/single_turn_single_image_example.json 0.01

Traceback:

(intern_clean) ubuntu@ip-172-31-18-91:~/InternLM-XComposer/finetune$ sh finetune_lora.sh 
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] 
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] *****************************************
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] *****************************************
[2024-07-25 16:59:32,011] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,019] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,047] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,050] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-07-25 16:59:34,054] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-07-25 16:59:34,099] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-07-25 16:59:34,127] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-25 16:59:34,127] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-07-25 16:59:34,131] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Set max length to 16384
Set max length to 16384
Set max length to 16384
Set max length to 16384
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.55it/s]
Some weights of the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit were not used when initializing InternLMXComposer2ForCausalLM: ['model.layers.12.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w1.qweight', 'model.layers.19.attention.wqkv.scales', 'model.layers.0.feed_forward.w1.scales', 'model.layers.4.attention.wqkv.qweight', 'model.layers.1.attention.wqkv.scales', 'model.layers.30.feed_forward.w3.qzeros', 'model.layers.29.attention.wqkv.qzeros', 'model.layers.15.attention.wo.qweight', 'model.layers.29.feed_forward.w2.scales', 'model.layers.3.feed_forward.w1.qzeros', 'model.layers.10.feed_forward.w2.qzeros', 'model.layers.19.feed_forward.w1.scales', 'model.layers.18.feed_forward.w2.scales', 'model.layers.14.feed_forward.w2.qzeros', 'model.layers.30.feed_forward.w2.qzeros', 'model.layers.16.feed_forward.w3.scales', 'model.layers.8.feed_forward.w2.qweight', 'model.layers.31.feed_forward.w3.qzeros', 'model.layers.29.feed_forward.w3.qweight', 'model.layers.0.feed_forward.w1.qzeros', 'model.layers.23.attention.wqkv.scales', 'model.layers.28.attention.wqkv.qweight', 'model.layers.1.attention.wqkv.qzeros', 'model.layers.10.attention.wo.qweight', 'model.layers.7.attention.wqkv.qweight', 'model.layers.24.feed_forward.w3.qzeros', 'model.layers.25.attention.wo.qweight', 'model.layers.27.feed_forward.w2.scales', 'model.layers.18.feed_forward.w2.qzeros', 'model.layers.18.feed_forward.w3.scales', 'model.layers.13.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w3.qzeros', 'model.layers.3.feed_forward.w2.qzeros', 'model.layers.24.feed_forward.w2.qweight', 'model.layers.20.feed_forward.w3.qweight', 'model.layers.24.attention.wqkv.qzeros', 'model.layers.3.feed_forward.w1.qweight', 'model.layers.22.attention.wqkv.scales', 'model.layers.10.feed_forward.w3.scales', 'model.layers.20.feed_forward.w1.qweight', 'model.layers.5.attention.wqkv.qzeros', 'model.layers.11.attention.wo.qweight', 'model.layers.11.attention.wqkv.qweight', 'model.layers.16.feed_forward.w1.qweight', 'model.layers.24.attention.wqkv.scales', 'model.layers.10.feed_forward.w1.qzeros', 'model.layers.23.attention.wqkv.qzeros', 'model.layers.21.attention.wqkv.qzeros', 'model.layers.7.feed_forward.w2.qweight', 'model.layers.7.attention.wo.qzeros', 'model.layers.9.attention.wo.qzeros', 'model.layers.29.feed_forward.w2.qweight', 'model.layers.24.feed_forward.w1.scales', 'model.layers.20.feed_forward.w1.scales', 'model.layers.15.feed_forward.w3.scales', 'model.layers.5.attention.wo.scales', 'model.layers.27.attention.wo.scales', 'model.layers.23.feed_forward.w1.qzeros', 'model.layers.25.feed_forward.w1.qzeros', 'model.layers.2.feed_forward.w2.qweight', 'model.layers.20.feed_forward.w2.scales', 'model.layers.12.attention.wqkv.qzeros', 'model.layers.14.attention.wo.qzeros', 'model.layers.22.feed_forward.w2.qzeros', 'model.layers.25.feed_forward.w3.qzeros', 'model.layers.6.feed_forward.w3.scales', 'model.layers.29.feed_forward.w3.qzeros', 'model.layers.19.attention.wo.qweight', 'model.layers.2.attention.wqkv.scales', 'model.layers.1.feed_forward.w2.scales', 'model.layers.19.feed_forward.w1.qweight', 'model.layers.13.attention.wo.qweight', 'model.layers.3.attention.wqkv.qweight', 'model.layers.23.feed_forward.w2.qweight', 'model.layers.22.attention.wo.qzeros', 'model.layers.3.feed_forward.w3.qweight', 'model.layers.16.feed_forward.w1.scales', 'model.layers.27.feed_forward.w1.qweight', 'model.layers.6.attention.wo.qzeros', 'model.layers.5.feed_forward.w1.scales', 'model.layers.1.feed_forward.w1.qzeros', 'model.layers.6.attention.wo.scales', 'model.layers.7.feed_forward.w1.qweight', 'model.layers.22.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w2.scales', 'model.layers.11.feed_forward.w3.qzeros', 'model.layers.13.feed_forward.w1.qzeros', 'model.layers.22.feed_forward.w3.scales', 'model.layers.5.feed_forward.w1.qweight', 'model.layers.28.attention.wqkv.scales', 'model.layers.26.feed_forward.w2.qweight', 'model.layers.4.attention.wo.scales', 'model.layers.19.feed_forward.w3.qzeros', 'model.layers.16.attention.wo.qzeros', 'model.layers.21.feed_forward.w3.qzeros', 'model.layers.16.feed_forward.w2.qzeros', 'model.layers.12.attention.wo.qzeros', 'model.layers.19.attention.wo.scales', 'model.layers.11.feed_forward.w3.qweight', 'model.layers.1.feed_forward.w1.scales', 'model.layers.26.feed_forward.w3.qweight', 'model.layers.31.feed_forward.w1.scales', 'model.layers.17.feed_forward.w2.scales', 'model.layers.13.attention.wo.scales', 'model.layers.21.attention.wo.qweight', 'model.layers.21.attention.wo.scales', 'model.layers.12.feed_forward.w1.scales', 'model.layers.31.attention.wo.qweight', 'model.layers.8.feed_forward.w2.qzeros', 'model.layers.31.feed_forward.w3.scales', 'model.layers.19.attention.wo.qzeros', 'model.layers.11.attention.wo.scales', 'model.layers.28.attention.wo.qweight', 'model.layers.6.feed_forward.w2.qzeros', 'model.layers.18.feed_forward.w1.qweight', 'model.layers.10.feed_forward.w1.scales', 'model.layers.8.feed_forward.w3.qzeros', 'model.layers.31.feed_forward.w2.qweight', 'model.layers.8.attention.wqkv.scales', 'model.layers.3.attention.wo.qweight', 'model.layers.6.feed_forward.w2.qweight', 'model.layers.0.attention.wo.qweight', 'model.layers.23.feed_forward.w2.qzeros', 'model.layers.21.feed_forward.w1.scales', 'model.layers.29.attention.wqkv.qweight', 'model.layers.3.attention.wo.scales', 'model.layers.25.feed_forward.w3.qweight', 'model.layers.30.feed_forward.w3.qweight', 'model.layers.31.attention.wo.scales', 'model.layers.17.attention.wqkv.scales', 'model.layers.29.attention.wo.qweight', 'model.layers.5.attention.wo.qweight', 'model.layers.10.attention.wqkv.qweight', 'model.layers.15.feed_forward.w2.scales', 'model.layers.29.attention.wqkv.scales', 'model.layers.14.feed_forward.w3.qweight', 'model.layers.7.feed_forward.w3.qweight', 'model.layers.16.feed_forward.w3.qweight', 'model.layers.26.attention.wqkv.qzeros', 'model.layers.28.feed_forward.w3.scales', 'model.layers.20.attention.wo.qweight', 'model.layers.25.feed_forward.w1.scales', 'model.layers.26.attention.wo.qweight', 'model.layers.21.feed_forward.w2.qweight', 'model.layers.17.feed_forward.w3.scales', 'model.layers.27.attention.wo.qzeros', 'model.layers.15.attention.wo.scales', 'model.layers.17.feed_forward.w2.qzeros', 'model.layers.12.attention.wqkv.scales', 'model.layers.11.attention.wqkv.scales', 'model.layers.28.feed_forward.w2.qzeros', 'model.layers.1.feed_forward.w3.scales', 'model.layers.0.attention.wqkv.qzeros', 'model.layers.30.attention.wo.qweight', 'model.layers.28.feed_forward.w1.scales', 'model.layers.18.attention.wo.scales', 'model.layers.17.feed_forward.w3.qweight', 'model.layers.3.feed_forward.w2.qweight', 'model.layers.1.feed_forward.w1.qweight', 'model.layers.23.feed_forward.w3.qweight', 'model.layers.9.feed_forward.w2.qzeros', 'model.layers.9.feed_forward.w1.qzeros', 'model.layers.28.feed_forward.w1.qzeros', 'model.layers.21.feed_forward.w1.qweight', 'model.layers.25.attention.wqkv.qzeros', 'model.layers.29.feed_forward.w1.qweight', 'model.layers.4.attention.wo.qzeros', 'model.layers.13.feed_forward.w1.qweight', 'model.layers.15.attention.wqkv.qzeros', 'model.layers.13.attention.wo.qzeros', 'model.layers.26.feed_forward.w1.qzeros', 'model.layers.8.attention.wo.qzeros', 'model.layers.14.feed_forward.w1.qzeros', 'model.layers.4.feed_forward.w3.scales', 'model.layers.22.feed_forward.w2.qweight', 'model.layers.7.feed_forward.w2.qzeros', 'model.layers.29.attention.wo.qzeros', 'model.layers.3.attention.wo.qzeros', 'model.layers.8.feed_forward.w2.scales', 'model.layers.5.feed_forward.w2.scales', 'model.layers.17.attention.wo.qzeros', 'model.layers.11.attention.wqkv.qzeros', 'model.layers.2.feed_forward.w1.scales', 'model.layers.24.attention.wo.qweight', 'model.layers.24.feed_forward.w1.qweight', 'model.layers.4.feed_forward.w3.qweight', 'model.layers.6.feed_forward.w1.qzeros', 'model.layers.26.attention.wo.scales', 'model.layers.20.feed_forward.w1.qzeros', 'model.layers.30.feed_forward.w1.scales', 'model.layers.22.feed_forward.w1.qzeros', 'model.layers.15.feed_forward.w2.qzeros', 'model.layers.5.feed_forward.w1.qzeros', 'model.layers.7.feed_forward.w1.scales', 'model.layers.0.feed_forward.w2.qweight', 'model.layers.17.attention.wo.scales', 'model.layers.20.feed_forward.w2.qzeros', 'model.layers.20.attention.wqkv.scales', 'model.layers.18.feed_forward.w3.qzeros', 'model.layers.26.attention.wo.qzeros', 'model.layers.14.feed_forward.w2.qweight', 'model.layers.14.feed_forward.w1.qweight', 'model.layers.10.feed_forward.w2.scales', 'model.layers.10.attention.wo.scales', 'model.layers.9.attention.wo.scales', 'model.layers.0.attention.wqkv.qweight', 'model.layers.12.feed_forward.w3.qweight', 'model.layers.22.feed_forward.w3.qzeros', 'model.layers.9.feed_forward.w1.qweight', 'model.layers.8.feed_forward.w3.scales', 'model.layers.23.attention.wo.qweight', 'model.layers.22.attention.wqkv.qweight', 'model.layers.23.attention.wo.qzeros', 'model.layers.31.feed_forward.w2.scales', 'model.layers.5.feed_forward.w3.scales', 'model.layers.10.attention.wqkv.scales', 'model.layers.14.attention.wo.scales', 'model.layers.14.attention.wqkv.qzeros', 'model.layers.19.feed_forward.w3.qweight', 'model.layers.4.attention.wo.qweight', 'model.layers.4.feed_forward.w2.qzeros', 'model.layers.12.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w1.scales', 'model.layers.4.feed_forward.w3.qzeros', 'model.layers.29.attention.wo.scales', 'model.layers.24.attention.wqkv.qweight', 'model.layers.4.feed_forward.w1.qzeros', 'model.layers.18.attention.wqkv.qzeros', 'model.layers.24.feed_forward.w3.scales', 'model.layers.18.attention.wqkv.qweight', 'model.layers.21.feed_forward.w3.qweight', 'model.layers.6.feed_forward.w3.qweight', 'model.layers.22.feed_forward.w1.qweight', 'model.layers.22.attention.wo.qweight', 'model.layers.21.attention.wqkv.qweight', 'model.layers.6.feed_forward.w2.scales', 'model.layers.24.feed_forward.w1.qzeros', 'model.layers.22.attention.wo.scales', 'model.layers.30.attention.wqkv.scales', 'model.layers.8.feed_forward.w1.qweight', 'model.layers.7.attention.wqkv.scales', 'model.layers.25.feed_forward.w2.scales', 'model.layers.24.attention.wo.qzeros', 'model.layers.15.feed_forward.w3.qweight', 'model.layers.24.feed_forward.w2.scales', 'model.layers.25.attention.wqkv.qweight', 'model.layers.4.attention.wqkv.qzeros', 'model.layers.1.feed_forward.w2.qzeros', 'model.layers.23.feed_forward.w3.scales', 'model.layers.13.feed_forward.w3.qzeros', 'model.layers.8.attention.wo.qweight', 'model.layers.0.feed_forward.w3.qweight', 'model.layers.29.feed_forward.w1.scales', 'model.layers.30.feed_forward.w3.scales', 'model.layers.4.attention.wqkv.scales', 'model.layers.27.feed_forward.w2.qzeros', 'model.layers.17.attention.wqkv.qzeros', 'model.layers.15.feed_forward.w2.qweight', 'model.layers.17.attention.wqkv.qweight', 'model.layers.11.attention.wo.qzeros', 'model.layers.20.feed_forward.w2.qweight', 'model.layers.28.attention.wo.qzeros', 'model.layers.10.attention.wqkv.qzeros', 'model.layers.18.attention.wqkv.scales', 'model.layers.7.attention.wqkv.qzeros', 'model.layers.2.attention.wqkv.qweight', 'model.layers.1.feed_forward.w3.qzeros', 'model.layers.31.attention.wqkv.qzeros', 'model.layers.0.feed_forward.w3.scales', 'model.layers.9.attention.wqkv.qzeros', 'model.layers.2.attention.wo.qzeros', 'model.layers.1.feed_forward.w2.qweight', 'model.layers.26.feed_forward.w2.scales', 'model.layers.18.feed_forward.w2.qweight', 'model.layers.1.attention.wo.qweight', 'model.layers.27.feed_forward.w1.qzeros', 'model.layers.30.attention.wqkv.qweight', 'model.layers.19.attention.wqkv.qzeros', 'model.layers.14.attention.wo.qweight', 'model.layers.13.feed_forward.w3.scales', 'model.layers.9.feed_forward.w3.scales', 'model.layers.22.feed_forward.w1.scales', 'model.layers.24.attention.wo.scales', 'model.layers.5.attention.wqkv.qweight', 'model.layers.6.attention.wo.qweight', 'model.layers.10.attention.wo.qzeros', 'model.layers.15.feed_forward.w1.scales', 'model.layers.0.feed_forward.w3.qzeros', 'model.layers.14.feed_forward.w3.scales', 'model.layers.12.feed_forward.w3.scales', 'model.layers.22.feed_forward.w2.scales', 'model.layers.8.feed_forward.w3.qweight', 'model.layers.31.feed_forward.w3.qweight', 'model.layers.28.feed_forward.w3.qzeros', 'model.layers.19.feed_forward.w1.qzeros', 'model.layers.31.feed_forward.w1.qweight', 'model.layers.14.attention.wqkv.qweight', 'model.layers.11.feed_forward.w3.scales', 'model.layers.12.attention.wo.qweight', 'model.layers.25.feed_forward.w1.qweight', 'model.layers.3.feed_forward.w1.scales', 'model.layers.4.feed_forward.w2.qweight', 'model.layers.27.feed_forward.w3.scales', 'model.layers.13.feed_forward.w1.scales', 'model.layers.2.feed_forward.w3.qweight', 'model.layers.6.attention.wqkv.qweight', 'model.layers.23.attention.wo.scales', 'model.layers.13.feed_forward.w2.qzeros', 'model.layers.31.attention.wqkv.qweight', 'model.layers.9.feed_forward.w3.qzeros', 'model.layers.6.feed_forward.w1.qweight', 'model.layers.5.feed_forward.w2.qzeros', 'model.layers.3.feed_forward.w3.scales', 'model.layers.13.feed_forward.w2.qweight', 'model.layers.1.attention.wo.scales', 'model.layers.3.feed_forward.w3.qzeros', 'model.layers.7.feed_forward.w3.scales', 'model.layers.30.feed_forward.w1.qzeros', 'model.layers.27.feed_forward.w3.qweight', 'model.layers.21.feed_forward.w3.scales', 'model.layers.6.feed_forward.w1.scales', 'model.layers.16.feed_forward.w3.qzeros', 'model.layers.4.feed_forward.w2.scales', 'model.layers.28.attention.wqkv.qzeros', 'model.layers.25.attention.wo.scales', 'model.layers.11.feed_forward.w2.scales', 'model.layers.29.feed_forward.w1.qzeros', 'model.layers.11.feed_forward.w2.qzeros', 'model.layers.21.attention.wo.qzeros', 'model.layers.0.attention.wo.scales', 'model.layers.30.feed_forward.w2.qweight', 'model.layers.27.attention.wqkv.qweight', 'model.layers.9.feed_forward.w1.scales', 'model.layers.2.feed_forward.w3.scales', 'model.layers.24.feed_forward.w2.qzeros', 'model.layers.20.attention.wqkv.qweight', 'model.layers.27.feed_forward.w1.scales', 'model.layers.18.attention.wo.qweight', 'model.layers.10.feed_forward.w3.qzeros', 'model.layers.17.attention.wo.qweight', 'model.layers.3.attention.wqkv.qzeros', 'model.layers.14.feed_forward.w1.scales', 'model.layers.12.feed_forward.w2.qweight', 'model.layers.11.feed_forward.w1.qzeros', 'model.layers.18.feed_forward.w1.qzeros', 'model.layers.8.attention.wo.scales', 'model.layers.17.feed_forward.w2.qweight', 'model.layers.27.attention.wqkv.scales', 'model.layers.12.feed_forward.w2.scales', 'model.layers.0.feed_forward.w1.qweight', 'model.layers.9.attention.wo.qweight', 'model.layers.3.feed_forward.w2.scales', 'model.layers.0.feed_forward.w2.scales', 'model.layers.2.attention.wo.qweight', 'model.layers.8.feed_forward.w1.scales', 'model.layers.21.feed_forward.w1.qzeros', 'model.layers.14.attention.wqkv.scales', 'model.layers.4.feed_forward.w1.qweight', 'model.layers.30.attention.wo.qzeros', 'model.layers.1.feed_forward.w3.qweight', 'model.layers.18.feed_forward.w3.qweight', 'model.layers.9.feed_forward.w2.scales', 'model.layers.31.feed_forward.w2.qzeros', 'model.layers.16.attention.wqkv.scales', 'model.layers.25.attention.wqkv.scales', 'model.layers.9.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w2.qzeros', 'model.layers.23.feed_forward.w2.scales', 'model.layers.12.attention.wqkv.qweight', 'model.layers.16.attention.wo.scales', 'model.layers.7.feed_forward.w1.qzeros', 'model.layers.27.attention.wqkv.qzeros', 'model.layers.19.feed_forward.w2.qweight', 'model.layers.20.attention.wo.qzeros', 'model.layers.26.feed_forward.w3.qzeros', 'model.layers.3.attention.wqkv.scales', 'model.layers.23.feed_forward.w3.qzeros', 'model.layers.7.attention.wo.qweight', 'model.layers.14.feed_forward.w2.scales', 'model.layers.1.attention.wo.qzeros', 'model.layers.20.attention.wqkv.qzeros', 'model.layers.7.feed_forward.w2.scales', 'model.layers.0.attention.wo.qzeros', 'model.layers.28.feed_forward.w3.qweight', 'model.layers.7.attention.wo.scales', 'model.layers.23.feed_forward.w1.qweight', 'model.layers.21.attention.wqkv.scales', 'model.layers.5.attention.wo.qzeros', 'model.layers.12.feed_forward.w1.qzeros', 'model.layers.24.feed_forward.w3.qweight', 'model.layers.17.feed_forward.w1.qzeros', 'model.layers.26.attention.wqkv.scales', 'model.layers.8.attention.wqkv.qzeros', 'model.layers.8.attention.wqkv.qweight', 'model.layers.6.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w3.scales', 'model.layers.14.feed_forward.w3.qzeros', 'model.layers.28.attention.wo.scales', 'model.layers.15.attention.wqkv.scales', 'model.layers.19.feed_forward.w3.scales', 'model.layers.17.feed_forward.w1.qweight', 'model.layers.25.attention.wo.qzeros', 'model.layers.16.attention.wqkv.qzeros', 'model.layers.16.feed_forward.w1.qzeros', 'model.layers.30.feed_forward.w1.qweight', 'model.layers.16.attention.wo.qweight', 'model.layers.2.attention.wqkv.qzeros', 'model.layers.23.feed_forward.w1.scales', 'model.layers.10.feed_forward.w1.qweight', 'model.layers.29.feed_forward.w2.qzeros', 'model.layers.6.attention.wqkv.qzeros', 'model.layers.15.feed_forward.w1.qweight', 'model.layers.0.feed_forward.w2.qzeros', 'model.layers.27.feed_forward.w3.qzeros', 'model.layers.31.attention.wo.qzeros', 'model.layers.2.attention.wo.scales', 'model.layers.17.feed_forward.w3.qzeros', 'model.layers.19.feed_forward.w2.scales', 'model.layers.18.attention.wo.qzeros', 'model.layers.12.attention.wo.scales', 'model.layers.27.attention.wo.qweight', 'model.layers.30.attention.wo.scales', 'model.layers.20.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w1.scales', 'model.layers.30.attention.wqkv.qzeros', 'model.layers.30.feed_forward.w2.scales', 'model.layers.29.feed_forward.w3.scales', 'model.layers.8.feed_forward.w1.qzeros', 'model.layers.0.attention.wqkv.scales', 'model.layers.27.feed_forward.w2.qweight', 'model.layers.21.feed_forward.w2.qzeros', 'model.layers.23.attention.wqkv.qweight', 'model.layers.7.feed_forward.w3.qzeros', 'model.layers.2.feed_forward.w1.qweight', 'model.layers.20.feed_forward.w3.scales', 'model.layers.15.attention.wo.qzeros', 'model.layers.5.attention.wqkv.scales', 'model.layers.10.feed_forward.w2.qweight', 'model.layers.22.attention.wqkv.qzeros', 'model.layers.13.feed_forward.w2.scales', 'model.layers.15.feed_forward.w3.qzeros', 'model.layers.5.feed_forward.w3.qzeros', 'model.layers.28.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w2.qweight', 'model.layers.21.feed_forward.w2.scales', 'model.layers.1.attention.wqkv.qweight', 'model.layers.5.feed_forward.w3.qweight', 'model.layers.25.feed_forward.w3.scales', 'model.layers.12.feed_forward.w2.qzeros', 'model.layers.28.feed_forward.w2.qweight', 'model.layers.20.attention.wo.scales', 'model.layers.13.attention.wqkv.scales', 'model.layers.13.attention.wqkv.qweight', 'model.layers.18.feed_forward.w1.scales', 'model.layers.28.feed_forward.w2.scales', 'model.layers.19.feed_forward.w2.qzeros', 'model.layers.31.attention.wqkv.scales', 'model.layers.5.feed_forward.w2.qweight', 'model.layers.15.attention.wqkv.qweight', 'model.layers.13.attention.wqkv.qzeros', 'model.layers.16.feed_forward.w2.scales', 'model.layers.25.feed_forward.w2.qweight', 'model.layers.15.feed_forward.w1.qzeros', 'model.layers.16.feed_forward.w2.qweight', 'model.layers.9.feed_forward.w2.qweight', 'model.layers.17.feed_forward.w1.scales', 'model.layers.25.feed_forward.w2.qzeros', 'model.layers.26.attention.wqkv.qweight', 'model.layers.9.attention.wqkv.qweight', 'model.layers.9.attention.wqkv.scales', 'model.layers.26.feed_forward.w2.qzeros', 'model.layers.31.feed_forward.w1.qzeros', 'model.layers.16.attention.wqkv.qweight', 'model.layers.4.feed_forward.w1.scales', 'model.layers.6.attention.wqkv.scales', 'model.layers.19.attention.wqkv.qweight', 'model.layers.10.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w1.qzeros']
- This IS expected if you are initializing InternLMXComposer2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing InternLMXComposer2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of InternLMXComposer2ForCausalLM were not initialized from the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit and are newly initialized: ['model.layers.23.attention.wo.weight', 'model.layers.31.attention.wqkv.weight', 'model.layers.24.feed_forward.w3.weight', 'model.layers.7.feed_forward.w3.weight', 'model.layers.18.attention.wqkv.weight', 'model.layers.19.feed_forward.w2.weight', 'model.layers.30.attention.wo.weight', 'model.layers.11.attention.wo.weight', 'model.layers.3.attention.wqkv.weight', 'model.layers.25.feed_forward.w2.weight', 'model.layers.9.attention.wo.weight', 'model.layers.10.feed_forward.w3.weight', 'model.layers.0.feed_forward.w3.weight', 'model.layers.28.feed_forward.w2.weight', 'model.layers.19.feed_forward.w1.weight', 'model.layers.12.feed_forward.w2.weight', 'model.layers.31.feed_forward.w2.weight', 'model.layers.0.attention.wqkv.weight', 'model.layers.31.feed_forward.w3.weight', 'model.layers.22.feed_forward.w1.weight', 'model.layers.4.feed_forward.w3.weight', 'model.layers.2.feed_forward.w3.weight', 'model.layers.3.attention.wo.weight', 'model.layers.29.feed_forward.w1.weight', 'model.layers.28.attention.wo.weight', 'model.layers.31.attention.wo.weight', 'model.layers.23.feed_forward.w2.weight', 'model.layers.2.feed_forward.w1.weight', 'model.layers.14.feed_forward.w2.weight', 'model.layers.20.feed_forward.w2.weight', 'model.layers.12.attention.wo.weight', 'model.layers.10.attention.wo.weight', 'model.layers.1.feed_forward.w2.weight', 'model.layers.25.attention.wqkv.weight', 'model.layers.17.attention.wqkv.weight', 'model.layers.0.feed_forward.w1.weight', 'model.layers.10.feed_forward.w1.weight', 'model.layers.14.feed_forward.w3.weight', 'model.layers.30.feed_forward.w3.weight', 'model.layers.19.attention.wqkv.weight', 'model.layers.16.attention.wo.weight', 'model.layers.6.feed_forward.w3.weight', 'model.layers.26.feed_forward.w1.weight', 'model.layers.15.attention.wqkv.weight', 'model.layers.21.feed_forward.w1.weight', 'model.layers.13.attention.wqkv.weight', 'model.layers.22.attention.wo.weight', 'model.layers.11.feed_forward.w2.weight', 'model.layers.11.feed_forward.w3.weight', 'model.layers.9.attention.wqkv.weight', 'model.layers.0.attention.wo.weight', 'model.layers.16.feed_forward.w3.weight', 'model.layers.18.feed_forward.w1.weight', 'model.layers.11.feed_forward.w1.weight', 'model.layers.12.feed_forward.w3.weight', 'model.layers.17.feed_forward.w2.weight', 'model.layers.1.attention.wqkv.weight', 'model.layers.6.attention.wqkv.weight', 'model.layers.22.attention.wqkv.weight', 'model.layers.16.feed_forward.w2.weight', 'model.layers.20.attention.wo.weight', 'model.layers.5.attention.wo.weight', 'model.layers.18.feed_forward.w2.weight', 'model.layers.17.feed_forward.w3.weight', 'model.layers.29.feed_forward.w2.weight', 'model.layers.26.feed_forward.w2.weight', 'model.layers.28.feed_forward.w3.weight', 'model.layers.21.attention.wqkv.weight', 'model.layers.14.attention.wqkv.weight', 'model.layers.5.attention.wqkv.weight', 'model.layers.6.feed_forward.w2.weight', 'model.layers.22.feed_forward.w2.weight', 'model.layers.2.feed_forward.w2.weight', 'model.layers.3.feed_forward.w1.weight', 'model.layers.31.feed_forward.w1.weight', 'model.layers.16.feed_forward.w1.weight', 'model.layers.21.attention.wo.weight', 'model.layers.4.attention.wo.weight', 'model.layers.26.feed_forward.w3.weight', 'model.layers.9.feed_forward.w1.weight', 'model.layers.29.attention.wo.weight', 'model.layers.29.attention.wqkv.weight', 'model.layers.9.feed_forward.w2.weight', 'model.layers.27.feed_forward.w2.weight', 'model.layers.28.feed_forward.w1.weight', 'model.layers.26.attention.wo.weight', 'model.layers.8.attention.wo.weight', 'model.layers.12.attention.wqkv.weight', 'model.layers.8.feed_forward.w3.weight', 'model.layers.10.feed_forward.w2.weight', 'model.layers.2.attention.wqkv.weight', 'model.layers.4.feed_forward.w2.weight', 'model.layers.24.feed_forward.w2.weight', 'model.layers.15.feed_forward.w1.weight', 'model.layers.27.attention.wo.weight', 'model.layers.3.feed_forward.w2.weight', 'model.layers.16.attention.wqkv.weight', 'model.layers.1.feed_forward.w1.weight', 'model.layers.8.feed_forward.w1.weight', 'model.layers.20.feed_forward.w1.weight', 'model.layers.25.feed_forward.w1.weight', 'model.layers.7.attention.wqkv.weight', 'model.layers.20.feed_forward.w3.weight', 'model.layers.13.feed_forward.w2.weight', 'model.layers.7.attention.wo.weight', 'model.layers.23.feed_forward.w3.weight', 'model.layers.18.attention.wo.weight', 'model.layers.4.feed_forward.w1.weight', 'model.layers.23.attention.wqkv.weight', 'model.layers.5.feed_forward.w1.weight', 'model.layers.19.attention.wo.weight', 'model.layers.4.attention.wqkv.weight', 'model.layers.13.attention.wo.weight', 'model.layers.13.feed_forward.w1.weight', 'model.layers.24.attention.wqkv.weight', 'model.layers.27.feed_forward.w1.weight', 'model.layers.29.feed_forward.w3.weight', 'model.layers.9.feed_forward.w3.weight', 'model.layers.14.feed_forward.w1.weight', 'model.layers.6.feed_forward.w1.weight', 'model.layers.2.attention.wo.weight', 'model.layers.30.feed_forward.w2.weight', 'model.layers.21.feed_forward.w3.weight', 'model.layers.1.attention.wo.weight', 'model.layers.14.attention.wo.weight', 'model.layers.7.feed_forward.w2.weight', 'model.layers.17.attention.wo.weight', 'model.layers.8.feed_forward.w2.weight', 'model.layers.11.attention.wqkv.weight', 'model.layers.3.feed_forward.w3.weight', 'model.layers.5.feed_forward.w2.weight', 'model.layers.10.attention.wqkv.weight', 'model.layers.21.feed_forward.w2.weight', 'model.layers.19.feed_forward.w3.weight', 'model.layers.30.feed_forward.w1.weight', 'model.layers.25.attention.wo.weight', 'model.layers.26.attention.wqkv.weight', 'model.layers.15.attention.wo.weight', 'model.layers.27.attention.wqkv.weight', 'model.layers.18.feed_forward.w3.weight', 'model.layers.28.attention.wqkv.weight', 'model.layers.7.feed_forward.w1.weight', 'model.layers.5.feed_forward.w3.weight', 'model.layers.23.feed_forward.w1.weight', 'model.layers.1.feed_forward.w3.weight', 'model.layers.24.attention.wo.weight', 'model.layers.8.attention.wqkv.weight', 'model.layers.15.feed_forward.w3.weight', 'model.layers.20.attention.wqkv.weight', 'model.layers.27.feed_forward.w3.weight', 'model.layers.17.feed_forward.w1.weight', 'model.layers.12.feed_forward.w1.weight', 'model.layers.24.feed_forward.w1.weight', 'model.layers.15.feed_forward.w2.weight', 'model.layers.6.attention.wo.weight', 'model.layers.30.attention.wqkv.weight', 'model.layers.25.feed_forward.w3.weight', 'model.layers.22.feed_forward.w3.weight', 'model.layers.13.feed_forward.w3.weight', 'model.layers.0.feed_forward.w2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.53it/s]
Some weights of the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit were not used when initializing InternLMXComposer2ForCausalLM: ['model.layers.3.feed_forward.w1.scales', 'model.layers.21.attention.wqkv.scales', 'model.layers.14.feed_forward.w2.qzeros', 'model.layers.30.feed_forward.w1.scales', 'model.layers.0.feed_forward.w1.qzeros', 'model.layers.2.attention.wqkv.scales', 'model.layers.17.attention.wo.qweight', 'model.layers.15.attention.wo.qzeros', 'model.layers.15.attention.wqkv.scales', 'model.layers.21.feed_forward.w1.qzeros', 'model.layers.29.attention.wqkv.qzeros', 
*************************
ers.23.attention.wqkv.weight', 'model.layers.27.feed_forward.w3.weight', 'model.layers.8.attention.wqkv.weight', 'model.layers.15.attention.wo.weight', 'model.layers.25.feed_forward.w2.weight', 'model.layers.18.feed_forward.w1.weight', 'model.layers.19.attention.wqkv.weight', 'model.layers.21.feed_forward.w3.weight', 'model.layers.22.attention.wqkv.weight', 'model.layers.30.attention.wqkv.weight', 'model.layers.27.attention.wqkv.weight', 'model.layers.24.attention.wo.weight', 'model.layers.29.attention.wo.weight', 'model.layers.20.feed_forward.w3.weight', 'model.layers.6.feed_forward.w3.weight', 'model.layers.1.feed_forward.w3.weight', 'model.layers.29.feed_forward.w3.weight', 'model.layers.19.attention.wo.weight', 'model.layers.25.feed_forward.w1.weight', 'model.layers.17.feed_forward.w1.weight', 'model.layers.21.attention.wo.weight', 'model.layers.27.feed_forward.w2.weight', 'model.layers.30.feed_forward.w3.weight', 'model.layers.27.attention.wo.weight', 'model.layers.6.feed_forward.w1.weight', 'model.layers.13.feed_forward.w3.weight', 'model.layers.16.feed_forward.w1.weight', 'model.layers.4.attention.wo.weight', 'model.layers.29.attention.wqkv.weight', 'model.layers.17.attention.wo.weight', 'model.layers.13.attention.wo.weight', 'model.layers.10.feed_forward.w2.weight', 'model.layers.9.feed_forward.w3.weight', 'model.layers.24.feed_forward.w1.weight', 'model.layers.8.feed_forward.w2.weight', 'model.layers.4.feed_forward.w2.weight', 'model.layers.23.feed_forward.w3.weight', 'model.layers.2.feed_forward.w1.weight', 'model.layers.0.feed_forward.w2.weight', 'model.layers.3.attention.wqkv.weight', 'model.layers.7.attention.wo.weight', 'model.layers.18.feed_forward.w3.weight', 'model.layers.4.feed_forward.w1.weight', 'model.layers.15.feed_forward.w2.weight', 'model.layers.17.feed_forward.w2.weight', 'model.layers.3.feed_forward.w1.weight', 'model.layers.25.feed_forward.w3.weight', 'model.layers.26.feed_forward.w3.weight', 'model.layers.5.attention.wo.weight', 'model.layers.14.feed_forward.w1.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 3
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
Loading data...
Load 10 samples from ['data/single_turn_single_image_example.json', '0.01']
init mix data at rank 0
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 2
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 1
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
[2024-07-25 17:00:45,016] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 10096 closing signal SIGTERM
[2024-07-25 17:00:45,331] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 10095) of binary: /home/ubuntu/miniconda3/envs/intern_clean/bin/python
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/intern_clean/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-07-25_17:00:45
  host      : ip-172-31-18-91.ec2.internal
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 10097)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-07-25_17:00:45
  host      : ip-172-31-18-91.ec2.internal
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 10098)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-07-25_17:00:45
  host      : ip-172-31-18-91.ec2.internal
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 10095)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Also I am using the finetune.py without any changes.

Screenshot 2024-07-25 at 19 03 49

@yuhangzang
Copy link
Collaborator

Do u use the fine-tune code in IXC 2.0? It is different with the IXC 2.5 finetune code.

@zhuraromdev
Copy link
Author

zhuraromdev commented Jul 28, 2024

Hey, yeap, I am using the code form here: https://github.com/InternLM/InternLM-XComposer/blob/main/finetune/finetune.py

@zhuraromdev
Copy link
Author

zhuraromdev commented Jul 28, 2024

@yuhangzang I have tried to run the code with 2.0 version. However still getting the same error:

Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/InternLM-XComposer-2.0/finetune/finetune.py", line 318, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/InternLM-XComposer-2.0/finetune/finetune.py", line 305, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 

Let me know, which additional information is needed, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants