Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesla P40不支持 flash_attn pipline 推理报错 #432

Open
love94me opened this issue Aug 26, 2024 · 3 comments
Open

Tesla P40不支持 flash_attn pipline 推理报错 #432

love94me opened this issue Aug 26, 2024 · 3 comments
Assignees

Comments

@love94me
Copy link

from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline(
    '/home/workspace/model/internlm-xcomposer2d5-7b',
    backend_config=TurbomindEngineConfig(tp=4)
    ,model_name='yinhe-vl-chat'
)
image = load_image('/home/workspace/123.png')

response = pipe(('describe this image', image))
print(response.text)

我要怎样关闭它,不使用 flash_attn 加速

(.venv) root@e9c6a6e513d9:/home/workspace#  cd /home/workspace ; /usr/bin/env /home/workspace/.venv/bin/python /config/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 44773 -- /home/workspace/test.py 
/home/workspace/.venv/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-08-26 18:06:01,470 - modelscope - INFO - PyTorch version 2.2.1+cu118 Found.
2024-08-26 18:06:01,475 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-08-26 18:06:01,553 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 678b20b607a9b7a970b699aea2b84212 and a total number of 980 components indexed
Set max length to 16384
Dummy Resized
Device does not support bfloat16. Set float16 forcefully
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                                                                                           
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
terminate called after throwing an instance of 'std::runtime_error'
  what():  [TM][ERROR]  Assertion fail: /lmdeploy/src/turbomind/kernels/attention/attention.cu:35 
@InternLM InternLM deleted a comment Aug 26, 2024
@InternLM InternLM deleted a comment Aug 26, 2024
@yuhangzang
Copy link
Collaborator

You can check the definition in config.json and configuration_internlm_xcomposer2.py.

Not that you may face the out-of-memory problem if you do not use the flash-attention.

@love94me
Copy link
Author

You can check the definition in config.json and configuration_internlm_xcomposer2.py.

Not that you may face the out-of-memory problem if you do not use the flash-attention.

config.json 和 configuration_internlm_xcomposer2.py 两个都已经修改成以下,但是问题依旧存在,我使用得是pipeline推理才出现得错误

"attn_implementation": "eager",

@yuhangzang
Copy link
Collaborator

You may set a breakpoint here and check the value of config.attn_implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@myownskyW7 @yuhangzang @love94me and others