Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4bit AWQ/GPTQ quantized model? #4

Open
kenvix opened this issue May 23, 2024 · 2 comments
Open

4bit AWQ/GPTQ quantized model? #4

kenvix opened this issue May 23, 2024 · 2 comments

Comments

@kenvix
Copy link

kenvix commented May 23, 2024

Thank you for your work.

I would like to know if there are any plans to release a 4bit AWQ/GPTQ quantized version for the 70B size model, as I don't have enough resources locally to run the quantization procedures.

@yuxie11
Copy link
Contributor

yuxie11 commented May 23, 2024

Thank you for your question. We plan to do this in the future.

@aoji0606
Copy link
Collaborator

There is a simple way, you can load the model using quantization of transformers, such as modifying inference code:

tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, args.model_base, model_name, load_4bit=True)

If you set load_4bit=True, please set nproc_per_node=1 in scripts.
This will bring some loss of accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants