Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much memory is "enough"? #330

Open
pattang56892 opened this issue May 3, 2024 · 5 comments
Open

How much memory is "enough"? #330

pattang56892 opened this issue May 3, 2024 · 5 comments

Comments

@pattang56892
Copy link

No description provided.

@bhaswata08
Copy link

bhaswata08 commented May 16, 2024

Calculate the amount of VRAM you need for inference: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

@bhaswata08
Copy link

https://rahulschand.github.io/gpu_poor/

@pattang56892
Copy link
Author

My question is : how much memory is needed for GPU in order to test Grok-1?

@bhaswata08
Copy link

bhaswata08 commented May 22, 2024

I am assuming you are not running any quantization algos.
For inference:
2 active experts per token, ~ 314/4 active parameters. Let us assume a standard context length of 4096 tokens and a generation of 4096 tokens for simplicity. If we assume that you do not care about tokens/s. You will need ~250GB VRAM. However in practice to run it at with stable runtime, usable speed and actual proper generation, you may need around 8xNvidia H100 80GB.

In the above, I have given an estimated ballpark. I cant really give you exact numbers as config info for grok is missing from huggingface. If you want to actually find the proper memory requirement, I would suggest https://huggingface.co/blog/Andyrasika/memory-consumption-estimation

@pattang56892
Copy link
Author

Thank you so much!
Very useful information.
Do you know why the structure is set up for 8 GPUs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants