How much memory is "enough"? #330

pattang56892 · 2024-05-03T16:38:44Z

No description provided.

bhaswata08 · 2024-05-16T08:20:12Z

Calculate the amount of VRAM you need for inference: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

bhaswata08 · 2024-05-16T08:35:30Z

https://rahulschand.github.io/gpu_poor/

pattang56892 · 2024-05-17T02:48:02Z

My question is : how much memory is needed for GPU in order to test Grok-1?

bhaswata08 · 2024-05-22T09:29:35Z

I am assuming you are not running any quantization algos.
For inference:
2 active experts per token, ~ 314/4 active parameters. Let us assume a standard context length of 4096 tokens and a generation of 4096 tokens for simplicity. If we assume that you do not care about tokens/s. You will need ~250GB VRAM. However in practice to run it at with stable runtime, usable speed and actual proper generation, you may need around 8xNvidia H100 80GB.

In the above, I have given an estimated ballpark. I cant really give you exact numbers as config info for grok is missing from huggingface. If you want to actually find the proper memory requirement, I would suggest https://huggingface.co/blog/Andyrasika/memory-consumption-estimation

pattang56892 · 2024-05-24T12:45:47Z

Thank you so much!
Very useful information.
Do you know why the structure is set up for 8 GPUs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How much memory is "enough"? #330

How much memory is "enough"? #330

pattang56892 commented May 3, 2024

bhaswata08 commented May 16, 2024 •

edited

Loading

bhaswata08 commented May 16, 2024

pattang56892 commented May 17, 2024

bhaswata08 commented May 22, 2024 •

edited

Loading

pattang56892 commented May 24, 2024

How much memory is "enough"? #330

How much memory is "enough"? #330

Comments

pattang56892 commented May 3, 2024

bhaswata08 commented May 16, 2024 • edited Loading

bhaswata08 commented May 16, 2024

pattang56892 commented May 17, 2024

bhaswata08 commented May 22, 2024 • edited Loading

pattang56892 commented May 24, 2024

bhaswata08 commented May 16, 2024 •

edited

Loading

bhaswata08 commented May 22, 2024 •

edited

Loading