-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different perplexity when fine-tuning with parallel_model vs 1 gpu #86
Comments
Hey Randy, |
probably unrelated with your issue, nut bear in mind that llama 3 uses Rope scaling which is not implemented in Eole yet. |
Hello both, thanks for your replies. I will check your suggestions as soon as possible and will keep you posted. |
Hello, |
Hello everyone, Tensorboard logs Output values Different values in decoder forward 1 GPU Layer_in norm: 389.75 norm_layer_in Euclidean Distance to zero: 691.5 Layer nr 4 Layer_in norm: 391.25 norm_layer_in Euclidean Distance to zero: 620.0 Tensor parallel Layer_in norm: 389.75 norm_layer_in Euclidean Distance to zero: 691.5 Layer nr 4 Layer_in norm: 391.5 norm_layer_in Euclidean Distance to zero: 620.0 Checkpoint size The sizes (KB) of the 400th checkpoint for the 1 gpu model are: Could you please advise? Thanks! output.csv |
Hello,
we have noticed some unexpected behaviors when fine-tuning a llama 3 model on 1 gpu and when fine-tuning the same model on the same data set with 2 gpus in parallel mode. See the attached tensorboard graphs (red=run with parallel mode). The minimal validation ppl is different between the two runs.
As you can see from the configs I am pasting below, the only parameters that differ between the runs are: world_size, gpu_rank and parallel_mode.
Could you please advise?
Configs for run with 1 GPU
Configs for run with parallel_mode
The text was updated successfully, but these errors were encountered: