This repo contains the solution for the Feedback prize - English language learning competition. This approach achieved a silver medal(25th out of 2654 participants).
The goal of this competition is to assess the language proficiency of 8th-12th grade English Language Learners (ELLs). Utilizing a dataset of essays written by ELLs will help to develop proficiency models that better supports all students.
Submissions were scored using MCRMSE, mean columnwise root mean squared error.
This idea is motivated from the paper The Power of Scale for Parameter-Efficient Prompt Tuning. We added 40 continuous tokens before the actual sequence by keeping the model parameters frozen. The model adopted for this was the Deberta-v3-large model.
This idea is motivated from the ULMFit paper. Having different learning rates for different layers helps to generalize better on the downstream task. The main idea is that earlier layers learn more general features while the later layers learn more task-specific features.
Techniques like AWP (Adversarial Weight perturbation), FGM were used to prevent overfitting and improve geenralization capability of the model.
Mixed precision training, layer freezing, gradient accumulation, and gradient checkpointing were implemented to enable faster training and prevent CUDA Out of Memory Errors.