Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning fogetfulness #163

Open
davidress-ILW opened this issue Jul 28, 2024 · 6 comments
Open

Fine-tuning fogetfulness #163

davidress-ILW opened this issue Jul 28, 2024 · 6 comments

Comments

@davidress-ILW
Copy link

I am working on fine-tuning a model and running into a "forgetful" situation I wanted to bring to your attention.

The 2 changes we made to the finetuning Jupyter notebook are:

  1. create PyCharm Python script
  2. Change output and provide scores

model: urchade/gliner_small
json: sample_data.json
num_steps = 500
batch_size = 8
data_size = 57
num_batches = 7
num_epochs = 7

Before training results:
Cristiano Ronaldo > Person > 0.9846
Ballon d'Or > Award > 0.9413
UEFA Men's Player of the Year Awards > Award > 0.8620
European Golden Shoes > Award > 0.9594

After training, using final model:
Cristiano Ronaldo dos Santos Aveiro > Person > 0.9472
Ballon d'Or awards > Award > 0.8051
UEFA Men's Player of the Year Awards > Award > 0.9852
European Golden Shoes > Award > 0.9863
outfield player > Person > 0.8722

Model retained original entities (although the scores changed) and even predicted a new entity. So I think the finetuning Juypter file works for your sample data just fine.

Our data set is composed of 72 records, which after the 90% split,
there are 64 records in the training set, 8 in the test set. All records
are for a single label, EntC.

num_steps = 500
batch_size = 8
data_size = 64
num_batches = 8
num_epochs = 62

Before training, results are:
EntA > OurLabel > 0.8799
EntA > OurLabel > 0.8288
EntB > OurLabel > 0.7210
EntA > OurLabel > 0.8052
EntA > OurLabel > 0.7026
EntC > OurLabel > 0.5243
EntA > OurLabel > 0.7475

After training, results are:
EntC > OurLabel > 1.0000

The model now finds EntC with a score of 1.000, but it is as if the last model completely forgot all other entities except EntC.
Any thoughts as to why the forgetfulness could be happening?

While I cannot disclose the entity names or label, I can say that all entities are three-characters long.

Any suggestions are appreciated, thank you.

@urchade
Copy link
Owner

urchade commented Aug 3, 2024

Hi @davidress-ILW

It seems like your model is experiencing catastrophic forgetting, where it heavily overfits to the new data (EntC) and forgets the previous entities. This is a common issue in continual learning and fine-tuning scenarios.

To mitigate this problem, you can use Experience Replay. This involves maintaining a buffer of orginal data (in this case the pile ner dataset) and periodically use these samples during training. By doing this, you can ensure that the model retains knowledge of the previously learned entities while learning new ones.

@KUMBLE
Copy link

KUMBLE commented Aug 6, 2024

Adding pile ner data with my training data fixed this issue.

@urchade What is the best ratio of mixing pile ner data set with our training data set?

pile ner has 45K+ entries my training data has only 200+ entries.

@davidress-ILW
Copy link
Author

davidress-ILW commented Aug 7, 2024

@urchade Thank you. I really appreciate you sharing your knowledge with me and the broader community by answering these questions. I found the pile NER data so as @KUMBLE mentioned, is there is preferred means of mixing the pile NER data with out custom data sets?

The software you have developed, GLiNER, GraphER, etc are simply fabulous.

@urchade
Copy link
Owner

urchade commented Aug 7, 2024

Hi @davidress-ILW. You can try this. Let:

  1. Sample A: Your new dataset
  2. Sample B: Sampled dataset from pile-ner (eg.: 2x size of Sample A)

Then, mix Sample A and Sample B to create a new data for training. (optionally) Use another Sample B after each epoch .

@davidress-ILW
Copy link
Author

Hello @urchade

Thank you for the reply on mixing training data with ner pile data.

For my testing, I found Sample B needed to be 5x the size of Sample A

I then mixed Sample A and Sample B (shuffled) to randomize the data.

I say 5x as that ration enabled GLiNER to predict everything found before fine-tuning at high scores, entities that were missed, with the "best" model found during the fine-tuning. So, the fine-tuning appeared to work.

However, I notices that the eval_loss metric was always between 220 and 270 (regardless of mix, ie, 2x, 3x, 4x, and 5x), which I do not understand. Is there a way to extract all the training metrics from a fine-tuning? Should I be concerned about the high eval_loss values?

Thank you again for the efforts you and your team have put into GLiNER. So much easier to fine-tune than other NER models. I also appreciate the support.

@urchade
Copy link
Owner

urchade commented Aug 28, 2024

Hi @davidress-ILW,
I recommend focusing more on metrics like the F1-score rather than relying heavily on the loss metric. The loss value is influenced by several factors, and a value of 200 might be close to the lower bound, especially since the loss reduction is set to sum by default. Additionally, the number of spans in an input is L*K, where L represents the length and K is the maximum span size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants