What I should do if I want to train a Japanese Model? #219

ymzlygw · 2021-08-23T07:47:31Z

Hi, my question is that for english, the output of model is directly the index of char If I understand correctly，then it can map between char and sequence. And for japanese, what is the output of model? and how to create map between index and kanji of jp.

ymzlygw · 2021-08-24T07:22:34Z

I see the english_characters , what about japanese? And too get the japanese_characters, token_type using is 'char' or 'bpe'?
ENGLISH_CHARACTERS = [a-z],

nglehuy · 2021-10-10T09:06:35Z

@ymzlygw I think for Japanese, Korean, Chinese we should use subwords instead of characters. If you can define a vocabulary contains all characters of the language like in english then you can use character mode. As far as I know those languages have characters that are a combination of "some characters in alphabet" so I think it's quite a lot for you to define a characters vocabulary file.

psyma · 2022-02-16T13:59:48Z

Hi, I tried to train a Chinese model and it seems not good, I followed the steps in Conformer the same way with English. can have a suggestion on how could I properly train a Chinese model? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What I should do if I want to train a Japanese Model? #219

What I should do if I want to train a Japanese Model? #219

ymzlygw commented Aug 23, 2021

ymzlygw commented Aug 24, 2021 •

edited

Loading

nglehuy commented Oct 10, 2021

psyma commented Feb 16, 2022

What I should do if I want to train a Japanese Model? #219

What I should do if I want to train a Japanese Model? #219

Comments

ymzlygw commented Aug 23, 2021

ymzlygw commented Aug 24, 2021 • edited Loading

nglehuy commented Oct 10, 2021

psyma commented Feb 16, 2022

ymzlygw commented Aug 24, 2021 •

edited

Loading