Skip to content

Releases: ikawaha/kagome

Refactoring

05 Jan 08:15
Compare
Choose a tag to compare
  • Fix to stabilize the serialization of the index table and the unk index table
  • Rebuild dictionaries
  • Drop support of go 1.6 and 1.7

Add simple dict mode

07 Sep 23:08
Compare
Choose a tag to compare
  • Add simple dict mode to save memory usage
  • Remove unsafe
  • Fix golint

Using simple dict. mode, the analysis result does not change.
But the output contents (活用型, 活用形, 基本形, 読み, 発音) are omitted.

memory usage:

dict old full simple
IPA 193MB 175MB 132MB
UNI 872MB 795MB 590MB

Full Dict.

BOS
寿司    名詞,一般,*,*,*,*,寿司,スシ,スシ
が      助詞,格助詞,一般,*,*,*,が,ガ,ガ
食べ    動詞,自立,*,*,一段,連用形,食べる,タベ,タベ
たい    助動詞,*,*,*,特殊・タイ,基本形,たい,タイ,タイ
。      記号,句点,*,*,*,*,。,。,。
EOS

Simple Dict.

BOS
寿司    名詞,一般,*,*,*,*
が      助詞,格助詞,一般,*,*,*
食べ    動詞,自立,*,*,一段,連用形
たい    助動詞,*,*,*,特殊・タイ,基本形
。      記号,句点,*,*,*,*
EOS

Bugfix

17 May 02:15
Compare
Choose a tag to compare

Fix infinite loop bug when tokenizing invalid (non-utf8) input.

Performance tweak

14 Sep 02:52
Compare
Choose a tag to compare
Merge pull request #91 from ikawaha/develop

Performance tweak

Add the appengine build tag

13 Sep 15:03
Compare
Choose a tag to compare
Merge pull request #89 from ikawaha/develop

Add the appengine build tag and some cosmetic change

Bugfix

28 Jul 03:45
Compare
Choose a tag to compare
Fix a bug (#84)

* Fix a bug of file handler closing (#82)

* Update (#83)

Reduce space_alloc

30 Mar 13:48
Compare
Choose a tag to compare
v1.5.1

Update

Add user dictionary builder

20 Mar 05:51
Compare
Choose a tag to compare

example:

form io.Reader

        s := `
日本経済新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
# 関西国際空港,関西 国際 空港,カンサイ コクサイ クウコウ,カスタム地名
朝青龍,朝青龍,アサショウリュウ,カスタム人名
`
        r := strings.NewReader(s)
        rec, err := NewUserDicRecords(r)
        if err != nil {
                t.Fatalf("user dic build error, %v", err)
        }
        udic, err := rec.NewUserDic()

from go struct

        r := UserDicRecords{
                {
                        Text:   "日本経済新聞",
                        Tokens: []string{"日本", "経済", "新聞"},
                        Yomi:   []string{"ニホン", "ケイザイ", "シンブン"},
                        Pos:    "カスタム名詞",
                },
                {
                        Text:   "朝青龍",
                        Tokens: []string{"朝青龍"},
                        Yomi:   []string{"アサショウリュウ"},
                        Pos:    "カスタム人名",
                },
        }
        udic, err := r.NewUserDic()

from JSON

        var rec UserDicRecords
        json.Unmarshal([]byte(`[
        {
            "text":"日本経済新聞",
            "tokens":["日本","経済","新聞"],
            "yomi":["ニホン","ケイザイ","シンブン"],
            "pos":"カスタム名詞"
        },
        {
            "text":"朝青龍",
            "tokens":["朝青龍"],
            "yomi":["アサショウリュウ"],
            "pos":"カスタム人名"
        }]`), &rec)
        udic, err := rec.NewUserDic()

Add a function to get a part of speech tag

17 Mar 12:44
Compare
Choose a tag to compare
Merge pull request #71 from ikawaha/feature/token_features_20160317

Add a function to get a part-of-speech tag

UniDic support

12 Jan 02:21
Compare
Choose a tag to compare
Merge pull request #49 from ikawaha/develop

Support UniDic