Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect nom refs better #44

Open
mdoering opened this issue Aug 6, 2019 · 0 comments
Open

detect nom refs better #44

mdoering opened this issue Aug 6, 2019 · 0 comments

Comments

@mdoering
Copy link
Member

mdoering commented Aug 6, 2019

We detect nomenclatural references inside or better at the end of a name by looking either for common keywords like Journal or by spotting a numbering block for volumes/pages like 8(5): 563.

We can improve this by actually looking for known journals, thereby also removing arbitrary middle titles with a procedure Guido uses for years:

Regarding a list of journal names: Maybe it's possible to extract one from all the DwC-As? The journal names should mostly be right before what I've come to call numbering block, and given a sufficiently hight number of references, that might yield a pretty extensive list. Maybe you might end up cutting a few short or including a tailing chunk of the title, but it's a good starting point. From there, extraction of common phrases from the raw journal names should help getting rid of the title chunks. With that, you can then revisit the reference list and see if you find more.

Abbreviations handle nicely if you extract the sequence of initial capital letters (e.g. "Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen" would become "PKNAW") of the name parts and index all the journal names by that, in buckets. Then you can go through a matching bucket and see if your input phrase matches the starts of all words of some journal name in there, in ordered sequence, of course. That would match "Proc. Koninkl. Nederl. Akad. Wetensch.", for instance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant