-
Notifications
You must be signed in to change notification settings - Fork 7
Textmining literature resources, section patterns
Patterns are as generic as possible without detecting non-target text. There is considerable variation in the details of each pattern among resources and among documents within some resources.
newline newline [species binomial detected in a line of six words or less*] newline newline [target text]
*excluding common addenda: "new status" "nov. sp." "new combination" "Figures" "Figs." alphanumeric IDs (3 digits or less, including a numeral, separated by commas or -)
These may represent items that appear near the end of a species section but contain potentially misinterpretable text (eg: Remarks section) or an item that is not part of a species section.
newline AFFINITIES.—
newline NOTE.—
newline DISCUSSION.—
newline REMARKS.—
newline DISTRIBUTION.—
newline VARIATION.—
newline DISTRIBUTION AND GEOGRAPHIC VARIATION.—
newline MATERIAL EXAMINED.—
newline MATERIAL.—
newline ADULT.—
newline newline Acknowledgments newline
newline newline Intermediates newline
newline newline Literature Cited newline
newline newline Bibliography newline
newline newline Dubious Binomials newline
newline newline Excluded Species newline
newline newline Descriptive Biogeography newline
newline newline Appendix
newline newline Figures newline
newline newline Table
newline newline General Conclusions newline
newline newline Nomenclatorial Considerations.
newline newline [any combination of rank name and/or taxon name w/rank above species, and 0-3 additional words and/or 0-3 numerals] newline
newline newline [12 words or less beginning with "Key"] newline species section append pattern These are species description subsections that often follow a stop pattern but should still be included in the species section
newline newline Header [12 words or less, including List, and at least one of taxon name, vernacular name, habitat term and/or geographic term] newline [lots of non-target text] newline
newline newline Header [12 words or less, including species, and at least one of taxon name, vernacular name, habitat term and/or geographic term] newline
[NO HEADER] (list patterns detected without a header may still be supported. The trick will be distinguishing them from synonymies)
Up to 15 lines of non-target text may separate the list header from the list body
Line [including species name] newline Line [including species name] newline
Line [including species name and geographic and/or habitat terms] newline Line [including species name and geographic and/or habitat terms] newline
Line [including genus name] newline Line [including species epithet and geographic term] newline Line [including species epithet and geographic term] newline
Line [including species name] newline Line [including geographic and/or habitat term] newline [additional text, up to 4 newlines] Line [including species name] newline Line [including geographic and/or habitat term] newline [additional text, up to 4 newlines]
Line [including species name] newline 1 or 2 Lines [including geographic and habitat terms] newline Line [including species name] newline 1 or 2 Lines [including geographic and habitat terms] newline
Line [including genus name] newline Line [including full species name] newline Line [including species epithet and geographic term] newline Line [including species epithet and geographic term] newline
Single or back-to-back pairs of subheaders may interrupt a list pattern if the same list pattern appears before and after each subheader or pair of subheaders.
newline Subheader [up to 8 words including higher taxon name] newline