Point of View: Syst. Biol. 69 (6) :1231-1253, 2020
Point of View: Syst. Biol. 69 (6) :1231-1253, 2020
Point of View
            Syst. Biol. 69(6):1231–1253, 2020
            © The authors 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
            (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
            For commercial re-use, please contact journals.permissions@oup.com
            DOI:10.1093/sysbio/syaa026
            Advance Access publication April 16, 2020
             Helmholtz Center for Polar- and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany; 11 Department of Systematic Botany, Justus Liebig
                 University Gießen, Heinrich-Buff Ring 38, 35392 Giessen, Germany; 12 Department of Scientific Infrastructure, Centrum für Naturkunde (CeNak),
             Universität Hamburg, Martin-Luther-King-Platz 3, 20146 Hamburg, Germany; 13 GFBio - Gesellschaft für Biologische Daten e.V., c/o Research II, Campus
                Ring 1, 28759 Bremen, Germany; 14 Biodata Mining Group, Center of Biotechnology (CeBiTec), Bielefeld University, PO Box 100131, 33501 Bielefeld,
              Germany; 15 Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Senckenberganlage
                25, 60325 Frankfurt/Main, Germany; 16 Zooplankton Research Group, DZMB – Senckenberg am Meer, Martin-Luther-King Platz 3, 20146 Hamburg,
               Germany; 17 Department of Experimental Phycology and Culture Collection of Algae, University Göttingen, Nikolausberger-Weg 18, 37073 Göttingen,
             Germany; 18 Department Microbial Drugs, Helmholtz Centre for Infection Research (HZI), and German Centre for Infection Research (DZIF), Partner Site
              Hannover-Braunschweig, Inhoffenstrasse 7, 38124 Braunschweig, Germany; 19 Department of Animal Ecology and Systematics, Justus Liebig University
                  Gießen, Heinrich-Buff Ring 26, 35392 Giessen, Germany; and 20 Department of Evolutionary Biology, Zoological Institute, Technische Universität
                                                         Braunschweig, Mendelssohnstraße 4, 38106 Braunschweig, Germany
                   ∗ Correspondence to be sent to: Systematic Botany and Mycology, University of Munich (LMU), Menzingerstraße 67, 80638 Munich, Germany;
                                                                              E-mail: renner@lmu.de
                                                   Received 13 November 2019; reviews returned 20 February 2020; accepted 24 March 2020
                                                                             Associate Editor: Matt Friedman
                         Abstract.—Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata,
                         DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards
                         safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of
                         naming 15,000–20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties
                         and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We
                         surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet
                         comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published
                         in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%),
                         but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with
                         photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or
                         3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome
                         assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-
                         taxonomy. Because long-term—ideally perpetual—data storage is of particular importance for taxonomy, energy footprint
                         reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic
                         studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining
                         to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data,
                         including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach—
                         linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for
                         taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-
                         centered concept and quantitative challenges to host and connect an estimated ≤2 million images produced per year by
                         alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000–40,000 taxonomists
                         globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires
                         low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to
                         identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections
                         of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic
                         data.]
             Taxonomy, the science of documenting, naming, clas-                                          of direct relevance for documenting and understanding
             sifying, and understanding the diversity of life on                                          biodiversity dynamics in the face of global change. Since
             Earth (Simpson 1961; Small 1989; Stuessy et al. 2014),                                       the current system of binomial scientific names was
             is deeply embedded in evolutionary biology. It also is                                       introduced by Linnaeus (1753, 1758), taxonomists have
1231
            named about 1.8 million species (Roskov et al. 2019),                   journals lack mechanisms (and funds) for the mainten-
            and an unknown but undoubtedly vast number of                           ance of online supplementary documents with original
            species remain unnamed (Wheeler 2007; Mora et al.                       specimen-based data, and specialized taxonomic data
            2011; Fontaine et al. 2012; Costello et al. 2013a,b; Locey              repositories are largely lacking, as we will show below.
            and Lennon 2016; Larsen et al. 2017). With an estimated                    The importance of the availability, connectivity, and
            global holding of 3 billion biological specimens in                     management of data in taxonomy is obvious (Gemein-
            collections (Brooke 2000) and some 15,000–20,000 species                holzer et al. 2020) and is reflected in concepts of
            descriptions per year (IISE 2011: numbers for 2006 and                  cybertaxonomy (Pyle et al. 2008; Winterton 2009; LaSalle
            2007 are 16,969 and 18,516, respectively; this study:                   et al. 2009; Padial et al. 2010; Balke et al. 2013; Favret
            Fig. 1) taxonomy clearly qualifies as big data science                   2014; Rosenberg 2014; Stackebrandt and Smith 2019).
            by fulfilling the main criteria of volume, variety, and                  As claimed by Bik (2017), if we play our cards right,
            velocity (De Mauro et al. 2016). Still, initiatives to                  taxonomy could be on the brink of another golden
            implement cybertaxonomic approaches in taxonomic                        age. Driven by the need to comprehensively explore
            publishing (Smith et al. 2013; Penev et al. 2018) have                  and document Earth’s species (Wheeler et al. 2012a),
            not been widely adopted, and, most importantly, the                     big advances are being made in building cybertaxo-
            rate of new species naming has failed to increase,                      nomic infrastructures, especially by digitally mobilizing
                                                                                    metadata and images of voucher specimens in biological
            despite the rise of ever more efficient computational
                                                                                    collections as well as literature, by increasingly regis-
            and DNA sequencing tools available. One reason is
                                                                                    tering nomenclatural acts online (Krell 2015), and by
            that the basic species diagnosis and description pro-                   building curated databases of species names, diagnoses,
            cedure has remained unchanged (Fig. 1, original data                    and descriptions (Crous et al. 2004; Patterson et al. 2010;
            in Supplementary Appendix S1 available on Dryad at                      Webster 2017). At the moment, for instance, 172 taxonomic
            http://dx.doi.org/10.5061/dryad.fj6q573qd).                             databases are contributing to the Catalogue of Life (Roskov
               Naming a new species not only involves gathering                     et al. 2019).
            images, measurements, and molecular sequences for                          Here, we review the data repositories currently avail-
            a few reference specimens but also a comprehensive                      able for taxonomic data and describe how improved
            comparative study to distinguish the new from the                       data management could contribute to improving the
            already known. Little effort has been directed toward
                                                                                    inventory of life on Earth. We focus on alpha-taxonomy
            harvesting the massive amount of original data that is
                                                                                    the purpose of which is to establish an inventory
            being generated in the species naming process, and it
            is therefore often not safeguarded in repositories. As                  of the past and present species on Earth, combining
            in other fields of evolutionary biology, nonmolecular                    (i) a fundamental component, grounded in evolution-
            archived data are often incomplete or insufficiently                     ary biology, which consists of specimen-based species
            standardized, and therefore not available for reuse                     delimitation and (ii) an applied component, which
            (Roche et al. 2015). Furthermore, many taxonomic                        consists of providing a universal communication system
                                                                                    to unambiguously communicate about biodiversity. This
                                                                                    is achieved via the assignment of a two-part name in
                                                                                    Latin ruled by taxon-specific codes of nomenclatures,
                                                                                    all of which require (i) designating type material from
                                                                                    a collection and (ii) a diagnosis that sets the new
                                                                                    taxon apart from the most similar already named taxa.
                                                                                    Descriptions are not mandatory in any of the five codes
                                                                                    for the simple reason that Linnaeus did not use them,
                                                                                    relying instead on concise diagnoses (Renner 2016).
                                                                                       Data for fundamental research in alpha-taxonomy
                                                                                    of eukaryotes necessarily are specimen-based. They
                                                                                    are therefore not covered in species-based taxonomic
                                                                                    databases that store information on diagnostic features,
                                                                                    synonymy, distribution, phylogeny, traits, or natural
                                                                                    history of species. The original analyses carried out for
                                                                                    this study show that many established data repositories
                                                                                    do not meet the requirements of taxonomists for data
                                                                                    submission, retrieval, searchability, and reuse.
              FIGURE 1. Trends over time in taxonomic output (new species named
            per year) compared to number of academic publications, computing
            power, and DNA sequencing capacity. Numbers of new species were
            compiled from the Index to Organism Names (organismnames.com),
            the International Plant Name Index (ipni.org), and MycoBank (myco-
                                                                                      PROPERTIES AND DIVERSITY OF ALPHA-TAXONOMIC DATA
            bank.org); scientific knowledge is represented as number of academic       Historically, taxonomy was based on an essentialist
            publications compiled from Scopus (scopus.com); computing power
            is the number of transistors on silicon chips (Moore’s law; data from
                                                                                    concept, with members of a species assumed to share an
            Rupp 2018); DNA sequencing capacity is the number of Mbp that           essence setting them apart from other species. Today, tax-
            can be sequenced per 1000 US$. All data presented as 2-year averages    onomy is embedded in evolutionary biology, and species
            (Original data in Supplementary Appendix S1 available on Dryad).        are seen as inferred population-level evolutionary lin-
             eages (Mayden 1997; de Queiroz 1998, 2007; Padial et al.      they are destructively sampled for carbon-14 dating,
             2010). This change of paradigm, however, did not change       scanning electron microscopy, or DNA isolation, some
             how other biological disciplines, and most end users          authors have pushed for the introduction of digital
             of taxonomies, tend to conceive and utilize taxonomic         type specimens or cybertypes (e.g., Godfray 2007). Such
             species hypotheses: individual organisms are examined         cybertypes would be a complement (not a substitute)
             and their traits are considered as representative for the     to physical types deposited in collections. Represent-
             nominal species to which they were assigned by the            ing visual type information online is becoming more
             most recent taxonomist to label or otherwise “identify”       widespread (Bosselaers et al. 2010; Wheeler et al.
             the organism in question (Supplementary Appendix              2012b; Faulwetter et al. 2013; Akkari et al. 2015; Scherz
             S2 available on Dryad). This implies that databases           et al. 2016a,b). Wheeler et al. (2012b) suggested that
                                                                           a cybertype should minimally comprise a photo of
             for end users of taxonomy, in science, and society,
                                                                           the holotype and ideally additional photos of the
             will be centered on species names: traits, geographic
                                                                           organism in life, as well as detailed photos of important
             ranges, taxonomy, phylogeny, diagnoses, images, or            diagnostic characters. The cybertypes of Faulwetter et al.
             DNA sequences will primarily be labeled with and              (2013) and Akkari et al. (2015), for example, include
             retrieved via scientific names and conceptualized as           microCT scans with iodine, also known as diffusible
             representing the respective species in other research,        iodine-based contrast-enhanced computed tomography
             identification tools, laws, and conservation assessments.      (diceCT), which were used to create 3D digital models
                The alpha-taxonomic workflow itself, that is, the           of the external and internal morphology of specimens
             elaboration of species hypotheses, follows a different        without permanently damaging them (Gignac et al.
             approach. Ideally, multiple individuals are studied to        2016). Such cyberspecimens (Favret 2014), could be expan-
             infer “sufficiently” divergent, evolutionarily independ-       ded by nonvisual characteristics (e.g. DNA sequences
             ent population-level lineages, and based on this evalu-       or sound recordings, Fig. 2). Standards for digital
             ation, they are assigned species rank. The species is thus    representations of specimens are so far lacking, but it is
             not the basic unit of research, but instead the endpoint      obvious that the cyberspecimen concept, also referred
             and result of a study (Supplementary Appendix S2              as extended specimens (Cicero et al. 2017; Lendemer
             available on Dryad). Independent of the species concept       et al. 2020), implies digital publication of extensive and
             and species criteria used, alpha-taxonomic research is        diverse data packages connected via unique specimen
             centered on individual organisms in order to assess           identifiers.
             variation and so are the data produced during this               The data that are generated in taxonomic research—
             research activity.                                            and that would make up a cyberspecimen—are
                The unit studied by alpha-taxonomists typically is a       extremely diverse, depending on the organisms studied
             specimen—either an individual organism, or in the case        and the methods used (Table 1). They comprise both
             of paleontology, part thereof, or a cultured isolate com-     metadata and taxonomic data, and in a data manage-
             posed of multiple, often clonal individuals. Of particular    ment context it is crucial to conceptually distinguish
             importance are name-bearing type specimens, which             these two categories (Fig. 3). Metadata come in different
             constitute anchors for assigning a scientific name to a        categories (Riley 2004): in alpha-taxonomy, specimen
             species. Almost universally, these are physical objects       metadata characterize a specimen as a collection item, and
             (preserved organisms or their parts, metabolically inact-     contextualize it (Leonelli 2014) by providing information
             ive strains, or living, viable cultures) as recommended by    on taxonomic assignment (species name, supraspecific
             all five codes of nomenclature (Amorim et al. 2016; Santos     ranks), type status, spatial, and temporal origin (collec-
                                                                           tion date and location), and other technical and historical
             et al. 2016; Renner 2016), although where type specimens
                                                                           characteristics (collector name, preservation modality
             are declared lost, images can be used. Fierce disputes
                                                                           or storage coordinates including institution, collection,
             revolve around the option of basing new scientific nom-
                                                                           individual identifier). In contrast, taxonomic data are
             ina on photographs, videos, or DNA sequences alone
                                                                           those that characterize the specimen as a biological entity.
             (Ceriaco et al. 2016; Thorpe 2017; Krell and Marshall 2017;   They represent different kinds of raw or encoded data
             Garraffoni and Freitas 2017). In mycology, proposals have     intended to capture or to describe biological charac-
             been put forward to allow DNA sequences alone, even           teristics, such as morphological, anatomical, molecular,
             environmental DNA sequences, as a basis for naming            or behavioral traits. They are most often generated a
             new species (Hawksworth et al. 2016) but the majority of      posteriori in the framework of research that includes
             mycologists are presently reluctant to accept voucherless     the specimen, but they can also be generated in situ
             species-level taxa to be validly erected (May et al.          during the collection of the specimen, anticipating future
             2018), also because many comparative DNA sequences            investigations (for instance, pictures taken in the field to
             available from repositories are insufficiently linked to       document coloration in life).
             permanently preserved specimens (Hongsanan et al.                Data on a specimen comprise both raw data and
             2018; Zamora et al. 2018).                                    processed, selected, and encoded data (Fig. 4; Table 2).
                Because physical specimens in collections are not          Specimen metadata can also become important raw
             always accessible and deteriorate as they age or as           material for taxonomic research, for example, when
               FIGURE 2.     Schematic representation of a cyberspecimen: a virtual representation of a physical specimen to which the cyberspecimen is
            linked by a unique identifier. Primarily, the cyberspecimen consists of a high-resolution digital representation, ideally a 3D image obtained
            for example, via microCT-scanning or photogrammetry. The cyberspecimen additionally contains all other digital data obtained specifically
            from the specimen, including photographs of the specimen “in life,” morphometric data, genetic (genomic) sequences, sound files, or chemical
            profiles, all linked by the specimen identifier to the physical/true specimen.
            geographical coordinates are used to model and distin-                    The specimen identifier approach still has to overcome
            guish environmental niches (e.g. Rissler and Apodaca                   multiple practical problems due to ambiguities in defin-
            2007; Cicero et al. 2017), when time of collection is                  ing what a specimen is (Supplementary Appendix S3
            used to characterize phenology, migration behavior,                    available on Dryad). For instance, in most insect collec-
            or invasiveness (Chauvel et al. 2006; Miller-Rushing                   tions, specimens—individual insects in the collection—
            et al. 2006; Grass et al. 2014; Lorieul et al. 2019), or               have no identifying number and usually also lack a
            to determine the applicable regulations for access and                 catalog that could provide an inventory of specimens.
            benefit sharing, which depend on the time when a                        Even type specimens may lack individual specimen
            specimen was acquired by a collection.                                 identifiers (e.g. Zompro 2005). This is a massive imped-
               The heterogeneous taxonomic data themselves also                    iment considering an overall estimated 500 million
            need to be described with contextual or methodological                 preserved insect specimens in collections (Short et al.
            information. This might include the device, methods,                   2018). For most of these specimens, the associated
            and conditions used for photographic, tomographic,                     metadata are pinned on small labels underneath the
            or sound recording (Roch et al. 2016; Köhler et al.                    specimen and therefore cannot be scanned without
            2017), laboratory methods for histological staining or                 labor-intensive unpinning of every specimen. If several
            molecular sequencing, or even the sociological context                 specimens have been collected at the same location and
            of the data collection (McClellan 2019)—these constitute               time, their metadata will be identical, and distinguishing
            what one might consider “metadata of taxonomic data”                   among these specimens is impossible from the metadata.
            (not the same as specimen metadata). Ideally, these                    While it is possible to consider these and other bulk
            data and metadata must all be accommodated in the                      samples as one specimen, problems arise if data (DNA
            archiving process. On the one hand, these intricate                    sequences, images, measurements) refer to only one of
            requirements suggest that a distributed system of spe-                 the individuals included in the bulk, and problems are
            cialized repositories for specific kinds of taxonomic                   exacerbated if the bulk is found to contain individuals of
            data would be the best approach. On the other hand,                    different characteristics or even species (see also Nelson
            it is preferable to adjust the existing infrastructure of              et al. 2018).
            established repositories rather than create new ones and                  Many natural history collections are currently digit-
            to streamline the submission process of diverse data                   izing their specimens. For instance, 91% of the 5.5
            via user-friendly submission portals. The key lies in                  million plant specimens deposited in the world’s largest
            linking the data to a single specimen for which a specimen             herbarium (MNHN in Paris) have been photographed at
            identifier will be required (Güntsch et al. 2018).                      high resolution and made available online in less than a
                                                TABLE 1. Data types used and/or produced in the context of taxonomy, currently or potentially in the future, their predicted storage requirements and main issues to be solved to allow
                                             their efficient storage and reuse
                                                                           Current use in               Potential and                        Storage                        Established                      Issues and gaps
                                                                            alpha-                        prospective use in                   requirements                   specialized
                                              stack of .tiff,               Increasing use               morphometrics, key                   mesh vs. raw                                                    commentary by
                                              polygon mesh                  in invertebrates.            method in                            data) and level of                                              Hipsley and Sherratt
                                              such as .ply,                                              cyberspecimen                        resolution                                                      (2019).
                                              .bend, .obj)                                               approaches
                                             DNA sequences                 Regularly used               DNA barcodes will                    Very low to low                Yes, several very well           Sequences deposited in
                                              (Sanger) (e.g.,               for most                     continue to drive                     depending on the               established ones                 databases are not
                                                                                                                                                                                                                                          MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA
                                                                                                                                                                                                                                                                                              Systematic Biology
Page: 1235
1231–1253
                                                                                                                                                                                                 1236
                                                                                                                                                                                                                      Copyedited by: YS
                                               TABLE 1.      (Continued)
                                                                           Current use in       Potential and             Storage            Established              Issues and gaps
                                                                                                                                                                       DNA-based
                                                                                                                                                                       assessments of
                                                                                                                                                                       distribution of taxa;
                                                                                                                                                                                                                      MANUSCRIPT CATEGORY:
                                                                                                                                                                       targeted and
                                                                                                                                                                       searchable repositories
                                                                                                                                                                       do not exist (GenBank
                                                                                                                                                                       does not accept
                                                                                                                                                                       sequences <200 bp)
                                             Bait capture—raw              Not used             Only usable after         High               Yes (e.g., Sequence      No issues
                                                                                                                                                                                                                      Systematic Biology
Page: 1236
1231–1253
                                                                                                                                                                                                2020
                                                                                                                                                                                                                                                    Copyedited by: YS
Page: 1237
1231–1253
                                                                                                                                                                                                                                                1238
                                                                                                                                                                                                                                                                     Copyedited by: YS
                                             2D geometric                   Very rarely used              Increasingly used for                 Very low to low                 Yes                               No issues
                                              morphometric                                                  resolving species
                                                                                                                                                                                                                                                                     MANUSCRIPT CATEGORY:
                                                                                                            microCT scans
                                             Note: Note that the second column specifically focuses on the current use of data types in alpha-taxonomic studies (mostly based on our survey reported below), not other taxonomy-related activ-
                                             ities such as species identification or phylogenetics. Storage capacity required per specimen: very low (<0.1 MB), low (0.1–1 MB), moderate (1–10 MB), high (10–100 MB), and very high (>100 MB).
                                                                                                                                                                                                                                                VOL. 69
Page: 1238
1231–1253
         Copyedited by: YS                                                     MANUSCRIPT CATEGORY:   Systematic Biology
               FIGURE 3. Two categories of data linked to a specimen: metadata and taxonomic data. While specimen metadata from museum catalogs are
             increasingly made digitally available, the scarceness of specialized specimen-based data repositories adapted to the wide range of taxonomic
             data types is a limitation for the development of digital taxonomy. Additionally, “metadata of taxonomic data” (not shown) are associated with
             the taxonomic data (e.g., device used, methodology, author name, and date of the measurement).
                FIGURE 4. Overview of data types, transformations, and specification of information in the process of specimen-based alpha-taxonomic
             research. Paleontological samples can be considered to be already “fixed” for the purposes of this graphic, by the process of fossilization.
                                             Description     Unique specimen data: Data or set of data concerning a single specimen. Most            Multiple-specimens data sets: Set of data concerning particular trait(s) measured
                                                               often it consists of raw taxonomic data (see above).                                    for several specimens. Most often it consists of encoded data.
                                             Strengths       Specific (individual) searches are easy.                                                 Submission of large data set at once.
                                             Weaknesses      Case by case treatment unrealistic where specimen numbers are >> 102                    Stringent search and data extraction might be compromised by an inadequate data
                                                                                                                                                                                                                                                                      Systematic Biology
                                                                                                                                                       archiving process.
                                             Examples        Picture(s) of a specimen, complete mitogenome sequence of a given individual.           Tabular data (.csv), DNA alignment (.fasta, .nex).
                                                                                                                                                                                                                                                 VOL. 69
Page: 1240
1231–1253
         Copyedited by: YS                                            MANUSCRIPT CATEGORY:   Systematic Biology
             decade (Le Bras et al. 2017, constantly updated online at      (Supplementary Appendix S5 available on Dryad). The
             https://edition-humboldt.de), although so far only 16%         average publication named 1–2 (fungi, plants, protists,
             have field-collecting information (label data) associated       vertebrates) or 3–4 (insects and other invertebrates) new
             with them. Important efforts are also being made on            species (Fig. 7; original data in Supplementary Appendix
             several entomology collections (specimen images and            S6 available on Dryad).
             metadata; e.g., Dietrich et al. 2012). So far, however, only     In this survey, we more restrictively considered the use
             an estimated 2% have been digitized (Short et al. 2018).       of a certain kind of data only if it was explicitly part of
             To allow taxonomists to efficiently access, use and reuse       the arguments supporting a taxonomic change (usually
             these data, individual specimen identifiers are essential       the description of a new species). The use of molecular
             (Page 2016; Güntsch et al. 2018), and consequently,            evidence, newly generated or from other sources, was
             priority efforts are usually directed towards providing
                                                                            similar to our Web of Science survey (Fig. 6). In the
             specimen identifiers to type specimens and accordingly
                                                                            specialized taxonomic journals (4113 studies), molecular
             adding labels to the physical types in the collection.
             Surprisingly, the International Code of Zoological Nomen-      data were widely used in mycology, but much less so
             clature (Anonymous 1999) does not require individual           in botany and zoology (Supplementary Appendix S7
             identifiers for type specimens.                                 available on Dryad): in 2018 papers, DNA sequence
                                                                            analysis was used in 94% of taxonomic studies of fungi,
                                                                            53% of vertebrates, 15% of plants, and 10% and 14%
             QUANTIFYING THE KINDS OF DATA USED AND PRODUCED IN             of insects and other invertebrates (Fig. 7). Surprisingly,
                              ALPHA-TAXONOMY                                even in works on protists, which are difficult to identify
                                                                            morphologically, genetic evidence was used in only 29%
                To understand which repositories and storage capa-
                                                                            of the 66 surveyed papers, although our Web of Science
             cities are needed for taxonomic data we quantitatively
                                                                            survey suggested otherwise (Fig. 6). Even the frequent
             assessed the number of alpha-taxonomic studies and the
             kinds of data produced in them. An updated summary             DNA use in mycology suggested by our survey may be an
             of numbers of studies naming new insects, plants,              overestimate because many fungi are described in other
             mollusks, fungi, and vertebrates from 1950 to 2016 (Fig. 5)    specialized journals not surveyed here, mostly without
             illustrated a noticeable increase after 1966 for insects,      molecular data. Comparing papers from 2002, 2010, and
             with >8000 new species named per year, while in plants,        2018, an increase in the use of molecular evidence is
             a peak was apparent in the 1980s. Species discovery and        apparent for all organismal groups (Fig. 7).
             naming in fungi has been undergoing a striking increase          Photographic images were used in >80% of the
             since 2010 (see also Cannon et al. 2018), whereas for          papers in all categories in 2018, whereas other sets of
             vertebrates numbers have risen more continuously.              data (extensive morphometric data sets or 3D-imagery)
                Molecular data are at the core of a modern, integrative     were only rarely used and almost restricted to studies
             taxonomy (Padial et al. 2010). To assess their impact,         on vertebrates. Specifically, in the entire set of 4113
             we undertook a systematic search in Web of Science             papers, only 17 studies used microCT-scanning, 2 used
             using a combination of search terms to detect alpha-           synchrotron-based visualization, 6 used other kinds of
             taxonomic studies referring to molecular data during           3D-visualization, 14 used X-ray images, and 1 used
             the years 1990–2018 (search terms: molecular, DNA,             videos. Besides macroscopic photos, microscopy and
             gene; details in Supplementary Appendix S4 available           microscopy-produced images were used frequently: 670
             on Dryad). The results confirm a raise in the explicit          (16%) studies used electron microscopy (SEM or TEM)
             use of molecular evidence across all groups (Fig. 6).          and 709 (17%) used light microscopy. Classical drawings
             Mycologists and protistologists mention molecular data         were part of 2371 (58%) of the 4113 studies.
             in >75% of their taxonomic studies in 2018, whereas              Genome-scale data sets (e.g., RADseq, Sequence cap-
             this was the case for only 33% of insect and 26% of            ture, RNAseq, full genomes) in 2018 were only used in
             plant studies. Such an increasing use of DNA sequences         one paper in mycology (a draft genome), and not at all in
             in taxonomy likely reflects a growing tendency to take          zoology or in botany. Similarly, metabolomics data were
             evolutionary concepts into account during the species          rare in the surveyed papers in 2018: one publication using
             delimitation process, even if only implicitly.                 NIR spectra in entomology, one using NMR spectra
                We next undertook a survey of 4178 alpha-taxonomic          in mycology, and one using peptide fingerprints in
             studies (published in 2002, 2010, and 2018) that involved      vertebrate zoology.
             scientific naming of species. Each of these was manually          Several other kinds of molecular data were used in
             screened, and kinds of data used in the respective             a moderate proportion of the 4113 papers: cytological
             study were tabulated, along with a series of metadata          techniques from cell descriptions to flow-cytometric
             for each paper. We surveyed the taxonomic journals             determination of ploidy and genome size (n = 329),
             Phytotaxa, Zootaxa, Systematic Botany, and Mycological         karyotypes (n = 34), fragment analysis (microsatellites,
             Progress, and six generalist journals with higher-impact       AFLP, RFLP, n = 10), allozymes (n = 3), and chemo-
             factors (Nature, Science, PNAS, PLoS One, Scientific            taxonomic approaches including analysis of cuticular
             Reports, and the Biological, Botanical and Zoological          hormones or metabolites (n = 17) and GC-MS or HPLC
             Journal of the Linnean Society, for alpha-taxonomic studies    metabolite profiles (n = 4).
                 FIGURE 5.     Species named per year for the study period. Insects and mollusks (ION—organismnames.com), fungi (MycoBank—
            mycobank.org), plants (IPNI—ipni.org), vertebrates (compiled from Eschmeyer’s Catalog of Fishes, Amphibian Species of the World:
            Frost 2019, Reptile Database, Howard, and Moore Bird Checklist: Christidis 2018, Mammal Diversity Database; all accessed in March
            2019: calacademy.org/scientists/projects/eschmeyers-catalog-of-fishes, research.amnh.org/vz/herpetology/amphibia/, reptile-database.org,
            mammaldiversity.org). The gray-shaded windows indicate the time frames for which our surveys of data types were carried out. Vertebrate
            numbers refer to currently accepted species, whereas for the other taxa, also species currently considered as synonyms are included. Furthermore,
            the ION data (insects) also include subspecies. Note that paleontological studies were excluded from our survey.
              FIGURE 6. Comparison of the frequency of use of molecular data in taxonomic studies naming new species, in various groups of organisms,
            given as proportion of taxonomic papers retrieved from a semantic search on Web of Science, measured every 2 years. Molecular data were
            considered as contributing to every study based on a combination of search terms (cf. details in the Supplementary Appendix S4 available on
            Dryad). Data do not necessarily reflect absolute numbers due to inaccuracy involved with keyword searches but primarily serve as a comparison
            among organism groups.
               FIGURE 7. a1) Histograms indicating the proportion of alpha-taxonomic studies that have implicated different categories of data in specialized
             taxonomic journals in 2002, 2010, and 2018. Each series of bars in a and b corresponds to values (from left to right) of plants, fungi, vertebrates,
             invertebrates, and protists. a2) Same statistics for generalist journals (only 2018 is represented for these journals, the number of papers dealing
             with alpha-taxonomic issues being negligible for 2002 and 2010, with only 3 and 5 new species during these 2 years). (b) Mean number of species
             named per article in 2002, 2010, and 2018 in specialized taxonomic journals and generalist journals. c) Proportion of articles with a taxonomic
             component involving molecular data as a function of the number of authors. Specialized taxonomic journals are represented by a selection
             of four journals with a strong taxonomic component: Mycological Progress (Mycology), Phytotaxa and Systematic Botany (Botany), and Zootaxa
             (Zoology). Taxonomic works dealing with protists are shared among these journals. These journals belong to the top journals with taxonomic
             orientation and were selected according to our subjective opinion. The generalist journal category includes PLoS ONE, Scientific Reports, Nature,
             Science, Biological, Zoological and Botanical Journals the Linnean Society, and PNAS. “DNA” refers to mitochondrial or nuclear sequence data sets,
             “Photography” to classical photography plus pictures generated by light and electron microscopy, “Morphometry” to all sets of measurements
             realized and reported with a comparative perspective on a set of several specimens, and “3D imagery” to every study that generated data using
             tomographic methods (mostly 3D X-ray CT, plus one paper using synchrotron radiation CT). See details in Supplementary Appendices S6–S8
             available on Dryad.
               Of other kinds of data, measurement-based morpho-              in the journal in 2018, this corresponds to an average
            metric analysis was used relatively frequently (348 stud-         of six DNA sequences per taxonomic study. While
            ies), whereas landmark-based 2D- or 3D-morphometry                this may be an underestimate because taxonomists
            was rarely applied (7 studies only); 9 studies used               often report the results of their molecular phylogenetic
            geographical models; 13 reported or analyzed extensive            studies separately in higher-impact journals, the overall
            ecological data sets, including variables ranging from            picture is that taxonomy is not yet fully embracing
            climate to culture media; 78 studies used analysis of             the opportunities offered by the analysis of genetic
            sounds (of vertebrates and insects); and 5 used analyses          data.
            of electric waves, vibrations, and similar signals.                  Our analysis indicates that images are the most
               Our survey may be biased against innovative and                universal data type produced in alpha-taxonomic work.
            groundbreaking taxonomic discoveries because those                This is true of all regions of the world (Supplementary
            are often published in generalist journals of higher-             Appendix S9 available on Dryad). As a conservative
            impact factor. The data we obtained from the gener-               estimate, 10 images may typically be produced of
            alist journals surveyed (Supplementary Appendix S5                the holotype and paratypes of a new species and
            available on Dryad) confirmed this suspicion, with                 published as part of the taxonomic study. Mostly, these
                                                                              are photographs and drawings, sometimes scanning
            69% of papers on all organismal categories discussing
                                                                              electron microscopy (SEM). We may assume that in
            DNA data. Overall, molecular data were rare in papers
                                                                              comprehensive revisionary studies, up to 100 images
            published by single authors, whereas papers published
                                                                              (of comparative voucher specimens, or of different
            by larger author teams mentioned such data more fre-              morphological characters) will be produced per newly
            quently (Fig. 7c, Supplementary Appendix S8 available             named species. Most are probably neither published
            on Dryad). Taxonomists from each of five global regions            nor submitted to repositories. Assuming again 20,000
            use similar proportions of the data types (2D imagery             new species named per year (Fig. 1), and a bound
            > DNA > morphometrics > 3D data; Supplementary                    of 100 images per new species, this leads to an
            Appendices S9 and S10 available on Dryad).                        estimated ≤2 million images produced per year in the
               The journals Zookeys and Phytokeys, established in             context of alpha-taxonomic studies. Considering that
            2008 and 2014 respectively, and hence not included in             Instagram alone hosts more than 50 billion images
            our main survey, encourage data sharing and auto-                 and accepts more than 100 million new images per
            matic linking of metadata, and the aims of Zookeys                day (www.omnicoreagency.com, accessed January 19,
            (zookeys.pensoft.net, accessed 22 August 2019) include            2020), the yearly storage capacity required for taxonomy-
            the “preservation of digital materials to meet the highest        specific images produced in alpha-taxonomic research
            possible standards of the cybertaxonomy era.” Yet, the            appears manageable and in the short term is smaller
            general pattern of data use in these two journals so far          than that needed for intensive digitization campaigns
            does not differ from that in other outlets. In 2018, for all 83   of natural history museums and herbaria (e.g., Le Bras
            alpha-taxonomic papers published in Phytokeys, and 100            et al. 2017).
            randomly chosen ones published in Zookeys, molecular
            data were implicated in 29% (botany), 22% (entomology),
            and 50% (vertebrates). Despite innovations such as
            semantic markup or tagging, a method that assigns                      USEFUL DATA FOR NEXT-GENERATION TAXONOMY
            markers, or tags, to taxonomic names, gene sequences,               Our survey revealed that taxonomists in their routine
            localities, designations of nomenclatural novelties, and          alpha-taxonomic work do not make systematic use
            so on (Penev et al. 2018), standardization and sharing            of large omics data sets or 3D imagery. A rise in
            of raw data are far from being widely implemented                 the use of such advanced molecular and imagery
            in taxonomy. For instance, only 2.5% of all the GBIF-             data sets, however, is likely, especially as these
            mediated occurrences for the 24 classes of organisms              methods become more affordable and as images
            surveyed by Troudet et al. (2018) were linked to digital          of the type specimens of new names may become
            data and 1.5% to DNA sequences, and outlets such                  required by the codes of nomenclature. Taxonomists’
            as the Biodiversity Data Journal (Smith et al. 2013) that         requirements for data and metadata formats, however,
            try to redefine taxonomic papers as sources of data                go beyond DNA sequences and images. Verifiability
            rather than narratives, remain an exception—probably              of taxonomic work may sometimes require the
            not only because of technological limitations but also            archiving of computer memory-intensive raw data
            motivational factors (Hipsley and Sherratt 2019).                 of genomic and transcriptomic studies, for example, in
               How many DNA sequences are produced in the                     the NCBI-SRA Sequence Read Archive, but assemblies,
            context of taxonomic research? We used Zootaxa as                 especially if findable via a specimen identifier and
                                                                              accompanied by specimen metadata, may be more
            a benchmark, representative of a large amount of
                                                                              important. So far, however, assemblies especially of
            contemporary taxonomic work. For 2015–2018, numbers               RNAseq experiments are often not submitted to the
            of sequences deposited in NCBI-GenBank (accessed                  Transcriptome Shotgun Assembly Sequence Database
            August 22, 2019) with a Zootaxa reference varied between          (https://www.ncbi.nlm.nih.gov/genbank/tsa/)          or
            8662 and 14,073 per year (Supplementary Appendix                  other specialized repositories in a searchable format
            S11 available on Dryad). With 2321 papers published               (Moreton et al. 2015).
               Geographical occurrence data, also extremely import-        of them highly specialized (Louis et al. 2002; Pampel
            ant for taxonomic work, are available from GBIF                et al. 2013). Of the few generalist repositories, some are
            (https://www.gbif.org/; 1.3 billion records as of              not free of charge, and many do not provide curated
            September 2019) or Map of Life (https://mol.org/) and          metadata that would allow informed searches (Assante
            furthered also by citizen science portals (e.g., iNatural-     et al. 2016). Many scientific journals in the life sciences
            ist, https://www.inaturalist.org/), but metabarcoding          now recommend data repositories for archiving
            data, which include occurrence records of morpho-              the data that accompany a paper (e.g., the journal
            logically cryptic or microscopic taxa including fungi,         Scientific Data on behalf of Springer Nature journals:
            protists, or small invertebrates, are so far not stored        https://www.nature.com/sdata/policies/repositories,
            in a retrievable way. This is because the focus has            or PLoS: Public Library of Science Recommended
            been on archiving the raw sequence reads rather than
                                                                           Data Repositories; DOI: 10.25504/FAIRsharing.t2exm).
            the consensus OTU sequences that could be reused by
            taxonomists. Standards for metabarcoding data should           Dedicated registries have been developed to searching
            therefore include the archiving of quality-filtered con-        repositories for specific kinds of data (e.g., re3data.org/
            sensus reads in a searchable format, preferably as species     and fairsharing.org/), with the FAIR data principles—
            hypotheses linked to DOI numbers (Tedersoo et al. 2015).       data should be Findable, Accessible, Interoperable, and
               Lastly, chemotaxonomy is routine in the taxonomy            Reusable—as a framework (Wilkinson et al. 2016) and
            of prokaryotes (Stackebrandt and Smith 2019), is often         measurable metric (Wilkinson et al. 2018). Taxonomic
            used in fungi (Frisvad et al. 2008), has proven useful         data repositories should be (i) free of charge for data
            in several classification approaches in plants (Wink            contributors, (ii) user-friendly, with a low-complexity
            et al. 2010), and may be useful for some insects (Kather       submission workflow, not requiring affiliation to
            and Martin 2012) and vertebrates (Poth et al. 2012;            academic institutions and not requiring cumbersome
            Starnberger et al. 2013). According to our survey, it is       registration or login procedures, and (iii) including
            rarely used in alpha-taxonomic studies of nonfungal            careful and prompt quality-checks of submissions by
            eukaryotes today, but metabolomic or proteomic profiles         dedicated data curators. This is particularly important
            (Steinmann et al. 2013; Rossel and Martínez 2019) and          because a substantial proportion of the estimated
            NIR spectra (Rodríguez-Fernández et al. 2011; Kinzner          30,000–40,000 taxonomists worldwide (Haas and
            et al. 2015) have proven useful in large-scale species         Häuser 2007) lack data management expertise and
            identification and discrimination. Chemotaxonomic               support as they often work as single authors or small
            data traditionally play an important role in lichenized        teams (Knapp 2008; Joppa et al. 2011) and in many
            fungi (Lumbsch 2002), and mycologists distinguish              cases are nonprofessional researchers (Hopkins and
            species by HPLC profiling (Kuhnert et al. 2017;                 Freckleton 2002; Fontaine et al. 2012).
            Helaly et al. 2018) and sometimes higher taxa based               Ideally, taxonomic repositories should be able to
            on secondary metabolites (Wendt et al. 2018). The              handle universally unique identifiers to refer to speci-
            retention factors of known chemotaxonomic markers              mens (Guralnick et al. 2015; Güntsch et al. 2018; Nelson
            in standard thin-layer chromatography systems are              et al. 2018; Triebel et al. 2018). At present, however, a
            stored in the LIAS database (http://www.lias.net/). For        mandatory use of such identifiers for submission of taxo-
            spectroscopic data including GC-MS, the NIST database          nomic data is unrealistic because, as we have explained
            (https://www.nist.gov/pml/atomic-spectra-database)             above, (i) they do not yet exist for many collections
            provides reference spectra for many plant metabolites          and (ii) the best way of numbering bulk collections
            but does not act as a repository. Chemotaxonomy                is still unclear. For data reuse to be encouraged and
            can be aided by commercial databases like DNP,                 facilitated in taxonomy and by its end users, emphasis
            (http://dnp.chemnetbase.com/),         which      contains     should be on making data and metadata available
            comprehensive information about the occurrence                 in highly standardized formats, enhancing comparab-
            and distribution of secondary metabolites across               ility across taxonomic studies. Metadata should thus
            all organism kingdoms but these databases are not              include a specimen identifier in best-practice format
            open access and incur considerable license fees.               for the respective group of organisms, in addition to a
            Metabolomic and chemotaxonomic repositories do                 species-level name (accepted or candidate species) and
            exist (e.g., Tsugawa et al. 2019) but the underlying           information on geographic location, if possible including
            raw data may vary in quality and quantity depending            geographical coordinates. Usage of standards defined
            on the applied technological sensitivity, and thus             in the Darwin Core or ABCD (Holetschek et al. 2012;
            may not be readily searchable or comparable across             Wieczorek et al. 2012) would be highly advisable. In
            platforms.                                                     general, however, the submission procedure should keep
                                                                           mandatory metadata to a minimum but provide an
                                                                           extensive, standardized list of optional metadata, as
                      CRITERIA FOR TAXONOMIC DATA REPOSITORIES             in the minimum checklist concept of the Minimum
              The importance of data repositories becoming part            Information about any (x) Sequence (MIxS) for DNA data
            of the routine taxonomic research workflow was                  (Yilmaz et al. 2011).
            recognized almost 20 years ago (Louis et al. 2002; Lynch          Taxonomy is firmly grounded in history. Studies
            2008). Today, there is a plethora of repositories, many        published 100 or 200 years ago are regularly consulted
            by taxonomists today and so are voucher specimens               A final criterion for taxonomic data repositories is
            collected over centuries (see also Venu and Sanjappa         flexibility in format because of the diversity of taxonomic
            2011). The principal task of natural history museums         data (above and Figs. 3 and 4). To reflect this diversity,
            and herbaria is to preserve biological materials in          data submission should allow for user-defined metadata
            perpetuity. The rapid technological turnover of the          formats, but enforce the use of Darwin Core or ABCD
            digital era therefore elicits concerns in the taxonomic      standards (Holetschek et al. 2012; Wieczorek et al.
            community (e.g. Dubois 2003; Padial and De la Riva           2012; Cicero et al. 2017) where applicable and not
            2007): can data storage be ensured for “perpetu-             impose restrictions on the number of data files to
            ity”? This concern may be alleviated by data repos-          be submitted. None of the 15 taxonomic repositories
            itories acquiring a certificate, like the CoreTrustSeal       reviewed for this article meet all 12 of the needs
            (https://www.coretrustseal.org/), which certifies that        and criteria assessed (Tables 3 and 4, Supplementary
            they are sustainable and trustworthy. Because museums        Appendix S12 available on Dryad). Some criteria,
            and herbaria already provide long-term storage and           especially free and open access, are fulfilled by most
            careful curation of specimens, their data centers are also   repositories, but taxonomy-specific options for sub-
            the ideal location for long-term repositories of specimen-   mission or search are not. As examples, the lead-
            associated data, certified under even stricter rules such     ing repositories in the field of molecular data (Gen-
            as requiring a well-defined exit strategy defining where       Bank, http://www.ncbi.nlm.nih.gov/genbank; DDBJ,
                                                                         https://www.ddbj.nig.ac.jp; ENA, https://www.ebi.
            the data will be archived if the repository ceases to
                                                                         ac.uk/ena) seem to be compliant with most of the criteria
            exist (Table 3).
                                                                         in Table 3. In contrast, taxonomy-specific repositories, for
               Taxonomic data repositories should include (i) the
                                                                         instance those for bioacoustic recordings in amphibian
            option of complex advanced searches with elaborate
                                                                         taxonomy (Köhler et al. 2017), do not make data openly
            combinations of inclusion and exclusion of search terms      available for reuse.
            (and/or an API), (ii) semantic (contextual) searches
            for finding species under synonymous names, (iii)
            fuzzy searches allowing for different spelling variations
            e.g. of specimen identifiers, and (iv) the option to                     RECOMMENDATIONS AND CONCLUSIONS
            search a repository through other, general portals              The last decades have seen a massive increase
            like GBIF (gbif.org) or GFBio (gfbio.org). Searches          of taxonomic cyber-infrastructure, delivering crucial
            that include taxon names could be facilitated by the         services to many end users. Only a minor fraction
            possibility to access established taxonomic backbones,       of this infrastructure has, however, been specifically
            such as the NCBI taxonomy (Federhen 2012), GBIF,             conceived to support the alpha-taxonomic workflow
            or the many databases underlying the Catalogue of            itself. Taxonomists themselves need to become more
            Life (http://www.catalogueoflife.org/), or ideally to a       involved with the development of tools to integrate
            dynamic database providing a Global Names Architec-          the existing resources into their operational pipelines.
            ture (Pyle 2016).                                            Perhaps most important are data portals to retrieve and
               Large-scale taxonomic studies are often impeded by        submit specimen-based data. Via customized searches,
            the sheer amount of data that need to be compared.           a taxonomic portal fully dedicated to aggregating data
            The problem is compounded by an inherent conflict             based on specimen identifiers would retrieve all data
            between the two main interests of taxonomy—quality           in real time—DNA sequences, images, current species
            and speed of delimitation (Sangster and Luksenburg           attribution—available for a specimen across distributed
            2015). Probabilistic tools for (semi-)automated species      repositories and databases, thus coming close to the
            delimitation relying on high-quality data repositories       cyberspecimen concept. Distributed collection catalog
            might help. A few such tools have been developed,            portals, in particular VertNet (http://vertnet.org/),
            including Structure (Pritchard et al. 2000), GMYC (Pons      already have implemented many of the search options
            et al. 2006), Haploweb (Flot et al. 2010), ABC (Camargo      needed by taxonomists and could be successively expan-
            et al. 2012), ABGD (Puillandre et al. 2012), RESL            ded (Cicero et al. 2017). Connecting such a catalog
            (Ratnasingham and Hebert 2013), and PTP (Zhang et al.        to molecular data repositories, especially GenBank
            2013), but they all rely on DNA data and do not integrate    (https://www.ncbi.nlm.nih.gov/genbank/) or the Bar-
            other taxonomic evidence (Edwards and Knowles 2014).         code of Life (http://www.boldsystems.org/), whose
            Examples of programs for automated integrative species       structure fits our criteria for taxonomic data repositories
            delimitation (including information from geography or        quite well (Table 4) seems to be a logical first step.
            morphology) are Geneland (Guillot et al. 2005) and           Repositories should also be linked with taxonomic
            iBPP (Solís-Lemus et al. 2015). In the future, initial       databases in a flexible way, allowing data to be retrieved
            species delimitation hypotheses could be elaborated by       not only under the current taxonomic name but also
            probabilistic (machine-learning) algorithms that make        in nomenclatural and perhaps taxonomic synonym
            full use of data from different repositories. For this to    searches. A closer collaboration of taxonomists with the
            work, data in repositories need to be machine-accessible,    data scientists working on large cybertaxonomy projects
            standardized, reviewed, georeferenced, and current.          in the same institutions may create unexpected synergies
            because often, small modifications to existing data-                               power (Zurowietz et al. 2019). Most of them are already
            aggregating portals could substantially improve their                             equipped with machine-learning functions to automate
            utility for taxonomists.                                                          some steps in the annotation process. Toolboxes to be
               Images are among the most widely produced and used                             included in taxonomic repositories, or in cyberspecimen
            types of data in alpha-taxonomy (Fig. 7). Establishing                            data portals, could include automatic detection of
            portals that allow image repositories to be searched by                           rulers or scale bars, dynamic continuous zoom, and
            specimen identifiers should become a priority. Images                              measurement tools both for 2D and 3D images.
            are semistructured data, and successful managing or                                  Versatile data portals connected to rich taxonomic
            searching of such data requires metadata, including spe-                          data repositories would benefit taxonomists as well
            cies identifiers, annotations, scale information, author-                          as end users of taxonomy. For instance, the progress
            ship, and geographical location. New software solutions                           in computational power and imaging technology
            are needed to collect and safeguard this information and                          on smartphones allows the collection of visual data
            the diverse image data. Recently, image annotation soft-                          and the instant availability of taxonomic knowledge
            ware tools have been proposed to support, for instance,                           on a new scale. There is a boom of cellphone apps
            environmental monitoring (Schlining and Stout 2006;                               that identify species of plants and mushrooms
            Kloster et al. 2014; Althaus et al. 2015; Beijbom et al.                          (e.g.,     Pl@ntNet,     https://identify.plantnet.org/;
            2015; Langenkämper et al. 2017). These tools are easy                             PlantSnap, https://www.plantsnap.com/; Naturblick,
            to use and have low requirements of computational                                 http://www.naturblick.naturkundemuseum.berlin) or
                                                 Digimorph    ++                +          −          −             ++         +                ++              −             −          −          −          −          +          +
                                                 Morphomuseum ++                ++         −          −             ++         ++               ++              −             −          ++         +          −          ++         +
                                                 Morphosource ++                ++         ++         ++            ++         ++               ++              +             −          +          +          −          ++         ++
                                                                                                                                                                                                                                                                         MANUSCRIPT CATEGORY:
                                                 IDR          +                 +          +          ++            ++         ++               ++              ++            −          −          −          +          ++         +
                                                 Metabolights ++                ++         +          ++            ++         ++               ++              +             −          +          −          −          ++         ++
                                                 Genbank      ++                ++         ++         ++            ++         ++               ++              +             ++         ++         ++         +          ++         ++
                                                 DDBJ         ++                ++         ++         ++            ++         ++               ++              +             ++         ++         ++         +          ++         ++
                                                 ENA          ++                ++         ++         ++            ++         ++               ++              +             ++         ++         ++         ++         ++         ++
                                                 Movebank     ++                −          +          ++            ++         ++               ++              ++            −          +          −          ++         ++         +
                                                                                                                                                                                                                                                                         Systematic Biology
Page: 1248
1231–1253
         Copyedited by: YS                                                  MANUSCRIPT CATEGORY:   Systematic Biology
            animals      (e.g.,    https://fieldguide.ai/)     or    all          RE 603/29-1) and benefited from the sharing of
            of the above (https://www.inaturalist.org) by                        expertise within the DFG priority program SPP 1991
            automated comparison of photos with large                            Taxon-Omics.
            image collections. Similar apps also exist for
            sound-based species identification of birds (e.g.,
                                                                                                   ACKNOWLEDGMENTS
            SongSleuth, https://www.songsleuth.com/; BirdNet,
            https://birdnet.cornell.edu/; BirdGenie, https://press.                We are grateful to William N. Eschmeyer, Jon D.
            princeton.edu/apps/birdgenie.html;           BirdSongID,             Fong, Ronald Fricke, Darrel R. Frost, Rafaël Govaerts,
            http://isoperla.co.uk/; ChirpOMatic, http://www.                     Vincent Robert, Peter Uetz, and Richard van der Laan for
            chirpomatic.com/), bats (e.g. iBatsID, https://                      useful advice and data on rates of species discovery and
            sites.google.com/site/ibatsresources/iBatsID),         and           naming. We thank Christy Hipsley and one anonymous
            increasingly      also     insects  (e.g.    CicadaHunt,             reviewer for constructive feedback on our manuscript.
            http://newforestcicada.info/app/).         These      apps           We also thank Steve A. Marshall, Neal Evenhuis, and
            impressively demonstrate the potential of computer-                  Sébastien Soubzmaigne for allowing the use of original
            based approaches to species identification and provide                photographs.
            a glimpse into what may be possible in a future in
            which large virtual collections of cyberspecimens
            become available to train artificial intelligence                                                    REFERENCES
            pipelines.                                                           Akkari N., Enghoff H., Metscher B.D. 2015. A new dimension
               Having reviewed numerous data repositories for                       in documenting new species: high-detail imaging for myriapod
            this study, we propose a pilot submission template in                   taxonomy and first 3D cybertype of a new millipede species
            Supplementary Appendices S13 and S14 available on                       (Diplopoda, Julida, Julidae). PLoS One 10:e0135243.
                                                                                 Althaus F., Hill N., Ferrari R., Edwards L., Przeslawski R., Schönberg
            Dryad, building upon models established by the NCBI                     C. H., Stuart-Smith R., Barrett N., Edgar G., Colquhoun J., Tran M.,
            Sequence Read Archive and (re-)using ABCD terms. This                   Jordan A., Rees T., Gowlett-Holmes K. 2015. A standardised vocab-
            template is currently being tested for the submission                   ulary for identifying benthic biota and substrata from underwater
            of data to the GFBio data centers (Diepenbroek et al.                   imagery: the catami classification scheme. PLoS One 10:e0141039.
            2014). Because taxonomy is intrinsically dependent on                Amorim D.S., Santos C.M.D., Krell F.T., Dubois A. 2016. Timeless
                                                                                    standards for species delimitation. Zootaxa 4137:121–128.
            long-term availability of data, taxonomists will have a              Andrae A.S.G., Edler T. 2015. On global electricity usage of communic-
            high motivation to meet the “taxonomic data repository”                 ation technology: trends to 2030. Challenges 6:117–157.
            challenge and to develop concepts of truly sustainable,              Anonymous [International Commission on Zoological Nomenclature].
            potentially perpetual data storage. The electricity usage               1999. International code of zoological nomenclature. 4th ed. Lon-
            and the carbon footprint associated with data storage                   don: International Trust for Zoological Nomenclature, p. i–xxix +
                                                                                    1–306.
            (Andrae and Edler 2015; Jones 2018) may require stand-               Assante M., Candela L., Castelli D., Tani A. 2016. Are scientific data
            ards allowing submitters to identify which data truly                   repositories coping with research data publishing? Data Sci. J. 15:6.
            merit long-term storage (e.g., to prevent submission                 Balke M., Schmidt S., Hausmann A., Toussaint E.F.A., Bergsten J.,
            of redundant or blurred pictures, or to optimize their                  Buffington M., Häuser C.L., Kroupa A., Hagedorn G., Riedel A.,
            resolution level when it is excessively high). A stringent              Polaszek A., Ubaidillah R., Krogmann L., Zwick A., Fikáèek M.,
                                                                                    Hájek J., Michat J.C., Dietrich C., La Salle J., Mantle B.K.L., Ng P.,
            archiving strategy of original taxonomic data could                     Hobern D. 2013. Biodiversity into your hands—a call for a virtual
            become an integral part of a renewed procedure to                       global natural history ‘metacollection’. Front. Zool. 10:55.
            name new species—accelerated but without comprom-                    Beijbom O., Edmunds P., Roelfsema C., Smith J., Kline D., Neal B.,
            ising quality of species hypotheses, mobilizing species                 Dunlap M.J., Moriarty V., Fan T.Y., Tan C.J., Chan S., Treibitz T.,
            information through images, DNA sequences, sounds,                      Gamst A., Mitchell B.G., Kriegman D. 2015. Towards automated
                                                                                    annotation of benthic survey images: variability of human experts
            or tabulated trait information, while relieving taxonom-                and operational modes of automation. PLoS One 10:e0130312.
            ists from manually compiling lengthy descriptions.                   Bik H.M. 2017. Let’s rise up to unite taxonomy and technology. PLoS
            Although words will necessarily remain the means to                     Biol. 15:e2002231.
            justify taxonomic decisions, evaluate species criteria and           Bosselaers J., Dierick M., Cnudde V., Masschaele B., van Hoorebeke L.,
            (briefly) list diagnostic features of new species, tax-                  Jacobs P. 2010. High-resolution X-ray computed tomography of an
                                                                                    extant new Donuea (Araneae: Liocranidae) species in Madagascan
            onomists should consider moving towards publishing                      copal. Zootaxa 2427:25–35.
            alpha-taxonomic results as interlinked, standardized,                Brooke M. de L. 2000. Why museums matter. Trends Ecol. Evol. 15:136–
            and openly accessible data sets rather than traditional                 137.
            descriptive papers.                                                  Camargo A., Morando M., Avila L.J. and Sites J.W. 2012. Species
                                                                                    delimitations with ABC and other coalescent-based methods: a
                                                                                    test of accuracy with simulations and an empirical example with
                                   SUPPLEMENTARY MATERIAL                           lizards of the Liolaemus darwinii complex (Squamata: Liolaemidae).
              Data available from the Dryad Digital Repository:                     Evolution 66:2834–2849.
            http://dx.doi.org/10.5061/dryad.fj6q573qd.                           Cannon P., Aguirre-Hudson B., Aime M.C., Ainsworth A.M., Bidar-
                                                                                    tondo M.I., Gaya E., Hawksworth D., Kirk P., Leitch I.J., Lücking
                                                                                    R. 2018. Definition and diversity. In: Willis K.J., editors. State of the
                                                 FUNDING                            world’s fungi. Report. Kew: Royal Botanic Gardens, p. 4–11.
                                                                                 Ceriaco L.M.P., Gutiérrez E.E., Dubois, A. 2016. Photography-based
              This work was supported by the Deutsche                               taxonomy is inadequate, unnecessary, and potentially harmful for
            Forschungsgemeinschaft (DFG, grant number DFG                           biological sciences. Zootaxa 4196(3): 435–445.
            Chauvel B., Dessaint F., Cardinal-Legrand C., Bretagnolle, F. 2006.             amnh.org/herpetology/amphibia/index.html. American Museum
               The historical spread of Ambrosia artemisiifolia L. in France from           of Natural History, New York, USA (March 15, 2019).
               herbarium records. J. Biogeogr. 33:665–673.                               Garraffoni A.R.S., Freitas A.V.L. 2017. Photos belong in the taxonomic
            Christidis L. (Ed.) 2018. The Howard and Moore complete checklist               code. Science 355(6327):805.
               of the birds of the world, version 4.1 (Downloadable checklist).          Gemeinholzer B., Vences M., Beszteri B., Bruy T., Felden J., Kostadinov
               Available from: https://www.howardandmoore.org (March 15,                    I., Miralles A., Nattkemper T.W., Printzen C., Renz J., Rybalka N.,
               2019).                                                                       Schuster T., Weibulat T., Wilke T., Renner S.S. 2020. Data storage
            Cicero C., Spencer C.L. Bloom D.A., Guralnick R.P., Koo M.S., Otegui            and data re-use in taxonomy—the need for improved storage and
               J.. Russell L.A, Wieczorek J.R. 2017. Biodiversity informatics and           accessibility of heterogeneous data. Org. Divers. Evol. 20:1–8.
               data quality on a global scale. In: Webster M.S., editors. Emerging       Gignac P.M., Kley N.J., Clarke J.A., Colbert M.W., Morhardt A.C.,
               frontiers in collections-based ornithological research: the extended         Cerio D., Cost I.N., Cox P.G., Daza J.D., Early C.M., Echols M.S.,
               specimen. Studies in avian biology. Boca Raton, FL: CRC Press, p.            Henkelman R.M., Herdina A.N., Holliday C.M., Li Z., Mahlow
               201–218.                                                                     K., Merchant S., Müller J., Orsbon C.P., Paluh D.J., Thies M.L.,
            Costello M.J., May R.M., Stork N.E. 2013a. Can we name Earth’s species          Tsai H.P., Witmer L.M. 2016. Diffusible iodine-based contrast-
               before they go extinct? Science 339(6118):413–416.                           enhanced computed tomography (diceCT): an emerging tool for
            Costello M.J., Wilson S., Houlding B. 2013b. More taxonomists                   rapid, high-resolution, 3-D imaging of metazoan soft tissues. J. Anat.
               describing significantly fewer species per unit effort may indicate           228(6):889–909.
               that most species have been discovered. Syst. Biol. 62:616–624.           Godfray H.C.J.Jr. 2007. Linnaeus in the information age. Nature
            Crous P.W., Gams W., Stalpers J.A., Robert V., Stegehuis G. 2004.               446:259–260.
               MycoBank: an online initiative to launch mycology into the 21st           Grass A., Tremetsberger K., Hössinger R., Bernhardt K-G. 2014. Change
               century. Stud. Mycol. 50(1):19–22.                                           of species and habitat diversity in the Pannonian Region of Eastern
            De Mauro A., Greco M., Grimaldi M. 2016. A formal definition of big              Lower Austria over 170 years: using herbarium records as a witness.
               data based on its essential features. Library Rev. 65:122–135                Nat. Resour. 5:583–596.
            de Queiroz K. 1998. The general lineage concept of species, species          Guillot G., Estoup A., Mourtier F., Cosson, J.F. 2005. A spatial statistical
               criteria, and the process of speciation. In: Howard D.J., Berlocher          model for landscape genetics. Genetics 170:1261–1280.
               S.H., editors. Endless forms: species and speciation. New York:           Güntsch A., Groom Q., Hyam R., Chagnoux S., Röpert D., Berendsohn
               Oxford University Press., p. 57–75.
                                                                                            W., Casino A., Droege G., Gerritsen W., Holetschek J., Marhold
            de Queiroz K. 2007. Species concepts and species delimitation. Syst.
                                                                                            K., Mergen P., Rainer H., Smith V., Triebel D. 2018. Standardised
               Biol. 56:879–886.
                                                                                            globally unique specimen identifiers. Biodivers. Inf. Sci. Stand.
            Diepenbroek M., Glöckner F., Grobe P., Güntsch A., Huber R., König-
                                                                                            2:e26658.
               Ries B., Kostadinov I., Nieschulze J., Seeger B., Tolksdorf R., Triebel
                                                                                         Guralnick R.P., Cellinese N., Deck J., Pyle R.L., Kunze J., Penev L., Walls
               D. 2014. Towards an integrated biodiversity and ecological research
                                                                                            R., Hagedorn G., Agosti D. Wieczorek J., Catapano T., Page R. 2015.
               data management and archiving platform: the German Federation
               for the Curation of Biological Data (GFBio) In: Plödereder E.,               Community next steps for making globally unique identifiers work
               Grunske L., Schneider E., Ull D., editors. Informatik 2014—big               for biocollections data. ZooKeys 494:133–154.
               data komplexität meistern. GI-Edition: Lecture Notes in Informatics       Haas F., Häuser C.L. 2007. How many taxonomists are there?
               (LNI)—Proceedings. GI edn., vol. 232. Bonn: Köllen, p. 1711–1724.            Available from: http://www.senckenberg.uni-frankfurt.de/odes/
            Dietrich C., Hart J., Raila, D., Ravaioli U., Sobh N., Sobh O., Taylor C.       Haas_Haeuser.pdf.
               2012. InvertNet: a new paradigm for digital access to invertebrate        Hawksworth D.L., Hibbett D.S., Kirk P.M., Lücking R. 2016. Proposals
               collections. Zookeys 209:165–181.                                            to permit DNA sequence data to serve as types of names of fungi.
            Dubois A. 2003. Should internet sites be mentioned in the bibliograph-          Taxon 65:899–900.
               ies of scientific publications? Alytes 21:1–2.                             Helaly S.E., Thongbai B., Stadler M. 2018. Diversity of biologically
            Edwards D.L., Knowles L.L. 2014. Species detection and individual               active secondary metabolites from endophytic and saprotrophic
               assignment in species delimitation: can integrative data increase            fungi of the ascomycete order Xylariales. Nat. Prod. Rep. 35:992–
               efficacy? Proc. R. Soc. Lond. [Biol]. 281:20132765.                           1014.
            Faulwetter S., Vasileiadou A., Kouratoras M., Dailianis T., Arvanitidis      Hipsley C.A., Sherratt E. 2019. Psychology, not technology, is our
               C. 2013. Micro-computed tomography: introducing new dimensions               biggest challenge to open digital morphology data. Sci. Data. 6:41.
               to taxonomy. Zookeys 263:1–45.                                            Holetschek J., Dröge G., Güntsch A., Berendsohn W.G. 2012. The ABCD
            Favret C. 2014. Cybertaxonomy to accomplish big things in aphid                 of primary biodiversity data access, Plant Biosyst. 146:771–779.
               systematics. Insect Sci. 21:392–399.                                      Hongsanan S., Xie N., Liu J.K., Dissanayake A., Ekanayaka A.H., Raspé
            Federhen S. 2012. The NCBI taxonomy database. Nucleic Acids Res. 40             O., Jayawardena R.S., Hyde K.D., Jeewon R., Purahong W., Stadler
               (Database issue):D136–D143.                                                  M., Peršoh D. 2018. Can we use environmental DNA as holotypes?
            Flot J.-F., Couloux A., Tillier S. 2010. Haplowebs as a graphical tool for      Fungal. Divers. 92:1–30.
               delimiting species: a revival of Doyle’s “field for recombination”         Hopkins G.W., Freckleton R.P. 2002. Declines in the numbers of ama-
               approach and its application to the coral genus Pocillopora in               teur and professional taxonomists: implications for conservation.
               Clipperton. BMC Evol. Biol. 10:1–14.                                         Anim. Conserv. 5:245–249.
            Fontaine B., van Achterberg K., Alonso-Zarazaga M.A., Araujo R.,             IISE 2011. State of observed species. Tempe, AZ: International Institute
               Asche M., Aspöck H., Aspöck U., Audisio P., Aukema B., Bailly N.,            for Species Exploration. Available from: http:/species.asu.edu/SOS
               Balsamo M., Bank R.A., Belfiore C., Bogdanowicz W., Boxshall G.,              (March 15, 2019).
               Burckhardt D., Chylarecki P., Deharveng L., Dubois A., Enghoff H.,        Jones N. 2018. How to stop data centres from gobbling up the world’s
               Fochetti R., Fontaine C., Gargominy O., Gomez Lopez M.S., Goujet             electricity. Nature 561:163–166.
               D., Harvey M.S., Heller K.G., van Helsdingen P., Hoch H., De Jong         Joppa L.N., Roberts D.L., Pimm S.L. 2011. The population ecology and
               Y., Karsholt O., Los W., Magowski W., Massard J.A., McInnes S.J.,            social behaviour of taxonomists. Trends Ecol. Evol. 26:551–553.
               Mendes L.F., Mey E., Michelsen V., Minelli A., Nieto Nafrıa J.M.,         Kather R., Martin S.J. 2012. Cuticular hydrocarbon profiles as a
               van Nieukerken E.J., Pape T., De Prins W., Ramos M., Ricci C.,               taxonomic tool: advantages, limitations and technical aspects.
               Roselaar C., Rota E., Segers H., Timm T., van Tol J., Bouchet P. 2012.       Physiol. Entomol. 37: 25–32.
               New species in the old world: Europe as a frontier in biodiversity        Kinzner M.C., Wagner H.C., Peskoller A., Moder K., Dowell F.E.,
               exploration, a test bed for 21st century taxonomy. PLoS One 7:e36881.        Arthofer W., Schlick-Steiner B.C., Steiner F.M. 2015. A near-infrared
            Frisvad J.C., Andersen B., Thrane U. 2008. The use of secondary                 spectroscopy routine for unambiguous identification of cryptic ant
               metabolite profiling in chemotaxonomy of filamentous fungi.                    species. PeerJ. 3:e991.
               Mycol. Res. 112(2):231–240.                                               Kloster M., Kauer G., Beszteri B. 2014. SHERPA: an image segmentation
            Frost D.R. 2019. Amphibian species of the world: an online ref-                 and outline feature extraction tool for diatoms and other objects.
               erence. Version 6.0. Website. Available from: http://research.               BMC Bioinformatics 15:218.
            Knapp S. 2008. Taxonomy as a team sport. In: Wheeler Q., editor. The           Wilson M.R. editors. Species: the units of diversity. London, NY:
               new taxonomy. Systematics Association Special Volume 76. London:            Chapman & Hall. p. 381–423.
               CRC Press. p. 33–53.                                                     McClellan P.H. 2019. Taxonomic punchlines: metadata in biology. Hist.
            Köhler J., Jansen M., Rodríguez A., Kok P.J.R., Toledo L.F., Emmrich           Biol. https://doi.org/10.1080/08912963.2019.1618293.
               M., Glaw F., Haddad C.F.B., Rödel M.O., Vences M. 2017. The use          Miller-Rushing, A.J., Primack R.B., Primack D., Mukunda S. 2006.
               of bioacoustics in anuran taxonomy: theory, terminology, methods            Photographs and herbarium specimens as tools to document
               and recommendations for best practice. Zootaxa 4251:1–124.                  phenological changes in response to global warming. Am. J. Bot.
            Krell F.-T. 2015. ZooBank progress report. Bull. Zool. Nomenclat. 72:          93:1667–1674.
               181.                                                                     Mora C., Tittensor D.P., Adl S., Simpson A.G.B., Worm B. 2011. How
            Krell F.-T., Marshall S.A. 2017. New species described from photo-             many species are there on Earth and in the Ocean? PLoS Biol.
               graphs: Yes? No? Sometimes? A fierce debate and a new Declaration            9:e1001127.
               of the ICZN. Insect Syst. Divers. 1(1):3–19.                             Moreton J., Izquierdo A., Emes R.D. 2015. Assembly, assessment, and
            Kuhnert E., Sir E.B., Lambert C., Hyde K.D., Hladki A.I., Romero               availability of de novo generated eukaryotic transcriptomes. Front.
               A.I., Rohde M., Stadler M. 2017. Phylogenetic and chemotaxonomic            Genet. 6:361.
               resolution of the genus Annulohypoxylon (Xylariaceae) including          Nelson G., Sweeney P., Gilbert E. 2018. Use of globally unique
               four new species. Fungal Divers. 85:1–43.                                   identifiers (GUIDs) to link herbarium specimen records to physical
            Langenkämper D., Zurowietz M., Schoening T., Nattkemper T.W.                   specimens. Appl. Plant Sci. 6:e1027.
               2017. BIIGLE 2.0—browsing and annotating large marine image              Padial J.M., De la Riva I. 2007. Taxonomy, the Cinderella of science,
               collections. Front. Mar. Sci, 4:83.                                         hidden by its evolutionary stepsister. Zootaxa 1577:1–2.
            Larsen B.B., Miller E.C., Rhodes M.K., Wiens, J.J. 2017. Inordinate         Padial J.M., Miralles A., De la Riva I., Vences M. 2010. The integrative
               fondness multiplied and redistributed: the number of species on             future of taxonomy. Front. Zool. 7:16.
               Earth and the new pie of life. Q. Rev. Biol. 92: 229–265.                Page R.D.M. 2016. DNA barcoding and taxonomy: dark taxa and dark
            LaSalle J., Wheeler Q., Jackway P., Winterton S., Hobern D., Lovell            texts. Philos. Trans. R. Soc. B. 371:20150334.
               D. 2009. Accelerating taxonomic discovery through automated              Pampel H., Vierkant P., Scholze F., Bertelmann R., Kindling M., Klump
               character extraction. Zootaxa 2217:43–55.                                   J., Goebelbecker H.J., Gundlach J., Schirmbacher P., Dierolf U. 2013.
            Le Bras G., Pignal M., Jeanson M. L., Muller S., Aupic C., Carré B.,           Making research data repositories visible: the re3data.org registry.
               Flament G., Gaudeul M., Gonçalves C., Invernón V.R., Jabbour F.,            PLoS One 8: e78080.
               Lerat E., Lowry P.P., Offroy B., Pimparé Pérez E., Poncy O., Rouhan      Patterson D.J., Cooper J., Kirk P.M., Pyle R.L., Remsen D.P. 2010. Names
               G., Haevermans T. 2017. The French Muséum national d’Histoire               are key to the big new biology. Trends Ecol. Evol. 25:686–691.
               naturelle vascular plant herbarium collection dataset. Sci. Data
                                                                                        Penev L., Agosti D., Georgiev T., Senderov V., Sautter G., Catapano T.,
               4:170016.
                                                                                           Stoev P. 2018. The open biodiversity knowledge management (eco-)
            Lendemer J., Thiers B., Monfils A.K., Zaspel J., Ellwood E.R., Bentley
                                                                                           system: tools and services for extraction, mobilization, handling and
               A., LeVan K., Bates J., Jennings D., Contreras D., Lagomarsino L.,
                                                                                           re-use of data from the published literature. Biodiver. Inf. Sci. Stand.
               Mabee P., Ford L.S., Guralnick R., Gropp R.E., Revelez M., Cobb N.,
                                                                                           2:e25748.
               Seltmann K., Aime M.C. 2020. The extended specimen network: a
                                                                                        Pons J., Barraclough T.G., Gomez-Zurita J., Cardoso A., Duran D.P.,
               strategy to enhance US biodiversity collections, promote research
                                                                                           Hazell S., Kamoun S., Sumlin W.D., Vogler A.P. 2006. Sequence-
               and education. BioScience 70(1):23–30.
            Leonelli S. 2014. What difference does quantity make? On the                   based species delimitation for the DNA taxonomy of undescribed
               epistemology of big data in biology. Big Data Soc. 2014:1–11.               insects. Syst. Biol. 55:595–609.
            Linnaeus C. 1753. Species plantarum exhibentes plantas rite cognitas        Poth D., Wollenberg K.C., Vences M., Schulz S. 2012. Volatile amphibian
               ad genera relatas, cum differentiis specificis, nominibus trivialibus,       pheromones: macrolides of mantellid frogs from Madagascar.
               synonymis selectis, locis natalibus, secundum systema sexuale               Angew. Chem. Int. Ed. 51:1–5.
               digestas. Holmiæ [Stockholm]: Impensis Laurentii Salvi. 132 p.           Pritchard J.K., Stephens M., Donnelly P. 2000. Inference of population
            Linnaeus C. 1758. Systema naturæ per regna tria naturæ, secundum               structure using multilocus genotype data. Genetics 155:945–959.
               classes, ordines, genera, species, cum characteribus, differentiis,      Puillandre N., Lambert A., Brouillet S., Achaz G. 2012. ABGD,
               synonymis, locis. Tomus I. Editio decima, reformata. Holmiæ                 Automatic barcode gap discovery for primary species delimitation.
               [Stockholm]: Impensis Laurentii Salvi. 824 p.                               Mol. Ecol. 21:1864–1877.
            Locey K.J., Lennon J.T. 2016. Scaling laws predict global microbial         Pyle R.L. 2016. Towards a global names architecture: the future of
               diversity. Proc. Natl. Acad. Sci. USA 113(21):5970–5975.                    indexing scientific names. Zookeys 550:261–281.
            Lorieul T., Pearson K.D., Ellwood E.R., Goëau H., Molino J.F.,              Pyle R.L., Earle J.L., Greene B.D. 2008. Five new species of the
               Sweeney P.W., Yost J.M., Sachs J., Mata-Montero E., Nelson G.,              damselfish genus Chromis (Perciformes: Labroidei: Pomacentridae)
               Soltis P.S., Bonnet P., Joly A. 2019. Toward a large-scale and deep         from deep coral reefs in the tropical western Pacific. Zootaxa
               phenological stage annotation of herbarium specimens: case studies          1671:3–31.
               from temperate, tropical, and equatorial floras. Appl. Plant Sci.         Ratnasingham S., Hebert P.D.N. 2013. A DNA-based registry for all
               7(3):e01233.                                                                animal species: the Barcode Index Number (BIN) system. PLoS One
            Louis K.S., Jones L.M., Campbell E.G. 2002. Macroscope: Sharing in             8:e66213.
               Science. Am. Sci. 90:304–307.                                            Renner S.S. 2016. A return to Linnaeus’s focus on diagnosis, not
            Lumbsch H.T. 2002. Analysis of phenolic products in lichens for                description: the use of DNA characters in the formal naming of
               identification and taxonomy. In: Kranner I.C., Beckett R.P., Varma           species. Syst. Biol. 65:1085–1095.
               A.K., editors. Protocols in lichenology. Springer Lab Manuals.           Riley J. 2004. Understanding metadata. Bethesda, MD: NISO Press,
               Berlin, Heidelberg: Springer. p. 281–295.                                   National Information Standards Organization.
            Lynch C. 2008. Big data: How do your data grow? Nature 455: 28–29.          Rissler L.J., Apodaca J.J. 2007. Adding more ecology into species delim-
            Marcial L.H., Hemminger B.M. 2010. Scientific data repositories on the          itation: ecological niche models and phylogeography help define
               web: an initial survey. J. Assoc. Inf. Sci. Technol. 61(10):2029–2048.      cryptic species in the black salamander (Aneides flavipunctatus). Syst.
            Marshall S.A., Evenhuis N.L. 2015. New species without dead bodies:            Biol. 56(6):924–942.
               a case for photo-based descriptions, illustrated by a striking new       Roch M.A., Batchelor H., Baumann-Pickering S., Berchok C.L.,
               species of Marleyimyia Hesse (Diptera, Bombyliidae) from South              Cholewiak D., Fujioka E., Garland E.C., Herbert S., Hildebrand J.A.,
               Africa. ZooKeys 525:117–127.                                                Oleson E.M., Van Parijs S., Risch D., Široviæ A., Soldevilla M.S. 2016.
            May T.W., Redhead S.A., Lombard L., Rossman A.Y. 2018. XI                      Management of acoustic metadata for bioacoustics. Ecol. Inform.
               International Mycological Congress: report of Congress action on            31:122–136.
               nomenclature proposals relating to fungi. IMA Fungus 9(2):xxii.          Roche D.G., Kruuk L.E., Lanfear R., Binning S.A. 2015. Public data
            Mayden R.L. 1997. A hierarchy of species concepts: the denouement              archiving in ecology and evolution: how well are we doing? PLoS
               in the saga of the species problem. In: Claridge M.F., Dawah H.A.,          Biol. 13:e1002295.
            Rodríguez-Fernández J.I , De Carvalho C.J.B., Pasquini C., Gomes              Thorpe S.E. 2017. Is photography-based taxonomy really inadequate,
               de Lima K.M, Moura M.O., Carbajal Arizaga, G.G. 2011. Bar-                    unnecessary, and potentially harmful for biological sciences? A
               coding without DNA? Species identification using near infrared                 reply to Ceríaco et al. (2016). Zootaxa 4226:449–450.
               spectroscopy. Zootaxa 2933:46–54.                                          Triebel D., Reichert W., Bosert S., Feulner M., Osieko Okach D.,
            Rosenberg M.S. 2014. Contextual cross-referencing of species names for           Slimani A., Rambold G. 2018. A generic workflow for effective
               fiddler crabs (genus Uca): an experiment in cyber-taxonomy. PLoS               sampling of environmental vouchers with UUID assignment and
               One. 9:e101704.                                                               image processing. Database 2018:bax096.
            Roskov Y., Ower G., Orrell T., Nicolson D., Bailly N., Kirk P.M.,             Troudet J., Vignes-Lebbe R., Grandcolas P., Legendre F. 2018. The
               Bourgoin T., DeWalt R.E., Decock W., Nieukerken E. van, Zarucchi              increasing disconnection of primary biodiversity data from spe-
               J., Penev L., eds. 2019. Species 2000 & ITIS Catalogue of Life, 26th          cimens: how does it happen and how to handle it? Syst. Biol.
               February 2019. Digital resource at www.catalogueoflife.org/col.                67:1110–1119.
               Species 2000. Naturalis, Leiden, the Netherlands.                          Tsugawa H., Satoh A., Uchino H., Cajka T., Arita M., Arita M. 2019. Mass
            Rossel S., Martínez Arbizu P. 2019. Revealing higher than expected               spectrometry data repository enhances novel metabolite discoveries
               diversity of Harpacticoida (Crustacea:Copepoda) in the North Sea              with advances in computational metabolomics. Metabolites 9(6): pii:
               using MALDI-TOF MS and molecular barcoding. Sci. Rep. 9:9182.                 E119.
            Rupp K. 2018. 42 Years of microprocessor trend data. Web-                     Venu P., Sanjappa M. 2011. The impact factor and taxonomy. Curr. Sci.
               site. Available from: https://www.karlrupp.net/2018/02/42-years               101(11):1397.
               -of-microprocessor-trend-data/ (March 13, 2019).                           Webster M.S. 2017. Emerging frontiers in collections-based ornitholo-
            Sangster G., Luksenburg, J.A. 2015. Declining rates of species described         gical research: the extended specimen. Studies in avian biology. Boca
               per taxonomist: Slowdown of progress or a side-effect of improved             Raton, FL: CRC Press. 240 p.
               quality in taxonomy? Syst. Biol. 64:144–151.                               Wendt L., Sir E.B., Kuhnert E., Heitkämper S., Lambert C., Hladki A.I.,
            Santos C.M.D., Amorim D.S., Klassa B., Fachin D.A., Nihei S.S.,                  Romero A.I., Luangsaard J.J., Srikitikulchai P., Peršoh D., Stadler M.
               Carvalho C.J., Falaschi, R.L., Mello-Patiu C.A., Couri M.S., Oliveira         2018. Resurrection and emendation of the Hypoxylaceae, recognised
               S.S., Silva V.C., Ribeiro G.C., Capellari R.S., Lamas, C.J. 2016. On          from a multi-gene genealogy of the Xylariales. Mycol. Prog. 17:115–
               typeless species and the perils of fast taxonomy. Syst. Entomol.              154.
               41:511–515.                                                                Wheeler Q.D. 2007. Invertebrate systematics or spineless taxonomy?
            Scherz M.D., Glaw F., Vences M., Andreone F., Crottini A. 2016a.                 Zootaxa 1668:11–18.
               Two new species of terrestrial microhylid frogs (Microhylidae:             Wheeler Q.D., Knapp S., Stevenson D.W., Stevenson J., Blum S.D.,
               Cophylinae: Rhombophryne) from northeastern Madagascar. Sala-                 Boom B.M., Borisy G.G., Buizer J.L., De Carvalho M.R., Cibrian
               mandra 52:91–106.                                                             A., Donoghue M.J., Doyle V., Gerson E.M., Graham C.H., Graves
            Scherz M.D., Ruthensteiner B., Vences M., Glaw F. 2014. A new micro-
                                                                                             P., Graves S.J., Guralnick R.P., Hamilton A.L., Hanken J., Law
               hylid frog, genus Rhombophryne, from northeastern Madagascar,
                                                                                             W., Lipscomb D.L., Lovejoy T.E., Miller H., Miller J.S., Naeem S.,
               and a re-description of R. serratopalpebrosa using micro-computed
                                                                                             Novacek M.J., Page L.M., Platnick N.I., Porter-Morgan H., Raven
               tomography. Zootaxa 3860:547–560.
                                                                                             P.H., Solis M.A., Valdecasas A.G., Van Der Leeuw S., Vasco A.,
            Scherz M.D., Vences M., Rakotoarison A., Andreone F., Köhler J., Glaw
                                                                                             Vermeulen N., Vogel J., Walls R.L., Wilson E.O., Woolley J.B. 2012a.
               F., Crottini A. 2016b. Reconciling molecular phylogeny, morpholo-
                                                                                             Mapping the biosphere: exploring species to understand the origin,
               gical divergence and classification of Madagascan narrow-mouthed
                                                                                             organization and sustainability of biodiversity. Syst. Biodivers.
               frogs (Amphibia: Microhylidae). Mol. Phylogenet. Evol. 100:372–381.
            Schlining B.M., Stout, N.J. 2006. "MBARI’s Video Annotation and                  10:1–20.
               Reference System," OCEANS 2006. Boston, MA: IEEE. p. 1–5.                  Wheeler Q.D., Bourgoin T., Coddington J., Gostony T., Hamilton A.,
            Short A.E.Z., Dikow T., Moreau C.S. 2018. Entomological collections in           Larimer R., Plaszek A., Schauff M., Solis M.A. 2012b. Nomenclatural
               the age of big data. Annu. Rev. Entomol. 63:513–530.                          benchmarking: the roles of digital typification and telemicroscopy.
            Simpson G.G. 1961. Principles of animal taxonomy. New York:                      ZooKeys 209:193–202.
               Columbia University Press. p. xii + 247.                                   Wieczorek J., Bloom D., Guralnick R., Blum S., Döring M., Giovanni
            Small E. 1989. Systematics of biological Systematics (or, Taxonomy of            R., Robertson T., Vieglais D. 2012. Darwin Core: an evolving
               Taxonomy). Taxon 38(3):335–356.                                               community-developed biodiversity data standard. PLoS One
            Smith V., Georgiev T., Stoev P, Biserkov J., Miller J., Livermore L., Baker      7:e29715.
               E., Mietchen D., Couvreur T.L., Mueller G., Dikow T., Helgen K.M.,         Wilkinson M.D., Dumontier M., Aalbersberg I.J.J., Appleton G., Axton
               Frank J., Agosti D., Roberts D., Penev L. 2013. Beyond dead trees:            M., Baak A., Blomberg N., Boiten J.-W., Silva Santos L.B. da, Bourne
               integrating the scientific process in the Biodiversity Data Journal.           P.E., Bouwman J., Brookes A.J., Clark T., Crosas M., Dillo I., Dumon
               Biodivers. Data J. 1:e995.                                                    O., Edmunds S., Evelo C.T., Finkers R., Gonzalez-Beltran A., Gray
            Solís-Lemus C., Knowles L.L., Ané C. 2015. Bayesian species delimit-             A.J.G., Groth P., Goble C., Grethe J.S., Heringa J., Hoen P.A.C. ‘t,
               ation combining multiple genes and traits in a unified framework.              Hooft R., Kuhn T., Kok R., Kok J.N., Lusher S.J., Martone M.E., Mons
               Evolution 69:492–507.                                                         A., Packer A.L., Persson B., Rocca-Serra P., Roos M., Schaik R. van,
            Stackebrandt E., Smith D. 2019. Paradigm shift in species description:           Sansone S.-A., Schultes E., Sengstag T., Slater T., Strawn G., Swertz
               the need to move towards a tabular format. Arch. Microbiol. 201:143–          M.A., Thompson M., Lei J. van der, Mulligen E. van, Velterop J.,
               145.                                                                          Waagmeester A., Wittenburg P., Wolstencroft K.J., Zhao J., Mons B.
            Starnberger I., Poth D., Peram P.S., Schulz S., Vences M., Knudsen               2016. The FAIR Guiding Principles for scientific data management
               J., Barej M.F., Rödel M.-O., Walzl M., Hödl W. 2013. Take time to             and stewardship. Sci. Data. 3:160018.
               smell the frogs: vocal sac glands of reed frogs (Anura: Hyperoliidae)      Wilkinson M.D., Sansone S.A., Schultes E., Doorn P., Bonino da Silva
               contain species-specific chemical cocktails. Biol. J. Linn. Soc.               Santos L.O., Dumontier M. 2018. A design framework and exemplar
               110:828–838.                                                                  metrics for FAIRness. Sci. Data 5:180118.
            Steinmann I.C., Pflüger V., Schaffner F., Mathis A., Kaufmann C. 2013.         Wink M., Botschen F., Gosmann C., Schäfer H., Waterman G.
               Evaluation of matrix-assisted laser desorption/ionization time of             2010. Chemotaxonomy seen from a phylogenetic perspective and
               flight mass spectrometry for the identification of ceratopogonid and            evolution of secondary metabolism. Annu. Plant Rev. 40:364–433.
               culicid larvae. Parasitology 140:318–327.                                  Winterton S.L. 2009. Revision of the stiletto fly genus Neodialineura
            Stuessy T.F., Crawford D.J., Soltis D.E., Soltis P.S. 2014. Plant                Mann (Diptera: Therevidae): an empirical example of cybertax-
               systematics—the origin, interpretation, and ordering of plant                 onomy. Zootaxa 2157:1–33.
               biodiversity. In: Regnum Vegetabile, vol. 156. Königstein (Taunus):        Yilmaz P., Kottmann R., Field D., Knight R., Cole J.R., Amaral-
               Koeltz Scientific Books. 425 p.                                                Zettler L., Gilbert J.A., Karsch-Mizrachi I., Johnston A., Cochrane
            Tedersoo L., Ramirez K.S., Nilsson R.H., Kaljuvee A., Kõljalg U.,                G., Vaughan R., Hunter C., Park J., Morrison N., Rocca-Serra
               Abarenkov K. 2015. Standardizing metadata and taxonomic iden-                 P., Sterk P., Arumugam M., Bailey M., Baumgartner L., Birren
               tification in metabarcoding studies. GigaScience 4:34.                         B.W., Blaser M.J., Bonazzi V., Booth T., Bork P., Bushman F.D.,
                Buttigieg P.L., Chain P.S., Charlson E., Costello E.K., Huot-Creasy      any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29:415–
                H., Dawyndt P., DeSantis T., Fierer N., Fuhrman J.A., Gallery            420.
                R.E., Gevers D., Gibbs R.A., San Gil I., Gonzalez A., Gordon           Zamora J.C., and 412 coauthors. 2018. Considerations and con-
                J.I., Guralnick R., Hankeln W., Highlander S., Hugenholtz P.,            sequences of allowing DNA sequence data as types of fungal taxa.
                Jansson J., Kau A.L., Kelley S.T., Kennedy J., Knights D., Koren         IMA Fungus 9:167–175.
                O., Kuczynski J., Kyrpides N., Larsen R., Lauber C.L., Legg T., Ley    Zhang J., Kapli P., Pavlidis P., Stamatakis A. 2013. A general species
                R.E., Lozupone C.A., Ludwig W., Lyons D., Maguire E., Methé              delimitation method with applications to phylogenetic placements.
                B.A., Meyer F., Muegge B., Nakielny S., Nelson K.E., Nemergut            Bioinformatics 29:2869–2876.
                D., Neufeld J.D., Newbold L.K., Oliver A.E., Pace N.R., Palanisamy     Zompro O. 2005. Catalogue of type material of the insect order Phas-
                G., Peplies J., Petrosino J., Proctor L., Pruesse E., Quast C., Raes     matodea, housed in the Museum für Naturkunde der Humboldt
                J., Ratnasingham S., Ravel J., Relman D.A., Assunta-Sansone S.,          Universität zu Berlin, Germany and in the Institut für Zoologie
                Schloss P.D., Schriml L., Sinha R., Smith M.I., Sodergren E., Spo        der Martin Luther Universität in Halle (Saale), Germany. Dtsch.
                A., Stombaugh J., Tiedje J.M., Ward D.V., Weinstock G.M., Wendel         Entomol. Z. 52:251–290.
                D., White O., Whiteley A., Wilke A., Wortman J.R., Yatsunenko          Zurowietz M., Langenkämper D., Nattkemper T.W. 2019. BIIGLE2Go—
                T., Glöckner F.O. 2011. Minimum information about a marker               a scalable image annotation system for easy deployment on cruises.
                gene sequence (MIMARKS) and minimum information about                    OCEANS 2019-Marseille. Marseille, France: IEEE, p. 1–6.