UNIVERSIDADE FEDERAL DE MINAS GERAIS Instituto de Ciências Biológicas Pós-Graduação em Zoologia Bárbara Regina Neves Chaves Metabarcodes de iDNA para Caracterização da Diversidade de Mamíferos no Brasil Belo Horizonte 2024 Bárbara Regina Neves Chaves METABARCODES DE IDNA PARA CARACTERIZAÇÃO DA DIVERSIDADE DE MAMÍFEROS NO BRASIL Tese apresentada ao programa de Pós- Graduação em Zoologia da Universidade Federal de Minas Gerais como requisito parcial para obtenção do grau de Doutora em Zoologia. Orientador: Prof. Dr. Fabrício Rodrigues dos Santos Belo Horizonte 2024 Agradecimentos Ao Fabrício, que manteve a porta aberta para o meu retorno, confiando no meu trabalho e me propondo esse desafio. À Patrícia, ao Greg e ao Centro de Microscopia, que gentilmente proporcionaram minha dedicação exclusiva a este trabalho nos últimos dois anos e meio. Ao José Eustáquio, que colaborou com a ideia e a realização de todas as coletas de campo. Aos colegas do LBEM, que ao longo desses quatro anos e meio, ou mesmo antes, ajudaram nas coletas, na bancada e nas telas pretas, em especial Pedro, Mateus, Julia, Ana Cristina, Jean, Davidson, Henry, Joana e Maria Eugênia (in memorian). À Lica e ao Fred, que cederam espaço no Laboratório de Sistemática de Insetos e me ajudaram na triagem das amostras. Ao Rennan, que me acudiu e fez o sequenciamento das amostras no Laboratório de Genômica, e ao Renato Oliveira, que pacientemente me ajudou nas análises. Ao Tulaci, que ajudou principalmente nos momentos finais de desespero e escrita, e desenhando os mapas deste trabalho. À CAPES, por dois anos de bolsa, e ao CNPq e à FAPEMIG, pelo financiamento do trabalho. Finalmente, à UFMG, minha alma mater e meu local de trabalho, pela estrutura física e suporte financeiro. Resumo A sistemática e a taxonomia são essenciais para a classificação biológica, mas enfrentam limitações no uso de caracteres morfológicos, especialmente em ecossistemas neotropicais, que abrigam uma rica biodiversidade, são altamente ameaçados e subamostrados, resultando na extinção de muitas espécies antes de serem descritas. Técnicas moleculares, como o metabarcoding, oferecem identificações rápidas e precisas, sem a necessidade de conhecimento prévio de morfologia. O metabarcoding identifica espécies a partir de fragmentos curtos de DNA liberados pelos organismos no ambiente. Sua eficiência é similar ou superior à das câmeras-trap, que, embora úteis em levantamentos de mamíferos, falham em identificar espécies pequenas ou arborícolas. Dípteros hematófagos e saprófagos possuem alta capacidade de dispersão e podem localizar mamíferos furtivos em ambientes tropicais, com sua coleta sendo mais eficiente por meio de armadilhas de interceptação de voo. A abordagem demetabarcoding de iDNA permite identificar moléculas de DNA de mamíferos a partir do conteúdo intestinal desses insetos. Como o metabarcoding não possui um marcador genético padrão, é recomendado o uso de múltiplos marcadores para melhorar a resolução taxonômica. Esta tese testou a eficácia do metabarcoding de iDNA em três biomas brasileiros (Amazônia, Mata Atlântica e Cerrado), adaptando primers à diversidade neotropical para melhorar a detecção de espécies e utilizando moscas das famílias Calliphoridae, Sarcophagidae e Tabanidae. Esta tese destaca a importância de integrar o metabarcoding às estratégias de conservação e biomonitoramento, otimizando a análise da biodiversidade e apoiando políticas públicas para a preservação dos biomas brasileiros. Palavras-chave:metabarcoding; idna; biomas neotropicais; conservação de mamíferos. Abstract Systematics and taxonomy are essential for biological classification, but they face limitations when using only morphological characters, especially in Neotropical ecosystems, which host rich biodiversity, are highly threatened, and are under-sampled, leading to the extinction of many species before they are described. Molecular techniques, such as metabarcoding, provide rapid and accurate identifications without the need for prior morphological knowledge. Metabarcoding identifies species from short DNA fragments released by organisms into the environment. Its efficiency is similar to or greater than that of camera traps, which, although useful in mammal surveys, fail to identify small or arboreal species. Hematophagous and saprophagous dipterans have a high dispersal capacity and can locate elusive mammals in tropical environments, with their collection being more efficient through flight interception traps. The iDNA metabarcoding approach allows the identification of mammal DNA molecules from the intestinal contents of these insects. Since metabarcoding does not have a standard genetic marker, the use of multiple markers is recommended to improve taxonomic resolution. This thesis tested the effectiveness of iDNA metabarcoding in three Brazilian biomes (Amazon, Atlantic Forest, and Cerrado), adapting primers to Neotropical diversity to improve species detection, using flies from the families Calliphoridae, Sarcophagidae, and Tabanidae. This thesis highlights the importance of integrating metabarcoding into conservation and biomonitoring strategies, optimizing biodiversity analysis and supporting public policies for the preservation of Brazilian biomes. Keywords:metabarcoding; idna; neotropical biomes; mammal conservation. Sumário Sumário .................................................................................................................................................... 8 1. Introdução ............................................................................................................................................ 8 2. Capítulo I. Seleção de marcadores genéticos e montagem de banco de dados de referência para metabarcoding de mamíferos neotropicais (Artigo submetido à Conservation Genetics Resources)11 3. Capítulo II. Análise da biodiversidade de vertebrados detectada através de metabarcoding de iDNA em biomas brasileiros (artigo não submetido) ...................................................................................27 4. Considerações Finais .......................................................................................................................... 45 REFERÊNCIAS GERAIS ............................................................................................................................. 46 8 1. Introdução O número de espécies formalmente descritas é estimado entre 1,5 e 2 milhões (Larsen et al., 2017), mas calcula-se que cerca de 86% das espécies ainda sejam desconhecidas, especialmente em regiões tropicais (Dirzo & Raven, 2003). O número real de espécies pode ser muito maior, chegando a cerca de 8,7 milhões (Mora et al., 2011). No entanto, a maioria dos indicadores sobre o estado da natureza está em declínio, incluindo o número de espécies e o tamanho de suas populações, com projeções indicando que essa situação pode piorar nas próximas décadas, a menos que ações rápidas e integradas sejam implementadas (Diaz et al., 2019). O problema é ainda mais acentuado em regiões reconhecidas como hotspots de biodiversidade, uma vez que abrigam excepcional riqueza de espécies, ao mesmo tempo que enfrentam graves ameaças das atividades humanas (Myers et al 2000; Jenkins et al 2013). É o caso de biomas neotropicais, considerados entre os mais degradados e em rápido desaparecimento do mundo. Apesar de sua importância, esses ecossistemas são desproporcionalmente subamostrados (Hughes et al 2021), e inúmeras espécies neotropicais sofrem extinções locais ou desaparecem antes de serem formalmente descritas. Isso ressalta a importância de se ampliar o conhecimento taxonômico para compreender e conservar a biodiversidade. A sistemática tem sido o alicerce da classificação biológica por séculos, fornecendo um sistema de referência para toda a biologia. Os principais objetivos da sistemática são descrever, classificar, nomear e determinar as relações da biodiversidade da Terra (Teletchea, 2016), além de fornecer levantamentos e inventários por meio de sua subdisciplina chamada taxonomia (Wilson, 2004; Crisci, 2006). Tradicionalmente, a sistemática e a taxonomia baseiam-se no exame de caracteres observáveis dos organismos, como estrutura corporal, coloração e características anatômicas. Muitas vezes são necessários equipamentos e ensaios específicos para detecção de certos caracteres morfológicos, como microscopia óptica e eletrônica ou mesmo tomografia computadorizada (Tessler et al 2022). No entanto, a sistemática tradicional frequentemente enfrenta limitações relacionadas à disponibilidade de caracteres morfológicos, que podem variar devido a fatores ambientais, estágios de desenvolvimento ou convergência evolutiva (Teletchea, 2016). Diante das complexidades da diversidade biológica, que não se refletem apenas na morfologia, técnicas modernas surgiram não para substituir os estudos morfológicos, mas para oferecer uma compreensão mais abrangente da diversidade biológica (Hillis, 1987). Nesse contexto, técnicas moleculares têm se mostrado ferramentas poderosas para a classificação taxonômica, utilizando sequências de DNA como fonte de caracteres em análises filogenéticas, permitindo identificar homologias, apomorfias e plesiomorfias entre organismos (Wägele, 1995). Essas técnicas também revelam informações frequentemente ignoradas por métodos tradicionais, como diversidades intraespecíficas e espécies crípticas. Baseados no uso desses caracteres moleculares, o DNA barcoding foi proposto em 2003 como uma alternativa para superar limitações da classificação tradicional baseada em morfologia, permitindo uma identificação rápida e precisa de espécies, especialmente para não taxonomistas (Hebert et al., 2003). O DNA barcoding se mostra uma ferramenta valiosa para a identificação de espécies em alta escala sem a necessidade de conhecimento prévio sobre morfologia, tendo transformado de forma significativa os inventários de biodiversidade nas últimas décadas (Miller et al., 2016). A identificação das espécies é realizada através da análise da diversidade de sequências de nucleotídeos de uma região única do DNA (o marcador genético) que apresente variação rápida o suficiente para diferenciar linhagens com isolamento reprodutivo recente (Felsenstein 2004). Uma região de 648 pares de bases do gene mitocondrial Citocromo Oxidase Subunidade I (COI) foi definida como o marcador genético padrão para DNA barcoding de metazoários (Hebert et al 2003) e tem sido descrita para uma ampla diversidade de organismos. O DNA pode ser efetivamente extraído a partir de tecidos obtidos diretamente dos espécimes de interesse (p.ex. Mota el al. 2018; Pereira et al. 2011; Ilunga et al. 2020), mas essa não é a única forma 9 de amostragem de material genético. Como organismos vivos continuamente liberam moléculas de DNA no ambiente, estas podem ser utilizadas como indícios de sua presença em um local específico. Mais recentemente, a abordagem de metabarcoding surgiu como uma evolução do DNA barcoding, permitindo a identificação de espécies a partir do chamado DNA ambiental (eDNA), que consiste em uma mistura complexa de material genético liberado pelos organismos vivos no ambiente (Haile et al., 2009; Taberlet et al., 2012; Andújar et al., 2018). Devido à natureza mais fragmentada do eDNA, o metabarcoding geralmente amplifica e sequencia fragmentos de DNA ainda mais curtos utilizando técnicas de sequenciamento de alto rendimento (HTS) (Taberlet et al., 2012). Ao contrário do DNA barcoding, não há um marcador genético padrão para o metabarcoding de vertebrados. Embora a região do COI utilizada no DNA barcoding pudesse ser adequada, ela é considerada muito longa para o fragmentado eDNA, e sua alta taxa evolutiva resulta na falta de sítios conservados para primers (Kent, 2009). Na ausência de um marcador genético padrão, diferentes marcadores e pares de primers podem ser escolhidos para estudos de metabarcoding. Por exemplo, pequenos fragmentos de genes ribossômicos, como RNA12S e RNA16S, são frequentemente utilizados devido aos seus sítios de primer altamente conservados, que permitem a amplificação de uma ampla variedade de táxons (Green et al., 2015). Para aumentar o poder de resolução, já que sequências muito curtas podem não apresentar nucleotídeos espécie-específicos (Freeland, 2017), múltiplos marcadores têm sido empregados em conjunto, melhorando a robustez da identificação taxonômica em estudos de metabarcoding (Axtner et al., 2019; Lynggaard et al., 2019; Hajibabaei et al., 2019). Atualmente, a abordagem de metabarcoding de amostras ambientais tem sido amplamente utilizada em levantamentos e monitoramentos ao redor do mundo, abrangendo ambientes marinhos, de água doce e terrestres, e permitindo a detecção de espécies difíceis de monitorar por métodos convencionais (Belle et al., 2019; Deiner et al., 2017). Apesar de sua ampla adoção, a técnica enfrenta desafios, como a necessidade de bancos de referência mais abrangentes, uma vez que muitas espécies ainda não estão representadas (Tzafesta et al., 2022). Esse desafio é particularmente significativo na região neotropical, onde a imensa complexidade e riqueza de espécies, somadas à carência de infraestrutura logística e à dificuldade de locomoção, prejudicam os trabalhos de campo e comprometem a ampliação de bancos de referência (Jackman et al., 2021; Taberlet et al., 2012). Apesar do tamanho corporal relativamente grande em comparação com outros grupos, o estudo de mamíferos na região Neotropical apresenta desafios devido a baixas densidades populacionais e à habilidade desses animais de se esconderem na densa e complexa vegetação (Schipper et al., 2008; Schnell et al., 2012). Embora recebam atenção significativa em estudos de conservação, 13% das espécies de mamíferos ainda são classificadas como Deficientes em Dados (DD) pela IUCN, sendo que 35% delas ocorrem nos Neotrópicos (IUCN, 2024). Tradicionalmente, levantamentos de mamíferos envolvem a coleta e/ou observação de espécimes, demandando trabalhos de campo exaustivos. O uso de câmeras-trap tem reduzido o esforço amostral e contribuído para esses estudos, especialmente em habitats densos como florestas (Glover-Kapfer et al., 2019). No entanto, essas câmeras não permitem distinguir espécies próximas e são mais eficazes para mamíferos terrestres de médio e grande porte, sendo pouco úteis para espécies arborícolas, de pequeno porte (menos de 100 g) e morcegos (Bernard et al., 2013). O metabarcoding de amostras ambientais apresenta eficiência igual ou superior às câmeras-trap, permitindo a detecção de mamíferos terrestres, semiaquáticos, arborícolas, de pequeno, médio ou grande porte, incluindo espécies ameaçadas (Ushio et al., 2017; Allen et al., 2022; Keck et al., 2023). Além de se dispersar no ambiente por meio de fezes, urina, células epidérmicas e outras fontes, o DNA pode transitar entre diferentes níveis tróficos de uma teia alimentar, passando, por exemplo, da presa para o predador ou do hospedeiro para o parasita. Dessa forma, moléculas de DNA de mamíferos e outros vertebrados podem ser encontradas no conteúdo intestinal de invertebrados hematófagos e saprófagos, permitindo a identificação de espécies por meio da abordagem de 10 metabarcoding de iDNA (DNA derivado de invertebrados) (Calvignac-Spencer et al., 2013; Martínez- de la Puente et al., 2015). Embora a maioria dos estudos de metabarcoding de iDNA tenha sido conduzida em regiões temperadas, reflexo do financiamento limitado de pesquisas em países tropicais (Carvalho et al., 2022), um número crescente de estudos utilizando diversos grupos de invertebrados como fontes de DNA de vertebrados já foi aplicado com sucesso na região Neotropical (por exemplo, Kocher et al., 2017; Rodgers et al., 2017; Lynggaard et al., 2019; Massey et al., 2022; Saranholi et al., 2024). No entanto, os primers frequentemente utilizados nesses estudos foram originalmente desenvolvidos para organismos de outras regiões, o que pode levar à inibição da amplificação devido à divergência entre as sequências dos primers e das espécies-alvo, resultando na não-detecção de espécies (falsos- negativos) (Primmer et al., 1996). Assim, é necessário adaptar os marcadores genéticos à vasta diversidade Neotropical para aumentar a eficiência e a taxa de detecção de espécies (Ficetola et al., 2021; Teixeira et al., 2023). Diferentes fontes de iDNA podem introduzir diferentes vieses taxonômicos devido a variações na ecologia alimentar e no ciclo de vida dos organismos (Massey et al., 2022). Moscas varejeiras, das famílias Calliphoridae e Sarcophagidae, têm sido frequentemente utilizadas como fontes de iDNA, permitindo a detecção de ampla diversidade de vertebrados, incluindo mamíferos terrestres, voadores e arborícolas (Calvignac-Spencer et al., 2013; Rodgers et al., 2017; Gogarten et al., 2020; Lee et al., 2023). Moscas hematófagas da família Tabanidae (mutucas) também possuem potencial como fontes de iDNA, pois são se alimentam oportunisticamente em uma variedade de hospedeiros, incluindo mamíferos e aves (Kniepert 1980; Vaduva, 2015). Um trabalho recente foi capaz de identificar com sucesso espécies de mamíferos e aves através de mosquitos e moscas de várias famílias (Saranholi et al. 2023), porém as mutucas ainda não foram utilizadas como “amostradoras” de vertebrados. Dípteros hematófagos e saprófagos possuem sistemas sensoriais complexos e altamente especializados para localizar potenciais hospedeiros que servem como fontes de alimento essenciais à sua sobrevivência e reprodução. Com alta capacidade de dispersão no ambiente (Brown, 2020), proporcionada por seu tamanho e habilidade de voo, esses insetos conseguem alcançar mamíferos furtivos em ambientes tropicais. Diferentemente dos mamíferos, a coleta de dípteros pode ser realizada de forma simples e consistente. Armadilhas de interceptação de voo, como as armadilhas Malaise, requerem baixo esforço amostral e conseguem capturar grande número e diversidade de dípteros (Blahó et al., 2013; Lynggaard, 2019; Skvarla et al., 2021). O objetivo desta tese foi testar a eficácia do metabarcoding de iDNA para o levantamento de espécies de mamíferos na região neotropical, com foco nos biomas brasileiros, de forma que as informações obtidas por meio dessa abordagem possam ser utilizadas em programas de biomonitoramento e estudos biogeográficos, contribuindo para a conservação. No primeiro capítulo, avaliamos os marcadores genéticos e os pares de primers mais utilizados na literatura, selecionando os mais adequados para a identificação taxonômica de mamíferos brasileiros. Também personalizamos um banco de referência, reunindo sequências de mamíferos brasileiros disponíveis em bancos de dados públicos online. No segundo capítulo, validamos o desempenho desses primers e do banco de referência em amostras reais de iDNA, expandindo a aplicação do metabarcoding em três biomas brasileiros (Amazônia, Mata Atlântica e Cerrado) e utilizando moscas de três famílias distintas (Calliphoridae, Sarcophagidae e Tabanidae). 11 2. Capítulo I. Seleção de marcadores genéticos e montagem de banco de dados de referência parametabarcoding de mamíferos neotropicais (Artigo submetido à Conservation Genetics Resources) RESUMO Os biomas neotropicais são caracterizados por uma alta biodiversidade e por ameaças significativas. Para atender à necessidade urgente de avaliação da biodiversidade nesses ecossistemas ameaçados e amplamente subexplorados, como os do Brasil, técnicas modernas como ometabarcoding têm sido empregadas para gerar inventários taxonômicos detalhados. No entanto, a seleção de marcadores genéticos apropriados continua sendo desafiadora, pois diferentes marcadores e pares de primers podem introduzir vieses ao amplificar preferencialmente certos grupos taxonômicos, dependendo de suas sequências nucleotídicas e do banco de referência utilizado. Este estudo avança na pesquisa de biodiversidade ao desenvolver um banco de referência customizado e otimizar primers para o metabarcoding de mamíferos brasileiros. Nosso banco de referência, cobrindo 72% das espécies de mamíferos nativas, incluindo espécies ameaçadas, melhora a precisão da identificação taxonômica em todos os biomas brasileiros. Testes in silico e in vitro foram realizados para avaliar e otimizar os marcadores genéticos e pares de primers mais adequados, comparando sua cobertura taxonômica e resolução. Modificações personalizadas feitas nos primers de COI reduziram incompatibilidades com as sequências de mamíferos brasileiros, aumentando significativamente a cobertura taxonômica e superando os primers comumente utilizados na literatura. Demonstramos que a combinação de COI com RNA12S e RNA16S fornece informações importantes e complementares, melhorando a robustez das atribuições taxonômicas e enfrentando os desafios impostos pela diversidade de mamíferos neotropicais. Essa abordagem multimarcadores oferece uma estratégia confiável para aprimorar a identificação por metabarcoding na região neotropical, apoiando esforços de conservação mais eficazes. 12 Title Metabarcoding Markers and a Reference Database for Neotropical Mammals Author information Barbara R N Chaves (barbarachaves@ufmg.br, https://orcid.org/0009-0008-0977-8974) Jose Eustaquio Santos-Junior (jrsantos140782@yahoo.com.br, https://orcid.org/0000-0002-7150-3751) Fabricio R Santos (fsantos@icb.ufmg.br, https://orcid.org/0000-0001-9088-1750) Universidade Federal de Minas Gerais Belo Horizonte, MG, Brazil Abstract Neotropical biomes are characterized by both high biodiversity and significant threats. To address the urgent need for biodiversity assessment in these threatened and largely understudied ecosystems, like the ones in Brazil, modern techniques like metabarcoding have been employed to generate detailed taxonomic inventories. However, selecting appropriate genetic markers remains challenging, as different markers and primer pairs may introduce biases by preferentially amplifying certain taxonomic groups, depending on their nucleotide sequences and the reference database used. This study advances biodiversity research by developing a customized reference database and optimizing primers for metabarcoding Brazilian mammals. Our database, covering 72% of native species, including endangered ones, enhances taxonomic identification accuracy across all Brazilian biomes. Both in-silico and in-vitro tests were conducted to evaluate and optimize the most suitable genetic markers and primer pairs, comparing their taxonomic coverage and resolution. Customized modifications made to COI primers reduced mismatches against Brazilian mammal sequences, significantly enhancing taxonomic coverage and outperforming commonly used primers in literature. We demonstrate that combining COI with RNA12S and RNA16S provides important and complementary information, improving the robustness of taxonomic assignments and addressing challenges posed by the Neotropical mammalian diversity. This multi-marker approach offers a reliable strategy for enhancing metabarcoding identification in the Neotropical region, supporting more effective conservation efforts. Keywords Metabarcodes, Neotropics, Brazilian Mammals, COI, Reference Database, Molecular Taxonomy Acknowledgments Thanks to the financial support of Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Instituto Chico Mendes de Conservação da Biodiversidade (ICMBio), and Universidade Federal de Minas Gerais (UFMG). 13 Text (total word count: 5305+208) INTRODUCTION (word count: 1038) Neotropical biomes are among the most degraded and rapidly disappearing natural environments worldwide, and many of them are considered biodiversity hotspots – some of the richest and most threatened biomes (Myers et al. 2000; Jenkins et al. 2013). Many neotropical species are becoming locally extinct or disappearing before being described by science, which underscores the urgent need for detailed taxonomic inventories and for collecting essential species information (Thomsen et al. 2012). Modern techniques have facilitated and accelerated species identification and research. Over two decades ago, DNA barcoding was proposed, revolutionizing taxonomy research by sequencing short DNA sequences of molecular markers from individual specimens for species identification (Hebert et al. 2003). More recently, metabarcoding has evolved as a tool for biodiversity monitoring (Taberlet et al. 2012), sequencing even shorter DNA fragments from complex mixtures of multiple specimens, usually using high-throughput sequencing (NGS) techniques. Currently, metabarcoding is widely employed for biodiversity surveys from environmental samples (eDNA) such as soil and water, or from faeces and stomach contents (e.g., iDNA) (Valentini et al. 2009; Alberdi et al. 2018). The taxonomic classification of genetic sequences and the resulting species list from metabarcodes provide a foundation for subsequent ecological analyses and biodiversity monitoring (Keck et al. 2023). Various metabarcoding protocols are available, generally comprising: i) DNA extraction, ii) amplification of one or more genetic markers using specific primers, iii) DNA sequencing using NGS, and iv) bioinformatic analysis for taxonomic assignment. NGS techniques offer the advantage of identifying virtually all species present in a sample, even at low abundances (Kircher & Kelso 2010). The standard molecular marker for animal DNA barcoding is a 648 base-pair (bp) region of the mitochondrial gene Cytochrome Oxidase Subunit I (COI) (Hebert et al. 2003), however there is no standard molecular marker for metabarcoding. Due to its high taxonomic resolution and extensive database (Hajibabaei 2012; Clarke et al. 2017; Andújar et al. 2018), 648bp-COI could be considered suitable for metabarcoding. However, it is deemed too large for the degraded and fragmented DNA typically found in eDNA samples (Kent & Norris 2005; Alcaide et al. 2009; Kent 2009). Additionally, its high evolutionary rate results in a lack of conserved sites for primer design, which can lead to amplification biases, whereby primers target only part of the eDNA sequences (Kent 2009; Deagle et al. 2014; Piper et al. 2019), although degenerate primers can be used to expand taxonomic coverage (Kwok et al. 1990; Wei et al. 2003). Small fragments of other mitochondrial genes, such as ribosomal genes (12S and 16S), are often used for the amplification of a wide range of taxa, including undescribed species (Green et al. 2015). These genes have already provided accurate species-level identifications in some studies worldwide (e.g. Ficetola et al. 2010; Riaz et al. 2011; Piper et al. 2019). However, ribosomal genes generally exhibit significant overlap between intra- and interspecific genetic distances and have a smaller reference database compared to COI, which reduces their resolution (Alberdi et al. 2017). Furthermore, single genetic markers can produce biased results, as shorter sequences may lack species-specific mutations (Freeland 2017), and different primer pairs may favour certain species groups over others (Alberdi et al. 2017; Piper et al. 2019). Nevertheless, the use of multiple genetic markers can provide complementary information and enhance taxonomic coverage (Corse et al. 2019; Hajibabaei et al. 2019). Consequently, multiple markers were used to improve the robustness of taxonomic identification in metabarcoding studies (Pompanon et al. 2012; Alberdi et al. 2017; Axtner et al. 2019; Lynggaard et al. 2019). A search through previous studies reveals the use of several different molecular markers and primer pairs across metabarcoding studies. Choosing appropriate markers and primer pairs is crucial and should consider: i) taxonomic coverage, i.e., which taxa need to be amplified, ii) taxonomic resolution, i.e., the level of taxonomic discrimination, and iii) the availability of reference databases for taxonomic identification (Townzen et al. 2008; Pompanon et al. 2012; Reeves et al. 2018). The taxonomic coverage of a molecular marker can be limited when primers used were designed for distantly related species, due to increased sequence divergence, including mutations at priming sites. Mismatches can accumulate, inhibiting amplification and leading to the non-detection of occurring species, resulting in false negatives (Primmer et al. 1996). Consequently, the analysis of amplified DNA can provide a distorted community composition, with some species potentially missed due to systematic biases in amplification efficiency (Green et al. 2015). This issue is particularly relevant in the Neotropical region, known for its vast genetic and species diversity (Raven et al. 2020), and can be exacerbated when primers are designed for organisms from other regions, usually from biomes in the Global North. The quality of the reference database is also crucial for reliable taxonomic assignments, as it links genetic sequences to established taxonomic classifications. Incomplete databases can lead to false negatives and, when combined with sequence and taxonomic errors, result in misidentification – one of the most critical issues to avoid (Keck et al. 2023). To minimize misidentification, geographic filtering of sequences can improve the 14 database by excluding species known to occur outside the study area. Additionally, harmonizing taxonomy and resolving synonyms are important to prevent taxonomic conflicts that may arise when using large public databases (Grenié et al. 2023). This study aimed to establish technical guidelines for the effective use of metabarcodes for identifying Brazilian mammal species, including recommendations for specific primers and reference databases. To achieve this, we constructed a reliable reference database by collecting sequence records of Brazilian mammalian species from public databases. Additionally, the most frequently used genetic markers and primer pairs in the literature were evaluated to select the most suitable ones for the taxonomic identification of Brazilian mammalian species. Both in-silico and in-vitro tests were conducted to compare the performance of each genetic marker and primer pair with respect to taxonomic coverage and resolution, ensuring their effectiveness for identifying Brazilian mammal species. Given the high level of ongoing degradation due to increasing anthropogenic impact and an urgent need for conservation in tropical ecosystems, particularly in Brazil’s rich and threatened biomes, it is crucial to optimize biodiversity assessment methods. By refining metabarcoding practices, we hope to contribute to more accurate taxonomic inventories and better-informed conservation strategies for these critical habitats of the Global South. METHODS (word count: 1261) Reference Databases Construction A customized reference database for reliable taxonomic assignments of Brazilian mammals was assembled based on the list of native species from the Brazilian Society of Mammalogy (SBMz) (Abreu et al. 2023), supplemented with species listed in Mammal Species of the World (Wilson & Reeder 2005) and records of exotic species in Brazil (Da Rosa et al. 2017). This list contains essential species information, including IUCN threat status (2024), ecological niche, and biome of occurrence (Abreu et al. 2023; Paglia et al. 2012). To address taxonomic inconsistencies among databases (Keck et al. 2023), synonyms were included to cross- reference taxonomic systems (SBMz, IUCN, NCBI, BOLD, MSW). Sequence records from COI, RNA12S, RNA16S, and complete mitogenomes for each species and their synonyms were downloaded from GenBank (Clark et al. 2016), using the traits package in R (LeBauer et al. 2024), and from BOLD (Ratnasingham & Hebert 2007), using the bold package in R (Mudalige 2021). Taxonomy for GenBank sequences was assigned using the taxonomizr package (Sherrill-Mix 2019) and for BOLD sequences using the bold package. To capture intraspecific diversity, up to three sequences per marker were retained for each species. To minimize missing taxa, additional sequences from closely related species from other regions were included, selecting one sequence per species within each native genus. To enable broader taxonomic identification, one complete mitogenome per genus was added for selected animal classes (Aves, Amphibia, Lepidosauria, Actinopteri, and Insecta). All mitogenomes were divided into three parts (0–2,000 bp, 1,550–3,600 bp, and 5,000–9,500 bp) and included as RNA12S, RNA16S, and COI sequences, respectively. Genetic Markers and Primers Selection To identify suitable genetic markers for Brazilian mammals, the most commonly reported primer pairs in the literature were compiled. Chemical properties of the primers, including length, GC content, melting temperature, 3' end specificity, and dimerization (self or cross annealing), were analysed using PCR Primer Stats (Stothard 2000) and Multiple Primer Analyzer (ThermoFisher 2024) to evaluate quality and PCR efficiency (Banaganapalli et al. 2019). The lengths of candidate genetic markers (amplicon sizes) were verified on references or predicted by Primer-BLAST (Ye et al. 2012). As metabarcoding is often applied to degraded samples with short DNA fragments (e.g., soil, water, and faeces/stomach contents) (Freeland 2017), shorter genetic markers (max 300 bp) are preferred. An ideal marker should cover as many mammal species as possible, prioritizing those in Brazil, and preferably avoiding amplification of invertebrate DNA to ensure effective application on iDNA samples. Specificity at the species level is crucial, especially for ecological and conservation studies, and a comprehensive public database is needed to reduce false negatives or misidentifications. To address these requirements, in-silico and in-vitro tests were conducted to assess the performance of different markers and primers. In-silico tests were conducted on the compiled primer pairs using the primerTree package in R (Cannon et al. 2016), retrieving sequences from GenBank that each primer could amplify. Retrieved sequences were categorized as mammalian, Brazilian mammalian, or dipteran taxa. Primer pairs were evaluated based on their ability to: i) amplify the most mammal species, ii) cover the highest number of mammal families in Brazil, and iii) target the most mammal families overall. Primers amplifying more dipteran than mammal sequences were excluded. Given the more extensive COI database, further optimization of the best COI primer pair was 15 undertaken to improve mammalian identification accuracy. COI sequences from GenBank, including 130 native Brazilian mammal species across 36 families and 11 orders, were aligned to guide modifications and adjust primers to reduce mismatches against the target sequences. Modified versions of forward (F) and reverse (R) primers were created and retested in-silico using primerTree to compare their performance with the original primers. The most effective primers identified in-silico were synthesized and tested in-vitro to verify amplification efficiency and taxonomic resolution at the species level. Samples morphologically identified and preserved in the DNA/tissue collection of Centro de Coleções Taxonômicas (CCT-UFMG) were used as controls, representing Brazilian mammal diversity (Table 1). These samples were incidentally collected over decades and provisionally identified, some accompanied by voucher specimens in CCT-UFMG's mastozoological collection, all benefiting from DNA-based identification for confirmation or correction of taxonomic classifications. All molecular assays performed for in-vitro tests were conducted at the Laboratory of Biodiversity and Molecular Evolution of UFMG (LBEM-UFMG). Table 1. Mammal samples used in-vitro to test genetic markers and primers, including provisional taxonomic identifications based on morphology, numbers of sequences available for COI, RNA12S, and RNA16S in the customized reference database, and molecular taxonomy diagnostics using Sanger and Illumina sequencing, according to Fig 1. Provisional Taxonomy (Morphology) Reference databasea In-vitro tests Order Family Species COI RNA 12S RNA 16S Sangerb Illuminac COI COId RNA 12Se RNA 16Sf Artiodactyla Delphinidae Sotalia fluviatilis 6 5 5 no seq Sotalia guianensis 5 4 2 no seq no seq no seq no seq Iniidae Inia geoffrensis 9 8 8 no seq NA NA NA Carnivora Canidae Cerdocyon thous 3 0 0 Canis lupus no seq no seq no seq Felidae Puma yagouaroundi 5 5 5 no seq no seq pident<95 Mustelidae Lontra longicaudis 6 6 5 no seq Procyonidae Nasua nasua 5 5 5 no seq no seq Chiroptera Emballonuridae Rhynchonycteris naso 3 2 0 no seq Molossidae Molossops temminckii 3 0 0 Noctilio leporinus no seq Mormoopidae Pteronotus sp. 3 0 0 P. gymnonotus no seq no seq no seq Noctilionidae Noctilio albiventris 3 3 0 N. leporinus Anura Phyllostomidae Artibeus lituratus 6 6 6 pident<95 no seq Desmodus rotundus 6 6 3 no seq no seq pident<95 Glossophaga soricina 6 6 3 NA NA NA Phyllostomus discolor 6 6 3 NA NA NA Phyllostomus hastatus 3 3 0 no seq NA NA NA Vespertilionidae Eptesicus brasiliensis 3 0 0 no seq no seq Cingulata Dasypodidae Dasypus novemcinctus 6 6 6 D. septemcinctus no seq no seq no seq Didelphimorphia Didelphidae Didelphis marsupialis 5 5 4 D. albiventris Lagomorpha Leporidae Sylvilagus brasiliensis 3 3 0 no seq NA NA NA Pilosa Bradypodidae Bradypus torquatus 6 5 6 pident<95 pident<95 Cyclopedidae Cyclopes didactylus 6 6 6 pident<95 no seq no seq no seq Myrmecophagidae Myrmecophaga tridactyla 4 6 6 pident<95 no seq Tamandua tetradactyla 6 6 6 no seq no seq no seq Primates Atelidae Alouatta fusca 4 3 3 Sapajus nigritus Cebidae Brachyteles arachnoides 3 5 4 no seq NA NA NA Callithrix penicillata 4 3 3 no seq pident<95 no seq Mico rondoni 0 0 0 M. argentatus no seq no seq Pitheciidae Callicebus dubius 3 3 3 no seq no seq 16 Provisional Taxonomy (Morphology) Reference databasea In-vitro tests Plecturocebus caligatus P. moloch Rodentia Cricetidae Necromys lasiurus 3 0 0 Oligoryzomys fornesi no seq Akodon montensis Akodon montensis Cuniculidae Cuniculus paca 5 5 2 no seq Sirenia Trichechidae Trichechus inunguis 1 1 1 T. manatus Artibeus lituratus Trichechus manatus 6 6 6 no seq NA NA NA a: Number of sequences available for each genetic marker, supplemented with mitogenome sequences; b: Sanger sequencing using different combinations of F and R modified primers from Lee et al. (2015) followed by blastn (Altschul et al. 1990) to identify mammalian samples; c: two-step PCR protocol for Illumina sequencing (Ushio et al. 2017, Chen et al. 2021) followed by PIMBA pipeline (Oliveira et al. 2021) to identify mammalian samples; d: different combinations of F and R modified primers from Lee et al. (2015); e: original primers from Ushio et al. (2017); f: primers modified from Haile et al. (2009); NA: samples not used for Illumina sequencing. The efficiency of the modified COI primers was evaluated through in-vitro amplification and Sanger sequencing of 33 mammal species across 24 families and 10 orders, following standard protocols (Chaves et al. 2015). Consensus sequences were assembled in UGENE (Okonechnikov et al. 2012) and matched against our customized database using blastn (Altschul et al. 1990) to verify correct COI gene region amplification and accurate species identification. For unidentified samples, an additional blastn run was conducted against the full GenBank nucleotide collection, retaining only the best match per sequence. Taxonomic resolution of the genetic markers was evaluated using a two-step PCR protocol for Illumina library preparation, adapted from Ushio et al. (2017) and Chen et al. (2021). In the first PCR, MiSeq sequencing primers were added as 5' tails to each marker primer for in-vitro amplification of 26 mammal species across 22 families and 9 orders. In the second PCR, 7-base tag identifiers (Hamady et al. 2008) and Illumina adapters (P5 or P7) were added to first-step amplicons. Primer modifications, PCR conditions and recipes are detailed in Online Resource 1. After each PCR, amplified products were visualized in 2% agarose gel electrophoresis and purified with magnetic beads (AMPure XP). Second-step products were normalized, pooled, and sequenced on the Illumina MiSeq platform at LG-UFMG using MiSeq Reagent Kit v2 (Illumina) with 25% PhiX. Illumina reads were processed using the PIMBA pipeline (Oliveira et al. 2021) for demultiplexing, quality filtering, OTU clustering, error correction, and taxonomic assignment. OTUs under 100 bp, over 260 bp, or deviating by more than 19 bp from the expected marker size were excluded, as well as OTUs with fewer than 10 reads or representing less than 1% of total sample reads. Our customized database was used for taxonomic assignment, verifying the expected marker and species. OTUs without identification, unexpected matches, or with less than 95% identity were reanalysed via blastn against the full GenBank database, retaining only the best match. Each taxonomic assignment was reviewed with a diagnostic tree of yes/no questions, comparing genetic and morphological identifications of control samples (Fig 1). Possible outcomes included: species OK, wrong species, genus OK, unidentified, or contamination. Primer pairs and genetic markers were evaluated for their amplification efficiency, taxonomic resolution, and database coverage based on: i) the number of amplified control samples, ii) accurate species-level identifications, and iii) database completeness. Fig 1 Diagnostic tree with seven steps of questions of “YES” and “NO”, labelled from a to g, used to assess DNA-based species identification compared to morphological identification. See Online Resource 1 for a detailed description of steps. 17 RESULTS AND DISCUSSION (word count: 2819) Reference Databases Construction A total of 830 mammal species, including 270 genera, 55 families, and 11 orders potentially present in Brazil, were compiled from the SBMz list (Abreu et al. 2023) and other sources (Wilson & Reeder 2005; Da Rosa et al. 2017). This list includes 145 species classified as threatened or near threatened (IUCN 2024), 21 exotic species, and 312 synonyms, covering endemic and non-endemic species across all Brazilian biomes (Amazon, Cerrado, Atlantic Forest, Caatinga, Pampa, Pantanal, and Marine) and a range of ecological niches (aquatic, terrestrial, arboreal, scansorial, fossorial, and flying mammals). The complete species list is available in Online Resource 2. Fig 2 Composition of the customized database for Brazilian mammals, assembled with native species (Abreu et al. 2023; Wilson & Reeder 2005), exotic species in Brazil (Da Rosa et al. 2017), and related foreign species. The figure displays: a) sequence counts for COI, RNA12S, RNA16S, and mitogenomes from GenBank and BOLD; b) counts of native, foreign, and exotic mammal species with (solid colours) and without (transparent colours) sequences for each marker; c) species count with/without sequences per order; d) species count with/without sequences by IUCN threat categories (NT: Near Threatened, VU: Vulnerable, EN: Endangered, CR: Critically Endangered, DD: Data Deficient). Starting with this list, our customized database included 7,885 sequences: 6,949 from GenBank and 936 from BOLD. Of these, 3,109 sequences correspond to 841 native Brazilian species, with 744 from mitogenomes, 1,228 from COI, 753 from RNA12S, and 384 from RNA16S. After splitting mitogenome sequences into three segments as RNA12S, RNA16S, and COI, the database contained 1,972 COI sequences for 562 species (67% of Brazilian species), 1,497 RNA12S sequences for 458 species (54%), and 1,128 RNA16S sequences for 370 species (44%) (Fig 2a). No sequences were available for 237 Brazilian species (28%), while 342 species (41%) had sequences for all three markers (Fig 2b), many with multiple sequences per species, representing intraspecific diversity. Additionally, the database includes 222 sequences from 21 exotic species (including Homo sapiens), all with sequences for all three markers. COI sequences were more widely available across mammalian orders than RNA12S and RNA16S, making COI a desirable marker for accurate metabarcoding (Fig 2c). All few and conspicuous species in Cingulata, Perissodactyla, and Sirenia had sequences for all markers, while Rodentia, Didelphimorphia, and Chiroptera were less complete, reflecting their high diversity and sampling challenges. Most threatened species belong to Primates, Rodentia, and Artiodactyla, with 57% having sequences for all markers (Fig 2d), indicating a historical focus on sequencing endangered species. In contrast, species data deficient or with unknown threat status, mainly from Rodentia, Chiroptera, and Didelphimorphia, had only 22% of species represented in all markers, 18 emphasizing challenges in small mammal studies. Across all threat levels, COI was the most represented marker, supporting its use in identifying threatened species. To broaden taxonomic scope, 983 sequences from foreign species within the same genus as native species were added (Fig 2a), reducing the number of native genera without sequences from 101 (40%) to 31 (12%). While these sequences cannot fully prevent misidentification, they may offer insights into the taxonomy of native species that are not represented in the database and would otherwise result in false negatives or no-identifications. Additionally, 3,571 sequences from other animal classes (one per genus) were included: 797 from Aves, 261 from Amphibia, 237 from Lepidosauria, 2,029 from Actinopteri, and 247 from Insecta. While non-mammalian sequences do not guarantee precise identification, they help detect contaminants or sequences from other animal classes. The full database is available in Online Resource 3. Genetic Markers and Primers Selection A total of 24 primers commonly used for mammalian identification were compiled, including three pairs for COI, two for Cyt-b, five for RNA12S, and three for RNA16S (Table 2). After chemical evaluation, only four primers (COI_short f, 12S-V5-R1, 12S-V5-F2, and Riaz16S1 R) had ideal properties, while the others exhibited features that potentially reduce PCR efficiency: excessive or insufficient primer length (three primers), high or low GC content (15 primers), unsuitable melting temperatures (three primers), ineffective GC clamps (three primers), and self-dimerization (two primers) (details in Online Resource 4). This suggests that many of the primers currently used for mammalian identification are suboptimal, lacking critical characteristics for optimal performance in PCR reactions (Banaganapalli et al. 2019). Additionally, three primer pairs (TOWNZEN COI, KOCHER, and TOWNZEN Cytb) generated amplicons longer than 300 bp, unsuitable for metabarcoding (Banaganapalli et al. 2019). Table 2. Compilation of primers from literature used for amplification of different genetic markers of mammalian species, with respective chemical evaluations (Banaganapalli et al. 2019), predicted amplicon size (Ye et al. 2012), and in-silico amplifications of mammalian and dipteran species (Cannon et al. 2016). Mark er Primer pair Primer reference F Primer R Primer Amplicon size (bp) In-silico PCR Mam spp Mam Fam (Overall) Mam Fam (Brazil)a Diptera spp C O I LEE4th Lee et al. 2015 Uni-Mini-bar F RonPing R 205 224 37 11 (22%) 0 MEUSNIERb Meusnier et al. 2008 Uni-Mini-bar F Uni-Mini-bar R 130 43 13 7 (14%) 158 TOWNZEN COI Townzen et al. 2008 COI_short f COI_short r 324 27 12 7 (14%) 0 CHAVESc This study coiMam (F1+F2+F3+F4) coiMam (R1+R2+R3) 205 657 118 24 (49%) 4 C yt b KOCHER Kocher et al. 1989 Kocher CytB-fw Kocher CytB-rv 306 4 1 1 (2%) 0 TOWNZEN Cytb Townzen et al. 2008 Townzen Cytb F Townzen Cytb R 450 21 7 3 (6%) 0 R N A 12 S JI Ji et al. 2020 12S-V5-F1 Ji2020-R 82 - 150 110 19 8 (16%) 0 RIAZ 12S 1 Riaz et al. 2011 12S-V5-F1 12S-V5-R1 105 103 26 13 (27%) 0 RIAZ 12S 2 Riaz et al. 2011 12S-V5-F2 12S-V5-R2 98 95 24 12 (25%) 0 TABERLET3th Taberlet et al. 2018 Mamm01 F Mamm01 R 60 234 38 13 (27%) 0 USHIO5th Ushio et al. 2017 MiMammal-U F MiMammal-U R 171 132 31 14 (29%) 0 R N A 16 S HAILE2nd Haile et al. 2009 16Smam3 16Smam4 78 406 38 14 (29%) 0 HAILE 51st Modified from Haile et al. 2009 16Smam3c 16Smam4 78 522 48 17 (35%) 0 RIAZ 16S Riaz et al. 2011 Riaz16S1 R Riaz16S1 F 58 88 24 8 (16%) 0 TAYLOR Taylor 1996 16Smam1 16Smam2 300 82 25 10 (20%) 0 1st to 5th: best primer pairs selected due to in-silico taxonomic coverage and specificity. a: Number of families of Brazilian mammals according to Brazilian Society of Mammalogy (Abreu et al. 2022). b: Primer pair discarded for amplifying Diptera species. c: Modified primers from Lee et al. (2015), based on an alignment of mammalian species native to Brazil (see Table 3). Despite potential limitations, in-silico tests were performed on all 24 primers, which retrieved numerous sequences, which were taxonomically identified, quantified, and categorized according to their respective taxonomic groups. The number of species and families obtained from each primer pair were summarized in 19 Table 2. The Cyt-b primers (KOCHER and TOWNZEN-Cytb) were the least effective, amplifying fewer mammalian species. The HAILE primer pair, originally developed to amplify a 78 bp of RNA16S and with a small modification in the forward primer (changing the IUPAC Y nucleotide at the 3’ end to C), showed the best performance, retrieving sequences for 522 mammalian species from 48 families, 17 of which occur in Brazil (35% of Brazilian mammalian families). Approximately 100 previous studies have used the 12S-V5 primers (Riaz et al. 2011), but these did not perform as effectively in the context of this study. Instead, two other RNA12S primer pairs—TABERLET and USHIO—retrieved more species and families, aligning better with metabarcoding needs (Taberlet et al. 2018; Ushio et al. 2017). The MEUSNIER primer pair was discarded due to its ability to amplify more dipteran than mammalian species. Originally developed for a short region of the COI gene, these primers can amplify a wide range of taxa, including mammals, fish, birds, insects, plants, fungi, and macroalgae (Meusnier et al. 2008). Although their universality makes them valuable for identifying diverse environmental samples, we aimed to avoid amplifying insect species to ensure the primers' suitability for iDNA samples. Among the COI primers, the best-performing pair was LEE, which was originally adapted from Meusnier’s universal primers for more specific amplification of mammals (Lee et al. 2015). In our in-silico tests, this primer pair retrieved 224 mammalian species from 37 families, 11 of which occurring in Brazil (22% of all Brazilian mammalian families). To further optimize the performance of LEE primer pair, 18 different versions of primer F (Uni-Mini-bar) and 16 versions of primer R (RonPing) were made based on an alignment of mammalian species native to Brazil and manual modifications in their nucleotide sequences to reduce mismatches between primers and target sequences (see more in Online Resource 4). All modified primers were tested in-silico and the results of the seven best (four F primers and three R primers) showed significant improvement compared to the original primers (Table 2 and Table 3), achieving the best performance among all primers analysed. Table 3. Original and the best modified versions of COI primers from Lee et al. (2015), with respective sequences, chemical evaluation (Banaganapalli et al. 2019), in-silico amplifications of mammalian and dipteran species (Cannon et al. 2016), and in-vitro amplifications and sequencing of mammalian species. Primer type Primer name Primer reference 5’-3’ sequence In-silico testsa In-vitro tests Mam spp Mam Fam (Overal) Mam Fam (Brazil)b Diptera spp Amplified spp Sequenced sppc Forward Uni-Mini-bar F Lee et al. 2015TCCACTAATCACAARGATATTGGTAC 224 37 11 0 - - coiMam-F1 This study TCAACAAACCAYAAAGAYATTGGTAC 421 71 23 4 24 22 coiMam-F2 This study TCCACAAAYCAYAAGGACATTGGCAC 340 61 22 2 17 15 coiMam-F3 This study TCAACAAAYCAYAAAGACATTGGTAC 421 64 22 2 25 20 coiMam-F4 This study TCAACYAACCACAAAGACATYGGAAC 399 53 19 0 22 18 Reverse RonPing R Lee et al. 2015TATCAGGGGCTCCGATTAT 224 37 11 0 - - coiMam-R1 This study ATRTCRGGGGCTCCAATTAT 657 118 24 3 34 29 coiMam-R2 This study ATRTCTGGRGCACCAATTAT 312 51 20 1 21 19 coiMam-R3 This study ATRTCRGGTGCTCCAATTAT 363 62 19 2 23 20 a: Maximum values between combinations of the primer with corresponding concurrent primers (F or R). b: Families of Brazilian mammals according to Brazilian Society of Mammalogy (Abreu et al. 2022). c: Species sequenced using Sanger technique. All seven modified COI primers amplified mammalian species in-vitro (Table 3). Although no single pair amplified all samples, different F-R combinations amplified all 33 tested mammals, with 25 generating consensus sequences. The best combination, coiMam-F1 and coiMam-R1, sequenced most samples, with additional support from four other combinations. The 205 bp COI-based identifications confirmed taxonomic assignments of 15 samples, corrected seven, and accurately identified 22 species from 19 families across eight orders. Three of the sequenced samples remained unidentified, though one (M01543, Mico rondoni) reached the correct genus despite lacking COI sequences in the reference database. Using a two-step PCR protocol for Illumina sequencing (Ushio et al. 2017; Chen et al. 2021) and PIMBA pipeline (Oliveira et al. 2021), mammalian species were amplified, sequenced, and identified in-vitro across three markers: i) COI (205 bp), ii) RNA12S (171 bp), and iii) RNA16S (78 bp). COI performed best, identifying most mammalian samples with different F-R combinations (Table 1, Fig 3). Although no single primer pair amplified and identified all samples, combined they correctly identified 17 of 26 samples, spanning 11 families and eight orders. Notably, one sample not sequenced by Sanger for COI (Cuniculus paca, M01538) was 20 successfully sequenced via NGS with the same marker. However, 11 other samples sequenced by Sanger were not sequenced via NGS, possibly due to decreased efficiency in the two-step PCR protocol. Some misidentifications occurred with RNA12S and RNA16S (Table 1, Fig 3). For instance, Noctilio albiventris was misidentified as N. leporinus and Alouatta fusca as Sapajus nigritus, likely due to close genetic distances in ribosomal genes. RNA12S and RNA16S also lacked reference sequences for many control samples, leading to unidentified or misidentified sequences when applying a 95% identity cut-off, except for one sample reaching the correct genus (M01543, supposedly Mico rondoni). Nevertheless, COI performance was improved when combined with RNA16S and RNA12S, demonstrating that multiple genetic markers can enhance taxonomic accuracy. Fig 3 Summary of in-vitro tests of selected primers using two-step PCR protocol for Illumina sequencing (Ushio et al. 2017, Chen et al. 2021) followed by PIMBA pipeline (Oliveira et al. 2021) to identify mammalian samples through three different genetic markers: COI (different combinations of F and R modified primers from Lee et al. [2015]); RNA12S (original primers from Ushio et al. [2017]); and RNA16S (primers modified from Haile et al. [2009]) Metabarcoding Performance Our customized database encompasses 72% of native mammal species and 87% of genera from all orders in Brazil, alongside all exotic species, thereby facilitating the identification of mammals found in aquatic, terrestrial, arboreal, scansorial, fossorial, and flying habitats. By linking genetic sequences to a recognized taxonomic classification (i.e. Abreu et al. 2023), this database enhanced the accuracy of species identification, which represents a significant advancement in biodiversity research and conservation. Although further expansion is needed, this represents a positive step toward generating taxonomic inventories and acquiring species information for the diverse and threatened Brazilian biomes. The use of our database in Brazilian inventory surveys may reduce misidentification, as the geographic filtering excludes species known to occur only outside the study area (Grenié et al. 2023). If necessary, the database also allows for filtering sequences according to the specific biome under study, ensuring more accurate identifications. Despite being fairly comprehensive, our customized database has some gaps, particularly among orders Rodentia, Didelphimorphia, and Chiroptera—the same groups that contain most species classified as data deficient or lacking threat information according to the IUCN (2024). This reflects the lack of available sequences in public databases, largely due to high species diversity, low population densities, limitations in trapping methods, and reduced research and conservation efforts for small mammals (Pacheco et al. 2013; Santos-Filho et al. 2015; Stephenson 2017). This incompleteness can hinder species identification in these orders, potentially leading to false negatives or misidentification (Keck et al. 2023). Efforts to increase the sequencing of genetic markers such as COI, RNA12S, and RNA16S in these groups would greatly benefit the surveying of hard-to-sample species. Our careful analysis of available primers for metabarcoding revealed that many of the primers are suboptimal, lacking characteristics necessary for adequate performance in PCR reactions (Banaganapalli et al. 2019) and/or generating too long amplicons, likely due to the challenge of finding stable and ideal regions for primer annealing within genetic marker sequences. The in-silico tests allowed to characterize the best-performing primer pairs among the compiled options that were, coincidentally, originally developed or specifically adapted 21 for mammals: i) a 78 bp region of RNA16S (modified from Haile et al. 2009), ii) a 60 bp region of RNA12S (Taberlet et al. 2018), iii) a 205 bp region of the COI gene (Lee et al. 2015), and iv) a 171 bp region of RNA12S (Ushio et al. 2017). These primers amplified the greatest number of mammalian species, particularly those found in Brazil, while avoiding the amplification of insect species, ensuring their potential use in iDNA samples. Despite the high evolutionary rate and resulting variability in primer-binding sites, COI sequences remain valuable for biodiversity assessments due to their species-level taxonomic resolution and the availability of well- developed databases (Hebert et al. 2003; Hajibabaei 2012; Clarke et al. 2017; Andújar et al. 2018). In fact, our customized database contained a greater abundance of COI sequences across all mammalian orders and threat categories compared to RNA12S and RNA16S. However, some of the primer pairs analysed for RNA12S and RNA16S exhibited better in-silico taxonomic coverage than the best COI pair. Our taxon-oriented modifications of the COI primers reduced mismatches against sequences of mammalian species native to Brazil, resulting in significant improvement in their in-silico taxonomic coverage. Although the high variability in primer-binding sites prevented a single primer pair from amplifying COI sequences of all Brazilian mammals—even with degenerate nucleotides—combining different versions of forward (F) and reverse (R) primers ultimately outperformed all other primers analysed in-silico. Using coiMam- F1 and coiMam-R1 along with other F-R combinations of modified primers, the expected 205 bp of COI was also successfully amplified from mammals in-vitro, accurately identifying two-thirds of the 33 species tested, representing 19 families and 8 orders found in Brazil. This confirms that designing or adapting primers specifically for species from the Neotropical region can enhance their performance. In fact, the results suggest that different combinations of our newly designed COI primers could also be used to identify mammals from other regions of the world with greater efficiency. Among the best genetic markers identified by in-silico tests (Online Resource 5), three were also tested in-vitro (Online Resource 6) using a two-step PCR protocol for Illumina sequencing: i) 205 bp of COI, using improved primers modified from Lee et al. (2015), ii) 171 bp of RNA12S, using original primers from Ushio et al. (2017), and iii) 78 bp of RNA16S, using a modified primer pair from Haile et al. (2009). Different F-R combinations of COI primers amplified, sequenced and allowed the accurate identification of more samples compared to RNA12S and RNA16S, making COI the marker with better taxonomic resolution. Additionally, some samples were either unidentified or misidentified by the ribosomal markers (RNA12S and RNA16S), possibly due to the larger incompleteness of their reference databases and the commonly observed overlap between intra- and interspecific genetic distances (Alberdi et al. 2017; Kannan et al. 2020), where the variability among closely related species can be minimal, making it challenging to distinguish between them. These results suggest COI as the most accurate metabarcoding identification of mammals, but for certain species, an accurate identification was only achieved using also RNA16S and/or RNA12S. While it was not possible to amplify all samples with a single marker, two-thirds of the 26 species tested—representing 11 families and eight orders—were accurately identified using a combination of all three markers. Thus, we recommend combining COI with ribosomal genes to acquire complementary information and improve the robustness of taxonomic identification in metabarcoding studies of mammals, especially in the highly genetically and biodiverse Neotropical region. Our findings are consistent with previous studies that have also suggested a multi-marker approach to enhance taxonomic coverage in metabarcoding studies (Pompanon et al. 2012; Alberdi et al. 2017; Axtner et al. 2019; Lynggaard et al. 2019). When comparing the two sequencing approaches used for mammalian samples in this study, a notable discrepancy in success was observed. Despite using the same COI primers, half of the samples successfully sequenced and identified through the Sanger technique were not successfully sequenced using the two-step PCR protocol for Illumina. This protocol was chosen because, by introducing indexes only in the second step, it offers primer flexibility and reduces template-specific biases, enabling more uniform amplification and a more concentrated library for sequencing (O’Donnell et al. 2016; Chen et al. 2021). However, it also presents certain disadvantages, such as an increased risk of contamination (Seitz et al. 2015) and a potential impact on amplification efficiency. This occurs because the addition of tail sequences to primers introduces extra nucleotides that can create mismatches with the target sequence, significantly reducing both amplification efficiency and yield (Elbrecht et al. 2018; Marquina et al. 2018). This issue can be exacerbated when primers are highly degenerate, as the added complexity can further decrease amplification success (Kumar et al. 2022). CONCLUSION (word count: 187) In this study, we selected and optimized specific primers and customized a reference database for the effective metabarcoding of Brazilian mammals. The development of this database for mammalian identification across Brazilian biomes represents a significant advancement in biodiversity research and conservation. With 72% of 22 known native species covered, including endangered species, our customized database holds great potential to reduce misidentification in metabarcoding studies in Brazil. Our optimized COI primers improved the accuracy and efficiency of species identification, particularly for mammals in the Neotropical region, by enhancing amplification success and expanding taxonomic coverage for metabarcoding applications. Notably, these newly designed primers could also be applied to mammalian identification in other regions of the world with greater efficiency. In both in-silico and in-vitro comparisons, COI emerged as the preferred genetic marker due to its extensive reference database and superior taxonomic resolution. However, the combined use with ribosomal sequences was essential for providing complementary information and improving the robustness of taxonomic identification. These findings highlight the effectiveness of a multi-marker approach in addressing the challenges of genetic variability in the Neotropical region and strengthening the accuracy of mammalian metabarcoding studies. REFERENCES Abreu, E. F., Casali, D., Costa-Araújo, R., Garbino, G. S. T., Libardi, G. S., Loretto, D., Loss, A. C., Marmontel, M., Moras, L. M., Nascimento, M. C., Oliveira, M. L., Pavan, S. E., & Tirelli, F. P. (2023). Lista de Mamíferos do Brasil (2023-1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10428436 Alberdi, A., Aizpurua, O., Gilbert, M. T. P., & Bohmann, K. (2017). Scrutinizing key steps for reliable metabarcoding of environmental samples. Methods in Ecology and Evolution, 9(1), 134-147. https://doi.org/10.1111/2041-210X.12849 Alcaide, M., Rico, C., Ruiz, S., Soriguer, R., Munoz, J., & Figuerola, J. (2009). Disentangling vector-borne transmission networks: a universal DNA barcoding method to identify vertebrate hosts from arthropod bloodmeals. PloS one, 4(9), e7092. https://doi.org/10.1371/journal.pone.0007092 Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410. https://doi.org/10.1016/S0022-2836(05)80360-2 Andújar, C., Arribas, P., Gray, C., Bruce, C., Woodward, G., Yu, D. W., & Vogler, A. P. (2018). Metabarcoding of freshwater invertebrates to detect the effects of a pesticide spill. Molecular Ecology, 27(1), 146-166. https://doi.org/10.1111/mec.14410 Axtner, J., Crampton-Platt, A., Hörig, L. A., Mohamed, A., Xu, C. C., Yu, D. W., & Wilting, A. (2019). An efficient and robust laboratory workflow and tetrapod database for larger scale environmental DNA studies. GigaScience, 8(4), giz029. https://doi.org/10.1093/gigascience/giz029 Banaganapalli, B., Shaik, N. A., Rashidi, O. M., Jamalalail, B., Bahattab, R., Bokhari, H. A., ... & Elango, R. (2019). In- silico PCR. In Essentials of Bioinformatics, Volume I (pp. 355-371). Springer, Cham. Cannon, M. V., Hester, J., Shalkhauser, A., Chan, E. R., Logue, K., Small, S. T., & Serre, D. (2016). In-silico assessment of primers for eDNA studies using PrimerTree and application to characterize the biodiversity surrounding the Cuyahoga River. Scientific reports, 6(1), 1-11. https://doi.org/10.1038/srep22908 Chaves, B. R., Chaves, A. V., Nascimento, A. C., Chevitarese, J., Vasconcelos, M. F., Santos, F. R. (2015). Barcoding Neotropical birds: assessing the impact of nonmonophyly in a highly diverse group. Molecular Ecology Resources, 15, 921-931 Chen, K. H., Longley, R., Bonito, G., & Liao, H. L. (2021). A two-step PCR protocol enabling flexible primer choice and high sequencing yield for Illumina MiSeq meta-barcoding. Agronomy, 11(7), 1274. https://doi.org/10.3390/agronomy11071274 Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2016). GenBank. Nucleic acids research, 44(D1), D67-D72. https://doi.org/10.1093/nar/gkv1276 Clarke, L. J., Beard, J. M., Swadling, K. M., & Deagle, B. E. (2017). Effect of marker choice and thermal cycling protocol on zooplankton DNA metabarcoding studies. Ecology and evolution, 7(3), 873-883. https://doi.org/10.1002/ece3.2667 Corse, E., Tougard, C., Archambaud‐Suard, G., Agnèse, J. F., Messu Mandeng, F. D., Bilong Bilong, C. F., ... & Dubut, V. (2019). One‐locus‐several‐primers: A strategy to improve the taxonomic and haplotypic coverage in diet metabarcoding studies. Ecology and Evolution, 9(8), 4603-4620. https://doi.org/10.1002/ece3.5063 Da Rosa, C. A., de Almeida Curi, N. H., Puertas, F., & Passamani, M. (2017). Alien terrestrial mammals in Brazil: current status and management. Biological Invasions, 19(7), 2101-2123. https://doi.org/10.1007/s10530-017-1423-3 Deagle, B. E., Jarman, S. N., Coissac, E., Pompanon, F., & Taberlet, P. (2014). DNA metabarcoding and the cytochrome c oxidase subunit I marker: not a perfect match. Biology letters, 10(9), 20140562. https://doi.org/10.1098/rsbl.2014.0562 Elbrecht, V., Hebert, P. D. N., & Steinke, D. (2018). Slippage of degenerate primers can cause variation in amplicon length. Scientific Reports, 8(1). https://doi.org/10.1038/s41598-018-29364-z 23 Ficetola, G. F., Coissac, E., Zundel, S., Riaz, T., Shehzad, W., Bessière, J., ... & Pompanon, F. (2010). An in-silico approach for the evaluation of DNA barcodes. BMC genomics, 11, 1-10. https://doi.org/10.1186/1471-2164-11-434 Freeland, J. R. (2017). The importance of molecular markers and primer design when characterizing biodiversity from environmental DNA. Genome, 60(4), 358-374. https://doi.org/10.1139/gen-2016-0100 Green, S. J., Venkatramanan, R., & Naqib, A. (2015). Deconstructing the polymerase chain reaction: understanding and correcting bias associated with primer degeneracies and primer-template mismatches. PloS one, 10(5), e0128122. https://doi.org/10.1371/journal.pone.0128122 Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G. M. L., Sagouis, A., & Winter, M. (2023). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods in Ecology and Evolution, 14(1), 12-25. https://doi.org/10.1111/2041-210X.13802 Haile, J., Froese, D. G., MacPhee, R. D., Roberts, R. G., Arnold, L. J., Reyes, A. V., ... & Willerslev, E. (2009). Ancient DNA reveals late survival of mammoth and horse in interior Alaska. Proceedings of the National Academy of Sciences, 106(52), 22352-22357. https://doi.org/10.1073/pnas.0912510106 Hajibabaei, M. (2012). The golden age of DNA metasystematics. Trends in genetics, 28(11), 535-537. https://doi.org/10.1016/j.tig.2012.08.001 Hajibabaei, M., Porter, T. M., Wright, M., & Rudar, J. (2019). COI metabarcoding primer choice affects richness and recovery of indicator taxa in freshwater systems. PLoS One, 14(9), e0220953. https://doi.org/10.1371/journal.pone.0220953 Hamady, M., Walker, J., Harris, J. et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods 5, 235–237 (2008). https://doi.org/10.1038/nmeth.1184 Hebert, P. D., Ratnasingham, S., & De Waard, J. R. (2003). Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(suppl_1), S96-S99. https://doi.org/10.1098/rsbl.2003.0025 IUCN. 2024. The IUCN Red List of Threatened Species. Version 2024-1. https://www.iucnredlist.org. Accessed on 18/09/2024. Jenkins, C. N., Pimm, S. L., & Joppa, L. N. (2013). Global patterns of terrestrial vertebrate diversity and conservation. Proceedings of the National Academy of Sciences, 110(28), E2602-E2610. https://doi.org/10.1073/pnas.1302251110 Ji, Y., Baker, C. C., Popescu, V. D., Wang, J., Wu, C., Wang, Z., ... & Yu, D. W. (2020). Measuring protected-area outcomes with leech iDNA: large-scale quantification of vertebrate biodiversity in Ailaoshan reserve. BioRxiv, 2020-02. https://doi.org/10.1101/2020.02.10.941336 Kannan, A., Rao, S. R., Ratnayeke, S., & Yow, Y. (2020). The efficiency of universal mitochondrial DNA barcodes for species discrimination of Pomacea canaliculata and Pomacea maculata. PeerJ, 8, e8755. https://doi.org/10.7717/peerj.8755 Keck, F., Couton, M., & Altermatt, F. (2023). Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses. Molecular Ecology Resources, 23(4), 742-755. https://doi.org/10.1111/1755-0998.13746 Kent, R. J. (2009). Molecular methods for arthropod bloodmeal identification and applications to ecological and vector‐borne disease studies. Molecular ecology resources, 9(1), 4-18. https://doi.org/10.1111/j.1755-0998.2008.02469.x Kent, R. J., & Norris, D. E. (2005). Identification of mammalian blood meals in mosquitoes by a multiplexed polymerase chain reaction targeting cytochrome B. The American journal of tropical medicine and hygiene, 73(2), 336-342. Kircher, M., & Kelso, J. (2010). High‐throughput DNA sequencing–concepts and limitations. Bioessays, 32(6), 524-536. https://doi.org/10.1002/bies.200900181 Kocher, T. D., Thomas, W. K., Meyer, A., Edwards, S. V., Pääbo, S., Villablanca, F. X., & Wilson, A. C. (1989). Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers. Proceedings of the National Academy of Sciences, 86(16), 6196-6200. https://doi.org/10.1073/pnas.86.16.6196 Kumar, G., Reaume, A. M., Farrell, E., & Gaither, M. R. (2022). Comparing edna metabarcoding primers for assessing fish communities in a biodiverse estuary. Plos One, 17(6), e0266720. https://doi.org/10.1371/journal.pone.0266720 Kwok, S., Kellogg, D. E., McKinney, N., Spasic, D., Goda, L., Levenson, C., & Sninsky, J. J. (1990). Effects of primer- template mismatches on the polymerase chain reaction: human immunodeficiency virus type 1 model studies. Nucleic acids research, 18(4), 999-1005. https://doi.org/10.1093/nar/18.4.999 LeBauer, D., Chamberlain, S., Foster, Z., Bartomeus, I., Black, C., Harris, D., Collins, R. (2024). traits: Species Trait Data from Around the Web. 10.32614/CRAN.package.traits Lee, P. S., Sing, K. W., & Wilson, J. J. (2015). Reading mammal diversity from flies: the persistence period of amplifiable mammal mtDNA in blowfly guts (Chrysomya megacephala) and a new DNA mini-barcode target. PloS one, 10(4), e0123871. https://doi.org/10.1371/journal.pone.0123871 24 Lynggaard, C., Nielsen, M., Santos‐Bay, L., Gastauer, M., Oliveira, G., & Bohmann, K. (2019). Vertebrate diversity revealed by metabarcoding of bulk arthropod samples from tropical forests. Environmental DNA, 1(4), 329-341. https://doi.org/10.1002/edn3.34 Marquina, D., Andersson, A. F., & Ronquist, F. (2018). New mitochondrial primers for metabarcoding of insects, designed and evaluated using in-silico methods. Molecular Ecology Resources, 19(1), 90-104. https://doi.org/10.1111/1755- 0998.12942 Meusnier, I., Singer, G. A., Landry, J. F., Hickey, D. A., Hebert, P. D., & Hajibabaei, M. (2008). A universal DNA mini- barcode for biodiversity analysis. BMC genomics, 9, 1-4. https://doi.org/10.1186/1471-2164-9-214 Mudalige, N. (2021). BOLD. R: A Software Package to Interface with BOLD Through R. In Recent Developments in Mathematical, Statistical and Computational Sciences: The V AMMCS International Conference, Waterloo, Canada, August 18–23, 2019 (pp. 187-196). Springer International Publishing. https://doi.org/10.1007/978-3-030-63591-6_18 Myers, N., Mittermeier, R. A., Mittermeier, C. G., Da Fonseca, G. A., & Kent, J. (2000). Biodiversity hotspots for conservation priorities. Nature, 403(6772), 853-858. DOI: 10.1038/35002501 O’Donnell, J. L., Kelly, R. P., Lowell, N., & Port, J. A. (2016). Indexed pcr primers induce template-specific bias in large- scale dna sequencing studies. Plos One, 11(3), e0148698. https://doi.org/10.1371/journal.pone.0148698 Okonechnikov, K., Golosova, O., Fursov, M., & Ugene Team. (2012). Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics, 28(8), 1166-1167. https://doi.org/10.1093/bioinformatics/bts091 Oliveira, R. R., Silva, R., Nunes, G. L., & Oliveira, G. (2021). PIMBA: A pipeline for Meta Barcoding Analysis. In Advances in Bioinformatics and Computational Biology: 14th Brazilian Symposium on Bioinformatics, BSB 2021, Virtual Event, November 22–26, 2021, Proceedings 14 (pp. 106-116). Springer International Publishing. https://doi.org/10.1007/978-3-030-91814-9_10 Pacheco, M., Kajin, M., Gentile, R., Zangrandi, P. L., Vieira, M. V., & Cerqueira, R. (2013). A comparison of abundance estimators for small mammal populations. Zoologia (Curitiba), 30, 182-190. https://doi.org/10.1590/S1984- 46702013000200008 Paglia, A. P., da Fonseca, G. A., Rylands, A. B., Herrmann, G., Aguiar, L., Chiarello, A. G., ... & Patton, J. L. (2012). Lista anotada dos mamíferos do Brasil. Occasional papers in conservation biology. Piper, A. M., Batovska, J., Cogan, N. O., Weiss, J., Cunningham, J. P., Rodoni, B. C., & Blacket, M. J. (2019). Prospects and challenges of implementing DNA metabarcoding for high-throughput insect surveillance. GigaScience, 8(8), giz092. https://doi.org/10.1093/gigascience/giz092 Pompanon, F., Deagle, B. E., Symondson, W. O., Brown, D. S., Jarman, S. N., & Taberlet, P. (2012). Who is eating what: diet assessment using next generation sequencing. Molecular ecology, 21(8), 1931-1950. https://doi.org/10.1111/j.1365-294X.2011.05403.x Primmer, C. R., Møller, A. P., & Ellegren, H. (1996). A wide‐range survey of cross‐species microsatellite amplification in birds. Molecular ecology, 5(3), 365-378. https://doi.org/10.1046/j.1365-294X.1996.00092.x Ratnasingham, S., & Hebert, P. D. (2007). BOLD: The barcode of life data system (www. barcodinglife. org). Molecular Ecology Notes, 7: 355-364. https://doi.org/10.1111/j.1471-8286.2007.01678.x Raven, P. H., Gereau, R. E., Phillipson, P. B., Chatelain, C., Jenkins, C. N., & Ulloa, C. U. (2020). The distribution of biodiversity richness in the tropics. Science Advances, 6(37). https://doi.org/10.1126/sciadv.abc6228 Reeves, L. E., Gillett-Kaufman, J. L., Kawahara, A. Y., & Kaufman, P. E. (2018). Barcoding blood meals: new vertebrate- specific primer sets for assigning taxonomic identities to host DNA from mosquito blood meals. PLoS neglected tropical diseases, 12(8), e0006767. https://doi.org/10.1371/journal.pntd.0006767 Riaz, T., Shehzad, W., Viari, A., Pompanon, F., Taberlet, P., & Coissac, E. (2011). ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic acids research, 39(21), e145-e145. https://doi.org/10.1093/nar/gkr732 Santos-Filho, M. D., Lázari, P. R. D., Sousa, C. P. F. D., & Canale, G. R. (2015). Trap efficiency evaluation for small mammals in the southern Amazon. Acta Amazonica, 45, 187-194. https://doi.org/10.1590/1809-4392201401953 Seitz, V., Schaper, S., Dröge, A., Lenze, D., Hummel, M., & Hennig, S. (2015). A new method to prevent carry-over contaminations in two-step pcr ngs library preparations. Nucleic Acids Research, gkv694. https://doi.org/10.1093/nar/gkv694 Sherrill-Mix, S. (2019). taxonomizr: Functions to Work with NCBI Accessions and Taxonomy. See https://CRAN. R-project. org/package= taxonomizr. Stephenson, P. J. (2017). Small mammal monitoring: why we need more data on the Afrotheria. Oceania, 70(632), 11-1. Stothard P (2000) The Sequence Manipulation Suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 28:1102-1104. https://doi.org/10.2144/00286ir01 25 Taberlet, P., Bonin, A., Zinger, L., & Coissac, E. (2018). Environmental DNA: For biodiversity research and monitoring. Oxford University Press. Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C., & Willerslev, E. (2012). Towards next‐generation biodiversity assessment using DNA metabarcoding. Molecular ecology, 21(8), 2045-2050. https://doi.org/10.1111/j.1365- 294X.2012.05470.x Taylor, P. G. (1996). Reproducibility of ancient DNA sequences from extinct Pleistocene fauna. Molecular biology and evolution, 13(1), 283-285. https://doi.org/10.1093/oxfordjournals.molbev.a025566 Thermo Fisher Scientific Inc. (2024). Multiple Primer Analyzer. (Available at https://www.thermofisher.com/br/en/home/brands/thermo-scientific/molecular-biology/molecular-biology-learning- center/molecular-biology-resource-library/thermo-scientific-web-tools) Thomsen, P. F., Kielgast, J. O. S., Iversen, L. L., Wiuf, C., Rasmussen, M., Gilbert, M. T. P., ... & Willerslev, E. (2012). Monitoring endangered freshwater biodiversity using environmental DNA. Molecular ecology, 21(11), 2565-2573. https://doi.org/10.1111/j.1365-294X.2011.05418.x Townzen, J. S., Brower, A. V. Z., & Judd, D. D. (2008). Identification of mosquito bloodmeals using mitochondrial cytochrome oxidase subunit I and cytochrome b gene sequences. Medical and veterinary entomology, 22(4), 386-393. https://doi.org/10.1111/j.1365-2915.2008.00760.x Ushio, M., Fukuda, H., Inoue, T., Makoto, K., Kishida, O., Sato, K., ... & Miya, M. (2017). Environmental DNA enables detection of terrestrial mammals from forest pond water. Molecular Ecology Resources, 17(6), e63-e75. https://doi.org/10.1111/1755-0998.12690 Valentini, A., Pompanon, F., & Taberlet, P. (2009). DNA barcoding for ecologists. Trends in ecology & evolution, 24(2), 110-117. https://doi.org/10.1016/j.tree.2008.09.011 Wei, X., Kuhn, D. N., & Narasimhan, G. (2003). Degenerate primer design via clustering. In Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003 (pp. 75-83). IEEE. doi: 10.1109/CSB.2003.1227306 Wilson, D. E., & Reeder, D. M. (Eds.). (2005). Mammal species of the world: a taxonomic and geographic reference (Vol. 1). JHU press. (Available at http://www.press.jhu.edu). Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden T (2012). Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics. 13:134. https://doi.org/10.1186/1471-2105-13-134 Statements and Declarations This work was supported by CNPq, ICMBio and FAPEMIG (Grant CNPq/ICMBio/FAPs nº18/2017). B.R.N.C. received scholarship from Fundação Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES). J.E.R.S. received a postdoctoral CAPES fellowship, and F.R.S. has a research fellowship from CNPq. The authors have no relevant financial or non-financial interests to disclose. All authors contributed to the study conception and design. FRS and JESJ have written the approved grant proposal of the metabarcoding project (CNPq 421303/2017-4). Material preparation, data collection and analyses were performed by BRNC. The first draft of the manuscript was written by BRNC and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Data availability The datasets generated during and/or analysed during the current study are available in the Github [https://github.com/babinecha/BrazilianMammalsDB], and NCBI/GenBank [Accession numbers from PQ529421 to PQ529445] or from the corresponding author on reasonable request. Supplementary Information Online Resource 1. Primer modifications, PCR conditions and recipes for Illumina Library construction and assembly, and Diagnostic tree detailed steps (Fig 1). Online Resource 2. Spreadsheet containing the Brazilian mammal species list compiled from the list of Brazilian Society of Mastozoology (SBMz) (Abreu et al. 2023), complemented with Brazilian species according to the Mammal species of the world (Wilson & Reeder 2005), as well as exotic mammals present in Brazil (Da Rosa et al. 2017). It presents the taxonomic information of 830 mammal species from 270 genera, 55 families, and 11 orders, along with their name source and synonyms where available. The columns indicate whether the species is native or exotic to Brazil, its conservation status according to the IUCN Red List Version 2024-1 (IUCN 2024), its locomotion type (terrestrial "Te", aquatic "Aq", scansorial "Sc", semi-aquatic "SA", arboreal 26 "Ar", flying "Vo") according to Abreu et al. (2023), and occurrence across six Brazilian biomes: Amazon (AMAZ), Atlantic Forest (MATL), Cerrado (CERR), Caatinga (CAAT), Pantanal (PANT), Pampa (PAMP), and Marine (MARI), according to Abreu et al. (2023) and Paglia et al. (2012). Online Resource 3. Spreadsheet containing the customized reference database for barcoding and metabarcoding Brazilian mammals. It includes sequence IDs obtained from GenBank and BOLD for COI, RNA12S, RNA16S, and mitogenomic sequences, along with the respective taxonomic information. The columns provide the following data: phylum, class, order, family, genus, species, source of name, accession number, sequence type, sequence source, and Brazilian mammal status (whether the species is native, exotic, or foreign to Brazil, or if the sequence belongs to other animal classes). It also presents the distribution of species across different Brazilian biomes, represented by the columns AMAZ (Amazon), MATL (Atlantic Forest), CERR (Cerrado), CAAT (Caatinga), PANT (Pantanal), PAMP (Pampa), and MARI (Marine). The values in the biome columns indicate the presence (1) or absence (0) of the species in each biome. Online Resource 4. Table containing all compiled and modified primers analysed in this study, along with their respective sequences and characteristics for mitochondrial markers used in metabarcoding of mammal species. It specifies the amplifiable genetic marker for each primer, including the primer type (forward or reverse), name, reference, nucleotide sequences (5'-3'), and lengths. The chemical characteristics of the primers, including the minimum and maximum percentage of guanine-cytosine content (%GC) and minimum and maximum melting temperatures (TM), were obtained using Primer Stats (Stothard, 2000). Online Resource 5. Spreadsheet containing results of in-silico PCR performed using the primerTree package in R (Cannon et al. 2016). It details the number of sequences retrieved and taxonomically identified from GenBank using each primer pair listed in Online Resource 4. The data is organized by genetic marker, primer pair (forward and reverse), taxonomic classification (class, order, family), number of species, number of sequences, product lengths, and the number of mismatches between the primer and target sequences. Online Resource 6. Spreadsheet containing results from in-vitro PCR performed with three genetic markers: 1) 205 bp of the COI gene (modified primers from Lee et al. 2015), 2) 171 bp of RNA12S (original primers from Ushio et al. 2017), and 3) 78 bp of RNA16S (modified primers from Haile et al. 2009). Two amplification techniques were applied to mammalian samples: Sanger sequencing and Illumina sequencing (using a two-step PCR protocol adapted from Ushio et al. 2017 and Chen et al. 2021, followed by the PIMBA pipeline from Oliveira et al. 2021). The spreadsheet includes Sample ID, DNA mix ID, collection information, provisional taxonomic classifications based on morphology (order, family, genus, species), applicable synonyms, number of sequence present in the customized Brazilian mammalian database (for COI, 12S, and 16S), and sequencing results. For each sample, species identification, percent identity, number of reads (Illumina only), reference database used, and diagnostic information are provided. 27 3. Capítulo II. Análise da biodiversidade de vertebrados detectada através de metabarcoding de iDNA em biomas brasileiros (artigo não submetido) RESUMO Os biomas neotropicais, incluindo a Mata Atlântica e o Cerrado, são hotspots de biodiversidade que enfrentam ameaças severas, indicando a necessidade de métodos eficazes de monitoramento. O metabarcoding de iDNA oferece uma alternativa poderosa aos levantamentos tradicionais, analisando DNA extraído de invertebrados. Aqui, aplicamos essa metodologia pela primeira vez na Mata Atlântica e no Cerrado, além de locais na Amazônia, utilizando mutucas (Tabanidae) e moscas varejeiras (Calliphoridae e Sarcophagidae) como amostradores. As identificações taxonômicas basearam-se em três marcadores genéticos (COI, RNA12S, RNA16S) e um banco de referência, ambos customizados para mamíferos brasileiros. Identificamos 106 unidades taxonômicas operacionais (OTUs) abrangendo mamíferos, aves, anfíbios, répteis e peixes, incluindo espécies elusivas, uma ameaçada (Sapajus nigritus), uma exótica (Lepus sp.) e uma doméstica (Bos taurus). O RNA16S foi o marcador que mais contribuiu para a diversidade taxonômica, enquanto o COI e o RNA12S foram mais eficazes para mamíferos e aves. A eficiência de amostragem variou com a sazonalidade, e o agrupamento de moscas detectou uma gama mais ampla de táxons, embora moscas individuais também tenham contribuído para a biodiversidade registrada. Os desafios incluíram bancos de referência incompletos, contaminação cruzada e eficiência reduzida de amplificação. Este estudo destaca o potencial do metabarcoding de iDNA para complementar levantamentos tradicionais, oferecendo insights escaláveis sobre a biodiversidade de vertebrados em biomas neotropicais. Ao expandir bancos de referência e melhorar metodologias, essa abordagem pode guiar estratégias de conservação e apoiar avaliações de saúde ecossistêmica em hotspots de biodiversidade. 28 Title iDNA Metabarcoding for Biodiversity Monitoring in Neotropics: A Case Study in Brazilian Biomes Author information Barbara R N Chaves (barbarachaves@ufmg.br, https://orcid.org/0009-0008-0977-8974) Jose Eustaquio Santos-Junior (jrsantos140782@yahoo.com.br, https://orcid.org/0000-0002-7150-3751) Fabricio R Santos (fsantos@icb.ufmg.br, https://orcid.org/0000-0001-9088-1750) Universidade Federal de Minas Gerais Belo Horizonte, MG, Brazil ABSTRACT Neotropical biomes, including the Atlantic Forest and Cerrado, are biodiversity hotspots facing severe threats, highlighting the need for effective monitoring. iDNA metabarcoding offers a powerful alternative to traditional surveys by analyzing DNA from invertebrates. Here, we applied this method for the first time in the Atlantic Forest and Cerrado, along with Amazonian sites, using horseflies (Tabanidae) and carrion flies (Calliphoridae and Sarcophagidae) as samplers. Taxonomic assignments relied on three genetic markers (COI, RNA12S, RNA16S) and a customized reference database for Brazilian mammals. We identified 106 operational taxonomic units (OTUs) spanning mammals, birds, amphibians, reptiles, and fish, including a threatened (Sapajus nigritus), an exotic (Lepus sp.), and a domestic (Bos taurus) species. RNA16S contributed the most to taxonomic diversity, while COI and RNA12S were effective for mammals and birds. Sampling efficiency varied with seasonality, and pooling flies detected a broader range of taxa, though individual flies also added biodiversity. Challenges included incomplete reference databases, cross-contamination, and reduced amplification efficiency. This study highlights the potential of iDNA metabarcoding to complement traditional surveys, offering scalable insights into vertebrate biodiversity in Neotropical biomes. By expanding reference databases and improving methodologies, this approach can guide conservation strategies and support ecosystem health assessments in biodiversity hotspots. INTRODUCTION Neotropical biomes, such as the Amazon, Atlantic Forest, and Cerrado, are among the most degraded and rapidly disappearing ecosystems globally. Recognized as biodiversity hotspots, they harbor exceptional species richness while facing severe threats (Myers et al 2000; Jenkins et al 2013). Despite their importance, these ecosystems are disproportionately under-sampled (Hughes et al 2021), leading to numerous Neotropical species undergoing local extinctions or vanishing before being formally described. Although their relatively large body size compared to other groups, mammals are challenging to study in the Neotropical region due to limited accessibility and logistical constraints, as well as low population densities and the elusive nature of many species due to their ability to conceal themselves within the complex tropical vegetation (Schipper et al 2008; Schnell et al 2012). Although mammals receive significant attention in conservation research, 13% of mammal species are still classified as Data Deficient (DD) by the IUCN, with 35% occurring in the Neotropics (IUCN, 2024). This highlights the urgent need to expand taxonomic inventories and knowledge of natural history (Thomsen et al 2012). Traditional taxonomic approaches, however, demand extensive fieldwork and require specialized expertise in morphology and taxonomy. Metabarcoding of invertebrate-derived DNA (iDNA) has emerged as a promising tool for biodiversity surveys, enabling the detection of elusive and rare vertebrate species through the analysis of DNA extracted from invertebrates that feed on vertebrates (Calvignac-Spencer et al., 2013). Metabarcoding of iDNA presents similar or higher efficiency in detecting vertebrates than conventional methods (e.g. camera trapping) and can generate species lists that support ecological analyses and biodiversity monitoring (Keck et al., 2023). Unlike mammals, invertebrates can be easily collected, for instance, through flight interception traps (e.g. Malaise traps), requiring minimal sampling effort to capture large numbers of diverse fly species (Blahó et al 2013; Lynggaard et al 2019; Skvarla et al 2021). Although most iDNA metabarcoding studies have been conducted in temperate zones, reflecting limited research funding in many tropical countries (Carvalho et al 2022), it has been successfully applied in the Neotropics, within a growing number of studies utilizing diverse invertebrate groups as sources of vertebrate DNA (e.g., Kocher et al 2017; Rodgers et al 2017; Lynggaard et al 2019; Massey et al 2022; Saranholi et al 2024). However, the Atlantic Forest, for instance, despite its importance for conservation—second richest biome in terms of mammalian diversity, harboring about 300 mammal species, 30% of which are endemic (Paglia et al, 2012), and facing severe anthropogenic pressures such as habitat loss and defaunation (Bogoni et al., 2018; Galetti et al., 2017)—remains understudied, with no published iDNA surveys to date, with exception of a zoo in a transition area between Cerrado and Atlantic Forest (Saranholi et al., 2023). 29 Each iDNA source can introduce taxonomic biases, due to differences in feeding ecology and life history (Massey et al 2022). However, flies from Calliphoridae and Sarcophagidae families (carrion flies) feed on dead animals, open wounds, and feces, and have been frequently used as iDNA samplers, presenting no noticeable taxonomic bias: they enable the detection of a wide diversity of vertebrate species, including terrestrial, volant, and arboreal mammals (Calvignac-Spencer et al 2013; Rodgers et al 2017; Gogarten et al 2020; Lee et al 2023). Hematophagous flies from the family Tabanidae (horseflies) also have potential as iDNA sources, as they are highly adapted for widespread dispersion, using their flight ability to locate hosts and reach even the most elusive mammals of tropical environments (Brown, 2020). A recent work successfully identified mammal and bird species using mosquitoes and flies from several families (Saranholi et al. 2023), but horseflies have not yet been used as vertebrate samplers. The selection of appropriate genetic markers is crucial for metabarcoding efficiency, particularly in hyperdiverse regions like the Neotropics, where high genetic variability can hinder amplification and taxonomic identification (Carvalho et al., 2022). While ribosomal genes (e.g., RNA12S and RNA16S) are commonly used, they can lack species-level resolution for some groups and are supported by limited reference databases (Alberdi et al., 2018). The COI gene, on the other hand, offers higher taxonomic resolution and a broader database but requires careful primer design to ensure compatibility with local biodiversity using short sequences (Hajibabaei, 2012; Clarke et al., 2017). Combining multiple markers can mitigate individual limitations, improving both detection and taxonomic resolution (Teixeira et al 2023). In this study, we aimed to survey mammal diversity through iDNA across three Neotropical biomes (Amazon, Atlantic Forest, and Cerrado), maximizing detection by sampling flies from three families (Calliphoridae, Sarcophagidae, and Tabanidae). We employed a combination of primers targeting three genetic markers (COI, RNA12S, and RNA16S), alongside a customized reference database of known Brazilian mammal species, which demonstrated potential for species-level identification based on in silico and in vitro analyses (Chaves et al., submitted). This study not only serves as a pilot to validate the performance of these primers in real-world conditions but also represents a critical step toward expanding the application of metabarcoding in biodiversity monitoring and conservation in the Neotropics. Fig 1 Maps of sampled localities of iDNA metabarcoding of horseflies and carrion-flies, sampled across three Neotropical biomes (Amazon, Atlantic Forest, and Cerrado): a) number of Malaise traps/points per sampled 30 locality; b) proportion of sampled flies per locality and fly families (Tabanidae, Sarcophagidae, and Calliphoridae); c) proportion of vertebrate OTUs detected by locality and vertebrate classes (Mammalia, Aves, Actinopteri, Amphibia, and Lepidosauria). METHODS Survey localities and Insect sampling Fieldwork was conducted by a team of nine people across six points in three localities in the Amazon biome, all in Acre State (Juruá Valley, RESEX Cazumbá-Iracema, and Alto Acre), six points in the Cerrado biome, all in Minas Gerais State (region of Indaiá River), and 29 points in the Atlantic Forest biome, across five Brazilian states and six conservation units (PARNA Monte Pascoal, FLONA Rio Preto, PE Rio Doce, PARNA Caparaó, PARNA Itatiaia, and PARNA Campos Gerais). Conservation units were used as collection localities due to their crucial role in preserving biodiversity by helping maintain ecological processes and protect species from anthropogenic pressures. In total, five expeditions were conducted across 41 sampling points within 10 localities (Fig 1a), resulting in 369 trap days of sampling and encompassing diverse environmental conditions and geographic regions. Insects were collected passively using unbaited Malaise-type flight interception traps with absolute ethanol-filled bottles. Traps were placed at each collection locality and left for an average of five days (minimum one day and maximum 13 days) during five sampling periods: July 2017, June and September 2018, November 2019, and December 2022. Variations in trap exposure and sampling periods occurred due to logistical challenges, but environmental conditions during different seasons are known to influence fly abundance and diversity, potentially affecting sampling efficiency. At PE Rio Doce, black balloons were positioned near the traps to enhance the collection of horseflies, which are visually attracted to large, dark objects (Brown, 2020; Skvarla et al., 2021). All collected samples were preserved in absolute ethanol at -20°C to ensure the integrity of DNA for subsequent analyses, and stored in the Laboratory of Biodiversity and Molecular Evolution of UFMG (LBEM- UFMG). Fly specimens from families Calliphoridae, Sarcophagidae, and Tabanidae were sorted and individualized from each sample under stereomicroscopes at the Laboratory of Insect Systematics of UFMG (LSI-UFMG). The intestines of larger specimens and the entire abdomens of smaller flies considered the most likely sources of vertebrate iDNA due to their contact with ingested material, were carefully dissected using sterilized razor blades and tweezers. The tissues were individually digested in a lysis mix (Wilson, 2012) and subjected to DNA extraction using DNeasy Blood & Tissue Kits (QIAGEN), fo