Comparative Analysis of the Complete Chloroplast Genome Sequences of Four Origin Plants of Lonicerae Flos (Lonicera; Caprifoliaceae)

Lonicerae Flos (LF) derived from the dried flower buds or opening flowers of four Lonicera plants (Lonicera macranthoides, L. hypoglauca, L. confusa, and L. fulvotnetosa), is a popular traditional Chinese medicine. Because the four origin plants are very similar in morphology, it is difficult to control the quality of LF in actual production. Over the past decade, many reports have pointed out the differences among them, including the botanical characteristics and active ingredients. However, there is still a lack of rapid methods that can be applied to the identification of the four origins. In this study, comparative analysis of the four chloroplast genomes was performed, and they showed low diversity (Pi = 0.00267), three variation hotspots regions (rbcL-accD, rps12-ndhF and rps12-trnN-trnG) were identified as potentially molecular marker of highly informative. Meanwhile, the most obvious difference in SSR comparative analysis is reverse and complement repeats were only identified in L. confusa and L. hypoglauca, respectively. Lastly, the phylogenetic tree showed that L. confusa is more closely related to L. fulvotnetosa, while L. macranthoides is closer to L. hypoglauca. This study systematically revealed the differences among the four chloroplast genomes, and it provides valuable genetic information for identifying the origin of LF.


Introduction
Lonicerae Flos (LF, Shanyinhua in Chinese) is one of the most commonly used traditional Chinese medicines [1], which has been officially listed in Chinese Pharmacopeia (Edition 2020) and described as "Used for carbuncle boils, throat arthralgia, erysipelas, toxic blood dysentery, wind-heat cold, febrile fever". LF is the dried flower buds or opening flowers of four plants (Lonicera macranthoides, L. hypoglauca, L. confusa, and L. fulvotnetosa) [2], and they are widely cultivated in Southern China. Inflorescences and bracts are the main differences among the four origin plants in terms of botanical traits [3]. At present, L. macranthoides is the most cultivated one in the market, and L. confusa was the least cultivated and used [3,4]. Because they are so similar, how to distinguish the differences among them is one of the difficulties of current research [5,6]. In order to solve this problem, studies on genetic diversity and relationships might be a good choice.
The use of plant DNA analysis to identify plant species, genotypes, and relationships has gradually replaced earlier techniques based on other biochemical markers [7]. With the rapid development of DNA analysis technology, it is becoming more accurate, conventional and low-cost, and is commonly used in plant research [8,9]. At present, phylogenetic analysis of plants is mainly based on the structure and changes of chloroplast genome and nuclear genome. However, it is difficult to screen low copy genes in plants due to the complexity of nuclear genomes. Chloroplast (cp) genome is of great significance in revealing the origin and evolution of species, genetic diversity, genetic relationship and biodiversity because of their small molecular weight (115 to 165 kb), non-recombination, highly conserved, and uniparental inheritance characteristics [10][11][12][13]. It has been widely accepted as a powerful tool for distinguishing the difference among related similar species [14,15]. With the accumulation of cp genome of Lonicera, comparative analysis of the complete cp genome of four origin plants of LF is helpful for deepening and expanding our systematic understanding of it.
In most previous report, studies focused on inferring the phylogeny of Lonicera or Caprifoliaceae based on complete chloroplast genome [16][17][18][19]. In this study, we report three sequenced complete cp genomes from three different origin plants of LF (L. macranthoides, L. confusa, and L. fulvotnetosa), respectively, and genomic comparative analyses with the other published cp genome (L. hypoglauca) [20]. The comparative analysis focuses on features, structure, nucleotide diversity, simple sequence repeats (SSRs) and phylogenetic analysis. The aims of our study are: (1) to comprehensive understanding the complete cp genome features from four origin plants of LF, (2) to the systematic analysis of similarities and differences from the four origin plants, (3) to infer the phylogenetic relationship among the four and between the four and Lonicerae Japonicae Flos (LJF, Jinyinhua in Chinese, L. japonica,), and (4) to provide genetic resources for developing chloroplast markers to identify LF species and future research on Lonicera.

Plant Material and Genome Sequencing
Fresh leaves samples were collected from three Lonicera species (Lonicera macranthoides, L. confusa, and L. fulvotnetosa; Table 1). Voucher specimens were deposited at the Hunan Academy of Forestry. A Genomic DNA extraction and high-throughput sequencing were performed with an Illumina NovaSeq6000 by Suzhou GENEWIZ Biotech. Co., Ltd. (Suzhou, China).

Assembly and Annotation of Chloroplast Genome
NGS QC Tool Kit software package was used for data quality detection and filtering to remove low quality sequences, joint sequences and sequences containing uncertain bases to obtain high quality sequences (clean reads). The clean reads then assembled using Velvet 1.2.10 [21], SSPACE v3.0 [22] and GapFiller v2.1.2 [23] with the cp genome of L. japonica (GenBank: KJ170923) as the reference [24]. The assembled sequence was annotated using the Plann [25], transfer RNAs (tRNAs) were detected in the genome using the program tRNAscan-SE [26] with default parameter settings and rRNA were identified by using RNAmmer [27]. All gene annotations were verified by Geneious 11.1.4 software [28] and the circular genome maps were drawn with OGDRAW (Organellar Genome DRAW) [29]. Finally, three Lonicera species annotated chloroplast genomes were submitted to GenBank (Table 2).

Comparative Analysis of Chloroplast Genomes
This study, except for the L. macranthoides, L. confusa and L. fulvotnetosa complete chloroplast genomes sequenced here, L. hypoglauca chloroplast genome (NC_054350) [20] were used for comparative genomic analysis by mVISTA with LAGAN alignment program [30]. The major variations of gene contents or features in four Lonicera species chloroplast genome were manually identified with Geneious [28]. For accurate comparisons, gene annotations of NC_054350 was checked again with Plann, RNAmmer, and tRNAscan-SE [26]. DNA polymorphisms analysis including highly variable sites and nucleotide diversity (Pi) was performed using DnaSP (DNA Sequence Polymorphism) v6 [31], which the window length was set to 800 bp and the step size was set to 200 bp. The four chloroplast genome sequences were aligned by using Geneious.
Comparison of chloroplast genomes from four origin plants of LF, L. macranthoides chloroplast genome had the largest genome size, however L. hypoglauca had the smallest genome size. Interestingly enough, "L. macranthoides and L. hypoglauca", "L. confusa and L. fulvotnetosa" shared the completely same number of genes. This is largely due to the simple and relatively conserved structure of chloroplast genome and the difference in chloroplast genome size may be caused by different homologous gene length for plants of the same genus.

Comparative Analyses of Lonicerae Flos Species
There were no large differences among the four origin plants of LF as a whole. Meanwhile, the chloroplast genomes of L. confusa and L. fulvotnetosa showing the least differences. It is worth noting that, when compared to the L. macranthoides, L. confusa, L. fulvotnetosa and L. hypoglauca have a large gap between rbcL and accD genes with five gaps located in conserved non-coding sequences (CNS) and two gaps located in accD (Fig. 2). In addition, L. confusa and L. fulvotnetosa have more variable sites compared to L. macranthoides and L. hypoglauca, L. hypoglauca has more variable sites than L. macranthoides.  Transcribed clockwise or counter-clockwise were shown inside or outside the circle. LSC, large single-copy region; SSC, small single-copy region; IR, inverted repeat By using DnaSP software, nucleotide diversity (Pi) was calculated to estimate the genetic distance among four LF species chloroplast genomes. The Pi value for four LF chloroplast genomes included (L. macranthoides, L. confusa, L. fulvotnetosa and L. hypoglauca) was 0.00267. By comparing the chloroplast genomes of four LF species, several variation hotspots were found (Fig. 3). There are three hotspots showed higher Pi values than other regions (Pi > 0.02), among these variation hotspots, rbcL-accD region showed the highest Pi (0.06875), followed by two regions (rps12-ndhF and rps12-trnN-trnG).

Figure 2:
Comparisons of four Lonicerae Flos species chloroplast genomes. L. macranthoides chloroplast genome was used as reference sequence, the x-axis represents the aligned sequence of base and the y-axis represents the pairwise percent identity (50%-100%). Gray arrows, purple bars, sky blue, red bars and gray bars represent gene, exon, UTR, CNS and mRNA, respectively

Phylogenetic Analysis
In this study, the phylogeny of Lonicera was reconstructed using four complete chloroplast genomes of LF species (L. macranthoides, L. confusa, L. fulvotnetosa and L. hypoglauca) and thirteen other species from the Lonicera genus. According to the phylogenetic tree (Fig. 5), two main clades can be identified, one of these includes four Lonaria species (96% bootstrap) and the remaining species, including the four species of interest, are located in the other clade (100% bootstrap). All four species of this study are included in the same clade (100% bootstrap) which is divided in two, with L. confusa, L. fulvotomentosa and L. japonica together in one clade, while L. macranthoide and L. hypoglauca along with L. maximowiczii are grouped in the other. In other words, the L. confusa is closer to L. fulvotnetosa and L. macranthoides is closer to L. hypoglauca. Compared to other species of Lonicera genus, the four origin plants of LF to be more closely related.

Conclusions
Although chloroplast genomes of four origin plants of LF (L. macranthoides, L. confusa, L. fulvotnetosa and L. hypoglauca) have been reported [17,36,37], comparative analysis of the four chloroplast genomes for the first time in this study. In addition, the results of chloroplast genome assembly and annotation were different due to different plant varieties, sequencing platforms and assembly methods, the results of this study will be complementary. There is small difference in the feature of chloroplast genome among the four Lonicera plants, such as size of LSC, SSC and IR, the number of PCGs, tRNAs and rRNAs, GC content, etc. Within four origin plants of LF, the four chloroplast genomes showed low diversity (Pi = 0.00267), meanwhile, there are three variation hotspots regions which including rbcL-accD, rps12-ndhF and rps12-trnN-trnG were found. To find more differences, SSR analysis was performed. The results showed two obvious differences among these four chloroplast genomes. One is the percentage of microsatellites were located in the intron, exon and intergenic regions (Fig. 4b), and the other is the proportion of different SSR types. Reverse, complement repeats were only identified in L. confusa and L. hypoglauca (Fig. 4c), respectively. Additionally, the chloroplast genome sequences of 17 Lonicera species were constructed the genetic phylogenetic analysis based on maximum likelihood method. According to the phylogenetic tree, four origin plants of LF (L. macranthoides, L. confusa, L. fulvotnetosa and L. hypoglauca), L. japonica and L. maximowiczii have a closer relationship.
The differences among the four origin plants of LF were evident in the genetic structure and repetitive sequences. Although L. confusa and L. fulvotnetosa, L. macranthoides and L. hypoglauca have the same number of microsatellites and coding regions without rearrangement, L. confusa, L. fulvotnetosa and L. hypoglauca have a large gap between rbcL and accD genes with seven small gaps compared to the L. macranthoides. Then by analyzing the SSR type, reverse and complement repeats were only found in L. confusa and L. hypoglauca, respectively. In order to better control the quality of traditional Chinese medicine, we need to identify the origin plants of Chinese medicinal materials quickly and accurately. Unquestionably, these differences were found in this study will benefit of the development of molecular markers to identify the origin of LF. Meanwhile, these results will provide more genetic information for molecular assisted breeding of LF.
Traditional Chinese medicine Lonicerae Japonicae Flos (LJF, Jinyinhua in Chinese) is dried flower buds or the flower with opening of L. japonica. Because LJF and LF both have the same pharmacologic effects and extremely similar appearances, there are easily confused, abuse and other phenomena [38][39][40]. As can be seen from Fig. 5, LF chloroplast genomes were classified into two branches, L. japonica was clustered into a branch with L. confusa and L. fulvotnetosa, however, L. macranthoides and L. hypoglauca were clustered into a branch with L. maximowiczii. The phylogenetic analysis showed that L. japonica have more closer relationship with L. confusa and L. fulvotnetosa than L. macranthoides and L. hypoglauca. Most current studies showed that it is difficult to point out their similarities or differences in-depth [41][42][43]. Therefore, studies on genetic diversity, relationships, bioactive compounds and modern pharmacological effects should be highlighted.