International Journal of
Villin Family Members Associated with Multiple Stress Responses in Cotton
1State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China
2Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, 210014, China
3Institute of Food Crops, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
*Corresponding Author: Kang Liu. Email: firstname.lastname@example.org
#These authors have contributed equally to this work
Received: 13 April 2021; Accepted: 31 May 2021
Abstract: Villin (VLN) is considered to be one of the most important actin-binding proteins, participates in modulating the actin cytoskeleton dynamics, plays essential role in plant development and resisting adverse environments. However, systematic studies of the VLN gene family have not been reported in cotton (Gossypium). In this study, 14 GhVLNs were identified in G. hirsutum. These GhVLN genes were distributed in 6 A-subgenome chromosomes and 6 D-subgenome chromosomes of the allotetraploid upland cotton and classified into three phylogenetical groups based on the classification model of AtVLNs. In addition, the 14 GhVLN genes have highly conserved gene structure and motif architecture. The number of introns was ranged from 18 to 22 and the length of protein sequences was varied from 901 to 1077. Six gelsolin homology domains, G1–G6, and villin headpiece domain, VHP, were existed in all GhVLNs with the exception of two VLNs (GhVLN6 and GhVLN13) which lacked VHP. Cis-elements analysis revealed that the promoter regions of GhVLNs contained various light related components and also elements responsible for phytohormones and stresses response, indicating that, when subjected to those adverse environments, cotton plants may activate the response system by targeting VLN genes to survive the crisis. Heatmaps showed that the GhVLN genes exhibited various expression patterns, some were accumulated in certain tissues, root, petal, stamen or elongating fibers, and some were obviously induced by environmental changes. Especially GhVLN3 and GhVLN10 were highly and preferentially expressed in elongating fibers and distinctly upregulated by abiotic (salt, PEG, cold and heat) and biotic (Verticillium dahliae V991) stresses. This study may provide useful information for biological function identification of GhVLN genes and gene resources for creating high-quality and various resistant cotton germplasms.
Keywords: Gossypium hirsutum; villin proteins; cis-elements; abiotic and biotic stresses
In the process of growth and development, plants are inevitably subjected to various types of abiotic and biotic stresses, whether long-or short-term. The biotic stresses including fungi, bacteria, oomycetes, nematodes, insects, and weeds cause diseases or physical damages to plants. On the other hand, effects by drought, salinity, floods, extreme temperatures, etc., are included in abiotic stresses. Fortunately, plants have developed various mechanisms in order to overcome these threats. In this series of response mechanisms, the actin cytoskeleton plays an extremely vital role.
The actin cytoskeleton is a highly organized and dynamic system that not only involves in direction of cell growth, maintenance of cell morphogenesis, but also participates in responses to numerous environmental stresses by actin filaments assembling, depolymerization, and bundling [1–6]. In the apical and subapical regions of extending pollen tubes, actin filaments would be persistently polymerized and bundled, and finally generated two structures, the one is the shank-localized bundles and the other one is the “apical actin structure” [6–8]. Low temperature induced the actin filaments depolymerization, moreover, the Ca2+ influx and the expression level of the cold acclimatization-specific gene (cas) could be decreased by stabilization of actin cytoskeleton, but increased by depolymerization of actin filaments . Additionally, under salt and osmotic stresses, facilitation of the actin filaments polymerization rescued plants from death, however blocking the polymerizing process or depolymerizing the actin filaments obviously reduced the plant survival rate [4,10]. Moreover, actin filament arrays and actin bundles were increased around the pathogen infection sites and the changes were related to various plant immune defense response processes [11–13].
The dynamic rearrangement of actin cytoskeleton is directly modulated by actin-binding proteins (ABPs). Villin (VLN) participates in modulating the actin cytoskeleton dynamics, including nucleating, capping, severing, and bundling actin filaments, is considered to be one of the most important ABPs [14–23]. VLN proteins contain six gelsolin homology domains, G1–G6, and an extra villin headpiece domain, VHP, at C-terminus, belong to villin/gelsolin/fragmin superfamily . Previous studies showed that VLNs play various important roles in many cellular processes [14–23]. In Arabidopsis thaliana genome, five villins were identified and classified into three phylogenetical groups . AtVLN1 was classified into group I alone, AtVLN2 and AtVLN3 belonged to group II, and group III contained AtVLN4 and AtVLN5. AtVLN1 bundled and stabilized actin filaments, however, had no actin-nucleating,-severing or-capping activity [21,24]. The other four AtVLNs have been reported to participate in actin nucleation, bundling, severing, and capping actin filaments, in other words, AtVLN2, AtVLN3, AtVLN4 and AtVLN5 are all-rounders [18–21,25]. Additionally, biological function analysis discovered that AtVLN1-5 were related to plant cell polar growth or diffuse growth [18–21,25]. OsVLN2 was also detected to affect plant architecture through regulating actin cytoskeleton dynamics . Moreover, in a previous study, we isolated a cotton villin gene (Gh_D08G0231 in this study) and demonstrated that this gene was involved in cell elongation and multiple biotic and abiotic stresses via regulation of actin organization [14,15].
For more information about the villins in cotton (Gossypium), the important economic crop that produces natural fibers, in this study, we genome-widely researched this important family based on the whole genome sequencing of the tetraploid cotton G. hirsutum and the release of publicly sequencing data [26,27]. We identified 14 GhVLN genes and further analyzed their physicochemical properties, chromosomal distribution, phylogenetic relationships, gene structures, and conserved motifs. Additionally, cis-elements analysis of the GhVLN promoters and expression profiling of GhVLNs in various cotton tissues and organs, as well as the different GhVLNs expression patterns under biotic (Verticillium dahliae V991) and abiotic (salt, PEG, cold and heat) stresses, revealed that the transcripts of GhVLNs differently accumulated in cotton tissues and GhVLNs may participate in responding to different environmental changes in cotton.
2 Materials and Methods
2.1 Identification and Sequence Retrieval of VLN Family Members
The HMM files (PF00626 and PF02209) of the conserved gelsolin domain and villin headpiece domain were downloaded from the Pfam database (https://pfam.xfam.org/) . Putative VLNs identified from G. hirsutum (http://mascotton.njau.edu.cn/), G. arboretum (http://cgp.genomics.org.cn), G. herbaceum (https://www.cottongen.org/), G. raimondii (http://www.phytozome.net/) were verified by the HMMER software with the default parameters . AtVLNs were also genome-widely searched against Arabidopsis thaliana genome sequencing data (https://www.arabidopsis.org/) and OsVLNs were obtained from Oryza sativa published genomes (https://phytozome.jgi.doe.gov/pz/portal.html). Furtherly, gene prediction programs SMART (http://smart.embl-heidelberg.de/) and FGENESH (http://www.softberry.com/berry.phtml) were used to correct and to detect the G1–G6 and VHP conserved motifs. Finally, the VLN genes were confirmed by BLASTp. Remove the VLNs that did not contain the gelsolin domain and retain proteins that contained the gelsolin domain but not the VHP domain. These VLN genes were named according the position on the chromosome. The amino acid length, isoelectric point (pI), theoretical molecular weight (Mw), and subcellular localization of the identified VLNs were predicted and calculated by submitting the protein sequences to the online program CELLO v2.5 (http://cello.life.nctu.edu.tw/)  and Pepstats (https://www.ebi.ac.uk/Tools/seqstats/emboss_pepstats/) .
2.2 Phylogenetic Analysis of VLN Proteins
The ClustalW program was used to perform the multiple sequence alignment of VLNs from G. hirsutum, G. arboretum, G. herbaceum, G. raimondii, A. thaliana, and Oryza sativa with the default parameters . The unrooted phylogenetic tree was constructed by employing the alignment sequences using the neighbor-joining (NJ) method in MEGA 7.0 software, with 1000 replicates boot-strap test .
2.3 Gene Structure and Conserved Motif Analysis
The online program GSDS 2.0  (http://gsds.cbi.pku.edu.cn/) was employed to draw the exon/intron structures of GhVLN genes. And the conserved motifs of GhVLNs were identified using one other online program MEME 5.3.3  (http://meme-suite.org/tools/meme).
2.4 Cis-Elements Analysis
The promoter regions of 14 GhVLN genes were extracted from G. hirsutum genome database, which contains 2000 bp upstream of “ATG” of GhVLNs open reading frames. The online program PlantCARE  (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/search_CARE.html) was used to predict the cis-acting elements in the 14 GhVLNs promoter regions.
2.5 Expression Pattern Analysis
For analyzing the expression profiling of GhVLN genes, we downloaded the cotton RNA-seq data [26,37], including vegetative and reproductive tissues (stamen, root, stem, leaf, petal, −3, 0 or 3 DPA ovules, 5, 10, 20 and 25 DPA fibers), and responding to abiotic stresses (salt, PEG, heat and cold) and biotic stress (V. dahliae V991). Expression patterns were displayed in TBtools .
3.1 Identification of VLN Genes and Chromosomal Distribution Analysis
In this study, 14 VLN genes from upland cotton G. hirsutum, 7 from G. arboreum, 7 from G. herbaceum, 7 from G. raimondii, 5 from Arabidopsis, and 5 from rice were identified. 7 out of the 14 genes identified from upland cotton were distributed on the A subgenome, and the other 7 genes were mapped on the D subgenome. These 14 VLNs in G. hirsutum were named GhVLN1 to GhVLN14 in turn, according to the distribution of VLNs on chromosomes. The VLNs in G. arboreum, G. herbaceum and G. raimondii were named GaVLN1 to GaVLN7, GheVLN1 to GheVLN7, GrVLN8 to GrVLN14 in turn, according to their homology VLNs in G. hirsutum. The gene names and their corresponding gene ID numbers were listed in Tab. 1 and Tab. S1.
The length of GhVLNs protein sequences was varied from 901 to 1077 aa, and the average length was 963 aa. The other physicochemical parameters were also shown in Tab. 1, including the molecular weight (MW) of GhVLNs which was ranged from 100.73 to 120.56 kDa with an average value of 106.60 kDa, and the isoelectric point (pI) of GhVLNs which was ranged from 4.99 to 6.61 with an average value of 5.74. Additionally, the subcellular localization was predicted and the results demonstrated great majority of the GhVLN genes were located in outermembrane (Tab. 1). The chromosome localization analysis shown that the 14 GhVLNs were unevenly distributed on 12 chromosomes of G. hirsutum acc. TM-1. A01/D01 homologous chromosomes contained two GhVLN genes, the other chromosomes contained only one GhVLN gene (Fig. 1). Besides a pair of homoeologous genes on chromosome A2 and D3, the other gene pairs of the GhVLN gene family were exhibited to locate on the same numbered A and D chromosomes. In addition, notably, the distribution of GhVLN genes on chromosomes was scattered, three of them on the top ends, five on the opposite ends, four distributed in the middle of the chromosome and the other two was detected in the scaffold 2268 and scaffold 3706 (Fig. 1).
3.2 Phylogenetic Relationship, Gene Structure and Conserved Motif Analysis of VLNs
To investigate the evolutionary relationships of VLNs identified from G. hirsutum, G. arboreum, G. herbaceum, G. raimondii, Arabidopsis and rice, 14 GhVLNs, 7 GaVLNs, 7 GheVLNs, 7 GrVLNs, 5 AtVLNs and 5 OsVLNs protein sequences were used to construct an unrooted phylogenetic tree (Fig. 2). Based on the formulated subfamilies in Arabidopsis [19,21,23,24], the 45 VLNs were classified into 3 subfamilies (Groups I–III). The Group II containing 8 GhVLNs, 4 GaVLNs, 4 GheVLNs, 4 GrVLNs, 2 AtVLNs and 1 OsVLN, was the largest subfamily. However, the Group I held the fewest VLN members, only containing 2 GhVLNs, 1 GaVLN, 1 GheVLN, 1 GrVLN, 1 AtVLN and 1 OsVLN genes (Fig. 2). Specially, the Group III contained 4 GhVLNs, 2 GaVLNs, 2 GheVLNs, 2 GrVLNs and 2 AtVLN genes, whereas 3 OsVLN genes. Overall, in this phylogenetic tree, each group contained a fair number of VLN genes from the six tested species, suggesting that VLN genes were relatively conserved in different species during evolution, also exposing the difference between monocotyledonous and dicotyledonous plants. Additionally, all Ka/Ks values were less than 1 (ranging 0.09–0.83) (Tab. S1), indicating that, during the long-term evolutionary process, VLN gene family had subjected to purifying selection in cotton species.
The GhVLN genes were also analyzed to investigate their exon/intron structure and conserved motif. As shown in Fig. 3, all of the GhVLNs contained a large number of exons and introns, showing the long genome sequences of GhVLNs. The number of introns of 14 GhVLNs was ranged from 18 to 22. In addition, to investigate whether the gene structure of GhVLNs is consistent with the evolutionary relationship, an unrooted phylogenetic tree was constructed employing the 14 GhVLN protein sequences. Combining the results of phylogenetic relationships and gene structure analysis, we found that evolutionarily closely related genes have similar exon/intron structures.
To discover the conserved motifs of GhVLNs, the MEME  was used by submitting the identified 14 GhVLN protein sequences to the online program. We detected 20 motifs and named them motif 1–20 accordingly (Fig. 4). The corresponding sequences and domain analysis of 20 identified motifs were shown in Tab. 2, and their motif logos were illustrated in Fig. S1. The length of the conserved motifs was ranged from 21 to 50 amino acids. The number of the conserved motifs in each GhVLN protein was varied from 17 to 20. All of the GhVLNs contained motif 1–16 and 18, while GhVLN4 and GhVLN11 lacked motif 17, 19 and 20, GhVLN6 and GhVLN13 lacked motif 13, GhVLN5 lacked motif 9 and 20, GhVLN7 and GhVLN14 lacked motif 9, 19 and 20. In other words, all GhVLN proteins contain six gelsolin homology domains, G1–G6, and villin headpiece domain, VHP, with the exception of two VLNs (GhVLN6 and GhVLN13) which lacked VHP. The sequence alignment of the conserved domains in GhVLNs and AtVLNs was shown in Fig. S2. Therefore, the gelsolin homology domains were highly conserved in cotton VLNs, moreover, there are similar types and number of motif architecture and exon/intron distribution pattern between GhVLN orthologs with close evolutionary relationship.
3.3 The Cis-Acting Elements Analysis of GhVLN Gene Promoters
When subjected to adverse environments or hormone stresses, what is the regulatory functions of VLN genes in cotton? In order to clarify this problem, we firstly extracted the 2000 bp upstream sequences of 14 GhVLN genes and submitted to the PlantCARE database for cis-acting element prediction. Besides 22 types of light response-related regulatory elements, there were several other cis-elements involved in stress response and phytohormone response in the promoter regions of GhVLNs. Stress response related cis-elements included defense and stress (TC-rich repeats), drought (MBS), WRKY binding site (W box), wound (WUN-motif), and low-temperature (LTR). While phytohormone response related cis-elements included abscisic acid (ABRE), ethylene (ERE), MeJA (CGTCA-motif and TGACG-motif), gibberellin (P-box, TATC-box and GARE-motif), salicylic acid (TCA-element), and auxin (TGA-element). As shown in Fig. 5, several promoter regions of GhVLNs contained a series of stress-responsive and hormone-responsive components, such as GhVLN6 (2 LTR, 3 WUN-motif, 1 MBS, 1 W box, 2 TC-rich repeats, 1 ERE, 1 P-box, 1 TCA-element, 2 TGACG-motif) and GhVLN10 (2 LTR, 1 MBS, 2 P-box, 3 TGACG-motif, 1 TC-rich repeats, 1 TCA-element, 3 CGTCA-motif). In addition, among the hormone-related elements, MeJA-responsive and SA-responsive elements appeared more frequently. These results indicated that GhVLNs possibly related to MeJA and SA hormone signaling pathways, and suggested the important roles of GhVLN genes in response to adverse environments stresses and phytohormone signals in cotton.
3.4 Expression Patterns Analysis of GhVLN Genes in Various Tissues and under Different Biotic and Abiotic Stresses
To explore the potential functions of GhVLNs, transcription patterns of all 14 GhVLNs in various cotton tissues and organs were investigated by analyzing transcriptome data [26,37]. GhVLNs showed different expression patterns (Fig. 6). GhVLN4, GhVLN11, GhVLN3, GhVLN10, GhVLN5, GhVLN12, GhVLN2 and GhVLN9 were detected at a high level of transcripts in all tested tissues. Whereas, GhVLN4 and GhVLN11 had the most transcription in petal, while GhVLN3, GhVLN10, GhVLN2 and GhVLN9 in elongating fibers. And in particular, one pair of homologous genes (GhVLN5 and GhVLN12) exhibited a similar expression pattern with relative changeless expression level in the tested tissues. Furthermore, some GhVLNs transcripts were accumulated mainly in one specific tissue. GhVLN6, GhVLN8 and GhVLN13 were such genes, they were preferentially expressed in cotton stamen, implying that these genes are necessary for the growth and development of cotton stamens. Therefore, the expression profiles of GhVLN genes demonstrated that members of VLN family in cotton were widely expressed in both vegetative tissues and reproductive tissues and which might endow them multiple biological functions.
The analysis of cis-elements in promoter regions suggest that GhVLNs possibly respond to biotic and abiotic stresses. In order to test this hypothesis, we furtherly researched the expression patterns of 14 VLNs under salt, PEG, cold and heat stresses, as well as V. dahliae V991 biotic stress. As shown in Figs. 7 and 8, the expression of most GhVLN genes were induced under biotic and abiotic stresses in varying degrees. GhVLN3, GhVLN10, GhVLN11, GhVLN5, GhVLN12 were highly expressed before and after all the abiotic and biotic stress treatments, and had the similar expression patterns under these stresses. Moreover, GhVLN2 and GhVLN9 showed significantly upregulated expression under salt, PEG and heat stresses, however, under cold stress, the upregulated expression was not conspicuous. On the other hand, some genes (GhVLN8, GhVLN13 and GhVLN14) exhibited marked downregulation in response to the abiotic stresses. In addition, some genes can respond to one or numbers of specific abiotic stresses. For instance, the transcripts of GhVLN1 were significantly decreased under salt, PEG and heat, but not cold, and GhVLN6 was significantly decreased only under salt and heat. Importantly, it should be noticed that, when subjected to biotic stress, some VLN genes exhibited opposite expression patterns in G. barbadense and G. hirsutum. GhVLN14 and GhVLN6 were such genes. Under V991 biotic stress, the expression of GhVLN14 was upregulated in G. hirsutum but downregulated in G. barbadense. While the transcripts of GhVLN6 were decreased in G. hirsutum but increased in G. barbadense. To verify the results of transcriptome data, we analyzed the expression pattern of GhVLN12 gene in different cotton tissues and under abiotic and biotic stresses by comparing data of qRT-PCR and FPKM. As shown in Fig. S3, GhVLN12 exhibited the same expression trend in both transcriptome and qRT-PCR analysis.
Plant villins are widely detected, from algae to land plants . Lots of VLN genes have been extensively analyzed in plant [14,15,17,23,40–43]. These studies demonstrated that VLNs play very important roles in plant growth and development and in resisting adverse environments. However, studies on cotton VLN gene family are relatively scarce so far.
In this study, we genome-widely identified for the first time the VLN family genes in cotton and systematically verified the chromosome distribution, phylogenetic relationship, gene structure and function of VLN genes. A total of 14 VLN genes were identified in G. hirsutum (Tab. 1). According to the classification model of AtVLN genes, 14 GhVLN genes were classified into three groups (Fig. 2), consistent with the previous classification of Arabidopsis [21,44]. The gene structure might decide its biological function . Cotton VLN orthologs shared highly similar motif architectures (Fig. 4). Villin conserved domain, G1–G6 and VHP motifs, existed in all GhVLN proteins, with the exception of two VLNs (GhVLN6 and GhVLN13) which lacked VHP. Moreover, due to the GhVLNs motifs shared highly similarity with AtVLN1-5 , we speculate that VLN isoforms of cotton possibly play similar functions with Arabidopsis.
G1–G6, each G domain has a different biochemical function, including binding, capping, cutting, and nucleation of MFs . The VHP domain at the C-terminal of villin proteins provides another microfilament binding site, which has microfilament bundle activity . Since 1998, the plant villin P-115-ABP and P-135-ABP [47,48] have been identified from lily (Lilium longiflorum) for the first time. More and more studies about VLN genes have been reported in recent years, including AtVLNs [18–21,24,25] and OsVLN2 . These studies have shown that plant villin has typical biochemical activities, which regulates the rearrangement of microfilaments and the formation of microfilament networks, exhibiting a significant impact on cell growth and cell morphology as well as responding to various abiotic and biotic stresses.
The expression pattern of one gene was determined in a certain degree by the cis-elements in its promoter regions . In promoter sequences of GhVLNs, we investigated multiple types of cis-elements (Fig. 5), suggesting that expression of GhVLNs might be regulated by various stresses and phytohormones. Further expression patterns analysis verified this inference. The identified GhVLN genes showed different expression patterns (Fig. 6). The transcripts of some VLN orthologs were obviously detected in all tested tissues, some VLN genes were mainly accumulated in specific tissues, and others were low detected. This result is consistency with other studies, the five Arabidopsis villin-like genes (AtVLN1-5) were also reported to have different expression patterns at high expression levels with preferential expression in certain tissues . In addition, when subjected to biotic and abiotic stresses, VLNs exhibited various expression patterns to response stresses (Figs. 7 and 8). These foundations suggest that VLN gene family is essential to cotton vegetative growth and fiber elongation, furthermore, there are functional differentiation, gained new functions, or lost functions probably presented in these cotton genes.
Additionally, a very interesting point caught our attention, that is the promoter regions of GhVLN5 and GhVLN12 containing only two types of cis-elements about SA and wound responses, however, the two genes induced highly in all the biotic and abiotic stress treatment. On the contrary, GhVLN6 and GhVLN13 have relatively more cis-elements, but expressed lowly in treatment. Therefore, we furtherly analyzed the promoter regions and found that in addition to cis-elements mentioned in Fig. 5, there are also MYB-and MYC-elements in all the VLN promoter regions. The MYB and MYC transcription factors have been reported to associated with drought, salt, heat, chilling stress responses and hormonal signaling pathways [50–55]. Transcription factor can activate or inhibit the expression of its target genes by combining with the cis-elements in the promoter regions of the genes. Perhaps, this is the reason for which the number of stress-related elements is inconsistent with the gene expression level in stress treatment. In our previous studies, GhVLN12 exhibited the actin-bundling activity, which induced cell extension by increasing the abundance of thick actin bundles , furthermore, this gene was reported to participate in salt and drought stress responses and required for tolerance to Verticillium Wilt in cotton . To date, these are only two biological function studies of VLN in cotton, which consistence with the results in this study. Taken together, we speculate that VLN genes may involve in root growth, petal and stamen development, fiber elongation, response to stresses (salt, drought, cold, heat, V. dahliae) and yield in cotton. Even so, the detailed function of each VLN genes remains to be researched in cotton.
In this study, we genome-widely identified the villin family genes in cotton. A total of 14 GhVLN genes were identified. A series of systematic analysis of GhVLNs in physicochemical properties, chromosomal distribution, phylogenetic relationships, gene structures, conserved motifs, cis-elements in promoter regions, and expression patterns revealed that some of them involved in cotton fiber elongation, and some VLN orthologs are associated with all tested or specific stress responses. Collectively, the data here might provide useful information for well functional verification of GhVLN genes in cotton and gene resources for creating high-quality and various resistant cotton germplasms in future.
Acknowledgement: We especially thank Dr. Dayong Zhang (Nanjing Agricultural University, China) for providing cotton samples and figure production of chromosomal distribution of VLN genes.
Funding Statement: This work was financially supported by the National Natural Science Foundation of China (No. 31801408), the Natural Science Foundation of Jiangsu Province, China (No. BK20180517).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|