TY  - EJOU
AU  - Sicilia, Miguel-Angel 
AU  - García-Barriocanal, Elena 
AU  - Mora-Cantallops, Marçal 
AU  - Sánchez-Alonso, Salvador 
AU  - González, Lino 

TI  - Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques
T2  - Computers, Materials \& Continua

PY  - 2021
VL  - 68
IS  - 2
SN  - 1546-2226

AB  - Existing studies have challenged the current definition of named bacterial species, especially in the case of highly recombinogenic bacteria. This has led to considering the use of computational procedures to examine potential bacterial clusters that are not identified by species naming. This paper describes the use of sequence data obtained from MLST databases as input for a k-means algorithm extended to deal with housekeeping gene sequences as a metric of similarity for the clustering process. An implementation of the k-means algorithm has been developed based on an existing source code implementation, and it has been evaluated against MLST data. Results point out to potential bacterial clusters that are close to more than one different named species and thus may become candidates for alternative classifications accounting for genotypic information. The use of hierarchical clustering with sequence comparison as similarity metric has the potential to find clusters different from named species by using a more informed cluster formation strategy than a conventional nominal variant of the algorithm.
KW  - Clustering; bacterial species; k-means; sequence alignment

DO  - 10.32604/cmc.2021.015874