Table of Content

Open Access iconOpen Access



Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data

Yu Jiang1, 2, Dengwen Yu1, Mingzhao Zhao1, 2, Hongtao Bai1, 2, Chong Wang1, 2, 3, Lili He1, 2, *

1 College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
2 A Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun, 130012, China.
3 Department of Engineering Mechanics, State Marine Technical University of St. Petersburg, St. Petersburg, 190008, Russia.

* Corresponding Author: Lili He. Email: email.

Computers, Materials & Continua 2020, 64(1), 207-216.


Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning. This paper implements and compares unsupervised and semi-supervised clustering analysis of BOAArgo ocean text data. Unsupervised K-Means and Affinity Propagation (AP) are two classical clustering algorithms. The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range. Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition, and use this data for semi-supervised cluster analysis. Several semi-supervised clustering algorithms were chosen for comparison of learning performance: Constrained-K-Means, Seeded-K-Means, SAP (Semi-supervised Affinity Propagation), LSAP (Loose Seed AP) and CSAP (Compact Seed AP). In order to adapt the single label, this paper improves the above algorithms to SCKM (improved Constrained-K-Means), SSKM (improved Seeded-K-Means), and SSAP (improved Semi-supervised Affinity Propagationg) to perform semi-supervised clustering analysis on the data. A DSAP (Double Seed AP) semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect. The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data.


Cite This Article

APA Style
Jiang, Y., Yu, D., Zhao, M., Bai, H., Wang, C. et al. (2020). Analysis of semi-supervised text clustering algorithm on marine data. Computers, Materials & Continua, 64(1), 207-216.
Vancouver Style
Jiang Y, Yu D, Zhao M, Bai H, Wang C, He L. Analysis of semi-supervised text clustering algorithm on marine data. Comput Mater Contin. 2020;64(1):207-216
IEEE Style
Y. Jiang, D. Yu, M. Zhao, H. Bai, C. Wang, and L. He "Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data," Comput. Mater. Contin., vol. 64, no. 1, pp. 207-216. 2020.


cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2878


  • 1604


  • 0


Related articles

Share Link