Table of Content

Open Access iconOpen Access

ARTICLE

crossmark

Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data

Yu Jiang1, 2, Dengwen Yu1, Mingzhao Zhao1, 2, Hongtao Bai1, 2, Chong Wang1, 2, 3, Lili He1, 2, *

1 College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
2 A Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun, 130012, China.
3 Department of Engineering Mechanics, State Marine Technical University of St. Petersburg, St. Petersburg, 190008, Russia.

* Corresponding Author: Lili He. Email: email.

Computers, Materials & Continua 2020, 64(1), 207-216. https://doi.org/10.32604/cmc.2020.09861

Abstract

Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning. This paper implements and compares unsupervised and semi-supervised clustering analysis of BOAArgo ocean text data. Unsupervised K-Means and Affinity Propagation (AP) are two classical clustering algorithms. The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range. Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition, and use this data for semi-supervised cluster analysis. Several semi-supervised clustering algorithms were chosen for comparison of learning performance: Constrained-K-Means, Seeded-K-Means, SAP (Semi-supervised Affinity Propagation), LSAP (Loose Seed AP) and CSAP (Compact Seed AP). In order to adapt the single label, this paper improves the above algorithms to SCKM (improved Constrained-K-Means), SSKM (improved Seeded-K-Means), and SSAP (improved Semi-supervised Affinity Propagationg) to perform semi-supervised clustering analysis on the data. A DSAP (Double Seed AP) semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect. The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data.

Keywords


Cite This Article

APA Style
Jiang, Y., Yu, D., Zhao, M., Bai, H., Wang, C. et al. (2020). Analysis of semi-supervised text clustering algorithm on marine data. Computers, Materials & Continua, 64(1), 207-216. https://doi.org/10.32604/cmc.2020.09861
Vancouver Style
Jiang Y, Yu D, Zhao M, Bai H, Wang C, He L. Analysis of semi-supervised text clustering algorithm on marine data. Comput Mater Contin. 2020;64(1):207-216 https://doi.org/10.32604/cmc.2020.09861
IEEE Style
Y. Jiang, D. Yu, M. Zhao, H. Bai, C. Wang, and L. He "Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data," Comput. Mater. Contin., vol. 64, no. 1, pp. 207-216. 2020. https://doi.org/10.32604/cmc.2020.09861

Citations




cc Copyright © 2020 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 3089

    View

  • 1698

    Download

  • 0

    Like

Related articles

Share Link