Open Access
ARTICLE
Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data
Yu Jiang1, 2, Dengwen Yu1, Mingzhao Zhao1, 2, Hongtao Bai1, 2, Chong Wang1, 2, 3, Lili He1, 2, *
1 College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
2 A Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun, 130012, China.
3 Department of Engineering Mechanics, State Marine Technical University of St. Petersburg, St. Petersburg, 190008, Russia.
* Corresponding Author: Lili He. Email: .
Computers, Materials & Continua 2020, 64(1), 207-216. https://doi.org/10.32604/cmc.2020.09861
Received 22 January 2020; Accepted 12 February 2020; Issue published 20 May 2020
Abstract
Semi-supervised clustering improves learning performance as long as it uses a
small number of labeled samples to assist un-tagged samples for learning. This paper
implements and compares unsupervised and semi-supervised clustering analysis of BOAArgo ocean text data. Unsupervised K-Means and Affinity Propagation (AP) are two
classical clustering algorithms. The Election-AP algorithm is proposed to handle the final
cluster number in AP clustering as it has proved to be difficult to control in a suitable
range. Semi-supervised samples thermocline data in the BOA-Argo dataset according to
the thermocline standard definition, and use this data for semi-supervised cluster analysis.
Several semi-supervised clustering algorithms were chosen for comparison of learning
performance: Constrained-K-Means, Seeded-K-Means, SAP (Semi-supervised Affinity
Propagation), LSAP (Loose Seed AP) and CSAP (Compact Seed AP). In order to adapt
the single label, this paper improves the above algorithms to SCKM (improved
Constrained-K-Means), SSKM (improved Seeded-K-Means), and SSAP (improved
Semi-supervised Affinity Propagationg) to perform semi-supervised clustering analysis
on the data. A DSAP (Double Seed AP) semi-supervised clustering algorithm based on
compact seeds is proposed as the experimental data shows that DSAP has a better
clustering effect. The unsupervised and semi-supervised clustering results are used to
analyze the potential patterns of marine data.
Keywords
Cite This Article
Y. Jiang, D. Yu, M. Zhao, H. Bai, C. Wang
et al., "Analysis of semi-supervised text clustering algorithm on marine data,"
Computers, Materials & Continua, vol. 64, no.1, pp. 207–216, 2020. https://doi.org/10.32604/cmc.2020.09861
Citations