Impact of Distance Measures on the Performance of AIS Data Clustering

Marta Mieczyńska1,*, Ireneusz Czarnowski2
1 Department of Maritime Telecommunications, Gdynia Maritime University, Morska 81-87, 81-225, Gdynia, Poland
2 Department of Information Systems, Gdynia Maritime University, Morska 81-87, 81-225, Gdynia, Poland
Computer Systems Science and Engineering 2021, 36(1), 69-82.

Received 13 September 2020; Accepted 01 November 2020; Issue published 23 December 2020


Automatic Identification System (AIS) data stream analysis is based on the AIS data of different vessel’s behaviours, including the vessels’ routes. When the AIS data consists of outliers, noises, or are incomplete, then the analysis of the vessel’s behaviours is not possible or is limited. When the data consists of outliers, it is not possible to automatically assign the AIS data to a particular vessel. In this paper, a clustering method is proposed to support the AIS data analysis, to qualify noises and outliers with respect to their suitability, and finally to aid the reconstruction of the vessel’s trajectory. In this paper, clustering results have been obtained using selected algorithms, including k-means, k-medoids, and fuzzy c-means. Based on the clustering results, it is possible to decide on the qualification of data with outliers and on their usefulness in the reconstruction of the vessel trajectory. The main aim of this paper is to answer how different distance measures during a clustering process can influence AIS data clustering quality. The main core question is whether or not they have an impact on the process of reconstruction of the vessel trajectories when the data are damaged. The research question during the computational experiments asked whether or not distance measure influence AIS data clustering quality. The computational experiments have been carried out using original AIS data. In general, the experiment and the results confirm the usefulness of the cluster-based analysis when the data include outliers that are derived from the natural environment. It is also possible to monitor and to analyse AIS data using clustering when the data include outliers. The computational experiment results confirm that the k-means with Euclidean distance has the best performance.


AIS; SAT-AIS; AIS data stream; clustering; maritime data analysis

