Open Access
ARTICLE
PointNMSA: An Improved PointNeXt Network with Non-Local Multi-Scale Aggregation for 3D Point Cloud Semantic Segmentation
College of Information Engineering, Shanghai Maritime University, Shanghai, China
* Corresponding Author: Chenlu Huang. Email:
Computers, Materials & Continua 2026, 88(2), 68 https://doi.org/10.32604/cmc.2026.078692
Received 06 January 2026; Accepted 20 April 2026; Issue published 15 June 2026
Abstract
Three-dimensional (3D) point cloud semantic segmentation is a core task in indoor scene understanding, providing detailed semantic information about spatial structures and object categories in indoor environments. Although methods based on deep learning have made steady progress in recent years, accurately segmenting complex indoor scenes remains challenging due to the unordered nature of point clouds and variations across large scales. Most existing networks have limited capability for multi-scale feature aggregation and struggle to balance local geometric details with global semantic context. These issues are further exacerbated by hierarchical downsampling, which often leads to the loss of fine-grained structural information. Moreover, feature interaction restricted to local neighborhoods may limit the capture of non-local semantic dependencies in complex indoor scenes. To address these limitations, we propose PointNMSA (PointNeXt with Non-local Multi-Scale Aggregation), an improved semantic segmentation network built upon the PointNeXt backbone. A Multi-Scale Feature Enhancement (MSFE) module is introduced in the decoding stage to fuse features from different encoding levels, and further refines the fused features to produce more stable multi-scale representations, which preserves geometric details across scales. In addition, a Convolution-Attention Mixing (CA-Mix) module is designed to jointly integrate local spatial structures and non-local contextual dependencies via dual-stream aggregation and multi-dimensional attention fusion, thereby enabling more discriminative feature representations. Experiments on the Stanford Large-Scale 3D Indoor Spaces (S3DIS) benchmark demonstrate the effectiveness of PointNMSA. On the Area 5 test split, PointNMSA achieves a mean intersection over union (mIoU) of 65.10%, outperforming the PointNeXt baseline by 1.59%, while introducing only a modest increase in computational cost (latency from 42.24 to 45.18 ms and parameters from 3.16 to 8.67M). Despite the noticeable growth in parameter count, the increase in inference latency remains relatively limited, indicating a favorable trade-off between segmentation accuracy and computational efficiency. Additional cross-dataset experiments on ScanNet further verify that PointNMSA maintains stable gains under different indoor scene distributions. Such performance gains suggest that PointNMSA provides a more robust and generalizable solution for semantic segmentation in large-scale indoor environments with complex structural layouts.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools