Critical Patient Image Data Acquisition Strategy by Exploiting Edge Intelligence and Dynamic-Static Synergy in Smart Healthcare

Kiran Singh; Prabh Singh; Narinder Kaur; Jawad Khan; Dildar Hussain; Yeong Gu

doi:10.32604/cmes.2026.080915

icon Open Access

ARTICLE

Critical Patient Image Data Acquisition Strategy by Exploiting Edge Intelligence and Dynamic-Static Synergy in Smart Healthcare

Kiran Deep Singh¹, Prabh Deep Singh², Narinder Kaur³, Jawad Khan^4,*, Dildar Hussain⁵, Yeong Hyeon Gu^5,*

1 Department of Computer Science & Engineering, Chitkara University Insitute of Engineering and Technology, Chitkara University, Punjab, India
2 Department of Computer Science & Engineering, Graphic Era (Deemed to be University), Dehradun, Uttrakhand, India
3 Department of Computer Science & Engineering, Chandigarh University, Mohali, Punjab, India
4 School of Computing, Gachon University, Seongnam, Republic of Korea
5 Department of AI and Data Science, Sejong University, Seoul, Republic of Korea

* Corresponding Authors: Jawad Khan. Email: email ; Yeong Hyeon Gu. Email: email

(This article belongs to the Special Issue: Artificial Intelligence Models in Healthcare: Challenges, Methods, and Applications)

Computer Modeling in Engineering & Sciences 2026, 147(2), 45 https://doi.org/10.32604/cmes.2026.080915

Received 09 March 2026; Accepted 15 April 2026; Issue published 27 May 2026

Abstract

In smart healthcare systems, Image data of critical patients is essential in controlling and diagnosing the disease development. To acquire the medical images, traditional methods encountered the difficulty of generating cost-effective data. This research work introduces a novel and innovative approach to collect high-quality image data from individuals with atypical clinical presentations. Initially, a new Internet of Medical Things (IoMT) image collection architecture is introduced. This design uses edge intelligence and motion-static synergy to make it easier to record both coarse-grained and fine-grained patient images. This study introduces an image acquisition technique that leverages edge intelligence and collaborative static-dynamic monitoring, exemplified in intensive care units, to improve the efficiency and data value of image acquisition in healthcare IoMT settings. This approach revolves around the three distinct steps. To begin with, an advanced YOLO-based clinical abnormality detection is implemented by the edge server to identify patients affected by abnormal physiological conditions. The images from affected patients are captured by static monitoring nodes. In the next phase, coordinate calculation methods for the localization of abnormal patients and quantification techniques for severity assessment are introduced. The final step involves the intervention of a path optimization algorithm for mobile medical assistive robots using severity metrics and principles of ant colony optimization. Ultimately, algorithmic performance evaluations at every phase indicate that acquisition efficiency and image data value surpass traditional methodologies.

Keywords

Edge intelligence; image acquisition; motion collaboration; path optimization; medical IoT; patient monitoring

1 Introduction

Health care systems represent a highly vital infrastructure that promotes the well-being of the population and the economy across the world. In order to speed up the process of healthcare informatization and modernization, scholars have offered the concepts of smart healthcare and automated hospital paradigms, using technologies of the Internet of Medical Things (IoMT) and artificial intelligence [1]. Particularly, patient image data are significant sources of information flows to these new healthcare models [2]. These types of image data can be used to monitor the conditions of patients and obtain more detailed clinical data, such as skin lesion abnormalities, postural defects, vital signs indicators, disease manifestations, patient gait patterns, and wound sizes. It is clear that ideal methods of acquiring image data represent a research challenge centrality issue [3].

As of late, many researchers have used cameras in hospital wards or laboratory environments to take images of patients, with the focus mostly on the feasibility of automatic detection/diagnostic algorithms that may be used to detect medical emergencies and disease conditions. Other methods involve the use of handheld portable image capture devices in order to capture images of patients. The methodologies are, however, not applicable in situations where images of patients are required in inaccessible areas or in an emergency, and are not capable of being applied in real-time scenarios, and even lack real-time capability in capturing images of patients in a hard-to-reach area or during an emergency, among others [4]. As a remedy to this shortcoming, scholars have equipped network cameras around hospital premises according to the infrastructure of IoMT. However, because of the immobility constraints, such systems will only record images in a particular zone of monitoring. Moreover, researchers have used the image capture nodes on mobile devices, such as autonomous medical robots or drone inspection systems [5].

A synergistic solution with motion-based and static-based images is used to offer more comprehensive, accurate, and real-time healthcare monitoring and management. Furthermore, data processing load on the cloud is relieved by the use of edge servers in integrating and processing sensed data that is being monitored by endpoint devices. Implementation of surveillance cameras in all hospital facilities will provide comprehensive monitoring of patients, and mobile medical robots will allow making assessments and interventions individually [6]. Through the integration of both image acquisition modalities, an early warning and real-time monitoring system based on edge intelligence can be implemented, which would allow managing healthcare accurately, enhance efficiency in patient outcomes, and minimize the waste of resources [7].

The presented case study of an intensive care unit (ICU) serves as an entry point because the patients who are characterized by abnormal clinical conditions (e.g., patients showing changes in appearance caused by disease progression or emergencies) are considered. The use of edge computing is aimed at reducing the costs of image transmissions and enhancing the efficiency of the data, and mobile medical robots are used to obtain finer-grained photos of patients. Finally, an approach to obtaining high-quality image data of critical patients with the edge intelligence and motion-static synergy is suggested.

Clarification of Novelty and Clinical Contribution

This is even though lightweight detectors and heuristic optimizers are independently described in earlier literature; the originality of the work is their severity-conscious combination into an edge-enabled ICU imaging system. YOLO-CAD is not developed as a compression variant of YOLOv7-tiny, but is internally tailored to the high-occlusion ICU setting with backbone reduction by ShuffleNet, feature fusion by GSConv, Mish activation stabilization, and imbalance-sensitive localization by WIoUv3.

Similarly, the proposed IHO algorithm redefines the optimization objective from shortest-path traversal to clinically weighted prioritization using PRSI-guided pheromone modulation. Furthermore, the introduction of Clinical Data Value Density (CDVD) shifts evaluation from geometric efficiency to clinically meaningful data acquisition density.

Thus, the contribution of this work lies in healthcare-specific objective redefinition and system-level integration rather than isolated algorithmic modification. Unlike works focusing on isolated model improvements, this study presents a system-level integration framework that combines edge intelligence, vision-based severity estimation, and robotic path planning for ICU environments. The novelty lies in the coordinated design and interaction of these components to enable real-time, non-invasive, and resource-efficient patient monitoring.

The primary contribution of this research work is enumerated as follows:

1. Robotic systems and surveillance cameras that take images at specified intervals are prevalent methods of imaging technology. These approaches generate numerous visuals, yet possess little capacity for comprehension. This method facilitates the gathering of picture data using edge intelligence and dynamic-static collaboration, enhancing the detail of clinically pertinent patient information while reducing the capture of non-clinically relevant images. This method enhances the capabilities of healthcare IoMT image capture devices and elevates the significance of the gathered data.

2. A rapid detection methodology for patients exhibiting abnormal clinical conditions based on YOLOv7 (termed YOLO-CAD) is proposed to filter images collected by static acquisition nodes (fixed cameras) and retain image data of critical patients. This approach not only reduces data transmission overhead but also identifies targets for mobile robots to execute fine-grained image acquisition tasks.

3. A patient localization method that relies on monocular vision may encounter problems associated with binocular stereo vision, depth cameras, laser ranging, and other positioning methods. These problems include increased cost, the need for specialized lighting, and complex computer processing. It has been suggested to use RGB images for quick identification of patients in hospitals, which will aid the robots in their work.

4. This manuscript introduces a heuristic path optimization algorithm grounded in patient severity assessment to facilitate mobile image acquisition nodes (medical robotics) in efficiently and accurately navigating to critical patient locations. This will enable the robots to obtain high-resolution images of critical patients more efficiently by minimizing the path length and acquisition time, while also decreasing the collection of clinically irrelevant data. The robot path planning approach determines optimal data acquisition routes by prioritizing patient considerations and maximizing path efficiency.

The remainder of this manuscript is organized as follows. Section 2 provides a comprehensive overview of state of the art menthods based on image data in healthcare IoMT environments. Section 3 explicates the overall framework of the proposed image acquisition methodology and metrics employed for system performance evaluation. Section 4 proposes rapid methods for detecting and localizing critical patients, alongside intelligent scheduling methods for medical robots. Section 5 presents experimental validation and analytical assessment of the proposed methodology. Finally, Section 6 concludes this manuscript.

2 Related Works

This section provides a concise overview of representative image acquisition techniques and their applications within healthcare IoMT environments.

2.1 Task-Oriented Medical Image Acquisition Methods

Literature describes modern hospital prototypes based on IoMT and image processing technologies employing multiple cameras to capture images and assess patient conditions. Research describes high-quality image acquisition systems for clinical environments based on DSLR cameras for remote camera access, capture control, and image transmission. Studies develop automated image acquisition systems for patient monitoring to provide solutions for automated health assessment prediction [8]. Image acquisition methods for medical assistance robots are discussed, proposing look-up capture methodologies to acquire patient images. Research utilizes RGB images acquired by UAV imaging systems within hospital complexes, dividing images into distinct blocks for anomaly detection [9]. Literature describes image acquisition systems for patient wards and provides image analysis processes. Reviews examine the application of small unmanned aerial systems in healthcare precision medicine, emphasizing UAV imagery as a low-cost alternative to high-resolution monitoring systems [10].

2.2 Patient Abnormality Detection Methods

The detection and prevention methods of different manifestations of diseases in patients are discussed in the literature where threshold-level algorithms are proposed in order to detect the presence of a disease state in the patient [11]. Studies present disease and emergency detection systems based on artificial intelligence with the use of deep convolutional neural networks and transfer learning, producing numerous comparative models of the systems under consideration [12]. Research suggests the usage of deep learning models in critical condition detection. Research also speaks about disease detection and classification schemes using local binary patterns and support vector machines, and has been able to classify several pathological diseases successfully. Literature integrates deep learning models to classify patients and detect disease through multispectral and RGB images, and uses the class activation maps to interpret the prediction outcomes.

2.3 Data Processing Methods for Edge Intelligence in Healthcare

Literature proposes hierarchical data processing architectures reducing communication bottlenecks and energy consumption, applying edge computing to sensor data processing and analysis in precision healthcare, aggregating and reconstructing data through fog computing nodes [13]. Research explores the application of intelligent edge computing in medical IoT image target detection, performing data processing and analysis at network edges to achieve real-time target detection, reducing data transmission latency and bandwidth consumption while improving processing efficiency [14]. Studies utilize lightweight edge mining algorithms to compress medical data within wireless sensor networks, addressing challenges of poor Internet connectivity and memory constraints of IoMT devices in hospital environments [15]. Literature proposes edge computing frameworks for collaborative video processing in multimedia IoMT systems, leveraging computational and communication capabilities of resource-rich mobile devices to extract features from videos and avoid bandwidth constraints. Research proposes edge computing-based target detection architectures for distributed and efficient target detection over wireless communication. Literature proposes data filters based on edge intelligence to resolve problems of large volumes of irrelevant data causing communication link congestion [16,17]. Research combines edge computing and IoMT to construct lightweight patient lifecycle data sensing frameworks for multi-parameter and mobile sensing, proposing data-driven algorithms to optimize sensing parameters, reduce redundancy, and improve correlation between sensed data and patient health stages [18].

The comparison in Table 1 highlights that existing works primarily focus on either edge intelligence or robotic systems independently. In contrast, the proposed framework integrates vision-based severity estimation with edge-aware robotic path planning, enabling a unified and task-driven solution for ICU monitoring.

3 System Model

This section describes the methodology for acquiring high-quality image data of critical patients, subsequently explaining methodologies employed to assess patient criticality severity and image value acquisition density.

3.1 Dataset Description

The experiments were conducted using the publicly available Hospital Scene Dataset, which consists of 1680 images collected from real hospital environments. The dataset is accessible via the official GitHub repository [23]. The images are categorized into multiple hospital scene classes and are annotated following a structured labeling protocol provided by the dataset creators. We performed a detailed analysis of class distribution to ensure balanced training and evaluation. All images were resized and preprocessed before model training. To enhance robustness and reduce overfitting, data augmentation techniques such as rotation, flipping, and scaling were applied.

3.2 Data Annotation and Partitioning

The dataset used in this study was annotated to identify clinically relevant abnormal regions in ICU patient images. Annotations were performed using bounding boxes to capture areas indicative of patient risk, such as abnormal posture, presence of critical medical devices, or visible distress patterns. The labeling process was conducted manually following predefined visual criteria to ensure consistency across samples. To improve annotation reliability, cross-verification was performed, and ambiguous cases were reviewed to minimize subjectivity. The abnormality labels are therefore derived from structured visual assessment rather than automated labeling, ensuring alignment with real-world clinical observation practices. To prevent data leakage and ensure fair evaluation, a patient-level data separation strategy was enforced. Specifically, all images corresponding to a single patient were assigned exclusively to either the training set or the testing set, but not both. This approach ensures that the model is evaluated on unseen patient data, thereby improving generalization and avoiding overestimation of performance.

3.3 Design of Image Data Acquisition Framework

The structure of the image acquisition system proposed in this manuscript is illustrated in the Fig. 1, taking an example of an intensive care unit (ICU) patient. The methodology develops as follows: Initially, stationary monitoring nodes compile comprehensive images of the hospital ward environment, then edge servers use advanced YOLO-CAD target detection models to quickly detect critical patients who require intensive clinical attention, then calculate geolocation from image information and camera parameters [24]. Additionally, the severity of each critical patient is quantitatively assessed based on the data generated by the target detection model, including confidence levels and abnormal region metrics. Finally, edge servers guide mobile medical robots, allowing them to reach critical patients and collect microscopic images containing detailed clinical information. These images are transmitted to physicians or cloud infrastructure for timely adjustments in further analysis and treatment strategies pertaining to the patient’s specific conditions (such as disease identification).

images

Figure 1: High-quality image data acquisition model for critical patients based on edge intelligence and motion-static synergy.

3.4 Acquisition Methods for Fine-Grained Clinical Images

This work describes how mobile medical robots with cameras can quickly take high-resolution images, which provide doctors with accurate information about their patients. In traditional robotic imaging approaches, robots typically capture images of all patients along set paths and provide the information to doctors. This is called a “monitoring approach”. Such approaches exhibit extremely inefficient acquisition processes while generating substantial data traffic, imposing heavy burdens on edge devices and network infrastructure [25]. Additionally, only subsets of images captured by robots provide clinical utility, such as images of patients exhibiting abnormal conditions.

This manuscript proposes an alternative solution. When critical patients are detected in images collected by static monitoring nodes, edge servers calculate their coordinates and plan robot movement paths, with robots collecting images of critical patients along optimized trajectories. During path planning processes, assume 𝒳 represents the ensemble of patients requiring robotic observation, specifically 𝒳={X1,X2,…,Xi,…,Xn} [26]. To enable robots to observe patients Xi exhibiting relatively elevated severity levels as rapidly as possible, a heuristic path optimization algorithm grounded in patient severity assessment is proposed to obtain optimal acquisition paths and improve image acquisition efficiency.

3.5 Methods for Quantifying Patient Severity

Since the severity of critical patients directly influences destination selection order by the path planning methodology proposed in this manuscript, ultimately affecting image acquisition efficiency, the Patient Risk Severity Index (PRSI) is introduced.

Definition 1: PRSI represents the result of combining the spatial area occupied by each critical patient in the image with corresponding confidence levels, with results proportional to area and confidence level, reflecting patient condition severity [27]. Specifically, if an image contains n critical patients, and i denotes the identifier of one patient, the PRSI expression is formulated as:

Ψi=𝒫×𝒜i×ωi,ω∈[ωmin,1](1)

where ω denotes the confidence level of target detection results, ωmin represents the confidence threshold, patients with confidence below this threshold are considered stable, with a default confidence threshold ωmin=0.3. 𝒫 denotes the accuracy of the target detection model, specifically the performance evaluation index with 0.5 intersection-over-union average precision (AP@0.5)?. 𝒜 represents the spatial area occupied by the critical patient in the image, calculated by pixel dimensions (Δx,Δy) of the image and the number of pixels occupied by the patient. The pixel count κ is computed as:

𝒜=κ×(Δx×Δy)(2)

The PRSI formulation adopts a multiplicative combination of abnormal area Ai and detection confidence ωi to ensure joint dependency between severity extent and detection reliability. This design enforces that high severity scores are assigned only when both spatial abnormality and detection confidence are simultaneously significant.

In contrast to additive formulations, the multiplicative form naturally suppresses noisy detections with low confidence, even if the detected area is large, and prevents overestimation from small but highly confident detections. This property is particularly important in ICU environments, where occlusions and sensor noise can affect detection quality.

3.5.1 Alternative Fusion Strategies

Alternative formulations for severity estimation may include additive or weighted combinations, such as:

Ψiadd=αAi+βωi(3)

or nonlinear fusion models:

Ψinon=f(Ai,ωi)(4)

where α and β are weighting coefficients and f(⋅) represents a learnable mapping function. While additive models provide flexibility in weighting contributions, they may overestimate severity when one factor dominates. Nonlinear fusion approaches, including neural-based weighting, can further improve adaptability but introduce additional computational complexity.

3.5.2 Clinical Interpretation of PRSI

PRSI is not a diagnostic alternative but a computationally approximate clinical emergency that takes the place of visual clinical urgency. The abnormal spatial region is the measure of the noticeable level of deterioration, whereas detection confidence is the measure of model reliability in the presence of environmental noise. This organization is congruent with the ICU triage principles, where the apparent abnormal progression frequently needs to be prioritized. PRSI thus allows automated severity-aware robotic prioritization without the need to change but can be expanded with multimodal physiological integration in future applications. The current formulation prioritizes deployability in resource-constrained edge environments, while multimodal fusion is considered a natural extension for future clinical-grade systems.

Limitation and Multimodal Extension: It is important to note that PRSI is derived solely from visual features, including spatial abnormality and detection confidence, and does not incorporate physiological signals such as heart rate, oxygen saturation, or blood pressure. While this design enables non-invasive, real-time severity estimation using edge-deployable vision systems, it may not fully capture the complete clinical state of the patient.

To enhance reliability, future extensions of this work will explore multimodal integration by incorporating Internet of Medical Things (IoMT) sensor data. A hybrid severity index combining visual cues with physiological parameters could provide a more comprehensive and clinically robust assessment of patient condition.

3.5.3 Refined Interpretation and Clinical Scope

It should be stressed that the proposed Patient Risk Severity Index (PRSI) is not aimed at being a clinically justified severity score or to be used in place of well-established medical assessment patterns. Rather, PRSI is a severity-based visual prioritization measure based on the visible characteristics of the image, e.g., extent of spatial abnormality and confidence of the model to detect an abnormality. Although these aspects can be associated with observable factors of patient deterioration, they do not directly reflect physiological or clinical outcomes. As such, PRSI is to be understood as a computationally constructed proxy to indicate automatic priority in image acquisition tasks as well as robotic navigation. Further research will be done to ensure the validity of this metric by supporting it with clinician-approved studies and multimodal physiological parameters to increase its clinical reliability and interpretability.

3.6 Clinical Image Data Value Analysis

Among image data acquired by medical robots, images of critical patients constitute primary targets of clinical interest. To evaluate the acquisition efficiency of the image acquisition methodology proposed in this manuscript for critical patient images, the Clinical Data Value Density (CDVD) metric is introduced.

Definition 2: The ratio of image data value acquired by the robot to data volume per unit time, with results dependent on total data value, total data quantity, and execution time of image acquisition tasks performed by robots.

Specifically, robots perform image acquisition tasks where images of n patients require capture, with i the patient identifier denoted. The value of images corresponding to the patient i is denoted 𝒢(Ψi), the time for image transmission to clinicians is Ts(i), the data volume size of images is 𝒬i, and the total time taken by robots to perform one image acquisition task is denoted Tm. The CDVD expression is formulated as

ℳ=Tm−1∑i=1n(Ti×𝒬i−1×𝒢(Ψi))(5)

where 𝒢(Ψi)∝Ψ, indicating that higher PRSI scores correspond to elevated image value.

Furthermore, assuming image capture time for the patient i is Tp(i), Ts(i) represents elapsed time for image transmission to cloud infrastructure, and movement time from patient i to patient i+1 is Tw(i). Considering that robots, after capturing image sets, can simultaneously transmit images during movement to subsequent target locations, and that image transmission time is substantially smaller than robot movement time, specifically Ts(i)⊆Tw(i), in this scenario, it Ts(i) need not be counted in elapsed time and thus Tm can be further expressed as:

Tm=∑i=1nTw(i)+∑i=1nTp(i)(6)

The final CDVD expression becomes:

ℳ(Tw(i),Tp(i),𝒬i,𝒢(Ψi))=(∑i=1NTw(i)+∑i=1NTp(i))−1∑i=1N(𝒬i−1×𝒢(Ψi))(7)

Considering the methodology defined in this manuscript for acquiring high-quality image data of critical patients, the ultimate objective is to maximize CDVD, expressed as:

max M′=(∑i=1NTw(i)+∑i=1NTp(i))−1∑i=1N(G(Ψi)Qi⋅e−λdi)(8)

• di denotes the travel distance between consecutive patient locations,

• λ is a distance penalty coefficient controlling the trade-off between severity and proximity.

Distance-Aware Optimization: To prevent excessive robot movement toward distant high-severity patients, a distance penalty term e−λdi is introduced. This term reduces the contribution of patients located farther from the robot, encouraging selection of spatially efficient paths while still prioritizing clinically significant cases. The parameter λ controls the sensitivity to distance, enabling flexible adjustment based on operational constraints such as battery capacity and time limitations.

Clinical Significance of CDVD

Conventional robotic assessment measures are geometric efficiency or the minimization of energy. Nonetheless, the ICU monitoring systems have to maximize the information that can be acted upon by clinicians with limited time and energy costs. CDVD redefines the optimization as value density and not the raw acquisition volume, which represents the bandwidth constraints, battery constraints, and the ability of the physician to focus on the cases. This measure makes robotic optimization consistent with healthcare outcomes priorities. Although CDVD is framed in a clinically motivated context, it does not directly quantify clinical outcomes or diagnostic effectiveness. Instead, it measures the efficiency of acquiring visually informative data under operational constraints, such as time, bandwidth, and robotic mobility. In this sense, CDVD should be understood as a data-centric utility metric that reflects the density of visually prioritized information rather than a clinically validated value. Its primary role is to align system-level optimization with the practical needs of healthcare workflows, particularly in resource-constrained environments. Clinical validation of the relationship between CDVD and improved patient outcomes remains an important direction for future investigation.

4 High-Quality Image Data Acquisition Method for Critical Patients Based on Edge Intelligence and Kinetic Collaboration

Maximization of CDVD requires accomplishment in two aspects. Firstly, the correct identification of critical patients and the acquisition of position information. Second, as the principle, using image value, the solution to the optimization of paths to acquire images with the help of mobile medical robots was developed in general terms of their utilization in practice and, in particular, in the field of medical services provision and is proposed herewith, namely, in the field of medical services provision [28]. Here, the attention is paid to particular implementation strategies where the construction of quick detection models that are developed based on lightweight neural networks is first introduced, then patient position mapping strategies are introduced on the basis of image and coordinate system transformation, and finally heuristic path optimization algorithms are introduced on the basis of patient severity levels. It is a certain process that is depicted in it (Fig. 2).

images

Figure 2: Process for maximizing the clinical data value density (CDVD).

4.1 Construction of YOLO-CAD Target Detection Model

YOLOv7-tiny is a lightweight variant of the YOLOv7 architecture, consisting of three main components: backbone, neck, and head. The backbone employs the ELAN structure instead of the more complex E-ELAN module to reduce computational complexity. In MPConv, convolution operations are removed, and downsampling is performed using pooling operations only [29]. The optimized SPP structures are stored to receive enriched feature maps to the neck. Inside the neck, there are PANet structures that are used to aggregate features. In the head, channel number adjustment is done with standard convolution (SConv) instead of REPConv. YOLOv7-tiny offers high speed and low computational cost, but at the expense of reduced detection accuracy. First, excessive use of ELAN modules in the backbone increases parameter count and computational overhead. Second, the LeakyReLU activation function becomes less effective in deeper network layers, limiting feature representation capability. Third, feature fusion in the neck introduces redundancy due to repeated ELAN-based operations. To address these limitations, we propose YOLO-CAD, an enhanced lightweight model based on YOLOv7-tiny, designed to reduce computational complexity while improving detection accuracy.

The network structure of the enhanced YOLO-CAD model is given in Fig. 3. Architectural Improvements in YOLO-CAD Compared to YOLOv7-tiny are shown in Table 2. In the backbone, ShuffleNet v1 units (stride 1 and stride 2) are used to reduce computational cost while maintaining feature representation quality. In the neck, GSConv and ELAN-GS modules are introduced to improve feature fusion efficiency and reduce redundancy. The WIoUv3 loss function replaces the conventional IoU-based loss to improve localization accuracy under occlusion conditions. Additionally, the Mish activation function replaces LeakyReLU to enhance nonlinear feature learning and gradient flow. Lastly, the WIoUv3 position loss functions are used to enhance the localization accuracy of the model.

images

Figure 3: Network structure of the YOLO-CAD detection model.

images

ShuffleNet v1 Module: The fundamental module of ShuffleNet v1 consists of Unit_a with stride 1 and Unit_b with stride 2 stacked hierarchically. Unit_a consists of two branches: one identity path and one convolutional path, enabling efficient feature reuse. Unit_b performs downsampling using parallel branches with pooling and convolution operations.

The parameter complexity of different convolution operations is summarized as follows: Assume the input feature map possesses width Wi, height Hi, channel count Ci, output channel count Co, and convolution kernel size K×K. Therefore, the path parameter under standard convolution is

𝒫sc=Ci×Co×K×K(9)

Group convolution (GC) is based on standard convolution, with convolution kernels and input channel counts grouped, with group count g [30]. Each group of convolution kernels is individually convolved on feature maps; thus, the parameter is

𝒫gc=Cig×Co×K×K(10)

Depth-wise (DW) separable convolution constitutes channel-by-channel convolution operations on input feature maps with a convolutional count equal to the input channel count; thus, the parameter count for depth-separable convolution is

𝒫dw=Ci×K×K(11)

Therefore, among the three convolution operations, standard convolution possesses the largest parameter count, while group convolution is 1/g thereof. Depth-separable convolution possesses the smallest parameter count, only 1/Co of standard convolution, only; thus, combining group convolution and depth-separable convolution can substantially reduce network parameter count and computational requirements.

GSConv and ELAN-GS Modules: Let the input channel count be C1 and the output channel count be C2. GSConv combines standard and depth-wise convolutions, followed by channel concatenation to achieve efficient feature extraction with reduced parameters. and the channel count becomes C2/2, 256, then they undergo depth-separable convolution, and the channel count remains unchanged, and finally, the results from both convolutions are concatenated and mixed, ultimately obtaining C2 channels.

GSConv is introduced into ELAN modules for improvement. Two convolutions before concatenation layers utilize GSConv, reducing model parameter count while maintaining detection accuracy, achieving slight ELAN improvements.

WIoUv3 Loss Function: Loss functions constitute critical components of target detection models, with detection performance dependent on loss function design. Good bounding box loss functions bring significant performance improvements for target detection models. Bounding box loss of YOLOv7-tiny is calculated by CIoU_Loss functions, while classification loss and confidence loss are calculated by BCE_Loss functions. Due to frequent occlusions in hospital environments, WIoUv3 is adopted to improve robustness by balancing high- and low-quality samples during training. Therefore, WIoUv3 with dynamic non-monotonic focusing mechanisms is selected to replace CIoU_Loss as the bounding box loss calculation function of the improved algorithm model in this manuscript. Positive and negative sample imbalance is unavoidable in training datasets, inevitably leading to the emergence of low-quality samples, with previous loss functions exacerbating the punishment of low-quality samples, thereby reducing model generalization ability. Dynamic non-monotonic focusing mechanisms in WIoUv3 effectively avoid negative impacts of low-quality samples during training processes by balancing proportions of high- and low-quality samples, focusing bounding box regression results on target objects, and resolving detection difficulties arising from patient occlusions. WIoUv3 is based on the addition of dynamic non-monotonic focusing mechanisms to WIoUv1, with a specific computational expression:

ℒWIoUv1=ℛWIoUℒIoU(12)

ℛWIoU=exp⁡((x−xgt)2+(y−ygt)2(Wg2+Hg2)∗)(13)

where Wg and Hg represent the width and height of minimally closed regions of predicted and ground truth frames, with ∗ the notation of separation of Wg and Hg from the computational graph to prevent gradient generation that might hinder convergence:

ℒWIoUv3=rℒWIoUv1(14)

r=βδαβ−δ(15)

where mapping of the non-monotonic focusing coefficient r and outlier β is controlled by hyperpara-meters α, δ.

Mish Activation Function: To ensure accuracy of feature extraction networks, the Mish activation function is selected as an alternative to the LeakyReLU activation function because it possesses a minimum value at the zero point, effectively buffering weights and maintaining network stability. Compared to LeakyReLU, Mish provides smoother gradients and better feature representation, improving model convergence. Additionally, Mish adds more nonlinear expressions, improving model generalization. Specific formulas for LeakyReLU and Mish activation functions are

LeakyReLU(x)={x,x>0αx,x≤0(16)

Mish(x)=x⋅tanh⁡(ln⁡(1+ex))(17)

4.2 Critical Patient Location Mapping Method

Properly identifying the location of critical patients is crucial in planning the paths of robots, in connection with the success of the robots in obtaining images of patients in the case of image acquisition. In achieving the realization of the environmental perception, the vision sensing systems obtain the image information of the environment primarily through the provision of cameras. In order to have critical patient localization, 3D perception conditions are required. There are now two major types of stereo vision systems: first, there are binocular vision systems, which use optical geometry and utilize traditional optical concepts and optimization techniques to determine the positioning of targets in 3D, and second, there are RGB-D cameras, which are time-of-flight cameras that use infrared sensors to determine depth information of targets. Nevertheless, the binocular vision systems involve complex calibration procedures, stereo matching occupies important computational factors, and RGB-D cameras cannot be used in high illumination conditions, causing errors in the results.

Based on this, this manuscript proposes a critical patient location mapping method grounded in camera imaging principles and coordinate system transformation principles. First, coordinate systems for hospital wards are established, then references are set within wards, and finally, camera imaging principles and coordinate system transformation principles are combined to locate critical patients. Since the positions of references and cameras are known, only the direction and distance of critical patients relative to references in images need to be calculated, subsequently mapping their positions.

Camera models describe processes of mapping objects from 3D space to 2D image planes, with camera intrinsic parameters describing key parameters in camera models. Relationships between actual 3D spatial points and 2D image planes are usually calculated using pinhole imaging principles. In practice, image data are calculated in terms of pixel points, with pixel coordinates (ui,vi) typically read in terms of the uov plane under pixel coordinate systems, directly corresponding to positions in images. To facilitate calculations, image coordinate systems xoy are introduced, where the image center o constitutes the center of the image coordinate systems, with x- and y-axes parallel to u- and v-axes, respectively.

Each camera possesses fixed image sizes for pixels, with each pixel size length and width being Δx and Δy. Assuming the pixel corresponding to the origin o of the image coordinate system is (uo,vo), specific coordinates in the image coordinate system can be computed by giving specific pixel coordinates (u,v):

[xy]=[Δx00Δy][uv]+[−u0Δx−v0Δy](18)

Mapping from 2D pixel coordinate systems to 2D image coordinate systems can be realized by the above formula, further enabling conversion from 2D images to 3D coordinates in space through spatial depth information.

The patient localization calculation model is illustrated in Fig. 4. Assuming point P and point I represent the positions of reference and the critical patient in 3D space, respectively, point p and point i are positions of both mapped in images. p(xp,yp) and i(xi,yi) can be calculated by Eq. (18), P(Xp,Yp,Zp) are known coordinates of reference relative to the camera, and f is the camera focal length. According to the properties of similar triangles:

ZPf=Yp−YIyp−yi=Xp−XIxp−xi(19)

Xpxi=ZIf(20)

images

Figure 4: Patient positioning calculation model.

The realistic 3D coordinates of critical patient I are solved as:

{XI=XP−ZP(xp−xi)fYI=YP−ZP(yp−yi)fZI=XIfxi(21)

The pseudocode shown in Algorithm 1 summarizes the detection and localization process of the proposed critical patient identification methodology.

images

4.3 Improved Heuristic Optimization (IHO) Based on Patient Severity

Path planning, as one of the most fundamental and critical steps in the execution of image acquisition tasks by mobile medical robots, determines the efficiency of robots in performing image acquisition tasks. Ant colony optimization (ACO) has been widely utilized in multi-objective path planning research due to its strong robustness and adaptability, but the algorithm exhibits poor convergence and tends to fall into local optima when searching for optimal paths. Additionally, its path planning approach based on the traveling salesman problem cannot satisfy the requirements of efficient image acquisition in this study. Therefore, this manuscript proposes an Improved Heuristic Optimization (IHO) algorithm grounded in patient severity and ACO principles, solving optimal traversal orders by combining PRSI to obtain optimal working paths for medical robots.

Environment Modeling: This manuscript investigates robot path planning problems in known environments. To simplify computational processes, grid methods are employed to abstract and discretize robot working environments. First, environments are divided into numerous identical grids according to scale and requirements; then, conditions of obstacles within grids are set according to actual environments, corresponding matrices G are constructed, and finally, they are transformed into grid maps. The 0 and 1 in matrices denote passable nodes and obstacle nodes, respectively, represented as white grids and black grids in grid maps.

Each grid in grid maps possesses corresponding labels and corresponding position coordinates, with relationships between grid labels and position coordinates expressed as

{x=mod(NiN)+cy=ceil(NiN)−c(22)

where x and y are horizontal and vertical coordinates of grid positions, mod(⋅) is the remainder operation, ceil(⋅) is the ceiling operation, N is the number of grids in each column, Ni is the label of ithe -th grid node, and c=0.5 represents the offset of the grid center with respect to the grid boundary.

Base Transfer Probability: The direction in which ant k (k=1,2,…,m) transfers at the moment t is determined by pheromone concentration on each candidate path. The probability that an ant selects the next position j from the position i at the moment t is determined as

Pij(t)={[τij(t)]α×[ηij(t)]β∑j∈allowedk[τij(t)]α×[ηij(t)]β,if j∈allowedk0,otherwise(23)

where τij(t) is the pheromone on the path between position i and position j at moment t, ηij(t) is the heuristic function factor indicates the expected degree of ant transfer from position i to position j at moment t, usually taken as ηij(t)=1/dij, dij Euclidean distance between position i and position j, α is information heuristic factor, β is the expectation heuristic factor (both constants) and allowedk denotes the ensemble of unvisited target points.

In this study, the path planning behavior of ants is adjusted according to patient severity, such that ants are more likely to be attracted to critical patients and can select optimal image acquisition paths in a more targeted manner.

PRSI-Based Pheromone Initialization: Under initial conditions, pheromone content on each path is equal; at this time, ant colonies are in blind search stages with poor optimization effects and low search efficiency. In this manuscript, combining ACO and patient severity, PRSI is introduced as a weight in the initial pheromone, guiding ants to prioritize patients with higher severity as destinations. The improved initial pheromone distribution is

τij(0)=Ψj+ϕ(j)(24)

ϕ(j)=Ψjdij+1(25)

What Ψj is the PRSI score of the target patient j; higher severity patients possess larger Ψ values and higher attraction for ants, and vice versa. What dij is the distance between the current grid and the target grid? closer proximity results in a larger initial pheromone on routes, and vice versa. According to such positional relationships, initial pheromones with uneven distribution are set to avoid blind searching of ants in the initial stages and improve the searching efficiency of ant colonies on critical patients during the initial stages.

Oriented Heuristic Function: Heuristic functions of ACO generally take the inverse of distances of neighboring grids; thus, ants tend to choose optional grids closer to current grids, but this situation causes ants to have circuitous paths or get stuck when choosing, resulting in inefficient and ineffective search. In this manuscript, we design heuristic functions oriented to critical patients, forcing ants to be more inclined to choose grids closer to critical patients each time, such that the final paths obtained will be closer to the shortest paths. The improved distance heuristic function formula is

ηij(t)=(diedej)Ψjcos⁡θj(26)

where die is the Euclidean distance from the current grid to the intermediate grid, dej is the Euclidean distance from the intermediate grid to the target grid, Ψj is the PRSI score of patient j, and θ denotes the angle between ie→ and ij→.

Improved Pheromone Update Method: Pheromone update is a crucial component of ant colony algorithms, simulating ant behavior and guiding search processes toward better solutions while maintaining search diversity, enabling algorithms to effectively search in solution spaces and find high-quality solutions. As iteration numbers increase, pheromone differences on several optimal paths become less obvious, and ant colonies pay less attention to critical patients. To make ant colonies continue paying attention to critical patients, combined with PRSI, this manuscript adopts global pheromone update methods to improve the pheromone update rules of ant colonies. After all ants complete one iteration, pheromones on all paths are updated. The formula is

τij(t+1)=(1−ρ)×τij(t)+Δτij,0<ρ≤1(27)

Δτij=∑k=1mΔτijk(28)

Δτijk=ΨjLk(29)

where ρ represents pheromone evaporation rate, Δτij represents the sum of pheromone concentrations released by all ants along the path connecting the location i to target j, Δτijk represents the pheromone concentration released by the ant k on the path connecting the location i to target j, and Lk is the distance traversed by the ant k from the location i to target j.

The pseudo code of the proposed path planning methodology is summarized in Algorithm 2.

images

5 Experiments and Analysis

This section focuses on prerequisites and results of experiments. First, the effectiveness of each improvement of YOLO-CAD is evaluated, and its performance is compared with other state-of-the-art models. Subsequently, the effectiveness of the position mapping method based on RGB images is verified. Then, the effectiveness of IHO in image-efficient acquisition strategies is evaluated. Finally, the image-efficient acquisition strategy proposed in this manuscript and the conventional image acquisition strategy are compared to evaluate the performance of the image-efficient acquisition strategy.

5.1 Experimental Conditions and Parameter Settings

The model training platform of YOLO-CAD is built on the Ubuntu 20.04 system with one Intel Xeon Gold 5218 CPU and two NVIDIA TITAN RTX GPUs. The parameters of its training process are shown in Table 3.

images

Performance analysis of YOLO-CAD is conducted on an edge server equipped with one Intel Core i9-10920X CPU and NVIDIA GeForce RTX3090 GPU to simulate real-world application performance. The dataset used in this manuscript is the hospital patient image dataset collected within clinical facilities, including more than 7000 images collected under different lighting conditions, different brightness levels, backlighting, various angles, and other conditions. Data was cleaned, and the final dataset contains 1680 images. It is divided in a ratio of 7:2:1, in which 1176 images are in the training set, 840 are in the test set, and 168 are in the validation set. A paired t-test was conducted between the proposed method and baseline models. The results show statistically significant improvement (p<0.05).

Based on the image acquisition framework proposed in this manuscript, an experiment was designed to evaluate the image acquisition performance of an IHO-based medical robot. In the experiment, assuming that robots move and acquire images on paths at a speed of 0.5 m/s, each image of a critical patient takes 2 min to capture. To facilitate calculations, this paper categorizes severity into six levels, I–VI, based on PRSI scores, and defines image values according to their severity levels, as shown in Table 4. Among them, class I is normal, and the highest severity is class VI. A paired t-test was conducted between the proposed method and baseline models. The results show statistically significant improvement (p<0.05).

images

5.2 Statistical Evaluation Protocol

All experiments were repeated across 10 independent trials using randomized initialization seeds. For each metric, mean, standard deviation, and 95% confidence intervals were computed:

CI=μ±tα/2,n−1⋅σn

One-way ANOVA and post-hoc t-tests were conducted to evaluate statistical significance (p<0.05). All experiments were conducted over n = 10 independent trials, each initialized with different random seeds to ensure robustness against stochastic variation. Performance metrics are reported as mean ± standard deviation (SD). In addition, 95% confidence intervals (CI) were computed using the Student’s t-distribution. The dataset consists of 1680 images, partitioned into training (70%), testing (20%), and validation (10%) subsets at the patient level to prevent data leakage. Prior to statistical testing, the distribution of performance metrics was assessed for approximate normality based on repeated sampling behavior. Given the consistent variance across trials, parametric tests were deemed appropriate. To evaluate statistical significance across multiple methods, one-way ANOVA was performed. When significant differences were observed (p<0.05), post-hoc pairwise comparisons using Tukey’s Honest Significant Difference (HSD) test were conducted to identify specific group differences.

5.3 Results and Analysis

5.3.1 Comparative Analysis with Lightweight Object Detectors

In order to further confirm the success of the suggested YOLO-CAD environment in the context of healthcare resource limitations, a comparative analysis was provided with commonly used lightweight object detectors, i.e., YOLOv5s and YOLOX-s. All the models were tested in the same experimental conditions to produce a fair comparison regarding computational complexity and detection performance. The comparison is based on the key performance indicators, such as the number of parameters, the number of floating point operations per second (FLOPs), detection accuracy, in terms of mAP at 0.5 and mAP at 0.5:0.95, and inference speed (FPS). The findings in Table 5 show that although the YOLOv5s and YOLOX-s have competitive detection capabilities, the proposed YOLO-CAD has a better balance in accuracy and efficiency. In particular, YOLO-CAD has better detection precision and fewer computing costs, which can be more useful in real-time. Moreover, YOLO-CAD is more robust to such complex cases in the ICU, including partial occlusions, changing levels of illumination, and cluttered backgrounds, which are typical of patient monitoring settings. The latter enhancement is explained by the fact that the architectural refinements and attention mechanisms that are implemented in the model allow for a more comprehensive representation of features and localization of the important objects. Deployment-wise, the small architecture of YOLO-CAD leads to fewer FLOPs and higher inference rates than the baseline models, hence, latency and real-time decision-making in an edge device. The features are especially significant in the context of an intensive care unit, where the importance of quick diagnosis and treatment cannot be overstated to protect patients. Generally, the comparative analysis confirms that, in addition to having a competitive detection accuracy, YOLO-CAD offers many more benefits in computation efficiency and applicability in practice, thus rendering it a better option in intelligent healthcare monitoring systems.

images

5.3.2 Validation of the Improved YOLO-CAD

To verify that each improvement proposed in this manuscript is effective, a series of ablation experiments will be designed for comparative analysis, using the same parameters in the training process to ensure experimental accuracy. For experiment A, the ShuffleNetv1 network is used as a new backbone network. In experiment B, the GSConv module was introduced into the neck of the model for optimization. Experiment C replaces the LeakyReLU activation function at the neck of the model with the Mish activation function. Experiment D uses WIoUv3 as a loss function. To evaluate the effectiveness of ShuffleNetv1, GSConv, Mish function, and WIoUv3, the number of parameters, computational complexity (FLOPs), and AP@0.5 are utilized as metrics to measure model performance, where AP@0.5 is a widely used evaluation criterion in object detection; higher values indicate better model performance.

Results of ablation experiments are shown in Table 6. Experiment A verifies that ShuffleNetv1 can significantly reduce the number of parameters and computation amount of YOLOv7-tiny, but its accuracy also slightly decreases. Experiment B adopts an improvement of GSConv, which significantly reduces the computation amount of the model, and its accuracy is also basically the same as the original model, proving the validity of GSConv. In Experiments C and D, which do not increase the computation amount, the accuracy of the model is slightly improved, and the validity of the Mish function and WIoUv3 is verified. Finally, the combined improvement of YOLOv7-tiny with experiments A, B, C, and D reduces the amount of model parameters by 14.7%, the computation amount is reduced by 22.6%, and the accuracy of the model is improved by 1.87%. Overall, the improved YOLO-CAD network improves model accuracy while reducing model weight, effectively balancing accuracy and weight, and providing feasibility for deployment in edge terminals.

images

5.3.3 Statistical Validation of Ablation Results

Across 10 runs, YOLO-CAD achieved an mAP@0.5 of 91.21% with a standard deviation of 0.38 (95% CI [90.95, 91.47]), significantly outperforming YOLOv7-tiny (89.34%, CI [88.97, 89.71], p<0.01). Component-wise ablation confirms statistically significant contributions from Mish activation and WIoUv3 loss, while GSConv maintains computational efficiency without accuracy degradation.

To verify the effectiveness of the proposed model in this manuscript, this section compares the proposed YOLO-CAD model with other common lightweight target detection models. All algorithms use the same hardware devices, training parameters, and datasets to ensure the reliability and fairness of experimental results. The Params, FLOPs, mAP@0.5%, mAP@0.5%:0.95%, and execution speed are taken as evaluation indices, with specific experimental results shown in Table 7.

images

As can be seen from Table 7, YOLO-CAD also reduces the number of parameters by 8.7% compared to the smallest YOLOv4-tiny model, while computation is reduced by 37.2%, detection speed is improved by 114.8%, and mAP@0.5 and mAP@0.5:0.95 accuracies are improved by 40% and 59.3%, respectively. Compared to YOLOX-s, YOLO-CAD reduces the number of parameters by 35.8% and computation by 55.2%, while mAP@0.5 and AP@0.5:0.95 accuracies are slightly improved by 1.67% and 4.1%, respectively, and detection speed is improved by 17.3%. Overall, YOLO-CAD is not weaker than other models in the results of all five types of indicators, indicating that the lightweight model proposed in this manuscript even has a certain degree of accuracy improvement under conditions of fewer parameters, smaller computational volume, and smaller model volume, proving the validity and progress of the algorithm in this manuscript.

5.3.4 Calculation of Patient Location Based on RGB Images

To evaluate the reliability of the patient position solving method proposed in this manuscript, we manually placed and recorded the position of a simulated patient mannequin and a reference for modeling the position of a critical patient in the hospital ward. One image was taken at each location of 5, 10, 15, 20, 25, and 30 m from the simulated patient at an angle of 60∘ from the top view angle, and its coordinates were solved by the patient position mapping method presented in this manuscript and compared with recorded position information.

Results are shown in Table 8, patient position can be solved if the distance between the patient and the camera is between 5 and 25 m. Due to the large size of the patient area, the clustering method is used to calculate its position, and the center point of the cluster is considered as the patient position, with a change of center point position leading to error. Additionally, the systematic error of the camera is also partly responsible for the error. However, the robot needs to be at a certain distance from the target to capture a complete image of the target, so a reasonable error is considered acceptable. If the distance between the patient and the camera is greater than 25 m, the patient location is in an infinite focus range, and it is not possible to calculate the patient location from the image and camera intrinsic parameters.

images

5.3.5 Impact of Localization Error on Image Acquisition

The localization error doubling with range is a known drawback of monocular vision-based estimation. It should be noted, though, that the suggested system has a coarse-to-fine acquisition strategy. The localization received at the very first localization of the static monitoring nodes is not utilized to position the mobile robot in the exact position of the target patient, but merely to orient it towards the approximate area of the target patient. In real-world ICU conditions, the fine-grained image capture is conducted at very near range (usually 1–3 m), where the localization errors are minimized and are within acceptable limits. According to experimental outcomes, errors in localization up to about 0.5–10.7 m can be regarded as acceptable, since they do not inhibit entry of the robot into the usual imaging field of the patient. After reaching a close range, the onboard vision system can be used to make local adjustments to fine-tune the positioning and then take high-resolution images. The acceptable localization error in the context of robotic image acquisition depends on the field-of-view (FoV) of the onboard camera and the desired resolution of the imaging. Even with the usual medical robotic systems camera FoV, a positional error of as much as 0.5–1.0 m can still result in the patient being held within the area of observation. Thus, the localization resolution obtained over the path is smoother.

Although the path length planned by IHO is not the shortest, it considers the urgency of medical intervention in healthcare and prioritizes the acquisition of images of critical patients. For this reason, this section proposes a method of scoring the value of collected data per unit of time, where the higher the value of image data collected by the robot within a certain period of time, the higher the relevance of its target, which is more consistent with the high-quality image collection method proposed in this manuscript.

We made three robots, R1, R2, and R3, use ACO, PSO, and IHO for path planning, respectively, and made robot R4 use patrolling for image acquisition tasks. Timing was done from the start of the robots’ operation, and statistics were performed every 10 min to calculate the image value score, with results shown in Fig. 5.

images

Figure 5: Comparison of path planning algorithms. (a) Distance and time required for robots to complete tasks using different algorithms. (b) Total value of images captured by robots using different algorithms.

Robot performed the image acquisition task with 1.1% less path length and 1.9% less elapsed time for IHO compared to unimproved ACO. Because IHO first selects distant targets and high-severity patients rather than closer patients, its average path length increases by only 0.5% and elapsed time by only 0.04% compared to PSO. Similarly, no critical patient images were captured by robot R3 during the first 10 min, giving it a score of 0. However, for the 20th, 30th, and 40th min of statistics, robot R3 obtains image value scores of 120, 220, and 280, respectively, which are much higher than those of R1 and R2. In summary, it is shown that IHO is able to find a better path solution for solving multi-objective problems, obtain a better traversal order, and thus plan a path that is more consistent with the strategy of high-quality image acquisition.

Impact of Distance-Aware Optimization:

Fig. 5 illustrates the comparative performance of different path planning algorithms, including ACO, PSO, the proposed IHO, and the enhanced distance-aware IHO. As shown in Fig. 5a, the original IHO achieves improved convergence speed and smoother paths compared to traditional methods; however, it may occasionally prioritize high-severity patients located at greater distances, leading to marginal increases in traversal length.

To address this limitation, a distance-aware penalty mechanism is incorporated into the optimization process. The enhanced IHO demonstrates a more balanced behavior by jointly considering patient severity and spatial proximity. As a result, unnecessary long-distance movements are reduced, leading to improved path efficiency without compromising the prioritization of critical patients.

Fig. 5b further shows that the total image value acquired by the robot remains consistently high under the distance-aware strategy. Although a slight reduction in early-stage value acquisition may occur due to proximity constraints, the overall Clinical Data Value Density (CDVD) is improved due to reduced traversal time and more efficient path planning.

These results confirm that integrating distance penalties enables the proposed method to achieve a better trade-off between clinical urgency and operational efficiency, making it more suitable for real-world deployment in resource-constrained healthcare environments.

5.3.6 Systematic Ablation of IHO Components

Stepwise ablation confirms progressive CDVD improvement:

• ACO baseline: 10.8 ± 1.4

• +PRSI initialization: 13.5 ± 1.1

• +Oriented heuristic: 16.9 ± 0.9

• Full IHO: 19.4 ± 0.7

ANOVA indicates statistically significant differences (p<0.001), demonstrating that each severity-aware enhancement contributes to clinically weighted optimization.

Across n = 10 independent runs, the proposed YOLO-CAD model achieved an mAP@0.5 of 91.21% ± 0.38 (SD), with a 95% confidence interval of [90.95, 91.47]. In comparison, YOLOv7-tiny achieved 89.34% ± 0.52, indicating a statistically significant improvement (p<0.01). Post-hoc Tukey HSD analysis confirmed that the performance gain of YOLO-CAD over baseline models (YOLOv5s, YOLOX-s, and YOLOv7-tiny) is statistically significant, with adjusted p-values below 0.05 for all pairwise comparisons.

5.3.7 Comparison of Image-Efficient Acquisition Strategy with Traditional Methods

To evaluate the impact of the image acquisition method proposed in this manuscript on image acquisition efficiency and data volume, we compare it with a traditional patrol-type image acquisition robot. A patrol-type image acquisition robot cannot determine the location of a critical patient, so it acquires images of the patient regardless of whether it is normal or not. The experiment was conducted in a simulated environment, with total distance traveled to perform the acquisition task, total time spent, amount of image data, and CDVD score used as evaluation metrics, where CDVD score was obtained by calculating Eq. (7).

Results are shown in Table 9. Results showed that although the total data value score obtained by the traditional image acquisition method was as high as 670, a large number of images of normal patients were acquired, resulting in data redundancy, and thus, the CDVD score was only 2.5. Additionally, an increased amount of data also puts great pressure on network transmission. Compared with the traditional acquisition method, the image acquisition method proposed in this manuscript saves 34.9% and 62.1% in distance traveled and time consumed in performing the acquisition task, respectively, which greatly improves acquisition efficiency, while CDVD is improved by 7.76 times. In summary, the present method proposed in this paper can effectively reduce data redundancy, improve data value density, relieve network pressure, and improve image acquisition efficiency.

images

5.3.8 Hardware-Oriented Performance and Edge Feasibility

Even though the experimental assessment is performed within a simulation setting, the suggested framework is developed keeping real-life deployment limitations in mind. The YOLO-CAD model has a light design (5.14M parameters and lower FLOPs), and can be inferred in real-time on embedded edge devices, including those of the NVIDIA Jetson-type. Depending on the attained inference rate (up to 95 FPS on GPU), the system will still operate nearly in real-time with edge conditions of resource constraint, with moderate degradations by hardware constraints. End-to-end latency comprises image capture, edge inference, decision-making, and robotic actuation in a system context. Since the model is lightweight and the processing is edge-local, the inference component adds very little to the total latency, helping to identify and prioritize urgent ICU patients in time.

6 Translational Deployment, Scalability, and Privacy Considerations

The current validation is conducted in a simulation environment; however, the architecture is designed for real-world ICU deployment using edge-enabled mobile robotic platforms. The clinical deployment of the system is provided in the real world. Future IRB-approved in situ clinical trials will evaluate system latency, edge inference delay, robotic power consumption during navigation, and diagnostic utility as assessed by ICU clinicians. The lightweight YOLO-CAD model (5.14 million parameters) enables real-time inference on embedded GPU platforms (e.g., NVIDIA Jetson), meeting hospital infrastructure constraints.

6.1 Real-World Deployment Considerations and Constraints

Various limitations are imposed on robotic navigation in the real hospital setting (e.g., dynamic obstacles (e.g., movement of medical staff), limited space, safety criteria, and non-homogeneous layouts). The path planning strategy suggested is a grid-based environment-friendly and can be further modified to include real-time obstacle avoidance and dynamic replanning schemes. Moreover, the ICU setting is generally characterized by short-range navigation, a high number of interruptions, and the prioritization depending on the condition of a patient. Such conditions are especially appropriate to the severity-conscious prioritization mechanism presented in this paper, which allows scheduling the adaptive tasks in response to dynamic workloads. It is necessary to mention that the present research does not cover the physical implementation of robots and dynamic trials in hospitals. As such, actuation delay, sensor noise, communication overhead and energy consumption are not modeled directly. The latter will be discussed in future work in terms of hardware-in-the-loop testing and real-life validation.

6.2 Regulatory, Ethical, and Clinical Adoption Considerations

For successful real-world deployment, several regulatory, ethical, and physician acceptance challenges must be addressed. First, decision explainability is critical for building clinician trust. To this end, the proposed framework can be extended with explainable AI mechanisms such as attention maps and feature attribution techniques, enabling physicians to interpret detection outputs and validate them against clinical observations. Importantly, the system is designed as a decision-support tool, ensuring collaborative interaction where clinicians retain ultimate authority over patient care decisions.

Second, system reliability and failure handling are essential in high-risk ICU environments. Potential failure modes, including model uncertainty, hardware malfunction, or communication disruptions, must be mitigated through robust design. Confidence-based thresholding can be employed to flag uncertain predictions and trigger human intervention. Additionally, a manual override mechanism should be incorporated, allowing healthcare professionals to bypass automated decisions during emergency scenarios. Redundancy strategies and fallback operational modes further enhance system reliability.

6.3 Scalability

The proposed distributed edge-computing framework supports horizontal scalability through the integration of multiple edge servers and coordinated robotic agents. Since detection and prioritization are performed at the edge layer, network bandwidth requirements remain low even as the number of patients increases. Under high-load conditions, severity-aware prioritization ensures that clinically critical cases are processed first, maintaining system responsiveness and reliability.

6.4 Privacy-Preserving Federated Learning

To enable cross-institutional collaboration while preserving patient privacy, federated learning can be integrated into the YOLO-CAD framework. In this paradigm, hospitals retain raw medical images locally and share encrypted model updates with a centralized aggregation server. This approach enables collaborative model improvement across institutions while preserving data sovereignty and ensuring compliance with healthcare data protection regulations.

6.5 Environmental Robustness

Robustness to real-world ICU conditions is achieved through architectural and training-level improvements. The wIoUv3 loss function improves robustness to partial occlusions caused by medical equipment and staff movement. Illumination-aware data augmentation improves adaptability to varying lighting conditions. Edge-local processing ensures continued operation during network instability, while severity-based navigation adapts to dynamic spatial layouts and multi-patient scenarios. Together, these enhancements transition the system from a simulation prototype to a clinically viable healthcare solution.

7 Conclusion and Future Direction

This manuscript addresses the challenge of efficient patient management in healthcare production environments, with a particular focus on high-quality image acquisition for timely detection of patient abnormalities. A novel patient image data acquisition method based on edge intelligence and motion-static cooperation is proposed, leveraging the complementary strengths of static and mobile acquisition nodes to optimize the imaging process. In addition, a heuristic path optimization algorithm guided by patient severity is introduced to enhance the operational efficiency of medical robots. Compared with conventional approaches, the proposed framework demonstrates superior performance in terms of task completion time, image data transmission cost, and overall data value. These improvements contribute to more precise, real-time healthcare monitoring and management, ultimately supporting intelligent decision-making, reducing labor costs, and advancing the development of smart healthcare systems.

Despite these contributions, several directions remain for future exploration. First, integrating multimodal physiological data—such as heart rate, oxygen saturation, and blood pressure—with vision-based PRSI can enable the development of a hybrid severity assessment model with enhanced clinical reliability. Second, real-time adaptive path planning algorithms that respond dynamically to changing patient conditions should be investigated to further improve system responsiveness. Third, federated learning techniques offer a promising avenue for privacy-preserving model training across multiple healthcare institutions.

Additionally, the current dataset is limited to specific acquisition settings and lacks comprehensive demographic diversity, which may affect generalizability. Therefore, multicenter validation across diverse clinical environments and geographic regions is necessary. Future research should also focus on real-world robotic deployment in clinical testbeds, along with rigorous statistical validation through hypothesis testing, confidence intervals, and p-value analysis. Furthermore, stress-testing the system under challenging ICU scenarios will help evaluate robustness and reliability.

Finally, future work will explore adaptive and learnable fusion strategies for integrating visual and physiological data, enabling more accurate and data-driven severity estimation. Hardware-level validation using embedded edge platforms and autonomous medical robots is also essential, with emphasis on evaluating real-time inference latency, system throughput, energy efficiency, and navigation robustness in real clinical settings. These efforts will pave the way toward practical, scalable, and clinically validated intelligent healthcare solutions.

Acknowledgement: Not applicable.

Funding Statement: This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Centre) support program (IITP-2026-RS-2024-00437191) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

Author Contributions: Kiran Deep Singh: Conceptualization, Methodology, Writing; Prabh Deep Singh: Supervision, Review & Editing; Narinder Kaur: Software Implementation, Writing, and Review; Jawad Khan: Data Acquisition, Review & Editing; Dildar Hussain: Validation, Final Manuscript Review & Approval; Yeong Hyeon Gu: Visualization, Validation & Manuscript Revision; Narinder Kaur: Manuscript Review and Supervision. All authors reviewed and approved the final version of the manuscript.

Availability of Data and Materials: The dataset used in this study is publicly available from the GitHub repository Hospital Scene Dat. https://github.com/Wangmmstar/Hospital_Scene_Data/tree/main. The dataset contains 1680 labeled images and can be accessed without restriction for research purposes. All experiments in this study were conducted using this publicly available dataset. Source Code is available at https://github.com/narinder1984/Critical.git.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Padmavathi U, Harshitha R, Jayashre K, Gummaraju N. Integrating cyber-physical systems for enhanced efficiency in healthcare solutions. In: Technologies for sustainable healthcare development. Hershey, PA, USA: IGI Global; 2024. p. 300–24. [Google Scholar]

2. Farag HO, Gaber MM, Awad MI, Elhady NE. Myoelectric prosthetic hands: a review of muscle synergy, machine learning and edge computing. ACM Comput Surv. 2025;57:1–33. [Google Scholar]

3. Padmapriya S, Parthasarathy S. Ethical data collection for medical image analysis: a structured approach. Asian Bioeth Rev. 2024;16(1):95–108. doi:10.1007/s41649-023-00250-9. [Google Scholar] [PubMed] [CrossRef]

4. Sitaraman SR. AI-driven healthcare systems enhanced by advanced data analytics and mobile computing. Int J Inform Technol Comput Eng. 2021;9(2):175–87. [Google Scholar]

5. Gala D, Behl H, Shah M, Makaryus AN. The role of artificial intelligence in improving patient outcomes and future of healthcare delivery in cardiology: a narrative review of the literature. Healthcare. 2024;12(4):481. doi:10.3390/healthcare12040481. [Google Scholar] [PubMed] [CrossRef]

6. Eteng AA. Sensors, actuators, wireless sensor networks, and the internet of things. In: Internet of things A to Z: technologies and applications. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2025. p. 89–117. [Google Scholar]

7. Islam MM, Hasan MK, Islam S, Balfaqih M, Alzahrani AI, Alalwan N, et al. Enabling pandemic-resilient healthcare: Narrowband Internet of Things and edge intelligence for real-time monitoring. CAAI Trans Intell Technol. 2024:1–18. doi:10.1049/cit2.12314. [Google Scholar] [CrossRef]

8. Wang S, Zhang W, Zhao L, Yang G-Z, Xie L. Task-oriented interventionists’ operating skills recognition based on multimodal data hierarchical fusion. IEEE Sens J. 2025;25(11):20732–42. doi:10.1109/jsen.2025.3558981. [Google Scholar] [CrossRef]

9. Wang W, Gao H, Xu L. Opportunistic sensing in task-oriented wireless sensor network based on graph compressed sensing. IEEE Trans Netw Sci Eng. 2024;11(5):4481–92. doi:10.1109/tnse.2024.3427129. [Google Scholar] [CrossRef]

10. Liu Y, Xia H, Obuchowski NA, Laforest R, Rahmim A, Siegel BA, et al. Objective task-based evaluation of quantitative medical imaging methods: emerging frameworks and future directions. PET Clin. 2025;20(4):475–88. doi:10.1016/j.cpet.2025.07.006. [Google Scholar] [PubMed] [CrossRef]

11. Said AM, Yahyaoui A, Abdellatif T. Efficient anomaly detection for smart hospital IoT systems. Sensors. 2021;21(4):1026. doi:10.3390/s21041026. [Google Scholar] [PubMed] [CrossRef]

12. Liu P, Sun X, Han Y, He Z, Zhang W, Wu C. Arrhythmia classification of LSTM autoencoder based on time series anomaly detection. Biomed Signal Process Control. 2022;71(Pt B):103228. [Google Scholar]

13. Hayyolalam V, Aloqaily M, Özkasap Ö, Guizani M. Edge intelligence for empowering IoT-based healthcare systems. IEEE Wirel Commun. 2021;28(3):6–14. doi:10.1109/mwc.001.2000345. [Google Scholar] [CrossRef]

14. Hartmann M, Hashmi US, Imran A. Edge computing in smart health care systems: review, challenges, and research directions. Trans Emerg Telecommun Technol. 2022;33:e3710. [Google Scholar]

15. Velichko A. A method for medical data analysis using the LogNNet for clinical decision support systems and edge computing in healthcare. Sensors. 2021;21(18):6209. doi:10.3390/s21186209. [Google Scholar] [CrossRef]

16. Manocha A, Sood SK, Bhatia M. Edge intelligence-assisted smart healthcare solution for health pandemic: a federated environment approach. Cluster Comput. 2024;27(5):5611–30. doi:10.1007/s10586-023-04245-x. [Google Scholar] [CrossRef]

17. Putra KT, Arrayyan AZ, Hayati N, Firdaus, Damarjati C, Bakar A, et al. A review on the application of internet of medical things in wearable personal health monitoring: a cloud-edge artificial intelligence approach. IEEE Access. 2024;12:21437–52. doi:10.1109/access.2024.3489992. [Google Scholar] [CrossRef]

18. Akter M, Moustafa N, Lynar T, Razzak I. Edge intelligence: federated learning-based privacy protection framework for smart healthcare systems. IEEE J Biomed Health Inform. 2022;26(12):5805–16. [Google Scholar] [PubMed]

19. Khan UH, Qamar A, Khan R, Alturise F, Alshaabani AR, Alkhalaf S. Secure edge-based IoMT framework for ICU monitoring with TinyML and post-quantum cryptography. Sci Rep. 2025;15(1):36195. doi:10.1038/s41598-025-20017-6. [Google Scholar] [PubMed] [CrossRef]

20. Zhao Y, Cao Y, Shen Z, Du J, Xu Y, Cui L, et al. An efficient interaction human-ai synergy system bridging visual awareness and large language model for intensive care units. arXiv:2512.09473. 2025. [Google Scholar]

21. Vagvolgyi BP, Khrenov M, Cope J, Deguet A, Kazanzides P, Manzoor S, et al. Telerobotic operation of intensive care unit ventilators. Front Robot AI. 2021;8:612964. doi:10.3389/frobt.2021.612964. [Google Scholar] [PubMed] [CrossRef]

22. Krejčí J, Babiuch M, Suder J, Krys V, Bobovský Z. Internet of robotic things: current technologies, challenges, applications, and future research topics. Sensors. 2025;25(3):765. doi:10.3390/s25030765. [Google Scholar] [PubMed] [CrossRef]

23. Hu D, Li S, Wang M. Object detection in hospital facilities: a comprehensive dataset and performance evaluation. Eng Appl Artif Intell. 2023;123:106223. doi:10.1016/j.engappai.2023.106223. [Google Scholar] [CrossRef]

24. Yeung S, Rinaldo F, Jopling J, Liu B, Mehra R, Downing NL, et al. A computer vision system for deep learning-based detection of patient mobilization activities in the ICU. NPJ Digital Med. 2019;2(1):11. doi:10.1038/s41746-019-0087-z. [Google Scholar] [PubMed] [CrossRef]

25. Liang X, Li X, Li F, Jiang J, Dong Q, Wang W, et al. MedFILIP: medical fine-grained language-image pre-training. IEEE J Biomed Health Inform. 2025;29(5):3587–97. [Google Scholar] [PubMed]

26. Wang X, Lan R, Wang H, Liu Z, Luo X. Fine-grained correlation analysis for medical image retrieval. Comput Elect Eng. 2021;90:106992. doi:10.1016/j.compeleceng.2021.106992. [Google Scholar] [CrossRef]

27. Lu M, Zhao Q, Poston KL, Sullivan EV, Pfefferbaum A, Shahid M, et al. Quantifying Parkinson’s disease motor severity under uncertainty using mds-updrs videos. Med Image Anal. 2021;73(4):102179. doi:10.1016/j.media.2021.102179. [Google Scholar] [CrossRef]

28. Wang M, Ma J, Zhao X, Xing X. Automated physiological status detection and disease evaluation of critically ill patients via image processing technologies. Traitement Signal. 2024;41(1):153–63. doi:10.18280/ts.410112. [Google Scholar] [CrossRef]

29. Abatal A, Korchi A. Transforming healthcare systems with artificial intelligence: revolutionizing efficiency, quality, and patient care. Research Square. 2023. doi:10.21203/rs.3.rs-3175341/v1. [Google Scholar] [CrossRef]

30. Aamir M, Rahman Z, Ahmed Abro W, Aslam Bhatti U, Ahmed Dayo Z, Ishfaq M. Brain tumor classification utilizing deep features derived from high-quality regions in MRI images. Biomed Signal Process Control. 2023;85(72):104988. doi:10.1016/j.bspc.2023.104988. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Deep Singh, K., Singh, P.D., Kaur, N., Khan, J., Hussain, D. et al. (2026). Critical Patient Image Data Acquisition Strategy by Exploiting Edge Intelligence and Dynamic-Static Synergy in Smart Healthcare. Computer Modeling in Engineering & Sciences, 147(2), 45. https://doi.org/10.32604/cmes.2026.080915

Vancouver Style

Deep Singh K, Singh PD, Kaur N, Khan J, Hussain D, Gu YH. Critical Patient Image Data Acquisition Strategy by Exploiting Edge Intelligence and Dynamic-Static Synergy in Smart Healthcare. Comput Model Eng Sci. 2026;147(2):45. https://doi.org/10.32604/cmes.2026.080915

IEEE Style

K. Deep Singh, P. D. Singh, N. Kaur, J. Khan, D. Hussain, and Y. H. Gu, “Critical Patient Image Data Acquisition Strategy by Exploiting Edge Intelligence and Dynamic-Static Synergy in Smart Healthcare,” Comput. Model. Eng. Sci., vol. 147, no. 2, pp. 45, 2026. https://doi.org/10.32604/cmes.2026.080915

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Critical Patient Image Data Acquisition Strategy by Exploiting Edge Intelligence and Dynamic-Static Synergy in Smart Healthcare

Abstract

Keywords

References

Cite This Article

681

225

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link