TY  - EJOU
AU  - Zhang, Zhi 
AU  - Sun, Bingyu 

TI  - Robust Facial Landmark Detection via Transformer-Conv Attention
T2  - Computers, Materials \& Continua

PY  - 2026
VL  - 87
IS  - 3
SN  - 1546-2226

AB  - In facial landmark detection, shape deviations induced by large poses and exaggerated expressions often prevent existing algorithms from simultaneously achieving fine-grained local accuracy and holistic global shape constraints. To address this, we propose a Transformer-Conv Attention-based Method (TCAM). Built upon a hybrid coordinate-heatmap regression backbone, TCAM integrates the long-range dependency modeling of Transformers with the local feature extraction advantages of Depthwise Convolution (DWConv). Specifically, by partitioning feature maps into sub-regions and applying Transformer modeling, the module enforces sparse linear constraints on global information, effectively mitigating the issues caused by discontinuous landmark distributions. Experimental results on the WFLW, COFW, and 300W datasets demonstrate that TCAM significantly outperforms current state-of-the-art methods. Notably, the Normalized Mean Error (NME) is reduced by 0.24% and 0.21<mml:math id="mml-ieqn-1"><mml:mi mathvariant="normal">%</mml:mi></mml:math> on the large pose and exaggerated expression subsets, respectively, validating the superior robustness of the proposed model.
KW  - Face alignment; transformer-conv attention; DWConv; coordinate heatmap hybrid model

DO  - 10.32604/cmc.2026.076236