TY  - EJOU
AU  - Octaviano, Maurice Kyla 
AU  - Seong, Jin-Taek 

TI  - Feature-Wise Linear Modulation for Heterogeneous-Frequency Multimodal Fusion in Temporal Sequence Encoders
T2  - Computers, Materials \& Continua

PY  - 
VL  - 
IS  - 
SN  - 1546-2226

AB  - Integrating high-frequency sequential signals with low-frequency contextual descriptors into a unified deep encoder is a recurring challenge in computational modelling, exemplified by cross-sectional stock ranking where price dynamics must be jointly modelled with quarterly accounting fundamentals. Existing approaches use late concatenation, where the contextual signal influences only the final prediction head and cannot shape upstream feature extraction. We propose Feature-wise Linear Modulation (FiLM) as an intermediate conditioning mechanism: fundamentals generate per-channel scaling (gamma) and shifting (beta) parameters that affinely transform the encoder’s intermediate representations before aggregation. The same price sequence thus yields different temporal features depending on the firm’s fundamental profile, which we hypothesise reduces signal variability across heterogeneous market regimes by allowing the encoder to amplify or suppress patterns based on contextual quality. We instantiate FiLM across recurrent (LSTM), convolutional (TCN), and attention-based (iTransformer) encoders, evaluated on China A-share equities (2010–2024). Across the convolutional and recurrent encoder families, the primary benefit of FiLM conditioning is improved signal stability—formally measured as the standard deviation of RankIC across rebalancing dates—and risk-adjusted performance, rather than mean predictive accuracy. The gain depends critically on where modulation is applied: pre-aggregation conditioning on temporally-rich representations produces the largest variance reduction. FiLM-TCN, which modulates the convolutional feature map before pooling, achieves RankIC of 0.1415, annualised Sharpe of 1.633, and IC hit rate of 80.4% net of transaction costs. The insight that intermediate conditioning improves signal stability rather than raw accuracy may inform analogous fusion problems in other sequential modelling domains.
KW  - Feature-wise linear modulation; temporal convolutional network; iTransformer; multimodal fusion; heterogeneous-frequency fusion; quantitative finance; stock ranking; China A-shares

DO  - 10.32604/cmc.2026.082842