Johnston, M. (1998). "Unification-based Multimodal Parsing". Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL '98), August 10–14, Université de Montréal, Montreal, Quebec, Canada. pp. 624-630.
Sun, Y.; Chen, F.; Shi, Y.D.; Chung, V. (2006). "A novel method for multi-sensory data fusion in multimodal human computer interaction". In Proceedings of the 20th conference of the computer-human interaction special interest group (CHISIG) of Australia on Computer-human interaction: design: activities, artefacts and environments, Sydney, Australia, pp. 401-404
Nguyen, Quy Hoang; Nguyen, Minh-Van Truong; Van Nguyen, Kiet (2025). "New benchmark dataset and fine-grained cross-modal fusion framework for Vietnamese multimodal aspect-category sentiment analysis". Multimedia Systems. 31 4. arXiv:2405.00543. doi:10.1007/s00530-024-01558-8.
Pereira, Moisés H. R.; Pádua, Flávio L. C.; Pereira, Adriano C. M.; Benevenuto, Fabrício; Dalip, Daniel H. (9 April 2016). "Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos". arXiv:1604.02612 [cs.CL].
Zucco, Chiara; Calabrese, Barbara; Cannataro, Mario (November 2017). "Sentiment analysis and affective computing for depression monitoring". 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. pp. 1988–1995. doi:10.1109/bibm.2017.8217966. ISBN978-1-5090-3050-7. S2CID24408937.
Sun, Shiliang; Luo, Chen; Chen, Junyu (July 2017). "A review of natural language processing techniques for opinion mining systems". Information Fusion. 36: 10–25. doi:10.1016/j.inffus.2016.10.004.
Sarter, N.B. (2006). "Multimodal information presentation: Design guidance and research challenges". International Journal of Industrial Ergonomics. 36 (5): 439–445. doi:10.1016/j.ergon.2006.01.007.
Geldar, F.A. (1957). "Adventures in tactile literacy". American Psychologist. 12 (3): 115–124. doi:10.1037/h0040416.
Caschera, M.C. , Ferri, F. , Grifoni, P. (2013). InteSe: An Integrated Model for Resolving Ambiguities in Multimodal Sentences". IEEE Transactions on Systems, Man, and Cybernetics: Systems, Volume: 43, Issue: 4, pp. 911 - 931.18. Spilker, J., Klarner, M., Görz, G. (2000). "Processing Self Corrections in a speech to speech system". COLING 2000. pp. 1116-1120.
Caschera M.C., Ferri F., Grifoni P., (2007). "The Management of ambiguities". In Visual Languages for Interactive Computing: Definitions and Formalizations. IGI Publishing. pp.129-140.
Zucco, Chiara; Calabrese, Barbara; Cannataro, Mario (November 2017). "Sentiment analysis and affective computing for depression monitoring". 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. pp. 1988–1995. doi:10.1109/bibm.2017.8217966. ISBN978-1-5090-3050-7. S2CID24408937.
Kettebekov, Sanshzar, and Rajeev Sharma (2001). "Toward Natural Gesture/Speech Control of a Large Display." ProceedingsEHCI '01 Proceedings of the 8th IFIP International Conference on Engineering for Human-Computer Interaction Pages 221-234
Pérez, G.; Amores, G.; Manchón, P. (2005). "Two strategies for multimodal fusion". In Proceedings of Multimodal Interaction for the Visualization and Exploration of Scientific Data, Trento, Italy, 26–32.
D'Ulizia, A. (2009). "Exploring Multimodal Input Fusion Strategies". In: Grifoni P (ed) Handbook of Research on Multimodal Human Computer Interaction and Pervasive Services: Evolutionary Techniques for Improving Accessibility. IGI Publishing, pp. 34-57.