Valle, Rafael (2020). "Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens". arXiv:1910.11997 [eess].
Rubin, P.; Baer, T.; Mermelstein, P. (1981). "An articulatory synthesizer for perceptual research". Journal of the Acoustical Society of America. 70 (2): 321–328. Bibcode:1981ASAJ...70..321R. doi:10.1121/1.386780.
Van Santen, J. (April 1994). "Assignment of segmental duration in text-to-speech synthesis". Computer Speech & Language. 8 (2): 95–128. doi:10.1006/csla.1994.1005.
Billi, Roberto; Canavesio, Franco; Ciaramella, Alberto; Nebbia, Luciano (1 November 1995). "Interactive voice technology at work: The CSELT experience". Speech Communication. 17 (3): 263–271. doi:10.1016/0167-6393(95)00030-R.
Muralishankar, R.; Ramakrishnan, A. G.; Prathibha, P. (February 2004). "Modification of Pitch using DCT in the Source Domain". Speech Communication. 42 (2): 143–154. doi:10.1016/j.specom.2003.05.001.
Prathosh, A. P.; Ramakrishnan, A. G.; Ananthapadmanabha, T. V. (December 2013). "Epoch extraction based on integrated linear prediction residual using plosion index". IEEE Trans. Audio Speech Language Processing. 21 (12): 2471–2480. doi:10.1109/TASL.2013.2273717. S2CID10491251.
Rubin, P.; Baer, T.; Mermelstein, P. (1981). "An articulatory synthesizer for perceptual research". Journal of the Acoustical Society of America. 70 (2): 321–328. Bibcode:1981ASAJ...70..321R. doi:10.1121/1.386780.
Prathosh, A. P.; Ramakrishnan, A. G.; Ananthapadmanabha, T. V. (December 2013). "Epoch extraction based on integrated linear prediction residual using plosion index". IEEE Trans. Audio Speech Language Processing. 21 (12): 2471–2480. doi:10.1109/TASL.2013.2273717. S2CID10491251.