Jia, Ye; Zhang, Yu; Weiss, Ron J. (2018년 6월 12일), “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis” (영어), 《신경 정보 처리 시스템 발전》 31: 4485–4495, arXiv:1806.04558
Rubin, P.; Baer, T.; Mermelstein, P. (1981). 《An articulatory synthesizer for perceptual research》. 《Journal of the Acoustical Society of America》 70. 321–328쪽. Bibcode:1981ASAJ...70..321R. doi:10.1121/1.386780.
Van Santen, J. (April 1994). 《Assignment of segmental duration in text-to-speech synthesis》. 《Computer Speech & Language》 8. 95–128쪽. doi:10.1006/csla.1994.1005.
Billi, Roberto; Canavesio, Franco; Ciaramella, Alberto; Nebbia, Luciano (1995년 11월 1일). 《Interactive voice technology at work: The CSELT experience》. 《Speech Communication》 17. 263–271쪽. doi:10.1016/0167-6393(95)00030-R.
Muralishankar, R.; Ramakrishnan, A. G.; Prathibha, P. (February 2004). 《Modification of Pitch using DCT in the Source Domain》. 《Speech Communication》 42. 143–154쪽. doi:10.1016/j.specom.2003.05.001.
Prathosh, A. P.; Ramakrishnan, A. G.; Ananthapadmanabha, T. V. (December 2013). 《Epoch extraction based on integrated linear prediction residual using plosion index》. 《IEEE Trans. Audio Speech Language Processing》 21. 2471–2480쪽. doi:10.1109/TASL.2013.2273717. S2CID10491251.
Rubin, P.; Baer, T.; Mermelstein, P. (1981). 《An articulatory synthesizer for perceptual research》. 《Journal of the Acoustical Society of America》 70. 321–328쪽. Bibcode:1981ASAJ...70..321R. doi:10.1121/1.386780.
Prathosh, A. P.; Ramakrishnan, A. G.; Ananthapadmanabha, T. V. (December 2013). 《Epoch extraction based on integrated linear prediction residual using plosion index》. 《IEEE Trans. Audio Speech Language Processing》 21. 2471–2480쪽. doi:10.1109/TASL.2013.2273717. S2CID10491251.