Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (6 грудня 2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356 [eess.AS].
Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (6 грудня 2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356 [eess.AS].
Paaß, Gerhard; Giesselbach, Sven (16 лютого 2023). Foundation Models for Speech, Images, Videos, and Control. Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms (англ.). с. 313—382. arXiv:2302.08575. doi:10.1007/978-3-031-23190-2_7. ISBN978-3-031-23189-6.
Yuan, Gong; Khurana, Sameer; Karlinsky, Leonid; Glass, James (2023). Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers. Interspeech 2023. с. 2798—2802. arXiv:2307.03183. doi:10.21437/Interspeech.2023-2193.
doi.org
Paaß, Gerhard; Giesselbach, Sven (16 лютого 2023). Foundation Models for Speech, Images, Videos, and Control. Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms (англ.). с. 313—382. arXiv:2302.08575. doi:10.1007/978-3-031-23190-2_7. ISBN978-3-031-23189-6.
Yuan, Gong; Khurana, Sameer; Karlinsky, Leonid; Glass, James (2023). Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers. Interspeech 2023. с. 2798—2802. arXiv:2307.03183. doi:10.21437/Interspeech.2023-2193.