Multimodal learning (English Wikipedia)

Analysis of information sources in references of the Wikipedia article "Multimodal learning" in English language version.

refsWebsite
Global rank English rank
69th place
59th place
1st place
1st place
low place
low place
383rd place
320th place
2nd place
2nd place
5th place
5th place
4th place
4th place
low place
low place
9,352nd place
5,696th place
low place
low place
low place
low place
low place
low place
low place
low place
low place
low place
1,559th place
1,155th place
9th place
13th place
187th place
146th place
low place
7,637th place
low place
low place
3,700th place
2,360th place
11th place
8th place
8,255th place
low place

aaai.org

ojs.aaai.org

analyticsindiamag.com

arxiv.org

  • Hendriksen, Mariya; Bleeker, Maurits; Vakulenko, Svitlana; van Noord, Nanne; Kuiper, Ernst; de Rijke, Maarten (2021). "Extending CLIP for Category-to-image Retrieval in E-commerce". arXiv:2112.11294 [cs.CV].
  • Mokady, Ron; Hertz, Amir; Bermano, Amit H. (2021). "ClipCap: CLIP Prefix for Image Captioning". arXiv:2111.09734 [cs.CV].
  • Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob (2021-06-03). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". arXiv:2010.11929 [cs.CV].
  • Gulati, Anmol; Qin, James; Chiu, Chung-Cheng; Parmar, Niki; Zhang, Yu; Yu, Jiahui; Han, Wei; Wang, Shibo; Zhang, Zhengdong; Wu, Yonghui; Pang, Ruoming (2020). "Conformer: Convolution-augmented Transformer for Speech Recognition". arXiv:2005.08100 [eess.AS].
  • Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS].
  • Jaegle, Andrew; Gimeno, Felix; Brock, Andrew; Zisserman, Andrew; Vinyals, Oriol; Carreira, Joao (2021-06-22). "Perceiver: General Perception with Iterative Attention". arXiv:2103.03206 [cs.CV].
  • Jaegle, Andrew; Borgeaud, Sebastian; Alayrac, Jean-Baptiste; Doersch, Carl; Ionescu, Catalin; Ding, David; Koppula, Skanda; Zoran, Daniel; Brock, Andrew; Shelhamer, Evan; Hénaff, Olivier (2021-08-02). "Perceiver IO: A General Architecture for Structured Inputs & Outputs". arXiv:2107.14795 [cs.LG].
  • Chang, Huiwen; Zhang, Han; Barber, Jarred; Maschinot, A. J.; Lezama, Jose; Jiang, Lu; Yang, Ming-Hsuan; Murphy, Kevin; Freeman, William T. (2023-01-02). "Muse: Text-To-Image Generation via Masked Generative Transformers". arXiv:2301.00704 [cs.CV].
  • Ramesh, Aditya; Pavlov, Mikhail; Goh, Gabriel; Gray, Scott; Voss, Chelsea; Radford, Alec; Chen, Mark; Sutskever, Ilya (2021-02-26), Zero-Shot Text-to-Image Generation, arXiv:2102.12092
  • Yu, Jiahui; Xu, Yuanzhong; Koh, Jing Yu; Luong, Thang; Baid, Gunjan; Wang, Zirui; Vasudevan, Vijay; Ku, Alexander; Yang, Yinfei (2022-06-21), Scaling Autoregressive Models for Content-Rich Text-to-Image Generation, arXiv:2206.10789
  • Li, Junnan; Li, Dongxu; Savarese, Silvio; Hoi, Steven (2023-01-01). "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models". arXiv:2301.12597 [cs.CV].
  • Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katherine; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao (2022-12-06). "Flamingo: a Visual Language Model for Few-Shot Learning". Advances in Neural Information Processing Systems. 35: 23716–23736. arXiv:2204.14198. Archived from the original on 2023-07-02. Retrieved 2023-07-02.
  • Driess, Danny; Xia, Fei; Sajjadi, Mehdi S. M.; Lynch, Corey; Chowdhery, Aakanksha; Ichter, Brian; Wahid, Ayzaan; Tompson, Jonathan; Vuong, Quan; Yu, Tianhe; Huang, Wenlong; Chebotar, Yevgen; Sermanet, Pierre; Duckworth, Daniel; Levine, Sergey (2023-03-01). "PaLM-E: An Embodied Multimodal Language Model". arXiv:2303.03378 [cs.LG].
  • Liu, Haotian; Li, Chunyuan; Wu, Qingyang; Lee, Yong Jae (2023-04-01). "Visual Instruction Tuning". arXiv:2304.08485 [cs.CV].
  • Zhang, Hang; Li, Xin; Bing, Lidong (2023-06-01). "Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding". arXiv:2306.02858 [cs.CL].
  • OpenAI (2023-03-27). "GPT-4 Technical Report". arXiv:2303.08774 [cs.CL].
  • Hendriksen, Mariya; Vakulenko, Svitlana; Kuiper, Ernst; de Rijke, Maarten (2023). "Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study". arXiv:2301.05174 [cs.CV].
  • Shi, Yuge; Siddharth, N.; Paige, Brooks; Torr, Philip HS (2019). "Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models". arXiv:1911.03393 [cs.LG].
  • Shi, Yuge; Siddharth, N.; Paige, Brooks; Torr, Philip HS (2019). "Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models". arXiv:1911.03393 [cs.LG].

doi.org

github.com

jmlr.org

lmsys.org

medicalxpress.com

mlr.press

proceedings.mlr.press

  • Kiros, Ryan; Salakhutdinov, Ruslan; Zemel, Rich (2014-06-18). "Multimodal Neural Language Models". Proceedings of the 31st International Conference on Machine Learning. PMLR: 595–603. Archived from the original on 2023-07-02. Retrieved 2023-07-02.

neurips.cc

proceedings.neurips.cc

nih.gov

ncbi.nlm.nih.gov

openai.com

cdn.openai.com

openreview.net

research.google

sites.research.google

semanticscholar.org

api.semanticscholar.org

techcrunch.com

thecvf.com

openaccess.thecvf.com

  • Antol, Stanislaw; Agrawal, Aishwarya; Lu, Jiasen; Mitchell, Margaret; Batra, Dhruv; Zitnick, C. Lawrence; Parikh, Devi (2015). "VQA: Visual Question Answering". ICCV: 2425–2433. Archived from the original on 2023-07-02. Retrieved 2023-07-02.

theregister.com

unite.ai

web.archive.org

worldcat.org

search.worldcat.org

youtube.com