{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T16:43:15Z","timestamp":1772901795125,"version":"3.50.1"},"reference-count":9,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>\n            Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive\n            <jats:bold>M<\/jats:bold>\n            ulti-modal\n            <jats:bold>Q<\/jats:bold>\n            uery\n            <jats:bold>A<\/jats:bold>\n            nswering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index, integrated with cutting-edge LLMs. It comprises five core components: Data Preprocessing, Vector Representation, Index Construction, Query Execution, and Answer Generation, all orchestrated by a dedicated coordinator to ensure smooth data flow from input to answer generation. One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities, facilitating precise measurement of multimodal information similarity. Furthermore, the system achieves efficient retrieval through our advanced navigation graph index, refined using computational pruning techniques. Another highlight of our system is its pluggable processing framework, allowing seamless integration of embedding models, graph indexes, and LLMs. This flexibility provides users diverse options for gaining insights from their multi-modal knowledge base. A preliminary video introduction of MQA is available at https:\/\/youtu.be\/xvUuo2ZIqWk.\n          <\/jats:p>","DOI":"10.14778\/3685800.3685868","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4333-4336","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["An Interactive Multi-Modal Query Answering System with Retrieval-Augmented Large Language Models"],"prefix":"10.14778","volume":"17","author":[{"given":"Mengzhao","family":"Wang","sequence":"first","affiliation":[{"name":"Zhejiang University"}]},{"given":"Haotian","family":"Wu","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]},{"given":"Xiangyu","family":"Ke","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]},{"given":"Yunjun","family":"Gao","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]},{"given":"Xiaoliang","family":"Xu","sequence":"additional","affiliation":[{"name":"Hangzhou Dianzi University"}]},{"given":"Lu","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"GPT-4 is OpenAI's most advanced system, producing safer and more useful responses. https:\/\/openai.com\/gpt-4. [Online","year":"2024","unstructured":"2024. GPT-4 is OpenAI's most advanced system, producing safer and more useful responses. https:\/\/openai.com\/gpt-4. [Online; accessed 07-April-2024]."},{"key":"e_1_2_1_2_1","volume-title":"ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. In International Conference on Learning Representations (ICLR).","author":"Delmas Ginger","year":"2022","unstructured":"Ginger Delmas, Rafael Sampaio de Rezende, Gabriela Csurka, and Diane Larlus. 2022. ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Lei Huang Weijiang Yu Weitao Ma Weihong Zhong Zhangyin Feng Haotian Wang Qianglong Chen Weihua Peng Xiaocheng Feng Bing Qin et al. 2023. A survey on hallucination in large language models: Principles taxonomy challenges and open questions. arXiv:2311.05232 (2023).","DOI":"10.1145\/3703155"},{"key":"e_1_2_1_4_1","volume-title":"When Large Language Models Meet Vector Databases: A Survey. arXiv:2402","author":"Jing Zhi","year":"2024","unstructured":"Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Chunjiang Liu, Haiyun Xu, and Kehai Chen. 2024. When Large Language Models Meet Vector Databases: A Survey. arXiv:2402.01763 (2024)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457550"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-022-2041-5"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-024-40231-1"},{"key":"e_1_2_1_8_1","volume-title":"MUST: An Effective and Scalable Framework for Multimodal Search of Target Modality. In IEEE International Conference on Data Engineering (ICDE).","author":"Wang Mengzhao","year":"2024","unstructured":"Mengzhao Wang, Xiangyu Ke, Xiaoliang Xu, Lu Chen, Yunjun Gao, Pinpin Huang, and Runkai Zhu. 2024. MUST: An Effective and Scalable Framework for Multimodal Search of Target Modality. In IEEE International Conference on Data Engineering (ICDE)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639269"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685868","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:27:38Z","timestamp":1735622858000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685868"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":9,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685868"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685868","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}