LGM3A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications

Shihao Xu, Yiyang Luo, Justin Dauwels, Andy Khong, Zheng Wang, Qianqian Chen, Chen Cai, Wei Shi, Tat Seng Chua

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This workshop aims to explore the potential of large generative models to revolutionize how we interact with multimodal information. A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, Qwen, etc. These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL). With the surge in multimodal content—encompassing images, videos, audio, and 3D models—over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements. These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc. Concurrently, certain research initiatives have developed specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production. There are also endeavors to integrate LLMs with external tools to achieve a near “any-to-any” multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT. Collectively, these models, spanning not only text and image generation but also other modalities, are referred to as large generative models. This workshop will allow researchers, practitioners, and industry professionals to explore the latest trends and best practices in the multimodal applications of large generative models.

Original languageEnglish
Title of host publicationLGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications
PublisherAssociation for Computing Machinery, Inc
Pages1-3
Number of pages3
ISBN (Electronic)9798400711930
DOIs
Publication statusPublished - Oct 28 2024
Externally publishedYes
Event2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024 - Melbourne, Australia
Duration: Oct 28 2024Nov 1 2024

Publication series

NameLGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications

Conference

Conference2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024
Country/TerritoryAustralia
CityMelbourne
Period10/28/2411/1/24

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

ASJC Scopus Subject Areas

  • Computer Science Applications

Keywords

  • generative models
  • large language models
  • multimodal applications

Fingerprint

Dive into the research topics of 'LGM3A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications'. Together they form a unique fingerprint.

Cite this