LGM3A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications

Shihao Xu; Yiyang Luo; Justin Dauwels; Andy Khong; Zheng Wang; Qianqian Chen; Chen Cai; Wei Shi; Tat Seng Chua

doi:10.1145/3688866.3696056

LGM³A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications

Shihao Xu, Yiyang Luo, Justin Dauwels, Andy Khong, Zheng Wang, Qianqian Chen, Chen Cai, Wei Shi, Tat Seng Chua

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

This workshop aims to explore the potential of large generative models to revolutionize how we interact with multimodal information. A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, Qwen, etc. These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL). With the surge in multimodal content—encompassing images, videos, audio, and 3D models—over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements. These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc. Concurrently, certain research initiatives have developed specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production. There are also endeavors to integrate LLMs with external tools to achieve a near “any-to-any” multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT. Collectively, these models, spanning not only text and image generation but also other modalities, are referred to as large generative models. This workshop will allow researchers, practitioners, and industry professionals to explore the latest trends and best practices in the multimodal applications of large generative models.

Original language	English
Title of host publication	LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications
Publisher	Association for Computing Machinery, Inc
Pages	1-3
Number of pages	3
ISBN (Electronic)	9798400711930
DOIs	https://doi.org/10.1145/3688866.3696056
Publication status	Published - Oct 28 2024
Externally published	Yes
Event	2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024 - Melbourne, Australia Duration: Oct 28 2024 → Nov 1 2024

Publication series

Name	LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications

Conference

Conference	2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024
Country/Territory	Australia
City	Melbourne
Period	10/28/24 → 11/1/24

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

ASJC Scopus Subject Areas

Computer Science Applications

Keywords

generative models
large language models
multimodal applications

Access to Document

10.1145/3688866.3696056

Cite this

Xu, S., Luo, Y., Dauwels, J., Khong, A., Wang, Z., Chen, Q., Cai, C., Shi, W., & Chua, T. S. (2024). LGM³A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications. In LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications (pp. 1-3). (LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications). Association for Computing Machinery, Inc. https://doi.org/10.1145/3688866.3696056

Xu, Shihao ; Luo, Yiyang ; Dauwels, Justin et al. / LGM³A 2024 : the 2nd Workshop on Large Generative Models Meet Multimodal Applications. LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications. Association for Computing Machinery, Inc, 2024. pp. 1-3 (LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications).

@inproceedings{44e57c0189754aa1914b55e961e4fb73,

title = "LGM3A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications",

abstract = "This workshop aims to explore the potential of large generative models to revolutionize how we interact with multimodal information. A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, Qwen, etc. These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL). With the surge in multimodal content—encompassing images, videos, audio, and 3D models—over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements. These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc. Concurrently, certain research initiatives have developed specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production. There are also endeavors to integrate LLMs with external tools to achieve a near “any-to-any” multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT. Collectively, these models, spanning not only text and image generation but also other modalities, are referred to as large generative models. This workshop will allow researchers, practitioners, and industry professionals to explore the latest trends and best practices in the multimodal applications of large generative models.",

keywords = "generative models, large language models, multimodal applications",

author = "Shihao Xu and Yiyang Luo and Justin Dauwels and Andy Khong and Zheng Wang and Qianqian Chen and Chen Cai and Wei Shi and Chua, \{Tat Seng\}",

note = "Publisher Copyright: {\textcopyright} 2024 Copyright held by the owner/author(s).; 2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024 ; Conference date: 28-10-2024 Through 01-11-2024",

year = "2024",

month = oct,

day = "28",

doi = "10.1145/3688866.3696056",

language = "English",

series = "LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications",

publisher = "Association for Computing Machinery, Inc",

pages = "1--3",

booktitle = "LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications",

}

Xu, S, Luo, Y, Dauwels, J, Khong, A, Wang, Z, Chen, Q, Cai, C, Shi, W & Chua, TS 2024, LGM³A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications. in LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications. LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications, Association for Computing Machinery, Inc, pp. 1-3, 2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024, Melbourne, Australia, 10/28/24. https://doi.org/10.1145/3688866.3696056

LGM³A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications. / Xu, Shihao; Luo, Yiyang; Dauwels, Justin et al.
LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications. Association for Computing Machinery, Inc, 2024. p. 1-3 (LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - LGM3A 2024

T2 - 2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2024

AU - Xu, Shihao

AU - Luo, Yiyang

AU - Dauwels, Justin

AU - Khong, Andy

AU - Wang, Zheng

AU - Chen, Qianqian

AU - Cai, Chen

AU - Shi, Wei

AU - Chua, Tat Seng

PY - 2024/10/28

Y1 - 2024/10/28

N2 - This workshop aims to explore the potential of large generative models to revolutionize how we interact with multimodal information. A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, Qwen, etc. These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL). With the surge in multimodal content—encompassing images, videos, audio, and 3D models—over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements. These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc. Concurrently, certain research initiatives have developed specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production. There are also endeavors to integrate LLMs with external tools to achieve a near “any-to-any” multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT. Collectively, these models, spanning not only text and image generation but also other modalities, are referred to as large generative models. This workshop will allow researchers, practitioners, and industry professionals to explore the latest trends and best practices in the multimodal applications of large generative models.

AB - This workshop aims to explore the potential of large generative models to revolutionize how we interact with multimodal information. A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, Qwen, etc. These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL). With the surge in multimodal content—encompassing images, videos, audio, and 3D models—over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements. These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc. Concurrently, certain research initiatives have developed specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production. There are also endeavors to integrate LLMs with external tools to achieve a near “any-to-any” multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT. Collectively, these models, spanning not only text and image generation but also other modalities, are referred to as large generative models. This workshop will allow researchers, practitioners, and industry professionals to explore the latest trends and best practices in the multimodal applications of large generative models.

KW - generative models

KW - large language models

KW - multimodal applications

UR - http://www.scopus.com/inward/record.url?scp=85210829283&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85210829283&partnerID=8YFLogxK

U2 - 10.1145/3688866.3696056

DO - 10.1145/3688866.3696056

M3 - Conference contribution

AN - SCOPUS:85210829283

T3 - LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications

SP - 1

EP - 3

BT - LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications

PB - Association for Computing Machinery, Inc

Y2 - 28 October 2024 through 1 November 2024

ER -

Xu S, Luo Y, Dauwels J, Khong A, Wang Z, Chen Q et al. LGM³A 2024: the 2nd Workshop on Large Generative Models Meet Multimodal Applications. In LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications. Association for Computing Machinery, Inc. 2024. p. 1-3. (LGM3A 2024 - Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications). doi: 10.1145/3688866.3696056