About Me

Hi, my name is Minh Duc Bui, but you can call me Duc. I am a PhD student in Natural Language Processing (NLP) at Johannes Gutenberg University (JGU) Mainz (Germany), supervised by Katharina von der Wense (née Kann). My research focuses on socially aware NLP, examining how LLM-based systems represent and respond to human diversity as expressed through cultural, linguistic, and socio-demographic variation.

Cultural Variation: How cultural contexts, conventions, and norms influence model behavior
- Multi3Hate: Multicultural Hate Speech Detection @ NAACL 2025
- On Generalization across Measurement Systems @ ACL 2025
- Upcoming: Korean Honorific Translations
Linguistic Variation: How social implications embedded in language variation (e.g., dialects) affect model behavior
- Large Language Models Discriminate Against Speakers of German Dialects @ EMNLP 2025
- Upcoming: Meenzerisch (the dialect of Mainz, Germany)
Socio-Demographic Variation: How social attributes such as gender, identity, and ethnicity shape model behavior
- Upcoming: Demographic Bias in AudioLLMs

Together, these studies advance insight into how LLMs capture diversity, guiding the development of AI systems that promote equity and inclusion across varied populations.

Latest News (See all)

Nov, 2025	📢 Our paper, Large Language Models Discriminate Against Speakers of German Dialects, was featured across major German news outlets, including Tagesschau (Online), Frankfurter Allgemeine (Newspaper), SWR3 (Radio), and BR (TV)! 🎉
Sep, 2025	📢 Two of our papers, Large Language Models Discriminate Against Speakers of German Dialects and A Closer Look at Tokenization for Multiple-Choice Question Answering, have been accepted to EMNLP 2025 (Main)! See you in Suzhou, China 🇨🇳! 🎉
Jun, 2025	📢 Our paper, On Generalization across Measurement Systems, has been accepted to ACL 2025 (Main)! See you in Vienna, Austria 🇦🇹! 🎉
May, 2025	🏆 We're thrilled to share that our paper, Multi³Hate, has received the Outstanding Paper Award at NAACL 2025!
Jan, 2025	📢 Our paper, Multi³Hate, has been accepted to NAACL 2025 (Main)! See you in Albuquerque, New Mexico 🇺🇸! 🎉

Recent Publications (See all)

Large Language Models Discriminate Against Speakers of German Dialects

Minh Duc Bui*, Carolin Holtermann*, Valentin Hofmann, Anne Lauscher, Katharina von der Wense

EMNLP (Main) 2025

Paper

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs

Mario Sanz-Guerrero, Minh Duc Bui, Katharina von der Wense

EMNLP (Main) 2025

Paper

On Generalization across Measurement Systems: LLMs Entail More Test-Time Compute for Underrepresented Cultures

Minh Duc Bui, Kyung Eun Park, Goran Glavaš, Fabian David Schmidt, Katharina von der Wense

ACL (Main) 2025

Paper

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models

Minh Duc Bui, Katharina von der Wense, Anne Lauscher

NAACL (Main) 2025

Paper Outstanding Paper Award

Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget

Minh Duc Bui, Fabian David Schmidt, Goran Glavaš, Katharina von der Wense

5th Workshop on Insights from Negative Results in NLP @ NAACL 2024

Paper

The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification

Minh Duc Bui, Katharina von der Wense

4th Workshop on Trustworthy Natural Language Processing (TrustNLP) @ NAACL 2024

Paper

JGU Mainz's Submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages

Minh Duc Bui, Katharina von der Wense

4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024) @ NAACL 2024

Paper