About Me
Hi, my name is Minh Duc Bui, but you can call me Duc. I am a PhD student in Natural Language Processing (NLP) at Johannes Gutenberg University (JGU) Mainz (Germany), supervised by Katharina von der Wense (née Kann). My research focuses on socially aware NLP, examining how LLM-based systems represent and respond to human diversity as expressed through cultural, linguistic, and socio-demographic variation.
- Cultural Variation: How cultural contexts, conventions, and norms influence model behavior
- Multi3Hate: Multicultural Hate Speech Detection @ NAACL 2025
- On Generalization across Measurement Systems @ ACL 2025
- Upcoming: Korean Honorific Translations
- Linguistic Variation: How social implications embedded in language variation (e.g., dialects) affect model behavior
- Large Language Models Discriminate Against Speakers of German Dialects @ EMNLP 2025
- Upcoming: Meenzerisch (the dialect of Mainz, Germany)
- Socio-Demographic Variation: How social attributes such as gender, identity, and ethnicity shape model behavior
- Upcoming: Demographic Bias in AudioLLMs
Together, these studies advance insight into how LLMs capture diversity, guiding the development of AI systems that promote equity and inclusion across varied populations.
Latest News (See all)
| Sep, 2025 | 📢 Two of our papers, Large Language Models Discriminate Against Speakers of German Dialects and A Closer Look at Tokenization for Multiple-Choice Question Answering, have been accepted to EMNLP 2025 (Main)! See you in Suzhou, China 🇨🇳! 🎉 |
| Jun, 2025 | 📢 Our paper, On Generalization across Measurement Systems, has been accepted to ACL 2025 (Main)! See you in Vienna, Austria 🇦🇹! 🎉 |
| May, 2025 | 🏆 We're thrilled to share that our paper, Multi³Hate, has received the Outstanding Paper Award at NAACL 2025! |
| Jan, 2025 | 📢 Our paper, Multi³Hate, has been accepted to NAACL 2025 (Main)! See you in Albuquerque, New Mexico 🇺🇸! 🎉 |
| Oct, 2024 | ✅ Finished my research visit! Check out our work on multicultural hate speech: Multi³Hate. |
Recent Publications (See all)
Large Language Models Discriminate Against Speakers of German Dialects
Minh Duc Bui*, Carolin Holtermann*, Valentin Hofmann, Anne Lauscher, Katharina von der Wense
EMNLP (Main) 2025
Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs
Mario Sanz-Guerrero, Minh Duc Bui, Katharina von der Wense
EMNLP (Main) 2025
On Generalization across Measurement Systems: LLMs Entail More Test-Time Compute for Underrepresented Cultures
Minh Duc Bui, Kyung Eun Park, Goran Glavaš, Fabian David Schmidt, Katharina von der Wense
ACL (Main) 2025
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models
Minh Duc Bui, Katharina von der Wense, Anne Lauscher
NAACL (Main) 2025
Paper Outstanding Paper Award
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
Minh Duc Bui, Fabian David Schmidt, Goran Glavaš, Katharina von der Wense
5th Workshop on Insights from Negative Results in NLP @ NAACL 2024
The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification
Minh Duc Bui, Katharina von der Wense
4th Workshop on Trustworthy Natural Language Processing (TrustNLP) @ NAACL 2024
JGU Mainz's Submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages
Minh Duc Bui, Katharina von der Wense
4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024) @ NAACL 2024
