Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
π§ Under Construction!! π§β
Published:
π§ Under Construction!! π§β
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
publications
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
Published in 5th Workshop on Insights from Negative Results in NLP, 2024
Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that smaller models can close the gap to larger counterparts if trained on more data (i.e., processing more tokens)-and under a fixed computation budget, smaller models are able be process more data than larger models. We thus hypothesize that KD might, in fact, be suboptimal to pretraining from scratch for obtaining smaller LMs, when appropriately accounting for the compute budget. [β¦]
The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification
Published in 4th Workshop on Trustworthy Natural Language Processing (TrustNLP) at NAACL, 2024
Current natural language processing (NLP) research tends to focus on only one or, less frequently, two dimensions - e.g., performance, privacy, fairness, or efficiency - at a time, which may lead to suboptimal conclusions and often overlooking the broader goal of achieving trustworthy NLP. Work on adapter modules focuses on improving performance and efficiency, with no investigation of unintended consequences on other aspects such as fairness. To address this gap, we conduct experiments on three text classification datasets by either (1) finetuning all parameters or (2) using adapter modules. [β¦]
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models
Published in Preprint, 2024
[β¦], we create the first multimodal and multilingual parallel hate speech dataset, annotated by a multicultural set of annotators, called Multi3Hate. [β¦] conduct experiments with 5 large VLMs in a zero-shot setting, finding that these models align more closely with annotations from the US than with those from other cultures, even when the memes and prompts are presented in the dominant language of the other culture.
talks
π§ Under Construction!! π§
Published:
π§ Under Construction!! π§β