Dongwon Lee

DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects

DIA-HARM evaluates 16 harmful content detection models across 50 English dialects using 195K+ samples, revealing 1.4–3.6% F1 drops for fine-tuned models and up to 27% for zero-shot …

Jason S. Lucas, Ph.D., MPH, M.Sc.

• Apr 8, 2026 • 1 min read

Multilingual NLP

BLUFF: Benchmarking in Low-resoUrce Languages for detecting Falsehoods and Fake news

BLUFF is the largest multilingual fake news detection benchmark, spanning 79 languages with 202K+ samples. It introduces AXL-CoI for adversarial generation and mPURIFY for quality …

Jason S. Lucas, Ph.D., MPH, M.Sc.

• Feb 1, 2026 • 1 min read

Large Language Models

Beyond speculation: Measuring the Growing Presence of LLM-generated texts in Multilingual Disinformation

This IEEE article provides empirical measurements of LLM-generated texts in multilingual disinformation, moving beyond speculation to analyze the growing presence and …

dominik-macko

• Jan 1, 2026 • 1 min read

Generative AI

Generative AI Disproportionately Harms Long Tail Users

This Computer article examines how Generative AI disproportionately harms longtail users, focusing on the structural inequalities that emerge when AI systems are deployed without …

barani-maung-maung

• Nov 1, 2024 • 1 min read

Generative AI

The Longtail Impact of Generative AI on Disinformation: Harmonizing Dichotomous Perspectives

This IEEE Intelligent Systems article examines the "longtail" impact of Generative AI on disinformation in high-impact events and resource-limited settings. We analyze four …

Jason S. Lucas, Ph.D., MPH, M.Sc.

• Sep 1, 2024 • 1 min read

Authorship Obfuscation in Multilingual Machine-Generated Text Detection

This research from Penn State and KiNiT, benchmarks the effectiveness of 10 authorship obfuscation (AO) techniques against 37 machine-generated text (MGT) detection methods across …

dominik-macko

• Jan 5, 2024 • 1 min read

NLP

Fighting Fire with Fire - EMNLP 2023

The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation

jason-lucas

• Dec 6, 2023 • 1 min read

MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark

This research from Penn State and KiNiT introduces MULTITuDE, a novel multilingual dataset for detecting machine-generated text. Comprised of over 74,000 authentic and …

dominik-macko

• Dec 5, 2023 • 1 min read

Adversarial ML

Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation

This research project is a collaboration with Penn State and MIT Lincoln Lab. Our study demonstrates the dual capacity of LLMs for offensive misuse and defense detection against …

Jason S. Lucas, Ph.D., MPH, M.Sc.

• Dec 5, 2023 • 1 min read

Detecting False Claims in Low-Resource Regions: A Case Study of Caribbean Islands

This paper is the first attempt to detect COVID-19 misinformation (in English, Spanish, and Haitian French) populated in the Caribbean regions, using the fact-checked claims in the …

Jason S. Lucas, Ph.D., MPH, M.Sc.

• May 1, 2022 • 1 min read

No results found

Dongwon Lee