BLUFF Accepted at KDD 2026 (Datasets and Benchmarks Track)

May 15, 2026 · 2 min read

Excited to share that our paper BLUFF: Benchmarking in Low-resoUrce Languages for detecting Falsehoods and Fake news has been accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026) in the Datasets and Benchmarks Track!

BLUFF is the largest multilingual fake news detection benchmark to date, spanning:

79 languages — 20 high-resource “big-head” + 59 low-resource “long-tail”
202K+ samples combining fact-checked content from 130 IFCN-certified organizations and LLM-generated content from 19 diverse models
AXL-CoI (Adversarial Cross-Lingual Agentic Chain-of-Interactions) — a multi-agentic framework for controlled multilingual content generation
mPURIFY — a 4-stage quality filtering pipeline ensuring dataset integrity
Bidirectional translation coverage across 70+ languages with 4 prompt variants

Our experiments reveal state-of-the-art detectors suffer up to 25.3% Macro-F1 degradation on low-resource versus high-resource languages — a systematic, not marginal, gap. This is the digital language divide manifest in detection systems: the communities least represented in training data are also the least protected from AI-generated disinformation.

Deeply grateful to my co-authors: Matt Murtagh-White, Adaku Uchendu, Ali Al-Lawati, Michiharu Yamashita, Dominik Macko, Ivan Srba, Robert Moro, and my advisor Dr. Dongwon Lee. Special thanks to our collaborators at Kempelen Institute of Intelligent Technologies (KInIT) and MIT Lincoln Laboratory.

Resources:

See you in Toronto for KDD 2026! 🍁

Last updated on May 19, 2026

Authors

Jason S. Lucas, Ph.D., MPH, M.Sc.

Tenure-Track Assistant Professor & Director, Secure and Ethical AI Lab (SEAL) — CU Boulder

I completed my Ph.D. in Informatics at Penn State University (defended May 2026; formal conferral August 2026), where I conducted research at the PIKE Research Lab under Dr. Dongwon Lee and the College of IST. Starting August 2026, I will join the Department of Information Science at the College of Media, Communication and Information (CMDI), University of Colorado Boulder, as a Tenure-Track Assistant Professor and founding Director of the Secure and Ethical AI Lab (SEAL). My research advances trustworthy and equitable AI for the world’s languages and communities — spanning multilingual NLP, low-resource and dialectal language technology, AI safety, and information integrity, with work extending across 70+ languages. I have authored 14+ peer-reviewed papers with 315+ citations in premier venues including ACL, EMNLP, NAACL, ICML, KDD, and IEEE.

My doctoral research focuses on bridging the digital language divide through transfer learning, classification (NLU), generation (NLG), adversarial attacks, and developing end-to-end AI pipelines using RAG and Agentic AI workflows for combating multilingual threats. Drawing from my Grenadian background and knowledge of local Creole languages, I bring a global perspective to AI challenges, working to democratize state-of-the-art AI capabilities for underserved linguistic communities worldwide. My mission is to develop robust multilingual multimodal systems and mitigate evolving security vulnerabilities while enhancing access to human language technology through cutting-edge solutions.

As an NSF LinDiv Fellow, I conduct transdisciplinary research advancing human-AI language interaction for social good. I actively mentor 5+ research interns and teach Applied Generative AI courses. Through industry experience at Lawrence Livermore National Lab, Interaction LLC, and Coalfire, I bridge academic research with practical applications in combating evolving security threats and enhancing global AI accessibility. I see multilingual advances and interdisciplinary collaboration as a competitive advantage, not a communication challenge. Beyond research, I stay active through dance, fitness, martial arts, and community service.

Successfully Defended My Ph.D. — Call Me Dr. Lucas 🎓 May 11, 2026 →

No results found

BLUFF Accepted at KDD 2026 (Datasets and Benchmarks Track)