In the context of three problem areas identified as critical for the future development of the Internet, WebFAQ aimed at addressing the problem of the analysis and representation of the information content. More specifically, the project concentrated on the access to information contained in very large, unstructured, heterogeneous repositories, on multimodal presentation of information, and on the assessment of the quality of information.
Four years ago, NLLB set a milestone with MT for 200 languages. Today we present OMT: a family of models that extend support to 1600 languages while delivering competitive results in high/mid-resource language, with our 1B-8B models matching frontier and open 70B LLMs.
🧵(1/n)
📢I'm organizing a BoF session at #EACL2026 called Tokenization & Beyond, aiming to gather researchers exploring tokenization and alternatives such as byte-level and pixel-based approaches. Sign up using the form if you're interested! #NLProc @eaclmeeting
New benchmark evaluates 🔎 #AI detection tools across languages, 🌍 finding performance gaps 📉 in low-resource languages and challenges ⚠️ with distinguishing AI-translated and hybrid human–AI text.
@jasonslucas1 @adaku_uchendu @penn_state @Visa