Blaise Cruz

Samsung Research Philippines


Mabuhay! 👋

I’m a researcher at Samsung Research Philippines where I specialize in problems at the intersection of Multilinguality and Low-resource Languages.

Particularly, I am interested in understanding the behavior of models when constrained under low-resource multilingual domains. I’ve collaborated with many talented colleagues on various topics under this umbrella, including:

Previously, I’ve also been affiliated with UP Diliman, DLSU CeLT, and Senti AI.

If you’re interested in collaborating or if you want to chat about low-resource languages, feel free to get in touch! You may reach me through my email me (at) blaisecruz (dot) com.


Jun 17, 2024 The preprint for our paper SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages is out!
Jun 12, 2024 The preprint for our paper CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark is out!
May 15, 2024 I’ll be joining the Mohammed bin Zayed University of Artificial Intelligence as a PhD student this Fall 2024!
Mar 06, 2024 The SEACrowd Data Catalogue – the main consolidated repositority for all datasets collected by the SEACrowd Project – is now live!

Latest Posts

Jun 12, 2024 Welcome!

Selected Publications

  1. EMNLP
    Multilingual Large Language Models Are Not (Yet) Code-Switchers
    Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Indra Winata, and Alham Fikri Aji
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2023
  2. LREC
    Improving Large-scale Language Models and Resources for Filipino
    Jan Christian Blaise Cruz, and Charibeth Cheng
    In Proceedings of the 13th Language Resources and Evaluation Conference , 2022
  3. WMT
    Data Processing Matters: SRPH-Konvergen AI’s Machine Translation System for WMT’21
    Lintang Sutawika, and Jan Christian Blaise Cruz
    In Proceedings of the Sixth Conference on Machine Translation , 2021