Mario Sanz-Guerrero

prof_pic.jpg

Hi, I’m Mario 👋! I am a PhD student in Natural Language Processing at Johannes Gutenberg University Mainz, supervised by Katharina von der Wense in the NALA lab. Previously, I completed my BSc in Computer Science and MSc in Artificial Intelligence. I have also worked as an AI engineer in the healthcare industry.

📚 Research Interests

I’m continually impressed by how large language models, trained on the seemingly “simple” task of next‑word prediction, exhibit surprising emergent capabilities far beyond their original design. Yet this power raises pressing questions about trustworthiness – can we actually trust what these models say, and can we trace why they say it?

  • LLM Calibration 📊

    How can we make a model’s confidence a reliable signal of its actual correctness?

  • Training Data Attribution in LLMs 🔍

    Which training examples shape a model’s predictions and behaviors, and how can we trace their influence?

  • Biomedical NLP đź’Š

    How can we leverage LLMs to accelerate drug discovery, clinical note analysis, and literature mining?

News

Feb. 2026

📄 Our paper, “Peak Attention U-Net: Enhancing ECG delineation with attention” was accepted to the journal Biomedical Signal Processing and Control!

Nov. 2025

📄 Our paper, “Mitigating Label Length Bias in Large Language Models” was accepted to AACL 2025 (Main) in Mumbai, India 🇮🇳!

Sep. 2025

📄 Two of our papers, “Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs” and “Molecular String Representation Preferences in Pretrained LLMs”, were accepted to EMNLP 2025 (Main)! See you in Suzhou, China 🇨🇳!

Aug. 2025

📄 Our paper, “Reducing leads, enhancing wearable practicality: A comparative study of 3-lead vs. 12-lead ECG classification” was accepted to the journal Medical Engineering & Physics!

Jul. 2025

I’ll be attending ACL 2025. See you in Vienna, Austria 🇦🇹!

Selected Publications

  1. BSPC
    Peak Attention U-Net: Enhancing ECG delineation with attention
    Mario Sanz-Guerrero, Sergio González-Cabeza, Luis Piñuel, and 4 more authors
    Biomedical Signal Processing and Control, 2026
  2. AACL’25
    Mitigating Label Length Bias in Large Language Models
    Mario Sanz-Guerrero and Katharina von der Wense
    Nov 2025
  3. EMNLP’25
    Molecular String Representation Preferences in Pretrained LLMs: A Comparative Study in Zero- & Few-Shot Molecular Property Prediction
    George Arthur Baker, Mario Sanz-Guerrero, and Katharina von der Wense
    In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Nov 2025
  4. EMNLP’25
    Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs
    Mario Sanz-Guerrero, Minh Duc Bui, and Katharina von der Wense
    In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Nov 2025
  5. NAACL’25 Workshop
    Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models
    Mario Sanz-Guerrero and Katharina von der Wense
    In The Sixth Workshop on Insights from Negative Results in NLP, May 2025
  6. Inteligencia Artificial
    Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending
    Mario Sanz-Guerrero and Javier Arroyo
    Inteligencia Artificial, Mar 2025