Publications
More details about my publications can be found on my Google Scholar profile.
2025
- Inteligencia ArtificialCredit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P LendingMario Sanz-Guerrero and Javier ArroyoInteligencia Artificial, Mar 2025
Peer-to-peer (P2P) lending connects borrowers and lenders through online platforms but suffers from significant information asymmetry, as lenders often lack sufficient data to assess borrowers’ creditworthiness. This paper addresses this challenge by leveraging BERT, a Large Language Model (LLM) known for its ability to capture contextual nuances in text, to generate a risk score based on borrowers’ loan descriptions using a dataset from the Lending Club platform. We fine-tune BERT to distinguish between defaulted and non-defaulted loans using the loan descriptions provided by the borrowers. The resulting BERT-generated risk score is then integrated as an additional feature into an XGBoost classifier used at the loan granting stage, where decision-makers have limited information available to guide their decisions. This integration enhances predictive performance, with improvements in balanced accuracy and AUC, highlighting the value of textual features in complementing traditional inputs. Moreover, we find that the incorporation of the BERT score alters how classification models utilize traditional input variables, with these changes varying by loan purpose. These findings suggest that BERT discerns meaningful patterns in loan descriptions, encompassing borrower-specific features, specific purposes, and linguistic characteristics. However, the inherent opacity of LLMs and their potential biases underscore the need for transparent frameworks to ensure regulatory compliance and foster trust. Overall, this study demonstrates how LLM-derived insights interact with traditional features in credit risk modeling, opening new avenues to enhance the explainability and fairness of these models.
@article{sanz-guerrero2025credit, title = {Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending}, author = {Sanz-Guerrero, Mario and Arroyo, Javier}, year = {2025}, month = mar, journal = {Inteligencia Artificial}, volume = {28}, number = {75}, pages = {220–247}, url = {https://journal.iberamia.org/index.php/intartif/article/view/1890}, doi = {10.4114/intartif.vol28iss75pp220-247}, }
- arXivAsking Again and Again: Exploring LLM Robustness to Repeated QuestionsSagi Shaier, Mario Sanz-Guerrero, and Katharina von der WenseMar 2025
This study investigates whether repeating questions within prompts influences the performance of large language models (LLMs). We hypothesize that reiterating a question within a single prompt might enhance the model’s focus on key elements of the query. We evaluate five recent LLMs – including GPT-4o-mini, DeepSeek-V3, and smaller open-source models – on three reading comprehension datasets under different prompt settings, varying question repetition levels (1, 3, or 5 times per prompt). Our results demonstrate that question repetition can increase models’ accuracy by up to 6%. However, across all models, settings, and datasets, we do not find the result statistically significant. These findings provide insights into prompt design and LLM behavior, suggesting that repetition alone does not significantly impact output quality.
@misc{shaier2025asking, title = {Asking Again and Again: Exploring LLM Robustness to Repeated Questions}, author = {Shaier, Sagi and Sanz-Guerrero, Mario and {von der Wense}, Katharina}, year = {2025}, month = mar, eprint = {2412.07923}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2412.07923}, doi = {10.48550/arXiv.2412.07923}, }
- NAACL’25 WorkshopCorrective In-Context Learning: Evaluating Self-Correction in Large Language ModelsMario Sanz-Guerrero and Katharina von der WenseIn The Sixth Workshop on Insights from Negative Results in NLP, May 2025
In-context learning (ICL) has transformed the use of large language models (LLMs) for NLP tasks, enabling few-shot learning by conditioning on labeled examples without finetuning. Despite its effectiveness, ICL is prone to errors, especially for challenging examples. With the goal of improving the performance of ICL, we propose *corrective in-context learning* (CICL), an approach that incorporates a model’s incorrect predictions alongside ground truth corrections into the prompt, aiming to enhance classification accuracy through self-correction. However, contrary to our hypothesis, extensive experiments on text classification tasks demonstrate that CICL consistently underperforms standard ICL, with performance degrading as the proportion of corrections in the prompt increases. Our findings indicate that CICL introduces confusion by disrupting the model’s task understanding, rather than refining its predictions. Additionally, we observe that presenting harder examples in standard ICL does not improve performance, suggesting that example difficulty alone may not be a reliable criterion for effective selection. By presenting these negative results, we provide important insights into the limitations of self-corrective mechanisms in LLMs and offer directions for future research.
@inproceedings{cicl_2025, title = {Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models}, author = {Sanz-Guerrero, Mario and {von der Wense}, Katharina}, booktitle = {The Sixth Workshop on Insights from Negative Results in NLP}, month = may, year = {2025}, address = {Albuquerque, New Mexico}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.insights-1.4/}, doi = {10.18653/v1/2025.insights-1.4}, pages = {24--33}, isbn = {979-8-89176-240-4}, }
- Med. Eng. Phys.Reducing leads, enhancing wearable practicality: A comparative study of 3-lead vs. 12-lead ECG classificationSergio González-Cabeza, Mario Sanz-Guerrero, Luis Piñuel, and 4 more authorsMedical Engineering & Physics, Nov 2025
Inspired by recent advances in clinical research and the growing adoption of wearable ECG devices, this study explores the feasibility of using reduced-lead ECGs for automated detection of heart anomalies using deep learning, providing a more accessible and cost-effective alternative to traditional 12-lead ECGs. This research adapts and evaluates a state-of-the-art 12-lead deep learning model (from Ribeiro et al. [1]) for 3-lead configurations. The 12-lead ECG model architecture was trained from scratch on the public database PTB-XL. It was then modified to use 3 leads by only changing the input layer. Despite a 75% reduction in input data, the 3-lead model showed only a subtle 3% performance drop. To address this gap, the 3-lead model was further optimized using a novel strategy that combines transfer learning and a One-vs-All classification approach. Using PTB-XL’s five-class setup (normal vs. four pathologies: myocardial infarction, ST/T change, conduction disturbance, and hypertrophy), we report the micro-averaged F1-score across all test samples. The new optimized 3-lead model achieves a global (micro-averaged) F1-score of 77% (vs. 78% for the 12-lead model). These findings highlight the potential of simplified and cost-effective reduced-lead classification models to deliver near-equivalent diagnostic accuracy. This advancement could democratize access to early cardiac diagnostics, particularly in resource-limited settings.
@article{3_leads_2025, title = {Reducing leads, enhancing wearable practicality: A comparative study of 3-lead vs. 12-lead ECG classification}, journal = {Medical Engineering & Physics}, volume = {145}, pages = {104419}, year = {2025}, month = nov, issn = {1350-4533}, doi = {https://doi.org/10.1016/j.medengphy.2025.104419}, url = {https://www.sciencedirect.com/science/article/pii/S1350453325001389}, author = {González-Cabeza, Sergio and Sanz-Guerrero, Mario and Piñuel, Luis and {Buelga Suárez}, Mauro Luis and {Alonso Salinas}, Gonzalo Luis and Diaz-Vicente, Marian and Recas, Joaquín}, keywords = {Deep learning, Electrocardiography, One-vs-All classification, Reduced-lead ECG, Transfer learning}, }