Eylon Caplan




Hello! I am a third-year Ph.D. student in the Department of Computer Science at Purdue University, advised by Prof. Dan Goldwasser. My work centers on using natural language processing to understand and draw conclusions from large amounts of unstructured data. In particular, I've built scalable and interpretable NLP systems that reason about human behavior, beliefs, emotions, and values as expressed in noisy real-world corpora—especially social media.

Recently, I developed ConceptCarve, a framework for identifying how abstract social concepts are expressed across communities by combining language model reasoning with scalable retrieval. I introduced the Splits! dataset, a large Reddit-based dataset with demographic and topical annotations which allows for investigation of how different demographic groups communicate about shared topics. I also investigated social concepts in a multimodal setting, developing VIBE, a benchmark for evaluating how well VLMs interpret visual cues in videos. From a technical standpoint, my work has extensively involved large-scale text/video data processing, retrieval, reranking, text clustering, and dataset design, collection, annotation, and validation—in multiple modalities.

Before coming to Purdue, I earned my B.Sc. in Computer Science and Mathematics from the University of Nebraska-Lincoln. There, I worked with Prof. Stephen Scott on continuous-layered neural architectures guided by integral equations, and with Prof. M. R. Hasan on improving classification performance for rare classes.




News

  • Aug 2025 — Our paper, VIBE: Can a VLM Read the Room?  has been accepted to Findings of EMNLP 2025! (arXiv).
  • Jul 2025 — Submitted Splits! A Flexible Dataset and Evaluation Framework for Sociocultural Linguistic Investigation (under review) (arXiv).
  • May 2025 — Submitted VIBE: Can a VLM Read the Room? (under review) (arXiv).
  • May 2025 — Our paper, ConceptCarve: Dynamic Realization of Evidence has been accepted to ACL 2025 Main Conference! (link).
  • Jan 2025 — Released ACL Searcher, an open-source semantic search tool for ACL paper abstracts using ColBERT (GitHub).
  • Dec 2024 — Submitted ConceptCarve: Dynamic Realization of Evidence to ACL 2025 (under review) (arXiv).
  • Aug 2023 — Started my Ph.D. in Computer Science at Purdue University (advisor: Dan Goldwasser).
  • Aug 2020 — Completed a second UCARE undergraduate research fellowship at UNL with Prof. Stephen Scott. Presented Continuous-Layered Dense Artificial Neural Networks at the 2020 Virtual UCARE Symposium.
  • Aug 2019 — Completed a UCARE undergraduate research fellowship at UNL with Prof. M. R. Hasan. Presented Improving Accuracy of Rare Classes in Machine Learning Classifiers at the 2019 Virtual UCARE Symposium, and was featured in a UNL article.