cv/resume

Basics

Name Mezbaur Rahman
Languages English (Fluent), Bengali (Native), Hindi (Conversational)
Email mezbaur00797@gmail.com

Work

  • 2023.08 - Present
    Research Assistant
    University of Illinois Chicago
    Working in the dl4nlpspace Lab under Professor Cornelia Caragea. Research focuses on NLP problems, semi-supervised learning, and learning from noisy labels using large language models.
    • Developing novel methods for semi supervised text classification with LLMs
    • Implementing and evaluating methods for learning with noisy labels using LLMs
  • 2020.01 - 2023.07
    Lecturer
    Islamic University of Technology
    Taught undergraduate courses in Computer Science and conducted research in natural language processing and machine learning.
    • Taught courses in Data Structures, Algorithms, and Programming
    • Supervised undergraduate thesis projects
    • Conducted research and published papers

Education

  • 2023.08 - Current

    Chicago, Illinois

    Doctor of Philosophy
    University of Illinois Chicago
    Computer Science
    CGPA (Current): 4.00/4.00
    • Advisor: Cornelia Caragea
    • Research Area: Natural Language Processing, Machine Learning
  • 2016.01 - 2019.10

    Dhaka, Bangladesh

    Bachelor of Science
    Islamic University of Technology
    Computer Science and Engineering
    CGPA: 3.86/4.00

Awards

Skills

NLP & ML Frameworks
PyTorch
Pandas
Hugging Face
vLLM
Programming, Tools & Systems
Python
C++
SQL
Git
Docker
Kubernetes

Interests

Research Interests
Natural Language Processing
Semi-supervised Learning
Noisy Label Learning
Large Language Models
Hobbies
Photography
Traveling
Reading
Tennis

Publications

  • 2025.11
    LLM-Guided Co-Training for Text Classification
    EMNLP 2025 (Main Conference)
    This paper introduces a novel weighted co-training framework guided by Large Language Models (LLMs), where two encoder-only networks iteratively train each other using dynamically assigned sample weights based on confidence in LLM-generated pseudo-labels. The proposed method achieves state-of-the-art results on 4 out of 5 benchmark datasets and ranks first among 14 semi-supervised learning methods via Friedman test, demonstrating that LLMs can serve as effective knowledge amplifiers in semi-supervised text classification.
  • 2025.10
    Semantic Label Drift in Cross-Cultural Translation
    arXiv preprint (submitted to LREC 2026)
    This work investigates semantic label drift that occurs during translation across culturally diverse languages. The study highlights how linguistic and cultural nuances can shift label meanings and proposes an alignment-aware evaluation framework to mitigate cross-cultural label inconsistencies.
  • 2023.06
    Multihop Factual Claim Verification Using Natural Language Prompts
    CanAI 2023
    This research aims to develop a strategy for claim verification using evidence sentences by employing prompt-based fine-tuning of state-of-the-art pre-trained language models. The study also focuses on designing effective language prompts for this task and investigates the increased complexity of claim validation when multiple evidence sentences are involved.
  • 2022.12
    Explainable artificial intelligence model for stroke prediction using EEG signal
    Sensors
    This study employs an explainable machine learning approach to predict stroke in patients using biomarker data derived from EEG signals.
  • 2022.12
    An efficient approach to automatic tag prediction from movie plot synopses using transformer-based language model
    ICCIT 2022
    This study aims to improve the prediction of movie tags from plot summaries by evaluating and comparing the performance of various models, including vanilla neural networks, LSTMs, and several pre-trained transformer-based language models.
  • 2022.12
    BanglaRQA: A Benchmark Dataset for Under-resourced Bangla Language Reading Comprehension-based Question Answering with Diverse Question-Answer Types
    EMNLP 2022
    This paper introduces a novel reading comprehension-based question-answer dataset containing 3000 Bangla Wikipedia context passages and 14,889 question-answer pairs. The experiments in this work also improve the performance of a pre-trained transformer model, as evidenced by higher EM (exact match) and F1 scores when compared to previous work on other comparable Bangla datasets.

Projects

  • 2021.09 - present
    Projects Page
    This will direct to the projects page of my personal website.

Volunteer

  • 2024 - 2025.12
    Organizing Committee Member
    BLP Workshop @ IJCNLP-AACL 2025
    Organizing the Bengali Code Generation Shared Task as part of the Bangla Language Processing Workshop