cv/resume | Mezbaur Rahman

Basics

Name	Mezbaur Rahman
Languages	English (Fluent), Bengali (Native), Hindi (Conversational)
Email	mezbaur00797@gmail.com

Work

2023.08 - Present
Research Assistant

University of Illinois Chicago

Working in the dl4nlpspace Lab under Professor Cornelia Caragea. Research focuses on NLP problems, semi-supervised learning, and learning from noisy labels using large language models.
- Developing novel methods for semi supervised text classification with LLMs
- Implementing and evaluating methods for learning with noisy labels using LLMs
2020.01 - 2023.07
Lecturer

Islamic University of Technology

Taught undergraduate courses in Computer Science and conducted research in natural language processing and machine learning.
- Taught courses in Data Structures, Algorithms, and Programming
- Supervised undergraduate thesis projects
- Conducted research and published papers

Education

2023.08 - Current

Chicago, Illinois
Doctor of Philosophy

University of Illinois Chicago

Computer Science

CGPA (Current): 4.00/4.00
- Advisor: Cornelia Caragea
- Research Area: Natural Language Processing, Machine Learning
2016.01 - 2019.10

Dhaka, Bangladesh
Bachelor of Science

Islamic University of Technology

Computer Science and Engineering

CGPA: 3.86/4.00
- Thesis Advisor: Abu Raihan Mostafa Kamal
- Graduated with First Class Honors
- Ranked in top 10% of graduating class

Awards

2019.10

Graduated with First Class Honors

Islamic University of Technology

Skills

	NLP & ML Frameworks
	PyTorch
	Pandas
	Hugging Face
	vLLM

	Programming, Tools & Systems
	Python
	C++
	SQL
	Git
	Docker
	Kubernetes

Interests

	Research Interests
	Natural Language Processing
	Semi-supervised Learning
	Noisy Label Learning
	Large Language Models

	Hobbies
	Photography
	Traveling
	Reading
	Tennis

Publications

2025.11

LLM-Guided Co-Training for Text Classification

EMNLP 2025 (Main Conference)

This paper introduces a novel weighted co-training framework guided by Large Language Models (LLMs), where two encoder-only networks iteratively train each other using dynamically assigned sample weights based on confidence in LLM-generated pseudo-labels. The proposed method achieves state-of-the-art results on 4 out of 5 benchmark datasets and ranks first among 14 semi-supervised learning methods via Friedman test, demonstrating that LLMs can serve as effective knowledge amplifiers in semi-supervised text classification.
2025.10

Semantic Label Drift in Cross-Cultural Translation

arXiv preprint (submitted to LREC 2026)

This work investigates semantic label drift that occurs during translation across culturally diverse languages. The study highlights how linguistic and cultural nuances can shift label meanings and proposes an alignment-aware evaluation framework to mitigate cross-cultural label inconsistencies.
2023.06

Multihop Factual Claim Verification Using Natural Language Prompts

CanAI 2023

This research aims to develop a strategy for claim verification using evidence sentences by employing prompt-based fine-tuning of state-of-the-art pre-trained language models. The study also focuses on designing effective language prompts for this task and investigates the increased complexity of claim validation when multiple evidence sentences are involved.
2022.12

Explainable artificial intelligence model for stroke prediction using EEG signal

Sensors

This study employs an explainable machine learning approach to predict stroke in patients using biomarker data derived from EEG signals.
2022.12

An efficient approach to automatic tag prediction from movie plot synopses using transformer-based language model

ICCIT 2022

This study aims to improve the prediction of movie tags from plot summaries by evaluating and comparing the performance of various models, including vanilla neural networks, LSTMs, and several pre-trained transformer-based language models.
2022.12

BanglaRQA: A Benchmark Dataset for Under-resourced Bangla Language Reading Comprehension-based Question Answering with Diverse Question-Answer Types

EMNLP 2022

This paper introduces a novel reading comprehension-based question-answer dataset containing 3000 Bangla Wikipedia context passages and 14,889 question-answer pairs. The experiments in this work also improve the performance of a pre-trained transformer model, as evidenced by higher EM (exact match) and F1 scores when compared to previous work on other comparable Bangla datasets.

Projects

2021.09 - present
Projects Page

This will direct to the projects page of my personal website.

Volunteer

2024 - 2025.12
Organizing Committee Member

BLP Workshop @ IJCNLP-AACL 2025

Organizing the Bengali Code Generation Shared Task as part of the Bangla Language Processing Workshop

Basics

Work

University of Illinois Chicago

Working in the dl4nlpspace Lab under Professor Cornelia Caragea. Research focuses on NLP problems, semi-supervised learning, and learning from noisy labels using large language models.

Islamic University of Technology

Taught undergraduate courses in Computer Science and conducted research in natural language processing and machine learning.

Education

University of Illinois Chicago

Computer Science

CGPA (Current): 4.00/4.00

Islamic University of Technology

Computer Science and Engineering

CGPA: 3.86/4.00

Awards

Islamic University of Technology

Skills

Interests

Publications

EMNLP 2025 (Main Conference)

arXiv preprint (submitted to LREC 2026)

This work investigates semantic label drift that occurs during translation across culturally diverse languages. The study highlights how linguistic and cultural nuances can shift label meanings and proposes an alignment-aware evaluation framework to mitigate cross-cultural label inconsistencies.

CanAI 2023

Sensors

This study employs an explainable machine learning approach to predict stroke in patients using biomarker data derived from EEG signals.

ICCIT 2022

This study aims to improve the prediction of movie tags from plot summaries by evaluating and comparing the performance of various models, including vanilla neural networks, LSTMs, and several pre-trained transformer-based language models.

EMNLP 2022

Projects

This will direct to the projects page of my personal website.

Volunteer

BLP Workshop @ IJCNLP-AACL 2025

Organizing the Bengali Code Generation Shared Task as part of the Bangla Language Processing Workshop