Research Experience
Remote Research Assistant
Supervisor: Dr. Laith H. Baniata
Research Professor, Gachon University, South Korea
Email: laith@gachon.ac.kr
June 2024 - Present
1. Towards Robust Chain-of-Thought Prompting with Self-Consistency for Remote Sensing VQA: An Empirical Study Across Large Multimodal Models
- ■ Developed Zero-GeoVision, a task-specific framework that applies zero-shot prompting to draw direct answers from the pretrained knowledge of four large multimodal models (GPT-4o, Grok 3, Gemini 2.5 Pro, and Claude 3.7 Sonnet), serving as a baseline for Remote Sensing Visual Question Answering (RSVQA).
- ■ Designed CoT-GeoReason, a framework that employs chain-of-thought prompting and guides models step by step through feature detection, spatial analysis, and answer synthesis to improve reasoning transparency, using a structured prompt template that explicitly defines the reasoning steps, expected outputs, and intermediate checks for each question.
- ■ Implemented Self-GeoSense, a framework that extends CoT-GeoReason by incorporating self-consistency, generating five independent reasoning chains per question and combining their outputs through majority voting to improve robustness against ambiguous or complex inputs.
- ■ Proposed Geo-Judge, a two-stage evaluation framework where in Stage 1 (a GPT-4o-mini-based LMM judge assesses reasoning coherence and answer correctness) and in Stage 2 (blinded human experts independently review the LMM's reasoning and answers to provide unbiased validation through careful reassessment).
- ■ Spearheaded the research, supported by the National Research Foundation of Korea (Grant No. NRF-2022R1A2C1005316) , funded by the Ministry of Science and ICT.
2. Analyzing Diagnostic Reasoning of Vision-Language Models via Zero-Shot Chain-of-Thought Prompting in Medical Visual Question Answering
- ■ Developed and evaluated a zero-shot learning (ZSL) framework for medical visual question answering (MedVQA), enabling large vision–language models (Gemini 2.5 Pro, Claude 3.5, Sonnet, and GPT-4o Mini) to answer diagnostic questions without fine-tuning on the PMC-VQA benchmark.
- ■ Implemented a zero-shot chain-of-thought (Zero-CoT) reasoning approach to guide models through step-by-step logical reasoning, improving interpretability and prediction accuracy in complex medical image analysis tasks such as X-rays and MRI scans.
- ■ Investigated research supported by the National Institute of Health (NIH) project in South Korea (Project No. 2024ER080300), funded by the Basic Science Research Program through the National Research Foundation of Korea (NRF) under the grant NRF-2022R1A2C1005316.
3. Investigating the Predominance of Large Language Models in Low-Resource Bangla Language Over Transformer Models for Hate Speech Detection: A Comparative Analysis
- ■ Applied Zero-Shot Learning to large language models (GPT-3.5 Turbo and Gemini 1.5 Pro) to differentiate harmful speech from benign expressions using multiple low-resource Bengali hate speech datasets.
- ■ Integrated few-shot prompting approaches (5-shot, 10-shot, and 15-shot) in Bangla hate speech detection involve providing the model with a limited number of example texts labeled as hate speech or non-hate speech to guide its predictions on subsequent texts.
- ■ Conducted research supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT under the grant NRF-2022R1A2C1005316.
4. SentimentFormer: A Transformer-Based Multi-Modal Fusion Framework for Enhanced Sentiment Analysis of Memes in the Under-Resourced Bangla Language
- ■ Implemented SentimentTextFormer, a text-based model built upon mBERT, fine-tuned to capture nuanced linguistic features and extract sentiment-related insights from Bengali text.
- ■ Developed SentimentImageFormer, an image-based transformer model using Swin Transformer with hierarchical windowed attention to classify sentiment from visual content.
- ■ Designed SentimentFormer, a hybrid model that integrates both text and image modalities by fusing SwiftFormer's visual features with mBERT's textual embeddings at an intermediate layer, enhancing cross-modal interactions through intermediate fusion.
- ■ Carried out research supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT under the grant NRF-2022R1A2C1005316.
Professional Experience
AI Engineer II
Astha.IT
November 2025 - Present
Adaptive HR Assistant: Toby (Ongoing)
- ■ Implemented and deployed a Job Description Builder Agent with human-in-the-loop workflows that can create or enhance job information (descriptions, requirements, technical responsibilities) and allow HR to approve them.
- ■ Built a relational database system to store original job-required fields, original job information, and all versions of Agent-generated drafts to support HR in resuming or deleting work at any time.
- ■ Designed user-centric interfaces for HR to create or edit job postings, view comprehensive job dashboards, and track departmental comments on jobs they created.
- ■ Developed a departmental dashboard for team members to review HR-assigned jobs by accepting, rejecting, or providing feedback, and to view statistics on jobs posted, in progress, or completed.
- Tech Stack Used: Python, PostgreSQL, LangGraph, LangChain, OpenAI (GPT-4o), AWS ECR, App Runner, AWS EC2, FastAPI, React, TypeScript, Tailwind CSS
Senior Application Developer
Dexian (Bangladesh) Limited.
July 2025 - October 2025
Conversational AI Agent Platform for Large-Scale Document Interaction: ShareFlow Agent
- ■ Constructed and deployed a ReAct-based Agentic RAG system integrated with Microsoft SharePoint and user-uploaded document support, incorporating custom guardrails to enable users to build personalized agents for retrieving information from e-learning training materials, recruitment processes, and internal financial document resource pages.
- ■ Orchestrated a Multimodal OCR Agent with a custom toolset for autonomous selection and execution of text extraction, table parsing, and diagram interpretation across diverse unstructured sources (scanned PDFs, images, DOCX, flowcharts, tables, diagrams), including resilient fallback strategies and controlled retry mechanisms with rate limiting for robust, fault-tolerant, and accurate information retrieval.
- ■ Implemented session-based chat ensuring separate user-agent conversations with full history retention for context-aware interactions and a reset option to start anew.
- ■ Enabled users to input up to six leading questions to facilitate interactions, and automatically generated at least three questions based on the agent's instructions and description if the user provided none or fewer.
- ■ Integrated an update feature enabling users to modify existing agents by adding new files, editing leading questions, revising instructions and description, and deleting existing files from the agent's memory.
- ■ Designed an agent-sharing functionality enabling users to share agents publicly with all app users, with individual users, or privately with groups, with automated email notifications to keep collaborators informed.
- ■ Created a user interface to display all agents owned or shared with the user, with a feature to delete an entire agent along with its knowledge base and associated data from the database at any time.
- ■ Developed an Action Tracking system to log API usage, capturing trigger events, execution time, input queries, and generated responses in the database for monitoring and analytics.
- ■ Took leadership of the project while guiding and supporting junior application developers, promoting their technical growth and upholding high standards of code quality and overall team output.
- ■ Achieved 63% yearly operational cost reduction by optimizing custom agent usage for 80 Sales Managers handling 50+ interactions per day, replacing the existing SharePoint Agent (Microsoft 365 Copilot Agent).
- ■ Optimized RAG query accuracy by 96%, reduced token costs by 42%, and enabled users to upload or select up to 50 files per agent, surpassing Microsoft Copilot's 20-file limit.
- Tech Stack Used: Python, LlamaIndex, LangChain, Azure OpenAI (GPT-4.1, GPT-4o, text-embedding-3-large), Azure Bot Services, Azure SQL, Azure Functions, AlloyDB for PostgreSQL (pgvector), Azure App Service, React, FastAPI, Ragas
Application Developer
Dexian (Bangladesh) Limited.
May 2024 - July 2025
Organizational Intelligence Role Placement System: Org Info
- ■ Implemented a multimodal agent to extract organizational hierarchy from organograms using tree-of-thought (ToT) prompting, incorporating multipath reasoning and Breadth-First Search (BFS) to ensure accurate role placement, and storing the hierarchy in a relational database after cross-checking with existing data and mapping it to Bullhorn records.
- ■ Designed an LLM-based agent that converts natural language queries into optimized SQL (text-to-SQL) using chain-of-thought (CoT) with self-consistency prompting, retrieves relevant organizational data, and integrates the results into the OrgChart front-end framework for hierarchical visualization.
- ■ Engineered a dynamic LLM-based agentic RAG-guided chat interface called "OrgInfo Assistant" that allowed users to interact with specific organizational hierarchies using predefined query types, specialized roles, and goals, where user queries were first converted to SQL (text-to-SQL), executed on a temporary organization-specific database, and the retrieved results were converted to natural language (SQL-to-text).
- ■ Developed a 7-day summarization of organizational activities by extracting relevant notes on placements, submissions, and communication logs from the Bullhorn database for a targeted organization.
- ■ Set up scheduled jobs that fetched organizational data from the Bullhorn database to a SQL database every 7 days to update and add new data.
- ■ Deployed and optimized organizational hierarchy search for Account Managers by eliminating full Bullhorn database queries, reducing search time by 92%, and enabling faster access to relevant data.
- Tech Stack Used: Python, LangChain, LangGraph, Azure OpenAI (GPT-4.1, GPT-4o-Mini, text-embedding-3-small), Prompt Engineering, Azure SQL, Azure App Service, OpenCV, React, FastAPI, WebSocket, Docker
Next-Gen Proposal Insights Automation Engine: RFPMatcher
- ■ Architected a RAG solution utilizing a temporary vector database and Chain-of-Thought prompting with 12 specialized analytical prompts to systematically extract key information, such as client details, scope of work, deliverables, and submission timelines, from Request for Proposal (RFP) documents.
- ■ Orchestrated a comprehensive master database by processing historical RFP responses using 38+ predefined questions (23 deliverable-focused and 15 experience-based), incorporating metadata such as detailed generated answers, chunk summaries, manual classification of RFP responses (winning or losing), and question categories (Deliverables or Experience), while storing vector embeddings to enable semantic search capabilities.
- ■ Designed a Past Experience Matcher system that analyzes new RFP documents by dynamically extracting three key requirements (core client needs or expectations), generates both binary questions (Yes/No — whether the company has handled similar situations before) and descriptive questions (open-ended questions to explore how the company addressed similar challenges or implemented solutions) for each chunk, then queries a master database, performs semantic similarity matching to identify relevant past experiences, and returns comprehensive, context-rich responses with filename references for verification.
- ■ Developed a Table of Contents (TOC) generation pipeline that analyzes extracted RFP key information against predefined section libraries (10 standard and 12 non-standard sections), employing GPT-4 to intelligently select and prioritize relevant sections based on project requirements.
- ■ Implemented a section-based conversational AI platform that generates initial content for each TOC section using extracted key information and past experience data, enables iterative refinement through targeted chat interactions where users can request modifications, maintains conversation history in CouchDB, and exports the final refined content to professionally formatted Word documents for proposal submission.
- ■ Facilitated the sharing of extracted key information, and ensured that uploaded documents are saved to a specific SharePoint drive and made available to other users via automated email notifications, keeping the team informed and aligned for easy access and reference.
- ■ Reduced manual review time for 100+ pages by 3–5 days and accelerated decision-making through automated extraction and predictive insights, enabling Proposal Managers to focus on strategic bid development.
- Tech Stack Used: Python, LlamaIndex, Azure OpenAI (GPT-3.5 Turbo, GPT-4, text-embedding-3-large), Prompt Engineering, AlloyDB for PostgreSQL (pgvector), CouchDB, Azure App Service, React, FastAPI, DeepEval, Docker
Automated Presentation Takeaways Generator: CaseAligner
- ■ Designed and deployed an LLM-powered application that repurposes existing client-facing PowerPoint presentations for case studies, leveraging chain-of-thought prompting to transform them into new practice areas and industries, enabling Sales Managers to rapidly generate domain-specific demo presentations.
- ■ Implemented an interactive chat interface linked to separated slides, allowing users to query and modify specific slide content, with session-wise conversation history stored for reference.
- ■ Built an automated summarization pipeline to extract and structure key insights from case studies (filtered by user-specified industry and practice area), including client details, project objectives, technology stacks, challenges, benefits, implementation steps, timelines, and outcomes, and exported the summaries into downloadable Word documents for quick reference.
- ■ Developed comprehensive search functionality to locate information across all generated case studies, including references to specific slide numbers for precise navigation.
- ■ Architected an admin panel enabling authorized users to download existing presentations from the knowledge base, modify them as needed, and re-upload them to update the knowledge repository.
- ■ Added export functionality to download slides in the company's official presentation template.
- ■ Significantly accelerated demo preparation by reducing slide crafting time by approximately 90%, enabling Salespersons to focus more on client engagement and closing deals.
- Tech Stack Used: Python, LlamaIndex, Azure OpenAI (GPT-4o-Mini), React, FastAPI, Azure App Service
Smart Recruitment Analytics Tool: AgentDexi
- ■ Designed a RAG system to identify technological trends and track the top 20 most in-demand skills by extracting information from job descriptions scraped from external company postings.
- ■ Developed an automated data analysis pipeline with interactive graphs and charts to compare company-wise data, enabling Technical Recruiters to make data-driven talent acquisition decisions.
- ■ Orchestrated a web Research Agent that autonomously performs web searches using the Serper Google Search API and Tavily Search API to extract executive and company information from diverse sources, retrieving the top 10 ranked URLs with contextual preview summaries and filtering results based on relevance to support downstream data extraction and analysis tasks.
- ■ Implemented a Website Scraping Agent utilizing ScraperAPI to extract targeted data from discovered URLs, processing raw HTML/text content into structured business data, and built a custom RAG pipeline that filters contextually relevant information and generates structured outputs, including tables, organizational reports, and business intelligence summaries.
- ■ Built a Transcript Agentic RAG pipeline that ingests IT/tech video URLs, processes transcripts through semantic chunking, and produces structured, context-aware summaries to surface emerging technologies and in-demand skills.
- ■ Designed an automated data analysis pipeline with interactive graphs and charts that surface company-wise hiring patterns and role-specific demand, equipping 250+ Technical Recruiters with competitive insights to make informed, data-driven talent acquisition decisions.
- Tech Stack Used: Python, LangChain, Langfuse, CrewAI, Azure OpenAI (GPT-4o, text-embedding-3-snall), ChromaDB, JobSpy, React, FastAPI
Personalized Assistant for Internal Knowledge Discovery: KnowledgeEngine
- ■ Orchestrated an LLM-based RAG system employing chain-of-thought reasoning to identify relevant internal legal documents across multiple sources and generate accurate, context-aware answers by synthesizing retrieved information.
- ■ Engineered semantic chunking of documents by storing 3–4 QA pairs with summaries per chunk, enabling targeted searches within relevant sections, reducing search space by 60%, speeding up retrieval from long documents, and improving answer precision by 95%.
- ■ Managed a session-based chat interface that delivers responses with precise document page references and maintains conversation history, including reset options.
- ■ Generated suggestive questions from user-uploaded documents to guide users and help them explore related information effectively.
- ■ Developed a user interface enabling dynamic management of the RAG knowledge base, allowing users to delete or update documents in real time.
- ■ Applied a set of evaluation metrics (Context Precision, Context Recall, Response Relevancy, Faithfulness, and Factual Correctness) to measure overall RAG application performance on custom-created gold-standard datasets for key information extraction, for assessing retrieval accuracy and response quality.
- Tech Stack Used: Python, LangChain, Azure OpenAI (GPT-4o-Mini, text-embedding-3-small), AlloyDB for PostgreSQL, Azure App Service, React, FastAPI, Ragas, Docker
