Educational Q&A Analysis System

Educational Q&A Classification System about Korean & Math

Project Documentation

Overview

Led a comprehensive research and development project to solve the challenge of classifying Korean and Math questions from Sidaeinzae’s TA app. This project taught me that successful AI implementation requires not just model training, but thoughtful criteria design that bridges the gap between messy real-world data and clean computational models.

What I Built

8-Category Classification Framework: Developed sophisticated question taxonomy (Q0-Q7) based on deep research into subject characteristics and student inquiry patterns
Custom GPT-4 Fine-tuning Pipeline: Implemented domain-specific model training for educational Q&A analysis
Interactive Streamlit Dashboard: Built real-time web application for live question analysis and visualization
Korean NLP Processing Engine: Integrated advanced Korean text processing with sentiment analysis capabilities

Key Learnings & Results

Through leading this project, I gained invaluable experience in:

Problem Definition: Transforming an “impossible” classification task (Korean literature) into a solvable framework
Research Leadership: Conducting deep domain research into subject characteristics, assessment patterns, and student behavior
Data Strategy: Developing labeling criteria that accurately model real-world complexity
Model Development: Fine-tuning GPT-4 with custom training datasets
Full-stack Development: Building production-ready web applications with Streamlit and Plotly

Breakthrough Results:

Solved previously “impossible” Korean literature classification that was in abandoned state
Achieved 80% accuracy on complex educational question categorization
Processed hundreds of thousands of question samples with systematic labeling approach
Deployed production system with real-time processing capabilities

Technologies Explored

AI/ML Stack: OpenAI GPT-4 fine-tuning
Web Development: Streamlit, real-time data processing
Visualization: Plotly interactive dashboards, statistical analysis
Data Engineering: Pandas, large-scale dataset processing

What This Project Taught Me

This experience fundamentally changed my approach to NLP and data science. I learned that the most sophisticated models are only as good as the criteria used to define the problem. The challenge wasn’t just technical implementation, but understanding the domain deeply enough to create meaningful categories that capture real-world complexity. This project reinforced my belief that successful AI solutions require both technical expertise and deep domain understanding – exactly the kind of interdisciplinary thinking I want to continue developing.