Educational Q&A Analysis System
Question classification system with GPT-4 fine-tuning
Educational Q&A Classification System about Korean & Math
Project Documentation
Overview
Led a comprehensive research and development project to solve the challenge of classifying Korean and Math questions from Sidaeinzae’s TA app. This project taught me that successful AI implementation requires not just model training, but thoughtful criteria design that bridges the gap between messy real-world data and clean computational models.
What I Built
- 8-Category Classification Framework: Developed sophisticated question taxonomy (Q0-Q7) based on deep research into subject characteristics and student inquiry patterns
- Custom GPT-4 Fine-tuning Pipeline: Implemented domain-specific model training for educational Q&A analysis
- Interactive Streamlit Dashboard: Built real-time web application for live question analysis and visualization
- Korean NLP Processing Engine: Integrated advanced Korean text processing with sentiment analysis capabilities
Key Learnings & Results
Through leading this project, I gained invaluable experience in:
- Problem Definition: Transforming an “impossible” classification task (Korean literature) into a solvable framework
- Research Leadership: Conducting deep domain research into subject characteristics, assessment patterns, and student behavior
- Data Strategy: Developing labeling criteria that accurately model real-world complexity
- Model Development: Fine-tuning GPT-4 with custom training datasets
- Full-stack Development: Building production-ready web applications with Streamlit and Plotly
Breakthrough Results:
- Solved previously “impossible” Korean literature classification that was in abandoned state
- Achieved 80% accuracy on complex educational question categorization
- Processed hundreds of thousands of question samples with systematic labeling approach
- Deployed production system with real-time processing capabilities
Technologies Explored
- AI/ML Stack: OpenAI GPT-4 fine-tuning
- Web Development: Streamlit, real-time data processing
- Visualization: Plotly interactive dashboards, statistical analysis
- Data Engineering: Pandas, large-scale dataset processing
What This Project Taught Me
This experience fundamentally changed my approach to NLP and data science. I learned that the most sophisticated models are only as good as the criteria used to define the problem. The challenge wasn’t just technical implementation, but understanding the domain deeply enough to create meaningful categories that capture real-world complexity. This project reinforced my belief that successful AI solutions require both technical expertise and deep domain understanding – exactly the kind of interdisciplinary thinking I want to continue developing.