Educational Q&A Analysis System

Question classification system with GPT-4 fine-tuning

Educational Q&A Classification System about Korean & Math

Project Documentation

Overview

Led a comprehensive research and development project to solve the challenge of classifying Korean and Math questions from Sidaeinzae’s TA app. This project taught me that successful AI implementation requires not just model training, but thoughtful criteria design that bridges the gap between messy real-world data and clean computational models.

What I Built

  • 8-Category Classification Framework: Developed sophisticated question taxonomy (Q0-Q7) based on deep research into subject characteristics and student inquiry patterns
  • Custom GPT-4 Fine-tuning Pipeline: Implemented domain-specific model training for educational Q&A analysis
  • Interactive Streamlit Dashboard: Built real-time web application for live question analysis and visualization
  • Korean NLP Processing Engine: Integrated advanced Korean text processing with sentiment analysis capabilities

Key Learnings & Results

Through leading this project, I gained invaluable experience in:

  • Problem Definition: Transforming an “impossible” classification task (Korean literature) into a solvable framework
  • Research Leadership: Conducting deep domain research into subject characteristics, assessment patterns, and student behavior
  • Data Strategy: Developing labeling criteria that accurately model real-world complexity
  • Model Development: Fine-tuning GPT-4 with custom training datasets
  • Full-stack Development: Building production-ready web applications with Streamlit and Plotly

Breakthrough Results:

  • Solved previously “impossible” Korean literature classification that was in abandoned state
  • Achieved 80% accuracy on complex educational question categorization
  • Processed hundreds of thousands of question samples with systematic labeling approach
  • Deployed production system with real-time processing capabilities

Technologies Explored

  • AI/ML Stack: OpenAI GPT-4 fine-tuning
  • Web Development: Streamlit, real-time data processing
  • Visualization: Plotly interactive dashboards, statistical analysis
  • Data Engineering: Pandas, large-scale dataset processing

What This Project Taught Me

This experience fundamentally changed my approach to NLP and data science. I learned that the most sophisticated models are only as good as the criteria used to define the problem. The challenge wasn’t just technical implementation, but understanding the domain deeply enough to create meaningful categories that capture real-world complexity. This project reinforced my belief that successful AI solutions require both technical expertise and deep domain understanding – exactly the kind of interdisciplinary thinking I want to continue developing.