About This Project

A comprehensive LLM response quality analyzer built for the GenAI Labs Challenge. Understand how temperature and top_p parameters affect response characteristics through data-driven metrics.

Quality Metrics Explained

Coherence Score

Measures logical flow and topic consistency within the response

0-100

Algorithm:

  • Analyzes sentence transitions and word overlap between consecutive sentences
  • Detects transition words and phrases (however, therefore, furthermore, etc.)
  • Calculates semantic similarity using word overlap ratios
  • Awards bonus points for consistent paragraph structure

Interpretation:

80-100
Excellent logical flow with strong connections between ideas
60-79
Good coherence with mostly clear transitions
0-59
Poor coherence; ideas seem disconnected or jumpy

Completeness Score

Measures how well the response addresses all aspects of the prompt

0-100

Algorithm:

  • Extracts key terms from the prompt (nouns, verbs, important concepts)
  • Checks if response addresses these key terms
  • Looks for common response patterns (examples, lists, explanations)
  • Measures depth through response patterns and structure

Interpretation:

80-100
Thoroughly addresses all aspects of the prompt
60-79
Covers most key points from the prompt
0-59
Incomplete response; many prompt aspects not addressed

Readability Score

Measures how easy the text is to read and understand

0-100

Algorithm:

  • Uses Flesch Reading Ease formula (industry-standard readability metric)
  • Measures sentence length variance and complexity
  • Evaluates vocabulary complexity through syllable counting
  • Penalizes overly long sentences (> 40 words)

Interpretation:

80-100
Very easy to read and understand
60-79
Moderately easy to read; appropriate complexity
0-59
Hard to read; overly complex or poorly structured

Length Appropriateness

Evaluates if the response length matches the prompt requirements

0-100

Algorithm:

  • Estimates expected response length based on prompt complexity
  • Analyzes prompt type (question, explanation, list, etc.)
  • Adjusts expectations for different question types
  • Penalizes responses that are too short (incomplete) or too verbose

Interpretation:

80-100
Optimal length for the given prompt
60-79
Reasonable length; slightly too short or long
0-59
Significantly too short or excessively verbose

Structural Quality

Measures formatting, organization, and presentation quality

0-100

Algorithm:

  • Checks for proper paragraph breaks and spacing
  • Validates list formatting and consistency
  • Looks for code blocks, headers, and structural elements
  • Evaluates punctuation consistency and balanced syntax

Interpretation:

80-100
Well-formatted with clear structure
60-79
Good structure with minor formatting issues
0-59
Poor structure; lacks proper formatting

Key Features

Everything you need to analyze and compare LLM responses

Multiple parameter configurations per experiment
Real-time response generation with OpenAI API
Comprehensive quality metrics (5 distinct measures)
Interactive data visualization (radar & bar charts)
Side-by-side response comparison
Export functionality (JSON & CSV formats)
Experiment history and management
Persistent data storage with SQLite
Responsive design for all devices
Dark mode support

Technology Stack

Built with modern, production-ready technologies

Frontend

Next.js 16
React 19
TypeScript
Tailwind CSS 4

State Management

TanStack Query v5

UI Components

Radix UI
Framer Motion
Recharts

Backend

Next.js API Routes
SQLite (better-sqlite3)

LLM Integration

OpenAI API
Custom Mock Service

Data Export

Papa Parse (CSV)
Native JSON

GenAI Labs Challenge 2025

A full-stack demonstration of LLM parameter analysis and quality metrics

This application was developed as part of the GenAI Labs Challenge to demonstrate expertise in full-stack development, LLM integration, data analysis, and UI/UX design. The project showcases the ability to build production-ready applications that solve real-world problems in the AI space.