About This Project

A comprehensive LLM response quality analyzer built for the GenAI Labs Challenge. Understand how temperature and top_p parameters affect response characteristics through data-driven metrics.

Quality Metrics Explained

Coherence Score

Measures logical flow and topic consistency within the response

0-100

Algorithm:

Analyzes sentence transitions and word overlap between consecutive sentences
Detects transition words and phrases (however, therefore, furthermore, etc.)
Calculates semantic similarity using word overlap ratios
Awards bonus points for consistent paragraph structure

Interpretation:

80-100

Excellent logical flow with strong connections between ideas

60-79

Good coherence with mostly clear transitions

0-59

Poor coherence; ideas seem disconnected or jumpy

Completeness Score

Measures how well the response addresses all aspects of the prompt

0-100

Algorithm:

Extracts key terms from the prompt (nouns, verbs, important concepts)
Checks if response addresses these key terms
Looks for common response patterns (examples, lists, explanations)
Measures depth through response patterns and structure

Interpretation:

80-100

Thoroughly addresses all aspects of the prompt

60-79

Covers most key points from the prompt

0-59

Incomplete response; many prompt aspects not addressed

Readability Score

Measures how easy the text is to read and understand

0-100

Algorithm:

Uses Flesch Reading Ease formula (industry-standard readability metric)
Measures sentence length variance and complexity
Evaluates vocabulary complexity through syllable counting
Penalizes overly long sentences (> 40 words)

Interpretation:

80-100

Very easy to read and understand

60-79

Moderately easy to read; appropriate complexity

0-59

Hard to read; overly complex or poorly structured

Length Appropriateness

Evaluates if the response length matches the prompt requirements

0-100

Algorithm:

Estimates expected response length based on prompt complexity
Analyzes prompt type (question, explanation, list, etc.)
Adjusts expectations for different question types
Penalizes responses that are too short (incomplete) or too verbose

Interpretation:

80-100

Optimal length for the given prompt

60-79

Reasonable length; slightly too short or long

0-59

Significantly too short or excessively verbose

Structural Quality

Measures formatting, organization, and presentation quality

0-100

Algorithm:

Checks for proper paragraph breaks and spacing
Validates list formatting and consistency
Looks for code blocks, headers, and structural elements
Evaluates punctuation consistency and balanced syntax

Interpretation:

80-100

Well-formatted with clear structure

60-79

Good structure with minor formatting issues

0-59

Poor structure; lacks proper formatting

Key Features

Everything you need to analyze and compare LLM responses

Multiple parameter configurations per experiment

Real-time response generation with OpenAI API

Comprehensive quality metrics (5 distinct measures)

Interactive data visualization (radar & bar charts)

Side-by-side response comparison

Export functionality (JSON & CSV formats)

Experiment history and management

Persistent data storage with SQLite

Responsive design for all devices

Dark mode support

Technology Stack

Built with modern, production-ready technologies

Frontend

Next.js 16

React 19

TypeScript

Tailwind CSS 4

State Management

TanStack Query v5

UI Components

Radix UI

Framer Motion

Recharts

Backend

Next.js API Routes

SQLite (better-sqlite3)

LLM Integration

OpenAI API

Custom Mock Service

Data Export

Papa Parse (CSV)

Native JSON

GenAI Labs Challenge 2025

A full-stack demonstration of LLM parameter analysis and quality metrics

This application was developed as part of the GenAI Labs Challenge to demonstrate expertise in full-stack development, LLM integration, data analysis, and UI/UX design. The project showcases the ability to build production-ready applications that solve real-world problems in the AI space.

View Source Connect