How It Works
Go from zero to production-grade evaluation in minutes. Four simple steps to rigorous LLM testing.
Connect Your Model
Point YetixAI at any LLM — OpenAI, Anthropic, open-source, or your own fine-tuned model. Just provide an API endpoint and we handle the rest. Supports streaming, batch, and async inference modes.
- One-line SDK integration
- Support for all major providers
- Custom model endpoints via REST
- Automatic rate limiting and retries
from yetixai import YetixClient
client = YetixClient(api_key="your-key")
# Register your model
client.models.add(
name="my-gpt4",
provider="openai",
model="gpt-4o"
)Configure Eval Suites
Choose from built-in evaluation templates or define custom test suites with your own datasets, metrics, and scoring rubrics. Start with pre-built templates for common tasks or build from scratch.
- Pre-built templates for QA, summarization, RAG
- Custom metrics with Python or YAML
- Dataset upload via CSV, JSON, or API
- LLM-as-judge configuration
# Use a built-in suite
suite = client.suites.get("hallucination-v2")
# Or define your own
suite = client.suites.create(
name="my-qa-tests",
dataset="./test_cases.json",
metrics=["accuracy", "relevance"],
threshold=90
)Run Evaluations
Execute evaluation runs on demand, on a schedule, or triggered by CI/CD events. Test across thousands of prompts in parallel with detailed per-case results.
- Parallel execution across test cases
- Scheduled and event-triggered runs
- Real-time progress monitoring
- Automatic result caching
# Run evaluation
results = client.evaluate(
model="my-gpt4",
suite="my-qa-tests"
)
# Stream progress
for update in results.stream():
print(f"{update.completed}/{update.total}")Analyze & Improve
Review dashboards, drill into failures, compare model versions, and track quality trends over time. Export reports, set alerts, and ship better models faster.
- Interactive failure analysis
- Model version comparison
- Trend charts and regression alerts
- Export to PDF, CSV, or API
# Check results
print(f"Score: {results.score}%")
print(f"Passed: {results.passed}/{results.total}")
# Fail CI on regression
assert results.score > 90
# Export report
results.export("report.pdf")