Choose this when
Your AI feature already has users, stakeholders, or enough risk that mistakes matter.
AI learning path
Stop judging AI quality by vibes and start building repeatable checks.
Your AI feature already has users, stakeholders, or enough risk that mistakes matter.
You can define task examples, expected behavior, graders, traces, regressions, and review workflows.
Move on when quality discussions point to examples and metrics, not taste.
Do
This is the route through the topic. Watch and open the material inside the step where it is used.
Step 1
Turn real user tasks, edge cases, and failures into a small eval set.
Watch here
Weights & Biases
Introduces evaluation workflows and measurement for LLM apps.
Open here
Guide · Hamel Husain · Intermediate
Your AI app needs quality checks before users see it.
Open resourceStep 2
Combine exact checks, human review, model grading, and trace inspection.
Watch here
Arize AI
Use this when moving from examples to traces and debugging.
Open here
Cohort course · Hamel Husain and Shreya Shankar · Intermediate
You are shipping AI features and need a serious evaluation workflow.
Open resourceStep 3
Compare prompts, models, retrieval changes, and releases before users see them.
Watch here
Promptfoo
Regression testing and adversarial checks for prompt and model changes.
Open here
Guide · OpenAI · Intermediate
You need API-level guidance for testing outputs, comparing models, and catching regressions during upgrades.
Open resourceOpen source docs · Promptfoo · Intermediate
You need regression tests for prompts, models, and LLM outputs.
Open resourceCreate a 20-row eval set for one AI workflow and run two prompt versions against it.
Reference
Step 1
Guide · Hamel Husain · Intermediate
Your AI app needs quality checks before users see it.
Step 2
Cohort course · Hamel Husain and Shreya Shankar · Intermediate
You are shipping AI features and need a serious evaluation workflow.
Step 3
Guide · OpenAI · Intermediate
You need API-level guidance for testing outputs, comparing models, and catching regressions during upgrades.
Step 3
Open source docs · Promptfoo · Intermediate
You need regression tests for prompts, models, and LLM outputs.
Intermediate to advanced
Read the evals guide and build a small test set for your own app.
View educator
Intermediate
Review the course outcomes and pair it with a real feature you can evaluate.
View educator
Intermediate to advanced
Use the book page and related essays as a production engineering path.
View educatorBeginner to intermediate
Read the public notes and examples before deciding whether the paid material matches your business.
View educator
Intermediate
Review the Maven syllabus and compare it to your current product workflow.
View educator
Beginner to intermediate
Browse the How I AI interviews and copy the workflows that match your role.
View educator