AI learning guide

Best AI resources for evals and reliability

Learn test sets, traces, prompt regression tests, and quality measurement.

Best practical eval writing guide: LLM Evals. Hamel Husain's guide to writing useful AI evaluations. Start here if you need to measure quality instead of arguing from anecdotes.

Best agent-specific course: Evaluating AI Agents. DeepLearning.AI course focused on testing and improving multi-step agent workflows. Use it when the thing being evaluated calls tools or takes several steps.

Best tracing tool path: Phoenix by Arize. Open-source tracing and evaluation tooling from Arize AI. Use it when you need to inspect what happened during an LLM or agent run.

Evals are how you stop guessing

If an AI feature matters, someone must define what good output looks like. Evals turn that judgement into examples, criteria, traces, graders, and regression checks. Without them, teams end up arguing from anecdotes.

Hamel Husain's LLM Evals material is the best general starting point. Evaluating AI Agents is better when the workflow has multiple steps. Phoenix helps when you need to inspect traces rather than only score final text.

Measure the workflow, not only the answer

For agents, RAG, coding tools, and research workflows, the final answer is not enough. You also need to check source use, tool calls, intermediate decisions, refusals, clarification behavior, and whether the system stopped at the right time.

A good eval resource should help you build small representative datasets before the feature ships. Waiting until after launch usually turns evaluation into damage control.

Recommended courses and resources

AI SDK v6 Crash Course

Workshop · Matt Pocock · Intermediate

You want a structured AI SDK v6 course that covers model choice, text and object generation, UI streams, agents, persistence, context engineering, evals, and advanced app patterns.
The AI Engineer Roadmap

Free tutorial · Matt Pocock · Beginner to intermediate

You want a guided path through core AI concepts, model selection, the AI engineering mindset, evals, and techniques for improving LLM-powered apps.
LLM Evals

Guide · Hamel Husain · Intermediate

Your AI app needs quality checks before users see it.
Evaluating AI Agents

Short course · DeepLearning.AI · Intermediate

You need to test, trace, and improve agent workflows instead of judging only single LLM responses.
Building and Evaluating Advanced RAG Applications

Short course · DeepLearning.AI · Intermediate

You already know basic RAG and need better retrieval, evaluation, and production-quality patterns.

Roll a learning mission

Pick one small move from this guide instead of opening ten tabs.

Open mission

About this guide

Author: Learnetto Editorial Team. Learnetto maintains this AI learning directory by organizing public course pages, official documentation, educator material, and practical learning resources.

How it is made: Learnetto uses public course pages, official documentation, educator material, and directory data to compile these recommendations. AI may help draft and organize the page, but recommendations are checked against the listed sources, page topic, and learner intent.

Review policy: We only add a named personal reviewer when that person has substantially reviewed the page. Until then, the page is attributed to Learnetto rather than a founder, editor, or individual expert.

Last updated: July 21, 2026. Suggest a correction if a course, doc, or recommendation is outdated.

Videos to watch

►

LLM evaluation with W&B

Weights & Biases

►

AI evals with Phoenix

Arize AI

►

Promptfoo red teaming

Promptfoo

Educators and sources

Educator / source	Best for	Skills	Start with
Hamel Husain Hamel's AI evals guides	Builders shipping LLM systems	Evals, RAG, LLM product quality	Read the evals guide and build a small test set for your own app.
Shreya Shankar AI Evals for Engineers and PMs	Engineers, PMs, AI product teams	Evals, LLM reliability, Product quality	Review the course outcomes and pair it with a real feature you can evaluate.
Matt Pocock AI Hero	Developers and self-directed learners building with AI coding agents	AI coding, Claude Skills, Agentic workflows, AI SDK, MCP, LLM fundamentals, Personalized learning	Use LLM Fundamentals or the AI Engineer Roadmap if you need concepts, the Vercel AI SDK Tutorial or AI SDK v6 Crash Course if you want to build apps, and the AI Skills catalog if you want practical agent workflows like /teach, /grill-me, /tdd, and /triage.
Agentic AI for Product Managers Hamza Farooq on Maven	Product managers, AI product leaders, founders	Agentic AI, AI product strategy, Evals, Production AI	Use the course to evaluate one AI product opportunity and define what reliability would mean before implementation.

Best AI resources for evals and reliability

Evals are how you stop guessing

Measure the workflow, not only the answer

Recommended courses and resources

AI SDK v6 Crash Course

The AI Engineer Roadmap

LLM Evals

Evaluating AI Agents

Building and Evaluating Advanced RAG Applications

Roll a learning mission

About this guide

Videos to watch

LLM evaluation with W&B

AI evals with Phoenix

Promptfoo red teaming

Educators and sources

Resources

AI SDK v6 Crash Course

The AI Engineer Roadmap

LLM Evals

Evaluating AI Agents

Building and Evaluating Advanced RAG Applications

OpenAI eval design guide

OpenAI evals quickstart and datasets

OpenAI all models

OpenAI Working with evals

OpenAI Evaluate agent workflows

OpenAI model optimization

AI Tinkerers One-Shot videos

W&B LLM Evaluation Course

Phoenix by Arize

Langfuse Docs

Promptfoo Intro