AI learning guide

Best AI resources for evals and reliability

Learn test sets, traces, prompt regression tests, and quality measurement.

Quick answer

Best first move

Pick one educator from the table, watch one matched video, then open one hands-on resource. The aim is to leave with a working habit, project, or mental model tied to evals.

How to shortlist

Compare audience, level, format, and starting point. Favor a source that matches your current role and gives you exercises, examples, docs, or code you can use immediately.

What to avoid

Avoid collecting dozens of generic AI tips. Use this guide to choose a narrow learning loop: one topic, one educator, one video, one resource, one application in your own work.

Videos to watch

LLM evaluation with W&B

Weights & Biases

AI evals with Phoenix

Arize AI

Promptfoo red teaming

Promptfoo

Educators and sources

Educator / source Best for Skills Start with
Builders shipping LLM systems Evals, RAG, LLM product quality Read the evals guide and build a small test set for your own app.
Engineers, PMs, AI product teams Evals, LLM reliability, Product quality Review the course outcomes and pair it with a real feature you can evaluate.
Developers evaluating and deploying LLM apps LLM apps, Evals, Experiment tracking, MLOps Take Building LLM-powered apps, then the evaluation material.
AI engineers and ML teams Observability, Evals, Tracing, RAG debugging Try Phoenix tracing on a small RAG or agent app.
Teams shipping LLM applications Observability, Prompt management, Evals, Tracing Instrument a toy app with traces, then add scores and eval datasets.
Product teams building with LLMs Prompt management, Evals, Workflow design Read guides on parameters, prompt management, and eval workflows.
Teams iterating on prompts and LLM products Prompt management, Evals, LLM workflows Read their evals and prompt-management writing.
Developers testing prompts and LLM apps Prompt testing, Evals, Red teaming Create a promptfoo eval file for one workflow you already use.
Professionals looking for cohort-based AI courses AI product, AI leadership, AI workflows, Evals Filter by role and check instructor outcomes before buying a course.

Resources

LLM Evals

Guide · Hamel Husain · Intermediate

Your AI app needs quality checks before users see it.

Phoenix by Arize

Open source tool and docs · Arize AI · Intermediate

You need to trace, inspect, and evaluate LLM app behavior.

Langfuse Docs

Docs and cookbooks · Langfuse · Intermediate

You need production LLM tracing, scoring, and prompt operations.

Promptfoo Intro

Open source docs · Promptfoo · Intermediate

You need regression tests for prompts, models, and LLM outputs.

AI Evals for Engineers & PMs

Cohort course · Hamel Husain and Shreya Shankar · Intermediate

You are shipping AI features and need a serious evaluation workflow.

Hamel's AI evals guides

Guides · Hamel Husain · Intermediate to advanced

Use this when you want Hamel Husain's material for evals and related AI skills.

AI Evals for Engineers and PMs

Course · Shreya Shankar · Intermediate

Use this when you want Shreya Shankar's material for evals and related AI skills.

W&B Courses

Free courses · Weights & Biases · Intermediate

Use this when you want Weights & Biases's material for llm apps and related AI skills.

Phoenix

Docs · Arize AI · Intermediate

Use this when you want Arize AI's material for observability and related AI skills.

Langfuse Docs

Docs · Langfuse · Intermediate

Use this when you want Langfuse's material for observability and related AI skills.

Vellum Guides

Guides · Vellum · Beginner to intermediate

Use this when you want Vellum's material for prompt management and related AI skills.

Humanloop Blog and Docs

Blog · Humanloop · Intermediate

Use this when you want Humanloop's material for prompt management and related AI skills.

Promptfoo Docs

Docs · Promptfoo · Intermediate

Use this when you want Promptfoo's material for prompt testing and related AI skills.

Maven AI courses

Cohort courses · Maven AI courses · Beginner to advanced

Use this when you want Maven AI courses's material for ai product and related AI skills.

Questions this guide answers

How do I evaluate an AI feature?

Use the educator table, videos, and resources above to compare options by topic fit, depth, and format. A good choice gives you a concrete next step: a course module, code example, video walkthrough, project, or workflow you can try today.

Which resources teach LLM evals for engineers and PMs?

Use the educator table, videos, and resources above to compare options by topic fit, depth, and format. A good choice gives you a concrete next step: a course module, code example, video walkthrough, project, or workflow you can try today.

What should I measure before users see an AI workflow?

Use the educator table, videos, and resources above to compare options by topic fit, depth, and format. A good choice gives you a concrete next step: a course module, code example, video walkthrough, project, or workflow you can try today.