W&B LLM Evaluation Course
Educator video
Weights & Biases · evals, llm apps, observability
AI education source
W&B Courses
Good for builders who need to measure, debug, and improve LLM apps rather than just demo them.
Start with: Take Building LLM-powered apps, then the evaluation material.
Educator videos are listed first. Similar videos are labelled and included when they cover the same skills or adjacent topics.
Educator video
Weights & Biases · evals, llm apps, observability
Similar video
Arize AI · evals, observability, tracing, rag debugging
Similar video
Promptfoo · evals, prompt testing, red teaming, security
Similar video
Chip Huyen · ai engineering, production, systems, mlops
Similar video
Full Stack Deep Learning · mlops, deployment, product ml, production
Similar video
MLOps Community · mlops, production ml, ai systems, deployment
Developers evaluating and deploying LLM apps should start here when they need llm apps, evals, experiment tracking, and mlops. The strongest fit is a learner who wants material in these formats: free courses, guides, examples.
Take Building LLM-powered apps, then the evaluation material. After that, open one related resource below and write down the exact workflow, concept, or implementation pattern you want to apply.
Good for builders who need to measure, debug, and improve LLM apps rather than just demo them. Use this profile when you are comparing educators by topic, level, format, and practical usefulness rather than browsing random AI content.
Compare the skill coverage, the starting recommendation, and the related videos. If you need llm apps, search the directory for that skill and shortlist three profiles before committing to a course, book, or playlist.
| Resource | Kind | Level | Use when |
|---|---|---|---|
|
LLM Evals
Hamel Husain
|
Guide | Intermediate | Your AI app needs quality checks before users see it. |
|
W&B LLM Evaluation Course
Weights & Biases
|
Free course | Intermediate | You need to debug and measure LLM app quality. |
|
Phoenix by Arize
Arize AI
|
Open source tool and docs | Intermediate | You need to trace, inspect, and evaluate LLM app behavior. |
|
Promptfoo Intro
Promptfoo
|
Open source docs | Intermediate | You need regression tests for prompts, models, and LLM outputs. |
|
Made With ML
Made With ML
|
Free course | Intermediate | You need production ML habits that transfer to AI systems. |
|
AI Evals for Engineers & PMs
Hamel Husain and Shreya Shankar
|
Cohort course | Intermediate | You are shipping AI features and need a serious evaluation workflow. |
|
Full Stack Deep Learning Lectures
Full Stack Deep Learning
|
Course videos | Intermediate to advanced | You want the whole lifecycle of ML and AI product development. |
|
Hamel's AI evals guides
Hamel Husain
|
Guides | Intermediate to advanced | Use this when you want Hamel Husain's material for evals and related AI skills. |
|
AI Evals for Engineers and PMs
Shreya Shankar
|
Course | Intermediate | Use this when you want Shreya Shankar's material for evals and related AI skills. |
|
Full Stack Deep Learning
Full Stack Deep Learning
|
Course | Intermediate to advanced | Use this when you want Full Stack Deep Learning's material for mlops and related AI skills. |