LangSmith

The most full-featured LLM observability platform if you're in the LangChain orbit, but the proprietary model and trace-based pricing need scrutiny.

Observability·Evaluation·DevTool·Agentic·LLM

langchain.com

Our Take

What It Is

LangSmith is LangChain's proprietary observability and evaluation platform for LLM applications and agents. It provides end-to-end tracing, debugging, evaluation (human annotation, heuristic checks, LLM-as-judge, and pairwise comparisons), cost tracking, and deployment management. Available as managed cloud, bring-your-own-cloud, and self-hosted (Enterprise license). SDKs cover Python, TypeScript, Go, and Java.

Why It Matters

If you're building with LangChain or LangGraph, the tracing and debugging experience is genuinely unmatched. Agent steps, tool calls, and retrieval operations are visualised with full context automatically. The February 2026 experiment baseline pinning feature lets you pin any experiment as baseline so subsequent runs auto-compare, which is well-designed for iterative improvement. The unified cost view that tracks costs across full agent workflows (not just LLM calls) solves a real pain point for production agent deployments.

Key Developments

Feb 2026: Experiment baseline pinning and unified cost view across full agent workflows.
Feb 2026: LangSmith Polly launched as an AI assistant for debugging and improving agent behaviour inside the platform.
Jan 2026: LangSmith Fetch CLI tool for trace access from terminal. Self-hosted v0.12 with improved cloud feature parity.
Late 2025: Pairwise Annotation Queues for structured side-by-side comparison with human evaluation.
Late 2025: Product naming consolidation: LangSmith Deployment and Studio under the LangSmith umbrella.

What to Watch

The proprietary vs open-source question is the key strategic consideration. Langfuse (MIT license, 20k+ GitHub stars) offers a compelling open-source alternative. LangSmith's trace-based pricing ($2.50-$5.00 per 1K traces) can escalate quickly for high-volume agent applications. Watch whether LangSmith maintains its feature lead over Langfuse as both platforms mature, and whether the self-hosted offering achieves true feature parity with cloud. The LangChain coupling is real: marketed as framework-agnostic, but the best experience is firmly within the LangChain ecosystem.

Strengths

Deep LangChain integration: If building with LangChain/LangGraph, tracing and debugging is unmatched. Agent steps, tool calls, and retrieval visualised automatically.
Evaluation breadth: Human annotation, heuristic checks, LLM-as-judge, pairwise comparisons, and custom evaluators. Experiment comparison with pinned baselines is well-designed.
Multi-language SDKs: Python, TypeScript, Go, and Java. The Go and Java SDKs matter for polyglot stacks where most competitors are Python/TS only.
Cost tracking: Unified view of costs across full agent workflows, not just LLM calls. Custom cost metadata lets you track tool execution costs alongside model costs.

Considerations

Proprietary and closed-source: Unlike Langfuse (MIT, 20k+ stars), LangSmith is closed-source. Self-hosting requires Enterprise license.
Trace-based pricing: $2.50-$5.00 per 1K traces. For high-volume agent applications with many sub-traces, costs escalate quickly.
LangChain coupling: Marketed as framework-agnostic but best experience is within LangChain ecosystem. LlamaIndex/DSPy users may find Langfuse or Braintrust more natural.
Self-hosted lag: Self-hosted releases run behind cloud features. Expect new capabilities to land in cloud first.

Resources

Documentation

LangChain Changelogchangelog.langchain.com

Running log of LangSmith feature releases

LangSmith Observability Overviewlangchain.com

Product page detailing tracing capabilities, SDK support, and deployment options

LangSmith Platform Documentationdocs.langchain.com

Full reference for tracing, evaluation, deployment, and SDK integration

Articles

LangSmith vs Langfuse Comparisonmarkaicode.com

Practical 2026 comparison covering architecture, pricing, and use-case fit

More in Observability & Evals

LangSmith· DeepEval· Braintrust· LLM-as-Judge· Langfuse

Back to AI Radar