PromisingObservability & EvalsNew entryMarch 2026 New Items

Strong signal and real results. Worth committing a pilot to.

LangSmith

The most full-featured LLM observability platform if you're in the LangChain orbit, but the proprietary model and trace-based pricing need scrutiny.

Observability·Evaluation·DevTool·Agentic·LLM

langchain.com

Our Take

What It Is

LangSmith is LangChain's proprietary observability and evaluation platform for LLM applications and agents. It provides end-to-end tracing, debugging, evaluation (human annotation, heuristic checks, LLM-as-judge, and pairwise comparisons), cost tracking, and deployment management. Available as managed cloud, bring-your-own-cloud, and self-hosted (Enterprise license). SDKs cover Python, TypeScript, Go, and Java.

Why It Matters

If you're building with LangChain or LangGraph, the tracing and debugging experience is genuinely unmatched. Agent steps, tool calls, and retrieval operations are visualised with full context automatically. The February 2026 experiment baseline pinning feature lets you pin any experiment as baseline so subsequent runs auto-compare, which is well-designed for iterative improvement. The unified cost view that tracks costs across full agent workflows (not just LLM calls) solves a real pain point for production agent deployments.

Key Developments

  • Feb 2026: Experiment baseline pinning and unified cost view across full agent workflows.
  • Feb 2026: LangSmith Polly launched as an AI assistant for debugging and improving agent behaviour inside the platform.
  • Jan 2026: LangSmith Fetch CLI tool for trace access from terminal. Self-hosted v0.12 with improved cloud feature parity.
  • Late 2025: Pairwise Annotation Queues for structured side-by-side comparison with human evaluation.
  • Late 2025: Product naming consolidation: LangSmith Deployment and Studio under the LangSmith umbrella.

What to Watch

The proprietary vs open-source question is the key strategic consideration. Langfuse (MIT license, 20k+ GitHub stars) offers a compelling open-source alternative. LangSmith's trace-based pricing ($2.50-$5.00 per 1K traces) can escalate quickly for high-volume agent applications. Watch whether LangSmith maintains its feature lead over Langfuse as both platforms mature, and whether the self-hosted offering achieves true feature parity with cloud. The LangChain coupling is real: marketed as framework-agnostic, but the best experience is firmly within the LangChain ecosystem.

Strengths

  • Deep LangChain integration: If building with LangChain/LangGraph, tracing and debugging is unmatched. Agent steps, tool calls, and retrieval visualised automatically.
  • Evaluation breadth: Human annotation, heuristic checks, LLM-as-judge, pairwise comparisons, and custom evaluators. Experiment comparison with pinned baselines is well-designed.
  • Multi-language SDKs: Python, TypeScript, Go, and Java. The Go and Java SDKs matter for polyglot stacks where most competitors are Python/TS only.
  • Cost tracking: Unified view of costs across full agent workflows, not just LLM calls. Custom cost metadata lets you track tool execution costs alongside model costs.

Considerations

  • Proprietary and closed-source: Unlike Langfuse (MIT, 20k+ stars), LangSmith is closed-source. Self-hosting requires Enterprise license.
  • Trace-based pricing: $2.50-$5.00 per 1K traces. For high-volume agent applications with many sub-traces, costs escalate quickly.
  • LangChain coupling: Marketed as framework-agnostic but best experience is within LangChain ecosystem. LlamaIndex/DSPy users may find Langfuse or Braintrust more natural.
  • Self-hosted lag: Self-hosted releases run behind cloud features. Expect new capabilities to land in cloud first.

More in Observability & Evals

LangSmith· DeepEval· Braintrust· LLM-as-Judge· Langfuse

Back to AI Radar