Onsombleonsombleai
FeaturesHow It WorksPricingBlog
Sign InGet Early Access
Onsombleonsombleai

Research. Write. Present.
All in one workspace.

Product
  • Features
  • Pricing
  • Docs
Resources
  • Blog
  • Changelog
  • Help
Legal
  • Privacy
  • Terms
Connect

© 2026 Onsomble AI. All rights reserved.

Built for knowledge workers who ship.

Onsombleonsombleai
FeaturesHow It WorksPricingBlog
Sign InGet Early Access
Onsombleonsombleai

Research. Write. Present.
All in one workspace.

Product
  • Features
  • Pricing
  • Docs
Resources
  • Blog
  • Changelog
  • Help
Legal
  • Privacy
  • Terms
Connect

© 2026 Onsomble AI. All rights reserved.

Built for knowledge workers who ship.

RSS
Contents
  • The Model Quality Moat
  • The Survey Microsoft Can't Ignore
  • Monorepo Navigation That Actually Works
  • Claude Skills Are the Killer Feature Nobody Talks About
  • What This Means for Engineering Leaders
  • The Bottom Line
Onsombleonsombleai
FeaturesHow It WorksPricingBlog
Sign InGet Early Access
Back to Blog
Microsoft Is So Worried About Claude Code, They're Testing It Against Copilot
Engineering8 min read•January 23, 2026

Microsoft Is So Worried About Claude Code, They're Testing It Against Copilot

Microsoft just told thousands of engineers to install Claude Code and compare it to Copilot. When you're running internal benchmarks against a competitor, you're not confident you're winning.

Rosh Jayawardena
Rosh Jayawardena
Data & AI Executive

Microsoft just told thousands of engineers to install Claude Code.

Not as a replacement for Copilot. Alongside it. They want their own engineers to run head-to-head comparisons between the AI coding tool they sell and the one Anthropic built.

Think about what that means for a second.

The company that owns GitHub Copilot - the company that's invested billions in OpenAI (To be fair they've invested a lot in Anthropic too) - is so concerned about Claude Code that they need internal data on how it stacks up. The Verge broke the story: Microsoft's Experiences + Devices division (that's Windows, Office, Teams, Edge, and Surface) was asked to install Claude Code last week. Even designers and project managers are being encouraged to prototype with it.

When you're running internal benchmarks against a competitor's product, you're not confident you're winning.

I build with Claude Code every day. I'm developing an AI-powered notebook app called Onsomble - a Next.js frontend, NestJS backend monorepo with LangGraph workflows and RAG capabilities. Claude Code isn't just another tool in my stack. It's become the way I build software.

And I get why Microsoft needs to know what they're up against.

The Model Quality Moat

Here's the thing nobody wants to say out loud: Anthropic's models are better for serious coding work. Not marginally better. Noticeably better.

Even in competitor tools. Fire up Cursor or Windsurf, and watch what model serious engineers choose. It's Sonnet or Opus. Not because of brand loyalty - because of results.

The numbers back this up. Claude Opus 4.5 hit 80.9% on SWE-bench Verified - the first AI to break 80%, currently the world leader. But benchmarks only tell part of the story.

When I started building Onsomble, I tried every model. GPT-4, Gemini, the works. I kept coming back to Anthropic's models for anything non-trivial. The difference isn't marginal. It's the difference between code that "sort of works" and code that actually understands your architecture.

Stack Overflow's 2025 Developer Survey found Claude Sonnet is used more by professional developers (45%) than by those learning to code (30%). The professionals know.

The Survey Microsoft Can't Ignore

Here's where it gets uncomfortable for the Copilot team.

A Blind survey from December 2025 asked tech professionals which AI tool they actually use. At Microsoft, 34% of respondents said Claude was their primary tool. Copilot? 32%.

It's not just Microsoft. At Meta, 50% reported Claude as their most-used AI model. Only 8% said Meta AI. At Amazon, 54% chose Claude as their go-to.

Engineers vote with their keystrokes. And right now, they're voting for Claude.

This is why Microsoft is running internal comparisons. They've seen the survey data. They know their engineers are already using Claude Code on side projects, maybe on company time. The smart move isn't to ban it - it's to understand exactly where Copilot falls short.

Monorepo Navigation That Actually Works

Most AI coding tools are great at answering questions about individual files. Ask about a function, get a reasonable answer. But codebases aren't individual files. They're systems.

Claude Code understands systems.

Onsomble is a monorepo with a Next.js frontend and NestJS backend. Change a DTO in the backend, and it affects API calls in the frontend, which affects state management in Zustand stores, which affects how components render. Most tools would give me answers for individual files. Claude Code gives me answers for my system.

This isn't magic. It's context management done right.

The CLAUDE.md system lets you give Claude project-specific context - your architecture decisions, your conventions, your common pitfalls. In a monorepo, you can nest these files: one at the root, one in each major directory. Claude reads them, understands the relationships, and traces dependencies across boundaries.

When I ask Claude Code why a certain API call is failing, it doesn't just look at the endpoint. It traces the DTO definition, checks how the frontend is constructing the request, examines the validation pipe, and tells me exactly where the mismatch is.

I spent three months with other tools before switching to Claude Code. The difference in how it navigates complex codebases is stark. It's not thinking about files - it's thinking about systems.

Claude Skills Are the Killer Feature Nobody Talks About

This is one killer feature that is super under-utilised: you can teach it to think like you.

Claude Skills let you encode your workflows, your debugging approaches, your architectural standards. This isn't just "custom prompts." It's creating specialized agents within Claude Code that follow your exact methodology.

I built a skill for bug investigation. Here's how it works:

  1. I describe a bug
  2. The skill develops multiple hypotheses about what could be causing it
  3. It researches and traces code to invalidate hypotheses one by one
  4. It suggests specific debugging statements, then waits for me to run them
  5. It analyzes the output and identifies the true root cause
  6. It suggests architectural improvements to prevent similar issues

This isn't autocomplete. This is pair programming with someone who never forgets your architecture and never gets tired of being systematic.

I have skills for creating components that follow our atomic design system. Skills for writing tests that match our patterns. Skills for planning features with the right level of breakdown.

Over time, Claude Code has become less of a tool and more of a team member that's internalized how we work.

Most engineers I talk to haven't even discovered Skills yet. They're using Claude Code like a better autocomplete. That's like buying a Tesla and only using it for the cup holders.

What This Means for Engineering Leaders

Microsoft's internal benchmark is a preview of what happens when executives finally pay attention to what their engineers are already telling them.

The engineers found something better and started using it. Leadership's response wasn't to ban the competition - it was to study it. That's the right call. The "let's compare them head-to-head" approach is far smarter than pretending the problem doesn't exist.

The real cost isn't the subscription price. Anthropic's API isn't cheap. The real cost is the productivity delta. Claude Opus 4.5 handles long-horizon coding tasks using up to 65% fewer tokens than previous models while achieving higher pass rates. That's efficiency gains that compound across your entire engineering org.

If your team is gravitating toward a tool you don't sell, that's not betrayal. That's market research delivered directly to your doorstep.

Microsoft understood this. They didn't issue a ban. They issued a benchmarking exercise. Somewhere in Redmond right now, an engineer is filing a report on exactly where Claude Code outperforms Copilot. That report is going to be uncomfortable reading.

The Bottom Line

Microsoft didn't deploy Claude Code because they think Copilot is winning. They deployed it because they need to know how far behind they are.

I've been building with Claude Code for months now. Every week it gets more embedded in how I work. The model quality keeps improving. The context handling keeps getting smarter. The Skills system lets me compound my workflows over time.

Microsoft's internal test will generate data. But the Blind survey already told us what the engineers think. 34% to 32%. The verdict is in.

The best tool is winning.

#Opinion#AI Strategy#ROI#Agents#Generative AI#Enterprise
Rosh Jayawardena

Rosh Jayawardena

Data & AI Executive

I lead data & AI for New Zealand's largest insurer. Before that, 10+ years building enterprise software. I write about AI for people who need to finish things, not just play with tools

View all posts→

Discussion

0

Continue Reading

Gaslighting Your AI Into Better Results: What the Research Actually Shows
Engineering8 min read

Gaslighting Your AI Into Better Results: What the Research Actually Shows

A Reddit post about telling Claude you work at a hospital went viral. Turns out there's actual research explaining why this works across all LLMs.

Rosh Jayawardena
Rosh Jayawardena
Jan 29, 2026
The Complete Guide to RAG Chunking: 6 Strategies with Code
Engineering12 min read

The Complete Guide to RAG Chunking: 6 Strategies with Code

How you split your documents determines whether RAG finds what you need or returns noise. Here's the complete breakdown with code.

Rosh Jayawardena
Rosh Jayawardena
Jan 2, 2026
RAG vs. Long Context Windows: A Decision Framework for Research Workflows
Engineering9 min read

RAG vs. Long Context Windows: A Decision Framework for Research Workflows

Long context windows are getting massive—but that doesn't mean RAG is dead. Here's when each approach actually works, with real numbers.

Rosh Jayawardena
Rosh Jayawardena
Dec 28, 2025

Deep dives, delivered weekly

AI patterns, workflow tips, and lessons from the field. No spam, just signal.

Onsombleonsombleai

Research. Write. Present.
All in one workspace.

Product
  • Features
  • Pricing
  • Docs
Resources
  • Blog
  • Changelog
  • Help
Legal
  • Privacy
  • Terms
Connect

© 2026 Onsomble AI. All rights reserved.

Built for knowledge workers who ship.