SWE-Bench Pro: How Much Rubber Duck Contributes
Indexed to Claude Sonnet 4.6 alone = 100. Absolute scores not disclosed.
Hard problems (3+ files, 70+ steps)
Hardest problems (identified across 3 trials)
0Sonnet = 100Opus
74.7%
Gap closed
Sonnet → Opus performance gap
+3.8%
Hard problems
3+ files / 70+ steps
+4.8%
Hardest problems
3-trial worst cases
All values are indexed to Sonnet 4.6 alone = 100. Source: GitHub Blog "GitHub Copilot CLI combines model families for a second opinion" (April 6, 2026)