SWE-Bench Pro: How Much Rubber Duck Contributes

Indexed to Claude Sonnet 4.6 alone = 100. Absolute scores not disclosed.

All problems (average)
Sonnet 4.6
100
100
+ Rubber Duck
~74.7%
Opus 4.6
gap top
Hard problems (3+ files, 70+ steps)
Sonnet 4.6
100
100
+ Rubber Duck
+3.8%
+3.8
Hardest problems (identified across 3 trials)
Sonnet 4.6
100
100
+ Rubber Duck
+4.8%
+4.8
0Sonnet = 100Opus
74.7%
Gap closed
Sonnet → Opus performance gap
+3.8%
Hard problems
3+ files / 70+ steps
+4.8%
Hardest problems
3-trial worst cases

All values are indexed to Sonnet 4.6 alone = 100. Source: GitHub Blog "GitHub Copilot CLI combines model families for a second opinion" (April 6, 2026)