I think that the current test suite is far too small. For the Claude Code codebase, a sensible next step would be to generate thousands of tests. Without that kind of coverage, regressions are likely, and the existing checks and review process do not appear sufficient to reliably prevent them.
My request is that an entirely LLM-written feature should only be eligible for merge once all of those generated tests pass, so we have objective evidence that the change preserves existing behavior.