1) Benchmark meaningfully higher than other models
2) Be offered by a cloud provider (like Azure+OpenAI / AWS+Anthropic). Otherwise you have very little track record in model/api stability. Especially looking at the last week.
For us, we’ll probably try it for workflows that don’t currently work with 4.1 or 4 sonnet