IMO, a good contest between LLMs would be data compression. Each LLM is given the same pile of text, and then asked to create compact notes that fit into N pages of text. Then the original text is replaced with their notes and they need to answer a bunch of questions about the original text using the notes alone.
Summarization ? I'm pretty sure there are benchmarks for this because people used summarization to build search indexes (at least a few years ago when I was working on this they did and there were benchmarks)