Totally agree. Just yesterday, I was finishing up an article [1] that advocates for conversation length as the new definition of a "score" on a Turing test.
You assume everyone is a robot and measure how long it takes to tell otherwise.
Such a metric is clearly useless if you cannot tell otherwise.
I am very frustrated by the way this article repeatedly asks chatgpt to guess if something is a bot, gets told “well, we can’t know for sure but this is at least the sign of a crappy bot or human behavior” and then the author says “Aha! But a human could act like a crappy bot or a you could train a bot to mimic this exact behavior”.
[1]: http://coldattic.info/post/129/