The claim of formal equivalence is restricted to "linear transformers".
The tone of the comment seems to suggest that S.'s claim is slightly ridiculous. Can you point out a concrete technical shortcoming of the paper?
It is valuable to point out connections with existing work, if only to avoid reinventing the wheel, and properly stand on the shoulders of gigants: אֵין כָּל חָדָשׁ תַּחַת הַשָּׁמֶשׁ (there is nothing new under the sun).