Whether your not the cosine similarity of either pair is higher depends on the m...

latency-guy2 · on March 12, 2024

> meaningless to talk about the cosine similarity of your sentences without setting out your assumptions about how you will created embeddings from them.

I agree, but from generics POV, you have to settle on a few things to compare between models. If you can't, then benchmarks are useless too outside of extremely narrow measures.

I only address structure in the parent, and sure, it can be too generic of a statement by only touching on structure. But I would almost assert structure is still an important feature, and I would almost assert that it is required or otherwise a dominant feature when you want to deliver a product for general use.

I don't think I get too much more incorrect going beyond a few dimensions given this.

vidarh · on March 12, 2024

From the introduction to the paper:

> Discrete entities are often embedded via a learned mapping to dense real-valued vectors in a variety of domains.

Already from that point, it is clear that a comparison based on the similarity of the textual version of the sentences is irrelevant to the evaluation in the paper. The paper consistently talk in terms of "learned embeddings" rather than simplistic direct mappings of words.

_t89y · on March 12, 2024

It is meaningless to talk about cosine similarity of sentences, or words, at all. Choose whatever mapping you want. You'll still be in Firth Mode.

vidarh · on March 12, 2024

It's meaningful to talk about cosine similarity for anything that you can quantify in ways such that the cosine similarity reflects a measure you care about. Same applies for any function. If it works, it's meaningful to talk about it whether or not it has a reasonable interpretation beyond that.

_t89y · on March 12, 2024

Uh oh. LOL. Got some angry Firthers out there.