An integer multiplication always needs a "carry propagate add" step at the end. Consequently, addition is always faster because that's the final step of a multiplication. (Floating point is a little different, but not significantly so).
IF you have a chain of additions AND the multiplier has a way to forward the "carry save" representation (ie. the step right before the carry propagate add) AND your carry propagate add takes longer than a single clock cycle, you could theoretically get better performance by feeding the adds into the multiplier accumulation matrix (which is just a big matrix for doing a whole bunch of adds simultaneously).
That's a lot of "and" and "if". Nobody does it that way. The best you get is "fused multiply add".
An integer multiplication always needs a "carry propagate add" step at the end. Consequently, addition is always faster because that's the final step of a multiplication. (Floating point is a little different, but not significantly so).
IF you have a chain of additions AND the multiplier has a way to forward the "carry save" representation (ie. the step right before the carry propagate add) AND your carry propagate add takes longer than a single clock cycle, you could theoretically get better performance by feeding the adds into the multiplier accumulation matrix (which is just a big matrix for doing a whole bunch of adds simultaneously).
That's a lot of "and" and "if". Nobody does it that way. The best you get is "fused multiply add".