That's obviously a good use case and doesn't even need to have that large of number of elements because the data could get consumed by the GPU, so no back-and-forth transfer necessary.
This paper was focused on comparison based sorting. Depth sorting can be done with GPU radix sort (which is super fast), because with minor modifications, floating point and integer comparison are equal for finite, not-NaN values (and games don't care about that).
This paper was focused on comparison based sorting. Depth sorting can be done with GPU radix sort (which is super fast), because with minor modifications, floating point and integer comparison are equal for finite, not-NaN values (and games don't care about that).