Computational photography is about to get really good when it can combine hundreds or thousands of frames into one. 1000 frames effectively combined is equivalent to a lens and sensor of 1000x the surface area - ie exceeding a single frame from a DSLR.
Current methods use optical flow and gyroscopes to align images, but I imagine future methods to use AI to understand movement that doesn't work well for optical flow (ie. Where a specular reflection 'moves' on a wine glass).
Saying “ai” does not magically solve anything. Current best ML things can’t even solve medium difficulty calculus equations. We’re nowhere near them doing original work like understanding what’s in 1000 images, creating a world model out of that, and then rendering a better image of that world successfully without hallucinations.
I doubt there's enough information in more than ~4 samples to be worth fusing into one image. Maybe more if you have a full HDR display and supply chain or if the lighting is really bad, but almost everything is about having good taste in SDR tone mapping/image development otherwise.
Current methods use optical flow and gyroscopes to align images, but I imagine future methods to use AI to understand movement that doesn't work well for optical flow (ie. Where a specular reflection 'moves' on a wine glass).