Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The sequence of model activations is being compressed. s4 treats each activation channel as an independent sequence, and applies a learned version of the Laplace transform, and drops less-significant components.

This is similar to basic compression you get with PCA or Fourier transforms. These transforms re fully invertible, until you drop the less significant components. Dropping less-significant components lets you reconstruct some degraded version of the input, and the transform makes it easy to pick the right components to drop.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: