Neural nets

designing the fanciest usable differentiable loss surface

What’s that now? Long story, but see transformer or Sparse Transformer etc for particularly developed examples and explanations of this sub-field.

For the transformer network at least there seems to be an unexpectedly computationally efficient trade-off where you can go faster by training a bigger network.

These networks are absolutely massive (heh) in natural language processing right now.

