Large language models are getting stronger, but scaling them efficiently is becoming harder. Transformers with full self-attention hit bottlenecks: they eat mem… … Read More