Machine Learning Research
10 Million Tokens of Input Context: ATLAS, a transformer-like architecture, can process a context window as large as ten million tokens
An alternative to attention enables large language models to track relationships among words across extraordinarily wide spans of text.