Machine Learning Research
Transformers Energized: Energy-Based Transformers (EBTs) use gradient descent to gradually predict the next token
A new type of transformer can check its work. Instead of guessing the next output token in one shot like a typical transformer, it starts with a rough version of the token and improves it step by step.