Token Compression: Reducing Attention Waste?
An experiment on compressing multiple LLM tokens into one representation for faster decoding and longer effective context.
2 pieces in this tag.
An experiment on compressing multiple LLM tokens into one representation for faster decoding and longer effective context.
Experiments on extending transformer context length, including training observations, tradeoffs, and lessons from long-context tuning.