Token Compression: Reducing Attention Waste?
An experiment on compressing multiple LLM tokens into one representation for faster decoding and longer effective context.
1 piece in this tag.
An experiment on compressing multiple LLM tokens into one representation for faster decoding and longer effective context.