GPT-4 Predictions LinkedIn Post
A short opinion post on GPT-4 expectations, scaling constraints, and likely directions for next-generation language models.
- Context
- Pre-GPT-4
- Size guess
- <=10x GPT-3
- Likely driver
- Data quality
- Context guess
- >50k
This is an archived LinkedIn post from January 2023, before GPT-4 was released. The image preserves the social post; the transcript below keeps the predictions readable and searchable.
Prediction Transcript
Average GPT-4 rumor in 2023.
On a serious note tho, GPT 4 isn't probably going to be too big compared to GPT-3 as we kind of already have hit the limits of computations. However here are some of the things to expect (this is just my personal opinion based on my knowledge of the field):
- The size won't be bigger than ~10x at best, or more likely it will be just the same size.
- It may include some optimization tricks from recent research, like longer context, better layers, faster attention, etc.
- The quality of the data would probably be the biggest factor. There is a possibility they might be training the new model on YouTube video transcripts.
- Multi-modality could be a possibility, like GPT directly outputting images along with the text, but I see it as less probable. More probable is for them to use DALL-E to synthesize images.
- There will probably be more specialized versions or experts like ChatGPT, along with the core special one, i.e. Davinci.
- It will more likely have longer context. The original post referenced an 8k-token context window, and I guessed some sort of memory could increase context way beyond 50k.
Main Themes
The post focused on compute limits, data quality, multimodality, specialization, faster attention, and longer context. The long-context prediction later connected to my experiments in A Quest for Very Long Context.
#chatgpt #gpt3 #gpt4 #gpt #training #data #research #openai