Writing

GPT-4 Predictions LinkedIn Post

A short opinion post on GPT-4 expectations, scaling constraints, and likely directions for next-generation language models.

Context
Pre-GPT-4
Size guess
<=10x GPT-3
Likely driver
Data quality
Context guess
>50k
GPT-4 prediction meme from a preserved LinkedIn post.
Archived image attached to the original LinkedIn post.

This is an archived LinkedIn post from January 2023, before GPT-4 was released. The image preserves the social post; the transcript below keeps the predictions readable and searchable.

Prediction Transcript

Average GPT-4 rumor in 2023.

On a serious note tho, GPT 4 isn't probably going to be too big compared to GPT-3 as we kind of already have hit the limits of computations. However here are some of the things to expect (this is just my personal opinion based on my knowledge of the field):

  1. The size won't be bigger than ~10x at best, or more likely it will be just the same size.
  2. It may include some optimization tricks from recent research, like longer context, better layers, faster attention, etc.
  3. The quality of the data would probably be the biggest factor. There is a possibility they might be training the new model on YouTube video transcripts.
  4. Multi-modality could be a possibility, like GPT directly outputting images along with the text, but I see it as less probable. More probable is for them to use DALL-E to synthesize images.
  5. There will probably be more specialized versions or experts like ChatGPT, along with the core special one, i.e. Davinci.
  6. It will more likely have longer context. The original post referenced an 8k-token context window, and I guessed some sort of memory could increase context way beyond 50k.

Main Themes

The post focused on compute limits, data quality, multimodality, specialization, faster attention, and longer context. The long-context prediction later connected to my experiments in A Quest for Very Long Context.

#chatgpt #gpt3 #gpt4 #gpt #training #data #research #openai