
40 people were hired by Openai to rate GPT's responses to a range of pre written questions. Responses that contained sexual or violent language were marked down. The feedback was used to train InstructGPT to match responses in ways that the judges preferred.
The OpenAI found that users preferred InstructGPT over GPT-3 more than 70% of the time.
The customers prefer the aligned models so much more that it is exciting.
Users preferred the responses of a 1.3 billion-parameter InstructGPT model to those of a 175 billion-parameter GPT-3 even though the model was more than 100 times smaller. Leike says alignment could be an easy way to make language models better.