Is OpenAI o1 All Hype? What is Strawberry Test?

strawberry
Artificial IntelligenceLeave a Comment on Is OpenAI o1 All Hype? What is Strawberry Test?

Is OpenAI o1 All Hype? What is Strawberry Test?

LLMs have a really hard time counting the number of R’s in the word strawberry. This has to do with how the LLMs break words down into vectors and is why one of the codenames for OpenAI’s o1 was strawberry. The idea being, if we can get it to reason a bit more, then it should be able to count the number of Rs.

I wonder if I can create a prompt that most models will respond to with the correct number of Rs?

Yes, in the screenshot we have 4 different models counting the R’s accurately. (Granted, the prompt only works about 70% of the time.)

The main technique used in o1 is chain of thought. One of the original papers on this was by Jason Wei in 2022. In other words, with the right prompt, the model will simulate reasoning and perform better.

You can use this technique and write better prompts, or you can create an agent makes two prompts. One that specifically generates reasoning steps and a second that adds the reasoning to your query. The result is better responses for questions that benefit from “thinking harder”.

What is neat about o1 is that you don’t need to know about chain of thought. You simply ask your question and the model will think really hard about it before responding. Neat, but it is also slower, more expensive, and does not have access to doing internet searches like 4o.

The bottom line: it is valuable to learn when to use which model and how to write good prompts.

*image credit dalle3

Software Architect and Senior Full Stack Developer excited about crafting innovative user experiences with GenAI and Blockchain.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top