OpenAI launched a new model called o1 (formerly known as “ Strawberry ”) on September 12th. It is significantly better at reasoning tasks, scoring in the 89th percentile in competitive programming and surpassing PhD-level scores on physics, biology, and chemistry questions.
This model has been trained to use chain of thought to answer questions, rather than simply providing an immediate response. Chain of thought isn't new; in fact, it's been around for quite some time. It involves asking a language model to solve problems by "thinking out loud." A good example is when doing long division; you'll probably find it easier if you write down each step one by one, rather than doing it mentally. Language models work in a similar way: chain of thought creates a logical sequence that keeps the AI focused on the task.
Let's look at this with a simple example:
- Context: We are going to develop a simple game where the player must guess a random number between 1 and 100. The player's objective is to guess a secret number randomly generated by the program. After each attempt, the program will tell them if the number is higher or lower than the secret number.
- Prompt 1: "I want to create a game in Python where the user has to guess a random number between 1 and 100. How can I start by generating a random number?"
- Prompt 2: "I want the user to enter a number and compare it to the secret number. How do I do that?"
- Prompt 3: "I need to give the player feedback on whether their number is higher or lower than the secret number. Generate the procedure."
- Prompt 4: "I want the player to keep trying until they guess the number. How do I implement this?"
- Prompt 5: "I want the game to end when the player guesses the number. How do I make it end?"
Here we can see a more advanced example created by Matthew Clifford, which you can find at: Crossword Clues
Previously, Chain of Thought was simply a prompting technique that improved responses in the original GPT models. However, o1 is different, as it has been trained through reinforcement learning to always use this type of reasoning, without needing to be explicitly prompted. Now, when you ask ChatGPT a question with o1 enabled, an expandable indicator appears, allowing you to see its thought process.
And yes, it also correctly solves the classic strawberry problem . We've been experimenting a lot with o1 in the last few days, and we'll have much more to say in the coming weeks, but we wanted to share a quick reaction.
ChatGPT 4o and Claude are wrong.
A new paradigm in AI
The o1 model introduces a significant shift in the evolution of AI. Until now, progress in AI has primarily relied on two factors: more data and greater computing power during training. However, OpenAI has added a new dimension to improve performance: computation during inference . They have discovered that allowing o1 to take longer to respond improves the accuracy of its answers, as it has more time to process and reason.
This behavior marks a difference compared to previous models, such as GPT-4. When given more time to process responses, GPT-4 often became erratic or focused on irrelevant details. However, o1, thanks to its specialized training, maintains its focus and improves its performance on complex tasks.
This allows OpenAI to optimize o1's performance without needing to create an even larger and more expensive model, such as a hypothetical GPT-7. Instead, they can focus on giving o1 more time to "think" and generate more detailed and accurate responses.
Another example of its “reach” would be:
Physics researchers need to solve complex differential equations to model the behavior of dynamic systems. The o1 model can generate a detailed, step-by-step solution for non-trivial differential equations.
Prompt: Let's solve the second-order differential equation y′′+5y′+6y=0y'' + 5y' + 6y = 0y′′+5y′+6y=0 using the o1 model.
Steps:
-
Defining the equation: We introduce the equation into model o1:
Input: Solve y'' + 5y' + 6y = 0 for y. -
Identify roots:
The model analyzes the characteristic equation r2+5r+6=0r^2 + 5r + 6 = 0r2+5r+6=0 and obtains the roots, r=−2,−3r = -2, -3r=−2,−3. -
Construction of the overall solution:
With the roots found, the model generates the general solution in the form:
y(x)=c1e−2x+c2e−3xy(x) = c_1 e^{-2x} + c_2 e^{-3x}y(x)=c1e−2x+c2e−3x. - Detailed explanation: o1 explains each step, from the characteristic equation to the construction of the general solution with arbitrary constants.
But he doesn't just reason with numbers, he also does so with literature, as you can find in this thread by X created by Mehran Jalali : A poem from scratch
o1 and the economics of allocation
One future scenario that could emerge with these types of models is the ability to allocate more time to specific tasks. For example, instead of expecting an immediate response to all queries, o1 could be asked to dedicate more time to addressing complex tasks and return with a result after minutes, hours, or even days.
This concept introduces the idea of " allocation economics , " where users or companies managing these models must learn to decide when it is more efficient to use a model that requires more time to think, and how to maximize the value of that allocation. It's like making a bet, since the results of o1 are not revealed until after its reasoning process is complete, so it is essential to know when to make those bets and how to formulate the prompts most effectively.
While most users probably won't notice a significant difference in their daily use of o1, companies building products based on this technology could benefit considerably. Projects requiring in-depth analysis or complex reasoning will be significantly improved by incorporating o1 instead of models like GPT-4 or Claude.
Additional points of interest :
-
The Riemann enigma remains unsolved by o1
Despite the improvements made, o1 has yet to solve some of the most complex mathematical problems, such as the Riemann Hypothesis. This famous problem, which deals with the distribution of prime numbers, has been the subject of a joke by Andrej Karpathy , who pointed out on the X platform that o1 "refuses" to solve it. It will be interesting to see how far this model can go in solving problems that have not yet been tackled by artificial intelligence. - Can o1 generate entirely new knowledge?
The philosopher Toby Ord has introduced the concept of " hyperpolation ," an idea that explores AI's ability to move beyond training data and generate novelty in a way that isn't solely based on known examples. While interpolation is like finding routes between known cities and extrapolation is about predicting beyond the map's boundaries, "hyperpolation" is like exploring new dimensions not reflected in the training data.
This raises a key question about the current limitations of AI: while models can interpolate or extrapolate based on known data, can they truly explore and generate entirely new concepts? Computational scientist Judea Pearl has argued that, while current AI systems can lead to advanced, data-driven predictions, they cannot yet achieve the kind of innovation that led to major scientific breakthroughs in the past, such as Eratosthenes' calculation of the Earth's circumference.
8 comments
“Absolutely phenomenal work! The way you’ve broken down this complex topic while maintaining depth is impressive. Your expertise and research quality are evident throughout.”
Your blog has quickly become one of my favorites. Your writing is both insightful and thought-provoking, and I always come away from your posts feeling inspired. Keep up the phenomenal work!
Your blog is a testament to your dedication to your craft. Your commitment to excellence is evident in every aspect of your writing. Thank you for being such a positive influence in the online community.
Your blog is a testament to your dedication to your craft. Your commitment to excellence is evident in every aspect of your writing. Thank you for being such a positive influence in the online community.
Nice blog here Also your site loads up fast What host are you using Can I get your affiliate link to your host I wish my web site loaded up as quickly as yours lol
List beberapa orang yang ber Pala Botak
Daftar Beberapa Harga Borak
Tempat paling asik bermain slot game hanya di big77