OpenAI o3: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 12:29, 25 December 2024 editIceCuba (talk \| contribs)Extended confirmed users4,279 edits →Capabilities: WP:OVERCITE Tags: Mobile edit Mobile web edit Advanced mobile edit← Previous edit		Revision as of 12:30, 25 December 2024 edit undoIceCuba (talk \| contribs)Extended confirmed users4,279 edits →Capabilities Tags: Mobile edit Mobile web edit Advanced mobile editNext edit →
Line 9:		Line 9:
	o3 demonstrates improved performance over the o1 model in complex tasks, including ], ], and ]. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.<ref name="auto"/>		o3 demonstrates improved performance over the o1 model in complex tasks, including ], ], and ]. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.<ref name="auto"/>

	As reported by '']'', o3 also scored a record high of 75.7% on the Abstraction and Reasoning Corpus (ARC) developed by Google software engineer ], a prestigious AI reasoning test,<ref name=":0">{{Cite web \|last=Hsu \|first=Jeremy \|title=OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI \|url=https://www.newscientist.com/article/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi/ \|access-date=2024-12-22 \|website=New Scientist \|language=en-US}}</ref> but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy. Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%, while humans score, on average, 84%.<ref name=":0" />		As reported by '']'', o3 also scored a record high of 75.7% on the Abstraction and Reasoning Corpus (ARC) developed by Google software engineer ], a prestigious AI reasoning test, but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy. Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%, while humans score, on average, 84%.<ref name=":0">{{Cite web \|last=Hsu \|first=Jeremy \|title=OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI \|url=https://www.newscientist.com/article/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi/ \|access-date=2024-12-22 \|website=New Scientist \|language=en-US}}</ref>

	According to '']'', ] was used to teach o3 to "think" before reacting using what ] refers to as a "private chain of thought." The model can allegedly plan ahead and reason through a task, carrying out a sequence of actions over a long period of time to assist in solving the problem, but TechCrunch reported that this does increase the ] of responses.<ref name=":1">{{Cite web \|last=Wiggers \|first=Maxwell Zeff, Kyle \|date=2024-12-20 \|title=OpenAI announces new o3 models \|url=https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/ \|access-date=2024-12-22 \|website=TechCrunch \|language=en-US}}</ref>		According to '']'', ] was used to teach o3 to "think" before reacting using what ] refers to as a "private chain of thought." The model can allegedly plan ahead and reason through a task, carrying out a sequence of actions over a long period of time to assist in solving the problem, but TechCrunch reported that this does increase the ] of responses.<ref name=":1">{{Cite web \|last=Wiggers \|first=Maxwell Zeff, Kyle \|date=2024-12-20 \|title=OpenAI announces new o3 models \|url=https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/ \|access-date=2024-12-22 \|website=TechCrunch \|language=en-US}}</ref>

Revision as of 12:30, 25 December 2024

Large language model

OpenAI o3 is a generative pre-trained transformer model developed by OpenAI as a successor to the OpenAI o1 model. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning.

History

The OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the existing UK mobile carrier named O2. The model is available in two versions: o3 and o3-mini. Until January 10, 2025, OpenAI invites safety and security researchers to apply for early access of these models. OpenAI plans to release o3-mini to the public in January 2025.

Capabilities

o3 demonstrates improved performance over the o1 model in complex tasks, including coding, mathematics, and science. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.

As reported by New Scientist, o3 also scored a record high of 75.7% on the Abstraction and Reasoning Corpus (ARC) developed by Google software engineer François Chollet, a prestigious AI reasoning test, but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy. Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%, while humans score, on average, 84%.

According to TechCrunch, reinforcement learning was used to teach o3 to "think" before reacting using what OpenAI refers to as a "private chain of thought." The model can allegedly plan ahead and reason through a task, carrying out a sequence of actions over a long period of time to assist in solving the problem, but TechCrunch reported that this does increase the latency of responses.

References

^ Knight, Will. "OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills" – via Wired.com.
https://www.nytimes.com/2024/12/20/technology/openai-new-ai-math-science.html
https://openai.com/index/early-access-for-safety-testing/
https://arstechnica.com/information-technology/2024/12/openai-announces-o3-and-o3-mini-its-next-simulated-reasoning-models/
Hsu, Jeremy. "OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI". New Scientist. Retrieved 2024-12-22.
Wiggers, Maxwell Zeff, Kyle (2024-12-20). "OpenAI announces new o3 models". TechCrunch. Retrieved 2024-12-22.{{cite web}}: CS1 maint: multiple names: authors list (link)

OpenAI

Products

Foundation models

OpenAI Codex
Generative pre-trained transformer
- GPT-1
- GPT-2
- GPT-3
- GPT-4
- GPT-4o
- o1
- o3

People

CEOs

Board of directors

Current	Sam Altman Adam D'Angelo Sue Desmond-Hellmann Paul Nakasone Nicole Seligman Fidji Simo Lawrence Summers Bret Taylor
Former	Greg Brockman (2017–2023) Reid Hoffman (2019–2023) Will Hurd (2021–2023) Holden Karnofsky (2017–2021) Elon Musk (2015–2018) Ilya Sutskever (2017–2023) Helen Toner (2021–2023) Shivon Zilis (2019–2023)

Category

Category:

OpenAI

Revision as of 12:29, 25 December 2024 editIceCuba (talk \| contribs)Extended confirmed users4,279 edits →Capabilities: WP:OVERCITE Tags: Mobile edit Mobile web edit Advanced mobile edit← Previous edit		Revision as of 12:30, 25 December 2024 edit undoIceCuba (talk \| contribs)Extended confirmed users4,279 edits →Capabilities Tags: Mobile edit Mobile web edit Advanced mobile editNext edit →
Line 9:		Line 9:
	o3 demonstrates improved performance over the o1 model in complex tasks, including ], ], and ]. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.<ref name="auto"/>		o3 demonstrates improved performance over the o1 model in complex tasks, including ], ], and ]. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.<ref name="auto"/>

	As reported by '']'', o3 also scored a record high of 75.7% on the Abstraction and Reasoning Corpus (ARC) developed by Google software engineer ], a prestigious AI reasoning test,<ref name=":0">{{Cite web \|last=Hsu \|first=Jeremy \|title=OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI \|url=https://www.newscientist.com/article/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi/ \|access-date=2024-12-22 \|website=New Scientist \|language=en-US}}</ref> but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy. Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%, while humans score, on average, 84%.<ref name=":0" />		As reported by '']'', o3 also scored a record high of 75.7% on the Abstraction and Reasoning Corpus (ARC) developed by Google software engineer ], a prestigious AI reasoning test, but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy. Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%, while humans score, on average, 84%.<ref name=":0">{{Cite web \|last=Hsu \|first=Jeremy \|title=OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI \|url=https://www.newscientist.com/article/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi/ \|access-date=2024-12-22 \|website=New Scientist \|language=en-US}}</ref>

	According to '']'', ] was used to teach o3 to "think" before reacting using what ] refers to as a "private chain of thought." The model can allegedly plan ahead and reason through a task, carrying out a sequence of actions over a long period of time to assist in solving the problem, but TechCrunch reported that this does increase the ] of responses.<ref name=":1">{{Cite web \|last=Wiggers \|first=Maxwell Zeff, Kyle \|date=2024-12-20 \|title=OpenAI announces new o3 models \|url=https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/ \|access-date=2024-12-22 \|website=TechCrunch \|language=en-US}}</ref>		According to '']'', ] was used to teach o3 to "think" before reacting using what ] refers to as a "private chain of thought." The model can allegedly plan ahead and reason through a task, carrying out a sequence of actions over a long period of time to assist in solving the problem, but TechCrunch reported that this does increase the ] of responses.<ref name=":1">{{Cite web \|last=Wiggers \|first=Maxwell Zeff, Kyle \|date=2024-12-20 \|title=OpenAI announces new o3 models \|url=https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/ \|access-date=2024-12-22 \|website=TechCrunch \|language=en-US}}</ref>