OpenAI o3: Difference between revisions

Browse history interactively ← Previous editContent deleted Content addedVisual WikitextInline

Revision as of 18:19, 22 December 2024 edit2402:8100:315e:6bd5:27dc:6666:5a08:18a3 (talk) Just added a dotTags: Reverted Mobile edit Mobile web edit← Previous edit		Latest revision as of 02:17, 28 December 2024 edit undoDukese805 (talk \| contribs)136 editsm →History: clarify that this is not openai giving you an invitation, but rather a description of an invitationTag: 2017 wikitext editor
(19 intermediate revisions by 12 users not shown)
Line 1:		Line 1:
	{{Short description\|Large language model}}		{{Short description\|Large language model}}
			{{Infobox software

			\| name = o3
⚫	'''OpenAI ~~o.3~~''' is a ] model developed by ] as a successor to the ] model. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning.<ref name="auto">{{Cite ~~web~~\|url=https://www.wired.com/story/openai-o3-reasoning-model-google-gemini/\|~~title~~=~~OpenAI~~ ~~Upgrades Its Smartest AI Model With Improved Reasoning Skills\|first=Will\|last=Knight~~\|via=]}}</ref><ref>https://www.nytimes.com/2024/12/20/technology/openai-new-ai-math-science.html</ref>
			\| developer = ]
			\| genre = ]
			\| replaces = ]
			\| replaced_by =
			\| license =
			}}
		⚫	'''OpenAI o3''' is a ] (GPT) model developed by ] as a successor to the ] model. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning.<ref name="auto">{{Cite magazine \|last=Knight \|first=Will \|date=December 20, 2024 \|title=OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills \|url=https://www.wired.com/story/openai-o3-reasoning-model-google-gemini/ \|magazine=Wired \|via=}}</ref><ref>{{Cite web \|date=2024-12-20 \|title=OpenAI Unveils New A.l. That Can 'Reason' Through Math and Science Problems \|url=https://www.nytimes.com/2024/12/20/technology/openai-new-ai-math-science.html \|website=The New York Times}}</ref>

	==History==		==History==
	OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the existing UK mobile carrier named ]. The model is available in two versions: o3 and o3-mini. ~~Until~~ ~~January~~ ~~10,~~ ~~2025,~~ ~~access~~ is ~~provided~~ for ~~safety~~ ~~and~~ ~~security~~ ~~researchers~~ ~~through~~ an ~~invitation-based~~ ~~testing~~ ~~program~~.<ref name="auto"/><ref>https://openai.com/index/early-access-for-safety-testing/</ref> OpenAI plans to release o3-mini to the public in January 2025.<ref>https://arstechnica.com/information-technology/2024/12/openai-announces-o3-and-o3-mini-its-next-simulated-reasoning-models/</ref>		The OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the existing UK mobile carrier named ]. The model is available in two versions: o3 and o3-mini. OpenAI invited safety and security researchers to apply for early access of these models until January 10, 2025.<ref name="auto"/><ref>{{Cite web \|date=December 20, 2024 \|title=Early access for safety testing \|url=https://openai.com/index/early-access-for-safety-testing/ \|website=OpenAI}}</ref> OpenAI plans to release o3-mini to the public in January 2025.<ref>{{Cite web\|url=https://arstechnica.com/information-technology/2024/12/openai-announces-o3-and-o3-mini-its-next-simulated-reasoning-models/\|title=OpenAI announces o3 and o3-mini, its next simulated reasoning models\|first=Benj\|last=Edwards\|date=December 20, 2024\|website=Ars Technica}}</ref>

	==Capabilities==		==Capabilities==
			] was used to teach o3 to "think" before generating answers, using what ] refers to as a "private ]". This approach enables the model to plan ahead and reason through tasks, performing a series of intermediate reasoning steps to assist in solving the problem, at the cost of needing additional computing power and increasing the ] of responses.<ref name=":1">{{Cite web \|last=Zeff \|first=Maxwell \|last2=Wiggers \|first2=Kyle \|date=2024-12-20 \|title=OpenAI announces new o3 models \|url=https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/ \|access-date=2024-12-22 \|website=TechCrunch \|language=en-US}}</ref>
	o3 demonstrates improved performance over the o1 model in complex tasks, including ], ], and ]. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.<ref name="auto"/>

			o3 demonstrates improved performance compared to o1 in complex tasks, including ], ], and ].<ref name="auto" /> OpenAI reported that o3 achieved a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online.<ref name=":2">{{Cite web \|last=Franzen \|first=Carl \|last2=David \|first2=Emilia \|date=2024-12-20 \|title=OpenAI confirms new frontier models o3 and o3-mini \|url=https://venturebeat.com/ai/openai-confirms-new-frontier-models-o3-and-o3-mini/ \|access-date=2024-12-26 \|website=VentureBeat \|language=en-US}}</ref>

			On SWE-bench Verified, a ] benchmark assessing the ability to solve real ] issues, o3 scored 71.7%, compared to 48.9% for o1. On ], o3 reached an ] score of 2727, whereas o1 obtained 1891.<ref name=":2" />
⚫	As ~~reported~~ by ], o3 ~~also~~ ~~scored~~ a ~~record~~ ~~high~~ of ~~75.7%~~ ~~on the~~ ~~Abstraction~~ and ~~Reasoning~~ ~~Corpus~~ ~~(ARC)~~ ~~developed~~ by ~~Google~~ ~~software~~ ~~engineer~~ ], a ~~prestigious AI reasoning test,~~<ref name=":0">{{Cite web \|last=Hsu \|first=Jeremy \|title=OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI \|url=https://www.newscientist.com/article/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi/ \|access-date=2024-12-22 \|website=New Scientist \|language=en-US}}</ref> but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy.<ref name=":0" /> Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%,<ref name=":0" /> while humans score, on average, 84%.<ref name=":0" />

		⚫	On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attained three times the accuracy of o1.<ref name="auto" /><ref name=":0">{{Cite web \|last=Hsu \|first=Jeremy \|title=OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI \|url=https://www.newscientist.com/article/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi/ \|access-date=2024-12-22 \|website=New Scientist \|language=en-US}}</ref>
	According to ], ] was used to teach o3 to "think" before reacting using what ] refers to as a "private chain of thought."<ref name=":1">{{Cite web \|last=Wiggers \|first=Maxwell Zeff, Kyle \|date=2024-12-20 \|title=OpenAI announces new o3 models \|url=https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/ \|access-date=2024-12-22 \|website=TechCrunch \|language=en-US}}</ref> The model can allegedly plan ahead and reason through a task, carrying out a sequence of actions over a long period of time to assist in solving the problem,<ref name=":1" /> but TechCrunch reported that this does increase the ] of responses.<ref name=":1" />

	==References==		==References==
	{{~~reflist~~}}		{{Reflist}}

	{{OpenAI}}		{{OpenAI}}

			]
			]
			]
	]		]
			]

Latest revision as of 02:17, 28 December 2024

Large language model

o3
Developer(s)	OpenAI
Predecessor	OpenAI o1
Type	Generative pre-trained transformer

OpenAI o3 is a generative pre-trained transformer (GPT) model developed by OpenAI as a successor to the OpenAI o1 model. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning.

History

The OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the existing UK mobile carrier named O2. The model is available in two versions: o3 and o3-mini. OpenAI invited safety and security researchers to apply for early access of these models until January 10, 2025. OpenAI plans to release o3-mini to the public in January 2025.

Capabilities

Reinforcement learning was used to teach o3 to "think" before generating answers, using what OpenAI refers to as a "private chain of thought". This approach enables the model to plan ahead and reason through tasks, performing a series of intermediate reasoning steps to assist in solving the problem, at the cost of needing additional computing power and increasing the latency of responses.

o3 demonstrates improved performance compared to o1 in complex tasks, including coding, mathematics, and science. OpenAI reported that o3 achieved a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online.

On SWE-bench Verified, a software engineering benchmark assessing the ability to solve real GitHub issues, o3 scored 71.7%, compared to 48.9% for o1. On Codeforces, o3 reached an Elo score of 2727, whereas o1 obtained 1891.

On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attained three times the accuracy of o1.

References

^ Knight, Will (December 20, 2024). "OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills". Wired.
"OpenAI Unveils New A.l. That Can 'Reason' Through Math and Science Problems". The New York Times. 2024-12-20.
"Early access for safety testing". OpenAI. December 20, 2024.
Edwards, Benj (December 20, 2024). "OpenAI announces o3 and o3-mini, its next simulated reasoning models". Ars Technica.
Zeff, Maxwell; Wiggers, Kyle (2024-12-20). "OpenAI announces new o3 models". TechCrunch. Retrieved 2024-12-22.
^ Franzen, Carl; David, Emilia (2024-12-20). "OpenAI confirms new frontier models o3 and o3-mini". VentureBeat. Retrieved 2024-12-26.
Hsu, Jeremy. "OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI". New Scientist. Retrieved 2024-12-22.

OpenAI

Products

Foundation models

OpenAI Codex
Generative pre-trained transformer
- GPT-1
- GPT-2
- GPT-3
- GPT-4
- GPT-4o
- o1
- o3

People

CEOs

Board of directors

Current	Sam Altman Adam D'Angelo Sue Desmond-Hellmann Paul Nakasone Nicole Seligman Fidji Simo Lawrence Summers Bret Taylor
Former	Greg Brockman (2017–2023) Reid Hoffman (2019–2023) Will Hurd (2021–2023) Holden Karnofsky (2017–2021) Elon Musk (2015–2018) Ilya Sutskever (2017–2023) Helen Toner (2021–2023) Shivon Zilis (2019–2023)

Category

Categories: