Revision as of 00:27, 22 December 2024 editAlenoach (talk | contribs)Extended confirmed users2,982 edits Adding short description: "Large language model"Tag: Shortdesc helper← Previous edit | Revision as of 02:47, 22 December 2024 edit undoMolecularPilot (talk | contribs)Extended confirmed users, New page reviewers, Pending changes reviewers, Rollbackers2,936 edits New Scientist articleTag: Visual editNext edit → | ||
Line 8: | Line 8: | ||
==Capabilities== | ==Capabilities== | ||
o3 demonstrates improved performance over the o1 model in complex tasks, including ], ], and ]. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.<ref name="auto"/> | o3 demonstrates improved performance over the o1 model in complex tasks, including ], ], and ]. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.<ref name="auto"/> | ||
As reported by New Scientist, o3 also scored a record high of 75.7% on the Abstraction and Reasoning Corpus (ARC) developed by Google software engineer ], a prestigious AI reasoning test,<ref name=":0">{{Cite web |last=#author.fullName} |title=OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI |url=https://www.newscientist.com/article/2462000-openais-o3-model-aced-a-test-of-ai-reasoning-but-its-still-not-agi/ |access-date=2024-12-22 |website=New Scientist |language=en-US}}</ref> but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy.<ref name=":0" /> Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%,<ref name=":0" /> while humans score, on average, 84%.<ref name=":0" /> | |||
==References== | ==References== |
Revision as of 02:47, 22 December 2024
Large language modelOpenAI o3 is a generative pre-trained transformer model developed by OpenAI as a successor to the OpenAI o1 model. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning.
History
OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the existing UK mobile carrier named O2. The model is available in two versions: o3 and o3-mini, and access is provided through an invitation-based testing program.
Capabilities
o3 demonstrates improved performance over the o1 model in complex tasks, including coding, mathematics, and science. On the ARC-AGI benchmark, which evaluates an AI's ability to handle new, challenging mathematical and logical problems, o3 attains three times the accuracy of its predecessor.
As reported by New Scientist, o3 also scored a record high of 75.7% on the Abstraction and Reasoning Corpus (ARC) developed by Google software engineer François Chollet, a prestigious AI reasoning test, but did not yet complete the requirements for the "Grand Prize" requiring 85% accuracy. Without the computing cost requirements imposing by the test, the model also achieves a new record high of 87.5%, while humans score, on average, 84%.
References
- ^ Knight, Will. "OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills" – via Wired.com.
- https://www.nytimes.com/2024/12/20/technology/openai-new-ai-math-science.html
- ^ #author.fullName}. "OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI". New Scientist. Retrieved 2024-12-22.
{{cite web}}
:|last=
has generic name (help)
OpenAI | |||||||||
---|---|---|---|---|---|---|---|---|---|
Products | |||||||||
Foundation models | |||||||||
People |
| ||||||||
Related | |||||||||