OpenAI claims GPT-4 will beat 90% of you in an exam

OpenAI on Tuesday announced the qualified arrival of GPT-4, its latest milestone in the making of call-and-response deep learning models and one that can seemingly outperform its fleshy creators in important exams.

According to OpenAI, the model exhibits "human-level performance on various professional and academic benchmarks." GPT-4 can pass a simulated bar exam in the top 10 percent of test takers, whereas its predecessor, GPT-3.5 (the basis of ChatGPT) scored around the bottom 10 percent.

GPT-4 also performed well on various other exams, like SAT Math (700 out of 800). It's not universally capable, however, scoring only 2 on the AP English Language and Composition (14th to 44th percentile).

GPT-4 is a large multimodal model, as opposed to a large language model. It is designed for accepting queries via text and image inputs, with answers returned in text. It's being made available initially via the waitlisted GPT-4 API and to ChatGPT Plus subscribers in a text-only capacity. Image-based input is still being refined.

Despite the addition of a visual input mechanism, OpenAI is not being open about or providing visibility into the making of its model. The upstart has chosen not to release details about its size, how it was trained, nor what data went into the process.

"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar," the company said in its technical paper [PDF].

In a live stream on YouTube, Greg Brockman, president and co-founder of OpenAI, demonstrated the difference between GPT-4 and GPT-3.5 by asking the models to summarize the OpenAI GPT-4 blog post in a single sentence where every word begins with the letter "G."

GPT-3.5 simply didn't try. GPT 4 returned "GPT-4 generates ground-breaking, grandiose gains, greatly galvanizing generalized AI goals." And when Brockman told the model that the inclusion of "AI" in the sentence doesn't count, GPT-4 revised its response in another G-laden sentence without "AI" in it.

He then went on to have GPT-4 generate the Python code for a Discord bot. More impressively, he took a picture of a hand-drawn mockup of a jokes website, sent the image to Discord, and associated GPT-4 model responded with HTML and JavaScript code to realize the mockup site.

Finally, Brockman set up GPT-4 to analyze 16 pages of US tax code to return the standard deduction for a couple, Alice and Bob, with specific financial circumstances. OpenAI's model responded with the correct answer, along with an explanation of the calculations involved.

Beyond better reasoning, evident in its improved test scores, GPT-4 is intended to be more collaborative (iterating as directed to improve previous output), better able to handle lots of text (analyzing or outputting novella-length chunks of around 25,000 words), and of accepting image-based input (for object recognition, though that capability isn't yet publicly available).

What's more, GPT-4, according to OpenAI, should be less likely to go off the rails than its predecessors.

"We've spent six months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails," the org says.

People may already be familiar with this "far from perfect" level of safety from the rocky debut of Microsoft Bing's question answering capabilities, which uses GPT-4 as the basis for its Prometheus model.

OpenAI acknowledges that GPT-4 "hallucinates facts and makes reasoning errors" like its ancestors, but the org insists the model does so to a lesser extent.

"While still a real issue, GPT-4 significantly reduces hallucinations relative to previous models (which have themselves been improving with each iteration)," the company explains. "GPT-4 scores 40 percent higher than our latest GPT-3.5 on our internal adversarial factuality evaluations."

Pricing for GPT-4 is $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens, where a token is about four characters. There's also a default rate limit of 40,000 tokens per minute and 200 requests per minute.

Despite ongoing concern about AI risks, there's a rush to bring AI models to market. On the same day GPT-4 arrived, Anthropic, a startup formed by former OpenAI employees, introduced its own chat-based helper called Claude for handling text summarization and generation, search, Q&A, coding, and more. That's also available via a limited preview.

And Google, worried about falling behind in the marketing of AP models, rolled out an API called PaLM for interacting with various large language models and a prototyping environment called MakerSuite.

A few weeks earlier, Facebook launched its LLaMA large language model, which has now been turned into the Alpaca model by Stanford researchers, which The Register will ve covering in more detail later.

"There's still a lot of work to do, and we look forward to improving this model through the collective efforts of the community building on top of, exploring, and contributing to the model," OpenAI concludes. ®

About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Mar 21
Russian developers blocked from contributing to FOSS tools

Opinion The war in Ukraine is bad and wrong... but does blocking these contributions help Ukraine?

Mar 21
Hospital to test AI 'copilot' for doctors that jots notes on patient care

The hope? Reducing piles of admin for clinicians freeing them up for medical work

Mar 21
Edinburgh Uni finds extra £8M for vendors after troubled ERP go-live

Staff and suppliers paid late last year, new requirements lead to contract price hike

Mar 21
Curl, the URL code that can, marks 25 years of transfers

Utility that began as a personal project found its way into billions of devices

Mar 21
Baidu's ERNIE chatbot has nothing to say about Xi Jinping

Bot also botches some requests, but is about to be baked into cloud services anyway

Mar 21
Stanford sends 'hallucinating' Alpaca AI model out to pasture over safety, cost

Meta-made small language model can produce misinformation, toxic text

Mar 20
Microsoft to give more than microsecond's thought about your Windows 11 needs

Concerns over consistent dialog boxes, pinning, default apps mulled