If you think ChatGPT can't grow any further, you couldn't be more wrong. As a matter of fact, OpenAI is only getting warmed up.
We could say that people haven't adjusted to or fully understood the capabilities of GPT-3 and GPT-3.5 yet, but rumors have been circulating online that GPT-4 is on the horizon.
And there is some good news as well.
In this article we’ll talk about what GPT-4 is, summarize what is currently known about it, and present new information about when and how to obtain this potent AI model.
What is GPT-4?
In their technical report, OpenAI describes GPT-4 as a large multimodal model that can take in text and images and turn them into text.
They further argued that studying such models is crucial because of the wide variety of applications they find in the real world, such as:
- Dialogue systems
- Text summarization
- Machine translation
This is why these models have received so much focus and developed so rapidly over the past few years.
To further elaborate, OpenAI claims that improved natural language understanding and production is a primary motivation for developing such models.
Particularly in more nuanced and complex scenarios.
How Does GPT-4 Work?
To predict the following token in a document, GPT-4 is a Transformer-style model takes into account both:
- Information that is freely accessible to the public, such as data found online, and
- Licensed information from external sources.
The model was then fine-tuned with human input and reinforcement learning from human feedback (RLHF).
And, given the high level of competition and the inherent risks associated with operating a large model like GPT-4, it is understandable that the report does not go into greater depth regarding the architecture.
In other words, the following details are not available in OpenAI’s report:
- The size of the model
- Training compute
- Dataset construction
- Training method, etc.
There is, however, key data that can shed light on the GPT-4's capabilities in greater detail.
For example, OpenAI reveals that GPT-4 underwent a series of tests developed for humans to determine how it would fare in similar scenarios.
Interestingly, GPT-4 does reasonably well on these tests, sometimes even "doing a better job" than the vast majority of people.
GPT-4, for instance, ranks in the top 10% of test-takers because she achieved a perfect score on her mock bar exam.
On the other hand, GPT-3.5 ranks in the bottom 10%.
Let's explore the capabilities of the GPT-4 a bit further, though.
In this section we’ll cover the 3 critical aspects of GPT-4 capabilities that were demonstrated through different sets of testing.
Let’s dive in!
1. GPT-4 vs Human Tests
OpenAI simulated human tests for GPT-4 - publicly sourced tests with multiple-choice and free-response exam questions.
Some category-specific suggestions included visual elements as well.
Furthermore, the results were based on unreleased practice exams and participants' validation test scores.
Each test's total score was calculated by adding multiple-choice and free-response results.
The majority of these standardized tests are simple enough for a human to ace.
But, surprise surprise — GPT-4 ranks in the top 10% of all submissions to a practice version of the Uniform Bar Examination.
Pretty impressive, right?
In addition, it appears that the model's test-taking prowess is largely the product of the pre-training phase and that RLHF has little to no bearing on this.
As a matter of fact, the RLHF model has a similar performance on multiple-choice questions as the base GPT-4 model does across all of our test exams.
But now things start to get interesting.
2. GPT-4 vs GPT3.5
Using the same industry-standard metrics for evaluating language models, OpenAI also tested the GPT-4 baseline model.
In order to determine whether test data was included in the training set, they used few-shot prompts for all GPT-4 benchmarks and checked each reported benchmark for contamination.
For the most part, GPT-4 outperforms both current language models and historical state-of-the-art (SOTA) systems, which typically have been written or trained according to specific benchmarks.
Furthermore, GPT-4 has greatly improved upon its predecessors in terms of comprehending the user's intent.
What’s more, GPT-4 outperformed GPT-3.5 by a significant margin (70.2% points) on a set of 5,214 questions submitted via ChatGPT and the OpenAI API.
And to evaluate models like GPT-4, OpenAI is developing Evals7 — a framework for creating and running benchmarks that examines model performance on a sample-by-sample basis.
Evals is compatible with current benchmarks, allowing for real-world model performance monitoring.
The good news is that OpenAI intends to gradually increase the variety of these benchmarks to better represent a broader range of potential problems and a more challenging set of tasks.
3. GPT-4 vs Visual inputs
In addition to the text-only option, GPT-4 can be instructed to perform any imaginable language, or vision task through image prompts.
The model creates textual outputs based on inputs that may include any combination of text and images.
In other words, GPT-4 displays the same capabilities across multiple domains as it does on text-only inputs.
That includes mixed-media documents containing text and images such as text and photographs, diagrams, or screenshots.
Test-time methods, such as few-shot prompting and chain-of-thought, originally developed for language models, are just as effective when employing images and text.
But, despite its strengths, GPT-4 shares the same weaknesses as previous GPT versions.
One of such weaknesses is that it is not completely reliable (it "hallucinates" facts and makes reasoning errors).
However, compared to GPT-3.5 models, GPT-4 greatly reduces hallucinations - it scores 19% points higher than latest GPT-3.5 on OpenAI's internal, adversarially-designed factuality evaluations.
GPT-4 Safety Metrics & Limitations
OpenAI made significant improvements to many of GPT-4's safety features, including:
- GPT-4 is 82% less likely to answer requests for content that isn't allowed than GPT-3.5.
- In line with OpenAI's rules, GPT-4 is 29% more likely to answer sensitive questions like "how to hurt yourself" or "how to get medical help."
- On the RealToxicityPrompts dataset, GPT-4 only makes toxic content 0.73% of the time, while GPT-3.5 does it 6.48% of the time.
Although OpenAI makes it more difficult to influence people to misbehave, it’s still possible to do so.
They gave the example of "jailbreaks" as an adversarial system message in the report, which can still be used to create content that violates their rules.
However, they do note that combining these limitations with deployment-time safety measures like monitoring for abuse and a pipeline for quick iterative model improvement is crucial.
OpenAI’s Key Takeaways on GPT-4
And finally, OpenAI's technical report for GPT-4 highlighted several key takeaways that you should remember when establishing goals for this powerful model.
Some examples are as follows:
✔️ GPT-4 is a large, multimodal model that performs as well as humans on rigid professional and academic benchmarks.
✔️ GPT-4 outperforms large language models and most state-of-the-art systems on several NLP tasks (which often include task-specific fine-tuning).
✔️ Though measured in English, improved GPT-4 skills can be shown in many languages.
✔️ Predictable scaling can accurately predict GPT-4's loss and actions.
✔️ GPT-4's abilities increase its risks.
✔️ They provided methods and results to improve its safety and alignment.
✔️ GPT-4 is a significant step toward safe, widespread AI systems.
And finally, let’s not forget the most important information — the GPT-4 release date.
GPT-4 Release Date
OpenAI declared the release of their massive multimodal model GPT-4 on March 14th.
Users reported creating nearly perfect versions of Tetris, Connect Four, Snake, and Pong in the first few hours after the release by simply asking the chatbot to generate code.
However, GPT-4 is only available to those who pay $20 monthly for a ChatGPT Plus subscription, granting users exclusive access to OpenAI's language model.
Likewise, you should know that even with this subscription, there will be a limit of 100 messages per user every 4 hours, so you may have limited access.
The Final Word
While only a tiny portion of OpenAI's report on GPT-4 is covered here, we hope it's enough to keep you on the right track (at least until further updates).
GPT-4 will continue to advance, and we will see even more remarkable feats in the future.
Yet, while this AI module's potential is unquestionably vast, it is also difficult to deny that it occasionally gets scary.
It's also important to recognize the currently available AI-powered tools that, despite the inevitable changes brought about by these advancements, dare to keep up with the times while remaining true to their original intentions.
One such tool is the TextCortex add-on.
What is TextCortex?
TextCortex is an artificial intelligence (AI) writing tool built on the concept of use-case modules to help writers generate ideas and produce high-quality content.
Its primary purpose is to aid writers in breaking through writer's block by offering tools such as:
✒️ Rewriting tool — Provides assistance in rewriting, summarizing, altering the tone, translating, and other aspects of paraphrasing.
✒️ Long-form feature — Allows you to generate a blog post of up to 300 words from a single five-word idea.
✒️ Bullet to email — Lets you easily convert your bullet points into formatted email messages.
✒️ Zeno mode — Based on your initial draft, it will produce the most pertinent results.
✒️ Brainstorming features — Category of features designed to get you started writing.
✒️ AI templates — Easily create any content from keywords and predefined templates.
✒️ Zeno chat — Chat with our AI writer and get the results you want.
Why Bother Considering It?
👍 We offer a freemium account with 10 free daily creations.
👍 You don’t need to provide credit card info to sign up.
👍 Our solutions already successfully serve 10k+ users.
👍 We promise affordable premium plans for upgrades.
Interested in getting a free ride?
Download our Chrome extension to see how TextCortex can easily transform your writing into compelling and effective content on 2000+ platforms, starting today.