Author:
CSO & Co-Founder
Reading time:
Open AI’s fourth iteration of its most popular natural language processor, GPT -4o, has, unsurprisingly, taken the world by storm. Sure, nothing can beat the first time the world witnessed GPT-3.5’s abilities. However, it had shortcomings and left much to be desired, especially in the accuracy department [1].
Fast-forward two years later, GPT-4o has gained immense popularity, and it’s safe to say that the OpenAI team has surpassed expectations and broken through all the limitations of their previous LLM. Still, the question remains: is there substance behind the hype, and are there actual use cases for GPT-4o? Keep reading to find out!
GPT-4o, Open AI’s latest LLM, was released on May 13th, 2024[2]. As the company’s current flagship model, it is available to users on a subscription basis, although users can access a free trial limited to 100 messages for every 3-hour window.
The “o” in GPT-4o stands for “omni.”[3] This is a Latin word for “every” or “all”. The term “Omni” is a hint at the NPL’s multimodality. Its predecessor could only interpret and output text; GPT-4o takes it up a notch and can now accept text, image, audio, and video input and output the same.
It’s worth noting that pre-release versions of ChatGPT could handle all the different modes (text, image, and the rest). However, these versions existed as single-purpose models, specifically:
GPT-4o combines all these different modes, thus giving users a less fragmented experience. With previous versions, you’d have to open Dall-e to convert text prompts into high-definition images. You’d also have to open another TTS tab to convert your YouTube video script into a captivating voice narration in text format. GPT-4o’s multimodality means you can do all that and much more in a single platform.
With extended capabilities, GPT-4o’s use cases emerged almost immediately after it launched in May. Only a few weeks later, and we’re already running out of fingers counting the number of developing uses for the platform.
That said, some of the most common use cases for Open AI’s latest LLM include:
Did you know the US loses 3 trillion dollars [4] annually to “bad data?” Bad data is inaccurate, incomplete, or irrelevant data that businesses and organizations can’t use. As such, they must spend a lot of money cleaning up and organizing this data to remove errors and discrepancies.
GPT-4o can analyze vast volumes of data in a matter of seconds and draw relevant insights. This GPT version can process spreadsheets, draw charts, create statistical models, and identify patterns and trends. These processes usually take weeks to months if done manually, but can now take no more than a minute thanks to the AI’s improved computational abilities. Data analysis using GPT-4o is also error-free and on point.
Below is a simple example of a prompt users can use for their data analysis:
Analyze the attached spreadsheet and provide a comprehensive and detailed analysis. Conduct both a technical and statistical analysis and highlight crucial insights. Also, generate a pie chart and line graph for the data analyzed. Use contrasting colors for each variable on the pie chart to make it easier to understand.
The latest version of ChaptGPT can now translate audio or conversations from one language to another in real time, breaking language barriers that normally prevent effective communication. Government agencies and NGOs can use real-time translation during meetings with international stakeholders and other persons of interest. This will undoubtedly strengthen efforts towards global collaboration and enhance partnerships between countries on a global scale.
While some people might argue that this new feature isn’t groundbreaking, it’s important to note that while real-time translation has been happening in international meetings, these translations weren’t as seamless, accurate, and instant.
One of the most trending use cases for GT-4o is role-playing characters under different scenarios. Most people are using this feature to prepare for job interviews. They ask the AI to play the role of an interviewer and ask them possible questions. Currently, the feature only works for text-to-audio prompts, but we could soon see text-to-video capabilities with upcoming updates.
That said, here’s a good example of a prompt for this feature:
Please play the role of an interviewer for a multi-national insurance agency based in the US. Ask possible interview questions and increase their complexity with each subsequent question. Also, rate each of my responses out of 10 and suggest ways I can improve and better my chances of landing the job.
You can also take advantage of this LLM for other role-playing scenarios like:
GPT-4o can simulate plenty of other scenarios; your only limit is your imagination. Another ground-breaking feature of this LLM worth noting is that it can pick up nuances in tone to deduce emotion. It can tell when you’re nervous, excited, sad, and agitated. The result is more empathetic and realistic interactions and extremely accurate role-playing.
Using computer vision pipelines, you can use the latest ChatGPT version to analyze images. There are two ways to approach this: First, you can use OpenAI’s integrated computer vision, which lets you scan an image with your device, or secondly, you can capture that image and upload it as input.
The AI can pick out elements and recognize patterns in images. Let’s say you’re taking a walk and come across an insect you can’t identify and would like to know more about it. Simply snap a photo of the insect, upload it to GPT-4o, and ask the AI what it is. In less than five seconds, it will summarize everything you need to know about the bug.
Alternatively, you can scan images of line graphs and bar charts and ask AI to highlight the most crucial insights and even compile a report on the same. Users have also utilized this feature to translate printed text from one language to another. For instance, they can use this AI to translate product manuals from Chinese to English or to generate alternate text or image captions.
A common question in the AI community is whether GPT-4o’s image analysis capability can extend to medicine and be used to analyze X-rays, MRIs, and other medical scans. Currently, the latest GPT version doesn’t directly give medical diagnoses from X-rays and other medical scans. This is a precautionary measure by OpenAI to avoid misinterpretations, misdiagnoses, and possible lawsuits.
It’s worth mentioning that although GPT-4o is great at image analysis, it does need some refinement in some instances. Sometimes, the AI wrongly identifies plants and mistakes some objects for similar-looking ones. But to be fair, it’s also hard for humans to interpret blurry or unclear photos. So, in the near future, we’re hoping image-enhancing AI will come to the rescue.
The latest GPT version not only creates images from text prompts but can also recreate images in different styles from existing images. When you enter a text prompt or upload an image, the AI will tap into its vast data resources to identify patterns and known elements and generate a response that aligns with your request.
For instance, you can take a selfie with your smartphone, upload it to GPT-4o, and then ask it to recreate the image in the Shoujo anime style, and it will. You can also utilize this AI to refine your photos and computer-generated images while editing. For instance, you can prompt the AI to:
Review the photo I’ve just uploaded and suggest the best filters and cropping details to make the main subject stand out and look more professional.
Chat GPT 3.5 could write mostly perfect code, but the latest version takes the AI’s ability to write code to the next level. Not only has GPT-4o widened the scope of compatible programming languages, but it has also introduced new capabilities and provides faster and more accurate responses.
Coders have made full use of the latest ChatGPT version to:
It’s common for people to spend hours in meetings but leave with nothing of value. This is usually because the most critical issues get drowned out by minor and off-topic discussions.
Thankfully, you can now use GPT-4o as a meeting facilitator to ensure you keep track of the crucial details and maintain the meeting’s direction. This version of ChatGPT is great at interpreting context, guiding conversions, and summarizing the highlights. That way, meetings can have clear objectives, and participants can leave with actual value and obligations.
Open AI has always been a champion for social inclusion, sometimes to the point of attracting criticism from within [5]. Visually impaired users can now use GPT-4o’s vision mode to engage with real-world environments. Rightfully called Be My Eye accessibility, the AI essentially becomes the user’s eyes and:
Users can also ask questions about their surroundings and get answers in real-time. The best part is that this feature is free for everyone with vision challenges. All they have to do is download the Be My Eye app from Google Play Store or the App Store for iPhone users.
The newest version of ChatGPT can also offer sound financial advice for both individuals and businesses. The simplest way to get advice from this LLM is by describing your financial situation and goals within a specific timeframe. You can ask questions about saving, budgeting, and investing and get instant advice.
If you want more precise financial advice, you can upload spreadsheets of your finances (income and expenses) and other financial documents. The AI will analyze these documents and suggest ways to streamline your financial plans, reduce unnecessary spending, etc.
What’s more, Open AI lets you integrate its latest version with various finance management apps. That way, the AI can let you keep track of your expenses, warn you when you’re going over your budget, and suggest areas where you can cut costs.
GPT-4o lets you create PowerPoint presentations for work, school, and hobbies with ease. If you want the easiest route, you can simply prompt the AI to create a presentation by giving it a topic and slide titles. You can highlight the areas you’d like covered and other specifics about the presentation for a more thorough presentation.
Alternatively, you can upload an article, research paper, or journal and ask the LLM to generate a presentation. YGPT-4o lets you specify the number of slides you want, the tone, and whether you want slides for specific subjects like theoretical frameworks, methodologies, results, and discussion. The AI can also generate graphs and charts to enrich your presentation.
Most people can’t distinguish between the different GPT-4 models. Besides GPT-4 and GPT-4o, Open AI also developed GPT-4 Turbo, further adding to the confusion. These GPT versions share a few fundamental similarities but are far from the same.
GPT-4 is the foundational basis for GPT-4 Turbo and GPT-4o. Open AI created this version to improve how the NPL understood user intentions and also to provide more truthful responses. The previous version, GPT-3.5, was more prone to hallucinations and reasoning mistakes. This was also the first GPT version to introduce multimodality and could accept both text and image prompts.
GPT-4 also had major performance improvements compared to the previous version. For instance, this version has over 1.7 trillion parameters [6], which is almost a thousand times more than GPT-3.5’s 175 billion. Another difference worth noting is that it was pre-trained with data from late 2023 (depending on the version), while its predecessor was pre-trained with data from up until 2021.
GPT-4 Turbo is the middle child of the three. This model improved upon its previous, specifically with faster processing, hence the name “turbo.” Other sectors where GPT-4 Turbo showed improvement are in its machine learning capabilities, and it also utilizes advanced algorithms to understand prompts better and respond more precisely.
Contrary to popular thought, GPT-4o is an improved variant of GPT-4 and not an entirely new version. The significant difference is that the former has multimodal capabilities and can work with text, audio, images, and video. In contrast, GPT-4 only works with text prompts and outputs responses in the same format.
GPT-4o easily outperforms its counterparts in terms of benchmark scores [7].
The table below summarizes the benchmark scores for notable evaluation tasks:
While the latest GPT version is certainly a huge leap in NPL and generative AI technology, it’s not perfect. Some of the most notable limitations include:
No doubt GPT-4o is a significant advancement in AI technology. The newest GPT version offers a wide range of capabilities, and its multi-modal functionality makes it versatile and useful for various applications. While it has considerable limitations, such as computational requirements, its benefits far outweigh its drawbacks.
As AI continues to evolve, we can expect even more impressive developments to enhance productivity and innovation across different sectors further.
References
[1] researchgate.net. Comparison-table-for-the-accuracy-of-GPT-35-and-GPT-4. URL:
https://tiny.pl/dp75f. Accessed on June 17, 2024
[2] openai.com. Hello GPT-4o. URL: https://openai.com/index/hello-gpt-4o/, Accessed on June 17, 2024
[3] openai.com. Hello GPT-4o. URL:
https://openai.com/index/hello-gpt-4o/, Accessed on June 17, 2024
[4] hbr.org. Bad Data Costs the U.S. $3 Trillion Per Year. URL: https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year. Accessed on June 17, 2024
[5] businessinsider.com. OpenAI Cofounder Responds to Elon Musk’s Criticism That ChatGPT is Too ‘Woke’: ‘We Made a Mistake’. URL:
https://businessinsider.com/news/openai-cofounder-responds-to-elon-musks-criticism-that-chatgpt-is-too-woke-we-made-a/8ekzk6v. Accessed on June 17, 2024
[6] semrush.com. What Is GPT-4? URL:
https://www.semrush.com/blog/gpt-4/, Accessed on June 17, 2024
[7] openai.com. Hello GPT-4o. URL:
https://openai.com/index/hello-gpt-4o/, Accessed on June 17, 2024
Category: