The biggest change to ChatGPT since the launch of ChatGPT.
OpenAI introduced a new model GPT-4o that can "see", "listen", and "speak". This model is going to be available on the free version of ChatGPT in the coming days. But what can you do with it?
Hi there,
We took a short break from our newsletter to figure things out and let life happen. We were also trying to understand what value we can actually add to you, our readers.
So here it is,
we will keep the newsletters simple and showcase what all is being made possible with AI by people around us.
In store this week
GPT-4o and what can you do with it?
What’s new with ChatGPT?
[New] Unlocked corner!
Now that we have that out of the way, let’s talk about the star of the show..
GPT-4o and what can you do with it?
OpenAI in their spring event launched a new model named GPT-4 ‘omni’. It can reason across audio, vision, and text in real time, basically “listen”, “see”, and “speak” at an almost human-like pace.
But ChatGPT (with GPT-4) was able to see photos. What’s new here?
Now ChatGPT can see things in real time! See these demos to understand what we mean by being able to “see”.
In this example of an Open AI team member solving linear equations with GPT-4o guiding each step of the way as GPT-4o can “see” what steps have already been done and what operations have been performed.
Take a look at another demo, here Sal Khan, the founder of Khan Academy, and his son are interacting with GPT-4o while solving Math questions as GPT-4o “sees” what the student is doing and guides him along the way.
But I could talk to ChatGPT using the Voice Mode, how is this new ability to “speak” or “listen” any different?
Voice Mode on ChatGPT is a tad bit slow. It takes GPT-3.5 around 2.8 seconds, and GPT-4 around 5.4 seconds to start responding. In Voice Mode, first your audio input is transcribed to text, then GPT-3.5 or GPT-4 takes in text, understands it, processes it and outputs text, and a third process converts that text back to audio.
GPT-4o directly observes tone, multiple speakers, or background noises, and can output laughter, singing, or express emotion which the Voice Mode can’t.
Also, you can interrupt it midway like humans sometimes do.
This launch has direct implications on a lot of incumbent applications, an example of which is Duolingo. Within minutes of launch GPT-4o’s launch, Duolingo’s value fell by around $340 million.
What all can GPT-4o do:
Translations in real time
Detecting faces, emotions and much more
Debugging code via voice commands
Ability to show different voices, tones and emotions
See more demos in action here and here.
What’s new with ChatGPT?
The GPT-4o model will become available to ChatGPT free users in the coming days. You would be able to speak with, and use images in the free version of ChatGPT.
ChatGPT free users will now have access to features such as:
Experiencing GPT-4 level intelligence with access to GPT-4o
Ability to browse the web and find real time answers
Upload files to ChatGPT
Ability to analyze data and create charts in ChatGPT
Upload photos and use ChatGPT on these photos
Access the GPT Store and use GPTs created by others
Build a more personalised experience on ChatGPT with Memory
Unlocked corner!
In this section we will get you something new and cool, every week.
Free AI Strategy Sessions
This week, we are opening up limited 20-min slots to connect 1:1 with you to answer your questions, understand your specific needs and explore how Generative AI can help you excel/grow.
If you want to brainstorm with us (Pratik & Tanmay) around AI, just sign up below.
We hope you liked what we have featured. As always, we cannot stress how eager we are to hear from you. Your remarks, insights, and feedback would be super helpful, so please tell us what you liked, or what you didn’t.
Please, do tell your friends about us.
Here’s wishing you all a happy week ahead :)
See you again, next week.