In a significant development, OpenAI has revealed major advancements to its flagship model, ChatGPT, with the integration of voice and image functionalities.
The voice functionality, enabled through a new text-to-speech model, allows users to "engage in a back-and-forth conversation with your assistant", according to OpenAI.
The updates will be rolled out to Plus and Enterprise users within the next two weeks, aiming to offer a "new, more intuitive type of interface".
Users can activate this feature via Settings → New Features on the mobile app.
This vocal interaction is facilitated by Whisper, OpenAI's open-source speech recognition system, and a range of voices developed in collaboration with professional voice actors.
Beyond voice capabilities, ChatGPT now offers image processing functionalities as well.
Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.Sound on ???? pic.twitter.com/3tuWzX0wtS
— OpenAI (@OpenAI) September 25, 2023
What is so special about image processing?
Users can "troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyse a complex graph for work-related data," said the company.
The image processing is driven by multimodal GPT-3.5 and GPT-4 models, accessible via a drawing tool on the mobile app.
Vision-based models also present new challenges, ranging from hallucinations about people to relying on the model’s interpretation of images in high-stakes domains.
Prior to broader deployment, ChatGPT tested the model with red teamers for risk in domains such as extremism and scientific proficiency, and a diverse set of alpha testers.
Phased deployment strategy
OpenAI has adopted a phased deployment strategy, emphasising the company's goal "to build AGI that is safe and beneficial".
The firm also highlighted potential risks, stating that voice technology opened doors to many creative and accessibility-focused applications, but also presented new challenges such as the potential for malicious actors to impersonate public figures or commit fraud.
In summary, OpenAI’s latest feature roll-out significantly broadens the capabilities of ChatGPT.
While initially available to Plus and Enterprise users, the company plans to extend these functionalities to a wider user base in the coming weeks.