Photo source: Beebom
OpenAI has launched its Advanced Voice Mode (AVM) for a wider range of paying ChatGPT users. The AVM enhances conversational interactions and is initially available to Plus and Teams subscribers, with Enterprise and Edu customers gaining access next week.
The AVM now features a redesigned blue animated sphere and introduces five new nature-inspired voices: Arbor, Maple, Sol, Spruce, and Vale, bringing the total to nine.
Notably absent is the Sky voice, which was removed following controversy over its similarity to Scarlett Johansson’s voice from the film “Her.” Moreover, the video and screen-sharing capabilities introduced earlier are not part of this rollout, and OpenAI has not provided a timeline for their availability.
OpenAI claims improvements in accent recognition and smoother conversations since the alpha testing phase. Customisation options like Custom Instructions and Memory are also being expanded within the AVM. However, the feature is not yet available in several regions, including the EU and U.K.
The voice capabilities of the new GPT-4o model are strikingly human-like. OpenAI claims that the model can also perceive and react to users’ emotional cues, which enhances the interaction experience.
Using the advanced voice mode has proven to be an entertaining experience, as the ability to interrupt responses adds a sense of control and reduces the frustration often associated with virtual assistants. The voice’s intonation is nearly perfect, and its ability to pause thoughtfully or laugh at its own jokes creates an immersive experience.
To utilise the new feature in the ChatGPT app, make sure you have the latest version installed and that OpenAI has enabled access for your device. After launching the app, you will receive a notification when the feature becomes available.
To start, create a new chat by swiping right or tapping the two-line icon in the top left corner. Look for the sound wave icon next to the message field, tap it, and ensure your sound settings are activated. Once you begin speaking, a brief sound will indicate that the app is ready to respond. While OpenAI has made improvements in accent recognition and conversation speed, users may experience occasional audio interruptions.
The AVM allows for interactive experiences such as storytelling or language practice, but it does come with limitations. Users on Plus and Team subscription tiers will encounter a daily usage cap for this feature. Transitioning between text and voice modes is not possible, as stated on OpenAI’s help page. Therefore, while users can enjoy various functionalities of the voice mode, they should be aware of these restrictions to manage their expectations effectively.
The accuracy of responses in the AVM still needs improvement. Some replies were helpful, but overall conversations were often lacking, with the chatbot occasionally stopping or responding late.
Additionally, the detail in responses did not match that of the text version. For example, when I asked for nearby restaurant recommendations, the chatbot veered off-topic before suggesting a certain place, requiring multiple redirections from me. Although ChatGPT’s voice capabilities offer a more engaging experience than assistants like Siri, they still fall short of the effectiveness found in text interactions.