ChatGPT Learns to See and Hear

Not content with just enhancing its capabilities, OpenAI is introducing a brand new version that expands how you can prompt this AI-powered bot. Say goodbye to typing sentences into a text box, because now you can speak your questions aloud or even upload a picture! These exciting features will be available to paying ChatGPT users in the next two weeks, with the rest of us getting our hands on them shortly after.

OpenAI is shaking up the way we interact with ChatGPT!

OpenAI has introduced voice chat to ChatGPT, allowing users to simply speak their questions and receive spoken answers from the AI bot. Similar to Alexa or Google Assistant, this new feature converts speech to text and then feeds it to an enhanced language model, providing improved and more accurate answers. This upgrade is a sign of the times and indicative of how most virtual assistants will rely on LLMs in the near future. With OpenAI ahead of the game, it’s an exciting time for AI communication!

Text-to-Speech Features

OpenAI’s Whisper model plays a significant role in the speech-to-text conversion process. Moreover, OpenAI is introducing a new text-to-speech model that promises to generate “human-like audio from just text and a short snippet of speech.” With this update, users will have the ability to choose from five different voices for ChatGPT. However, OpenAI sees even greater potential in this model beyond voice selection.

Collaborating with Spotify, OpenAI aims to translate podcasts into different languages while preserving the podcaster’s voice quality. This opens up a world of fascinating possibilities for synthetic voices, with OpenAI positioned to play a major part in the industry’s growth and innovation.

Innovative Technology with Potential Risks?

OpenAI acknowledges the potential risks that come with these capabilities, including the risk of impersonating public figures or committing fraud. As a result, the model will be strictly controlled and limited to specific use cases and partnerships.

The image search function in ChatGPT is similar to Google Lens, providing users with the ability to snap a photo and receive a response based on their query. The app’s drawing tool, along with voice and text commands, lets users refine their question as they go in a convenient back-and-forth exchange. This feature is innovative and similar to Google’s multimodal search, providing a more comprehensive and accurate search experience.

OpenAI recognizes that image search functionality in ChatGPT can give rise to certain concerns, particularly regarding privacy and the risk of making incorrect or unauthorized statements about individuals. To address these issues, OpenAI has purposely limited ChatGPT’s capability to directly analyze and provide statements about people. Consequently, the futuristic idea of AI being able to identify someone with a simple prompt is not on the immediate horizon. This cautious approach appears to be a responsible decision.

Even after almost a year since its initial launch, OpenAI is still grappling with the challenge of enhancing ChatGPT’s capabilities while mitigating the emergence of new issues and drawbacks. As more users leverage voice control and image search, and as ChatGPT evolves into a comprehensive and highly useful virtual assistant, it will become increasingly challenging to maintain strict boundaries and control.

