- Supercharged AI
- Posts
- ⚡️ ChatGPT Goes Multimodal
⚡️ ChatGPT Goes Multimodal
PLUS: Amazon's $4 billion gamble on Anthropic
It’s Tuesday. OpenAI is supercharging ChatGPT by introducing voice and image capabilities. What does that mean for user experience and privacy? Let’s find out.
Today's Highlights:
- Microsoft's unconventional power source for data centers 
- Elicit’s new AI tool "research assistant" takes on scientific literature 
- Amazon's $4 billion gamble on Anthropic 
DEEP DIVE
OpenAI Introduces Voice and Image to ChatGPT

OpenAI is all set to ramp up ChatGPT with two new significant updates: voice command and image-based queries. This upgrade means users will be able to engage in voice-based conversations on Android and iOS devices and prompt image-related discussions on all platforms.
Let's delve into these new features:
Voice Conversations: OpenAI has come up with five distinctive voices by collaborating with professional actors and using a new text-to-speech model. This model is designed to produce human-like audio from a simple text and a sample speech. The Whisper speech recognition system transcribes the user's spoken words into text.
Image-Based Queries: GPT-3.5 and GPT-4 power up ChatGPT's image recognition system, allowing ChatGPT to apply language reasoning skills to a variety of images ranging from photographs, and screenshots to documents laced with text and images. Whether it's meal planning, troubleshooting, or data analysis, all you need to do is share an image with the model.

Issues of potential harm and misuse are very much on OpenAI's radar. Users will have limitations on mimicking public figure voices, and ChatGPT will have restrictions on analyzing and commenting on individuals in images to maintain privacy.
Though starting with Plus and Enterprise users, OpenAI plans to make these voice and image features accessible to a wider user base soon, bringing the full spectrum of human conversation right to your fingertips.
PUNCHLINES
PowerPoint or PowerPlant? Microsoft looking to power its data centers with nuclear reactors.
Fizzed Out: Coca-Cola's AI-created Y3000 soda fails to draw in consumers with its bland flavor.
Smarter Reviews: Elicit’s new tool “research assistant” to automate the labor of scientific literature review.
AI Overdrive: Effective use of GPT-4 empowers enterprise workers with an impressive 40% performance boost, says Harvard-led study.
Snap's New BFF: Snap partners with Microsoft on ads in its ‘My AI’ chatbot feature.
TLDR
Spotify introduces AI-powered podcast translations: Spotify is using AI to clone the voices of its top podcasters and translate content into other languages. The feature, providing natural-sounding podcasts to non-English speakers, is now available for a select number of Spanish podcasts. Expansion plans include French and German translations.
Getty Images launches copyright-safe AI platform: Getty Images introduces Generative AI by Getty Images, a platform trained on Getty's library to create copyright-free images for commercial use. Blocking prompts for potentially problematic content such as deepfakes, the platform uses Nvidia's Picasso technology. Generative AI offers indemnification and unique API access while maintaining copyright protection for original creators.
Amazon's $4 billion bet signals shift in cloud industry: Amazon Web Services (AWS) invests up to $4 billion in AI startup Anthropic, echoing a strategy previously used by its rivals Microsoft Azure and Google Cloud. This shift in strategy, from winning business without equity investments to buying into firms, underlines the rising turbulence in the cloud sector.
LLMs excel in data compression, DeepMind finds: Google's DeepMind researchers used LLMs to perform arithmetic coding, a type of lossless data compression. They proved LLMs outperform classical compression algorithms, achieving impressive compression rates even on image and audio data, a task typically done by domain-specific algorithms. However, due to their size and speed differences, they remain unfeasible for practical use.
TRENDING TOOLS
🤖 Baron AI: Deploy ChatGPT natively on any platform
💾 CloseVector: A versatile vector database for easy integration and scalability
🗣 VOMO: Convert spoken words into notes and generate slide decks with GPT-4
👥 Prolific: Gain access to high-quality human data for improving LLMs
🎥 Tavus: Create custom, one-on-one connections with 1000s with one video
That’s all for today—if you have any questions or something interesting to share, please reply to this email. We’d love to hear from you!
P.S. If you want to sign up for the Supercharged newsletter or share it with a friend, you can find us here.

Reply