- Supercharged AI
- Posts
- ⚡️ Is GPT-4 Overrated?
⚡️ Is GPT-4 Overrated?
PLUS: Visa to Invest $100M in Generative AI Companies
Good morning. GPT-4 has been crushing the competition with superior performance and efficiency, setting new benchmarks for LLMs. Let's dive in.
Today’s Highlights:
Sequoia Capital Shifts Focus to AI Applications
Visa to Invest $100M in Generative AI Companies
Microsoft CEO Voices Concerns on Google-Apple Pact
DEEP DIVE
GPT-4 Crushes Competition, Reinforces New Benchmark Suite
OpenAI’s evolutionary path from GPT-3 to GPT-4 from the paper “GPT-FATHOM”
Benchmarking AI models is crucial for progress in AI. The new GPT-Fathom Benchmark Suite is battling known shortcomings to facilitate a more refined and effective analysis.
Research has shown that in the world of LLMs, there's a need for consistency in parameters, prompting methods, and sensitivity towards prompts. In fact, this lack of consistency often results in difficult comparison and fruitful outcome reproduction.
What is GPT-Fathom?
Developed by researchers from ByteDance and the University of Illinois, GPT-Fathom is an open-source evaluation suite for LLMs based on OpenAI's Evals framework. This kit focuses on jamming out the irregularities and improving the uniformity of specifications in benchmarking LLMs.
The Shining GPT-4
If you are a frequent user of LLMs, this won’t come as a surprise. GPT-4, the model behind the paid version of ChatGPT, crushes the competition in most benchmarks—even in a recently published benchmark on hallucinations.
Main evaluation results of GPT-Fathom
Interestingly, the present best-performing open-source model, Llama 2, outperforms its predecessor in most benchmarks, especially on reasoning and comprehension tasks. Nonetheless, the study revealed some weaknesses, especially in Mathematics, Coding, and Multilingualism.
The findings also point out the challenges associated with training and optimizing LLMs. For instance, substantial improvement in one model performance area might lead to considerable degradation in another—an effect identified as 'trade-off' or 'seesaw'. This emphasizes the need for more in-depth research to understand and perhaps overcome these effects.
PUNCHLINES
Dress Code: Adidas and Moncler’s fashion collab brings AI into the world of high fashion with featured 'AI Adventurers'.
Policing Pixels: Unitary AI picks up $15M for its multimodal approach to video content moderation
Shuffle with Words: Spotify's potential new feature will generate playlists based on user prompts.
All Hands Off Deck: Helsinki welcomes the new era of maritime commuting with Callboats' autonomous water taxis.
TLDR
Sequoia Capital shifts focus to AI applications: Influenced by its successful investment in OpenAI, Sequoia Capital is now centering its strategy on AI applications, specifically in companies that interact with, rather than create, foundation models. It continues to show interest in other areas like healthcare and defense technology.
Microsoft CEO voices concerns on Google-Apple pact: Microsoft's CEO, Satya Nadella, criticizes the Apple-Google deal, considering it a significant obstacle for other search engines, such as Bing. Despite Bing's proven improvements, including recently integrated OpenAI’s chatbot, Nadella suggests the agreement reinforces Google's 90% market dominance.
Visa to Invest $100M in Generative AI Companies: Visa is allotting $100M for investments in generative AI companies through Visa Ventures. Investees would tackle issues in commerce, payments, and fintech, with varying investments from a few million to bigger amounts.
Meta uses public social media posts to train its AI: Public posts on Facebook and Instagram were fed into Meta's new AI assistant as part of its training. The AI, recently made available to the public, uses the information to perform tasks like creating digital stickers and editing photos with text instructions. No private posts or messages were used.
Watermarking AI images may not be reliable, warns study: University of Maryland researchers warn that watermarking AI-generated images to fight misinformation and deepfakes may not be a sound security measure. Algorithms can easily defeat this strategy, and there is a trade-off between high performance and robustness. The findings question tactics by Google, Amazon, and OpenAI, which include watermarking as part of their AI safety measures.
TRENDING TOOLS
🎨 Colors: Automatically gather and categorize customer feedback to enhance product development
📝 Visily: Create beautiful wireframes & prototypes seamlessly from various inputs
💻 HARPA AI: Streamline various tasks with Google Chrome extension merging different AI tools
💡 Braintrust: Develop AI rapidly without any guesswork
🖼️ Fill 3D: Render photorealistic staging images with accuracy and precision
That’s all for today—if you have any questions or something interesting to share, please reply to this email. We’d love to hear from you!
P.S. If you want to sign up for the Supercharged newsletter or share it with a friend, you can find us here.
Reply