⚡️ Multimodal Beauty

PLUS: NVIDIA and Foxconn to create AI factories

Happy Friday. Adept has open-sourced Fuyu-8B, a multimodal AI model that can analyze images and understand text. The GPT-4V competitor streamlines the process of having digital agents understand and interpret visual and textual content. Let’s get right into it.

Today’s Highlights:

  • YouTube developing AI tool to replicate musicians' voices

  • NVIDIA and Foxconn to create AI factories

  • Thrive Capital to Lead Purchase of OpenAI Employee Shares

DEEP DIVE

Adept Unveils Open-Source Model Fuyu-8B

Adept has open-sourced Fuyu-8B, its multimodal GPT-4V competitor, now available through HuggingFace. The model piques interest with its improved OCR capabilities, allowing it to understand charts, documents, and diagrams effortlessly.

Fuyu-8B stands out for its convenience and flexibility to digital AI agents, designed to handle arbitrary image resolutions, queries related to graphs, diagrams, and screen image localization. Did we mention fast? It delivers responses for large images within an impressive 100 milliseconds.

Though tailored for specific applications, Fuyu-8B also competently navigates typical image understanding benchmarks, like visual question-answering and natural-image-captioning.

The secret lies in the architecture: A vanilla decoder-only transformer instead of a complex structure, thus skipping the need for a separate image encoder. This simplicity allows the model to handle image resolutions of any size.

Fuyu isn't all talk—Adept conducted rigorous evaluations on various prominent image-understanding datasets, where Fuyu-8B performed commendably. Notably, despite boasting significantly fewer parameters, it surpassed models like QWEN-VL and PALM-e-12B on multiple metrics.

However, it's not all about outperforming benchmarks. Adept emphasizes Fuyu-8B and its variants are designed to offer a simpler, highly effective choice in multimodal models.

PUNCHLINES

Light it Up: Oxford's new AI chip harnesses light to process data at unmatched speeds and efficiency.

Power Trip: IBM's brain-inspired chip sidesteps the need to access external memory, saving energy and boosting computing power.

Predictable speeches, unpredictable rates: AI helps forecast Bank of England policymakers' interest rate decisions by analyzing speeches.

Lyric-less: Major music publishers are suing Anthropic's chatbot, Claude, for copyright violations over song lyrics.

TLDR

YouTube developing AI tool to replicate musicians' voices: YouTube is currently developing an AI-powered tool that will allow users to replicate the voices of famous musicians. The video giant has reportedly approached music companies to obtain the rights to train its new AI tool on songs from their music catalogs, but agreements are yet to be reached.

Adobe's new Photoshop and Premiere Elements: Adobe unveils Photoshop Elements and Premiere Elements 2024, infused with AI-powered functionalities in-built presets for color, tone matching, and auto-selection. The software also introduces modern aesthetics for a refreshing experience and offers one-click Quick Actions for common edits.

NVIDIA and Foxconn to create AI factories: NVIDIA partners with Foxconn to build AI factories to drive AI applications including self-driving cars and robotics. These factories will pave the way for digital manufacturing and the development of autonomous electric vehicles, robotic systems, and generative AI services.

Thrive Capital to Lead Purchase of OpenAI Employee Shares: Thrive Capital is reportedly leading a deal to purchase OpenAI employee shares valuing the company over $80 billion—reflecting phenomenal growth from its last valuation of $27 billion, just 6 months ago. This valuation is 60 times OpenAI's estimated annualized revenue, accentuating investor confidence inspired by its increasingly successful LLMs.

TRENDING TOOLS

📑 Compliance.sh: Simplify and automate your compliance process using AI

💻 Softr AI APP Generator: Create web apps effortlessly from prompts

🔊 PlayHT 2.0 Turbo: Experience fast conversational AI text-to-speech model

🔍 YouRetriever: Efficiently access You’s Search API that's compatible with LLM chains

🌐 AI quick start by BrowserBear: Launch AI web scraping in seconds with no coding needed

That’s all for today—if you have any questions or something interesting to share, please reply to this email. We’d love to hear from you!

P.S. If you want to sign up for the Supercharged newsletter or share it with a friend, you can find us here.

Reply

or to participate.