Supercharged AI
Posts
⚡️ GPT-4V's New Rival

⚡️ GPT-4V's New Rival

PLUS: Dropbox reveals AI-powered search feature

October 12, 2023

Good morning. As AI systems continue to evolve, large multimodal models are leading the way—allowing interaction using both images and text inputs. Current market leader GPT-4 Vision has some tough open-source competition—let’s see what LLaVA 1.5 brings to the table.

Today’s Highlights:

Dropbox's new AI-powered search feature
DeepMind's shocking financials and staff cut
Klarna's new AI-powered Shopping Lens tool

DEEP DIVE

LLaVA 1.5: Open-Source Alternative to GPT-4V

As LLMs transform our interaction with AI systems, the open-source community is stepping up its game. OpenAI’s GPT-4 Vision (GPT-4V) may be a frontrunner but its closed-source nature can restrict its application. Enter LLaVA 1.5—a potential blueprint for open-source alternatives to GPT-4V.

"LLaVA 1.5 outperforms other open-source LMMs on 11 out of 12 multimodal benchmarks."

— VentureBeat report

LLaVA 1.5 is a significant improvement over successful models like DALL-E 2, combining a CLIP visual encoder with Vicuna, a variant of Meta’s LLaMA model. With LLaVA, the encoding of visual features and generation of responses based on user instructions are tied together in an efficient package.

With advancements like high-resolution images and more data from user-shared ChatGPT conversations, LLaVA 1.5 has improved massively on its predecessor.

Leaves you curious about how, doesn't it?

Over 600,000 examples were used in training before a single-layer perceptron (MLP) bridged the language model and vision encoder.
The training was completed within a day on eight A100 GPUs—surprisingly costing just a few hundred dollars!

However, LLaVA 1.5 cannot be used for commercial purposes due to ChatGPT’s terms of use. While LLaVA 1.5 may not square off directly against GPT-4V, it delivers on some attractive fronts including cost-effectiveness and scalability of generating training data for visual instruction with LLMs.

Keep reading here.

PUNCHLINES

Power Surge: AI processing could consume as much electricity as the entire country of Ireland, warns a recent paper.

Tagging Content: Adobe and Microsoft lead coalition to source the origin of AI content.

DeepMind's Deep Cut: Financials reveal a 39% staff cut and a 40% decrease in profit.

Bugs, Be Gone: Anysphere raises $8M from OpenAI to build an AI-powered IDE.

A.I. Aging: New 'HistoAge' AI model uncovers brain aging and neurodegenerative disorders.

TLDR

Dropbox reveals revamped web interface and AI-powered Dash: Dropbox is introducing a redesigned web interface and an open beta of Dash, an AI-powered universal search feature. It is also expanding Dropbox AI, a feature that can summarize and answer queries about user content.

Character.AI provides group chat with multiple AI's: Digital communication startup, Character.AI, unveils a group chat feature, where users can invite AI characters like Albert Einstein or Zeus, to create a hybrid human-AI social environment. Currently, the feature is only available to the platform's paid subscribers.

L’Oréal inaugurates AI-powered 'Visionary Wall' in Paris: L’Oréal's new creative hub in Paris, Le Visionnaire, boasts an AI-driven forecasting tool—the Visionary Wall, for monitoring trends and inspiration. It uses theme-based waves and positions as a cultural barometer. Other features include a multimedia archive, a brand room, Bluetooth, and sound technology.

NuEnergy.ai patents responsible AI governance framework: Ottawa-based AI governance firm, NuEnergy.ai secures a patent on its Machine Trust Index (MTI), offering a standardized measurement for AI oversight. MTI, aiming to keep enterprise leadership informed about the trustworthiness of their AI tools, assesses various parameters, including privacy, fairness, bias, and security among others, catering to various industries.

Klarna debuts AI-powered image-search tool, Shopping Lens: Klarna introduces several new offerings, including an AI-driven image-search tool—Shopping Lens—that quickly identifies over 10 million items, matching them with 50 million in-app store offers, and a shoppable product vid feature. The list of additions also includes in-store product scanning, a cashback program, and express refunds.

TRENDING TOOLS

🧠 LLaVA: A large multimodal model for general-purpose visual and language understanding

📧 Airparser: Extract and export structured data from emails, PDFs, and documents in real-time

📚 Canonica AI: Generate Wikipedia-like articles with an AI tool

🛍️ Kua AI: Produce fast, on-brand, search-optimized e-commerce content across all channels

🔊 Murf: Use AI to generate voiceovers in multiple languages and voices

That’s all for today—if you have any questions or something interesting to share, please reply to this email. We’d love to hear from you!

P.S. If you want to sign up for the Supercharged newsletter or share it with a friend, you can find us here.

Reply

or to participate.