Vision Models | Talking to Chatbots

Guess Age and Risk Profile From Net Worth Chart – SCBN Battle

Tagged with AI, chatbots, crypto, data science, finance, Generative AI, investing, multimodal AI, scbn, Vision Models, wealth Last updated on January 30, 2025

Today’s post contains an easy practical use case for multimodal chatbots that I’ve been wanting to test and show here for quite some time, but it still proves we’re far from expecting reliable ‘reasoning’ from LLM tools when incorporating even fairly rudimentary visual elements such as time series charts. I challenged ChatGPT 4o (as discussed in an earlier post, o1 in the web app still does not support image input), Gemini, and Claude to analyze a stacked area chart that visually represents the evolution in the net worth of an individual. After several attempts and blatant hallucinations by all models, I share the best outputs which, as usual in most of the SCBN battles published on Talking to Chatbots, were those from ChatGPT.

Can’t Stop Partying: Hedonism, Love, and the Enshittification of Everything (TTCB XV)

Tagged with AI, chatbots, existentialism, Internet, music, philosophy, playlist, polite rap songs, Search engines, social media, Vision Models Last updated on February 18, 2025

We talk so much about ‘papers’ and the presumed authenticity of their content when we read through academic research, in all fields, but when touching on machine learning in particular. Ironically, the process that created the medium for scientific research diffusion in the physical paper era was not that different from the process that produces the ‘content’ in the era governed by machine learning algorithms, whether these are classifiers, search engines, or generative algorithms: ‘shattering’ texts by tokenizing them and creating embeddings, decomposing pieces of visual art into numeric ‘tensors’ a deep artificial neural network will later ‘diffuse’ into images that attract clicks for this cringey blog…

WildVision Arena and the Battle of Multimodal AI: We Are Not the Same

Black and white line drawing generated by the ControlNet Canny model, depicting a woman in a wetsuit holding a surfboard on the beach. A text from CLIP Interrogator, describing an image, is superimposed over the drawing.

Tagged with AI, ai imaging, chatbots, existentialism, Generative AI, memes, multimodal AI, Vision Models Last updated on March 27, 2024

The intense competition in the chatbot space is reflected in the ever-increasing amount of contenders in the LMSYS Chatbot Arena leaderboard, or in my modest contribution with the SCBN Chatbot Battles I’ve introduced in this blog and complete as time allows. Today we’re exploring WildVision Arena, a new project in Hugging Face Spaces that brings vision-language models to contend. The mechanics of WildVision Arena are similar to that of LMSYS Chatbot Arena. It is a crowd-sourced ranking based on people’s votes, where you can enter any image (plus an optional text prompt), and you will be presented with two responses from two different models, keeping the name of the model hidden until you vote by choosing the answer that looks better to you. I’m sharing a few examples of what I’m testing so far, and we’ll end this post with a traditional ‘SCBN’ battle where I will evaluate the vision-language models based on my use cases.

Tag: Vision Models

Guess Age and Risk Profile From Net Worth Chart – SCBN Battle

Can’t Stop Partying: Hedonism, Love, and the Enshittification of Everything (TTCB XV)

WildVision Arena and the Battle of Multimodal AI: We Are Not the Same