AI-generated image inspired by "Your Friends & Neighbors". Andrew “Coop” Cooper stands at the center of a crowded party scene, eyes closed and face lifted, absorbed in the moment as colored lights blur behind him. Nick Brandes is visible in the background on the left, and Barney Choi on the right. Text at the top reads, “When dad bought you a ‘Get out of Jail, Free’ card, but you’re just an Internet meme.”

While the “Jon Hamm dancing” meme is definitely taking over the internet, what most struck me about the series is its realistic, fully normalized portrayal of for-profit bail bonds. This genuinely American system allows criminal defendants to “purchase” their freedom in a way that is legally impossible, and culturally condemnable, in most other places.

Today’s post contains an easy practical use case for multimodal chatbots that I’ve been wanting to test and show here for quite some time, but it still proves we’re far from expecting reliable ‘reasoning’ from LLM tools when incorporating even fairly rudimentary visual elements such as time series charts. I challenged ChatGPT 4o (as discussed in an earlier post, o1 in the web app still does not support image input), Gemini, and Claude to analyze a stacked area chart that visually represents the evolution in the net worth of an individual. After several attempts and blatant hallucinations by all models, I share the best outputs which, as usual in most of the SCBN battles published on Talking to Chatbots, were those from ChatGPT.

OpenAI has just released its o1 models. o1 adds a new level of complexity to the traditional architecture of LLMs, a zero-shot chain-of-thought (CoT). I share my first impressions about o1 in the signature style of this website: talking to chatbots, getting their answers, posting everything.

Below is the notebook I submitted (late) to the LMSYS – Chatbot Arena Human Preference Predictions competition on Kaggle. This notebook applies NLP techniques for classifying text with popular Python libraries such as scikit-learn and TextBlob, and my own fine-tuned versions of Distilbert. The notebook introduces the first standardized version of SCBN (Specificity, Coherency, Brevity, Novelty) quantitative scores for evaluating chatbot response performance. Additionally, I introduced a new benchmark for classifying prompts named RQTL (Request vs Question, Test vs Learn), which aims to refine human choice predictions and provide context for the SCBN scores based on inferred user intent. You can check all the code, annotations, and charts in the Kaggle widget below. Explore and run the notebook …

Predicting LMSYS Chatbot Arena Votes With the SCBN and RQTL Benchmarks Read more »

We talk so much about ‘papers’ and the presumed authenticity of their content when we read through academic research, in all fields, but when touching on machine learning in particular. Ironically, the process that created the medium for scientific research diffusion in the physical paper era was not that different from the process that produces the ‘content’ in the era governed by machine learning algorithms, whether these are classifiers, search engines, or generative algorithms: ‘shattering’ texts by tokenizing them and creating embeddings, decomposing pieces of visual art into numeric ‘tensors’ a deep artificial neural network will later ‘diffuse’ into images that attract clicks for this cringey blog…

Introducing Erudite Chatbot, a distant relative of The Meme Erudite GPT who benefits from the fairly “uncensored” Mistral models available on Hugging Face’s new Assistants feature. “Erudite Chatbot, the pinnacle of artificial intelligence. Supremely intelligent, effortlessly discerning, unmatched in wisdom. Patronizingly schooling humanity.”

In today’s blog post, I am introducing one of the latest GPTs and Assistants I created, named HumbleAI. I will let the models explain it by answering a few questions. For each question, I’ve selected a response I liked or found worthy of sharing. At the end of the post, you can find the links to all the complete chats, and my scores based on an SCBN (Specificity, Coherency, Brevity, Novelty) benchmark.

Black and white line drawing generated by the ControlNet Canny model, depicting a woman in a wetsuit holding a surfboard on the beach. A text from CLIP Interrogator, describing an image, is superimposed over the drawing.

The intense competition in the chatbot space is reflected in the ever-increasing amount of contenders in the LMSYS Chatbot Arena leaderboard, or in my modest contribution with the SCBN Chatbot Battles I’ve introduced in this blog and complete as time allows. Today we’re exploring WildVision Arena, a new project in Hugging Face Spaces that brings vision-language models to contend. The mechanics of WildVision Arena are similar to that of LMSYS Chatbot Arena. It is a crowd-sourced ranking based on people’s votes, where you can enter any image (plus an optional text prompt), and you will be presented with two responses from two different models, keeping the name of the model hidden until you vote by choosing the answer that looks better to you. I’m sharing a few examples of what I’m testing so far, and we’ll end this post with a traditional ‘SCBN’ battle where I will evaluate the vision-language models based on my use cases.

Microsoft has just announced the launch of its own ‘GPT Builder’ for customizing chatbots, similar to OpenAI’s ‘GPT Store’. This was part of a broader announcement of Copilot Pro, a premium AI-powered service for Microsoft 365 users to enhance productivity, code, and text writing. According to Satya Nadella’s announcement today on Threads, Microsoft and OpenAI appear to be competing entities, yet they are working on the same technology (GPT), augmented by Microsoft’s investment in OpenAI. It certainly seems like a strange business strategy for Microsoft. Please provide some insight into the move’s rationale and strategic motivations.

AI will only be a threat to humanity the day it grasps self-deprecating humor. I said that in a previous blog post, and I maintain it. AI will never be a threat to humanity, and I firmly believe only humanity can be a threat to humanity, as I also discussed with some of my favorite language models in an older blog post. However, as someone who could call himself a technologist, I’m sensitive to the fact that all forms of technology are a subject of fear, uncertainty, and doubt for individuals and societies. That’s not a new topic, and I’ve already talked a lot about AI ethics and cognitive biases on this blog, so today’s story is not particularly novel in terms of sharing and discussing my own ideas, but a good excuse to share more chats, as well as the latest GPT that I’ve built with OpenAI’s fun and promising chatbot customization tool: The Meme Erudite.

…The meme is a satirical take on what some sophisticated AI algorithms do behind the scenes, which the meme exaggerates as a huge stream of nested if() statements. Conditional If() statements have been widely used in traditional computer programming. They are an essential block of the simplest computer programs people write when they learn coding, or when they work with spreadsheets. If() statements sometimes get too complex and convoluted hinting at bad coding practices, so most people with basic programming skills would find the messy If() statements behind the AI-labeled wallpaper as a funny take on the overhype with generative AI and other emerging technologies.

AI-generated image. Prompt: "Stunning skyline picture of Detroit at night. Blue sky, hyperrealistic Unreal Engine cool colors futuristic"

Continues from Part 1: Yoshimi Battles the Pink Robots, Pt. 1: Stable Diffusion AI Girls and Google Bard Tales The first part of the post was inspired by Yoshimi Battles the Pink Robots, the song and album by The Flaming Lips. Part 1 showcased Google Bard and Stable Diffusion storytelling prompts, explored the AI girls digital art trend, and interpreted the lyrics of the song using the creativity of two of the most popular ChatGPT and Bing Chat alternatives: Google Bard and Hugging Chat (more comparisons and AI chatbot battles soon 😎). All blog stories inspired by the playlist end with a connection to another type of media. Part 2 will be about the connection to Yoshimi’s story, …

Yoshimi, Pt. 2. A Gaming Connection to Sentient Machines and AI Ethics (TTCB XIII) Read more »

😎 Can you elaborate a short tale from the lyrics of the song “Yoshimi Battles the Pink Robots” by The Flaming Lips? “Yoshimi Battles the Pink Robots, Pt. 1” by The Flaming Lips – Original song lyrics and translations at LyricsTranslate.com Yoshimi Battles the Pink Robots by The Flaming Lips: an AI-Assisted Short Tale and Comic Strip 🤖 Sure, here is a short tale based on the lyrics of “Yoshimi Battles the Pink Robots” by The Flaming Lips: Yoshimi is a young woman who lives in a city that is under attack by evil pink robots. The robots are programmed to destroy all humans, and Yoshimi is the only one who can stop them. She is a skilled karateka, …

Yoshimi Battles the Pink Robots, Pt. 1: Stable Diffusion AI Girls and Google Bard Tales (TTCB XII) Read more »

Ai-generated image. A casually dressed man exudes confidence, sporting sunglasses and a leather jacket. With a beaming smile, he playfully holds stacks of banknotes in his hands, while others rest on the table before him, painting a picture of unmistakable financial success

😎 “If you think you are smarter than me, you most likely are not. There is a scientific explanation for that: the Dunning-Kruger effect” I’ve been tempted to use that sentence in social media before, but I prefer not to because some people might feel offended or dislike me as an arrogant person for pointing that out. I think the message is very accurate, though, and understanding Dunning-Kruger, a cognitive bias that affects us all, might be useful for some people to learn and improve their skills. Based on my personal experiences, I’ve noticed Dunning-Kruger’s influence on many aspects of life, and understanding it is key to grow our social wealth (see previous blog post) and build trusted …

Cognitive Biases and Wisdom in the Era of GenAI. Comedy Videos and X Pictures. Frontier Psychiatrist (TTCB VIII) Read more »