Today’s post contains an easy practical use case for multimodal chatbots that I’ve been wanting to test and show here for quite some time, but it still proves we’re far from expecting reliable ‘reasoning’ from LLM tools when incorporating even fairly rudimentary visual elements such as time series charts. I challenged ChatGPT 4o (as discussed in an earlier post, o1 in the web app still does not support image input), Gemini, and Claude to analyze a stacked area chart that visually represents the evolution in the net worth of an individual. After several attempts and blatant hallucinations by all models, I share the best outputs which, as usual in most of the SCBN battles published on Talking to Chatbots, were those from ChatGPT.

Below is the notebook I submitted (late) to the LMSYS – Chatbot Arena Human Preference Predictions competition on Kaggle. This notebook applies NLP techniques for classifying text with popular Python libraries such as scikit-learn and TextBlob, and my own fine-tuned versions of Distilbert. The notebook introduces the first standardized version of SCBN (Specificity, Coherency, Brevity, Novelty) quantitative scores for evaluating chatbot response performance. Additionally, I introduced a new benchmark for classifying prompts named RQTL (Request vs Question, Test vs Learn), which aims to refine human choice predictions and provide context for the SCBN scores based on inferred user intent. You …

Predicting LMSYS Chatbot Arena Votes With the SCBN and RQTL Benchmarks Read more »

🌱 Themes: Philosophy and Personal Development 🤖 Chatbots: Bard, ChatGPT, HuggingChat ⚙️ Prompt engineering: Conceptual Questions, LLM Analysis 🔗 Related: [Polite Rap Songs: HUMBLE, by Kendrick Lamar] [SCBN chatbot battles intro] [ X.com post] Chatbot Battle: Is my Opinion Humble? Chatbot Rank (SCBN) Specificity Coherency Brevity Novelty Link Bard 🥇 Winner 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🤖 View Chat ChatGPT 🥈 Runner-up 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🕹️ View Chat HuggingChat 🥉 Contender 🤖🤖🕹️ 🤖🕹️🕹️ 🤖🕹️🕹️ 🤖🤖🕹️ View Chat ✍️ Sharing mixed answers from Bard, and ChatGPT, including two additional chats where I confronted each other to identify their own texts as human-written or likely LLM-written. 😎 Is my …

Is my Opinion Humble? SCBN Chatbot Battle Read more »

The SCBN (Specificity, Coherency, Brevity, Novelty) benchmark is a method to evaluate the output quality of language models and chatbots. SCBN provides a clear and systematic way to compare and assess chatbot responses based on four main metrics.

– Specificity (S): evaluates if a chatbot’s response is directly related to the user’s request. It checks how accurately the response addresses the prompt without deviating from the topic.
– Coherency (C): measures the logical structure of the response. It ensures that the information in the response is presented in a clear and organized manner, making it easy for the user to understand.

OpenAI has just released its o1 models. o1 adds a new level of complexity to the traditional architecture of LLMs, a zero-shot chain-of-thought (CoT). I share my first impressions about o1 in the signature style of this website: talking to chatbots, getting their answers, posting everything.

Entrenamiento de “tiny transformer” con JAX y XLA AI-generated image. Prompt: “Stunning skyline picture of Detroit at night. Blue sky, hyperrealistic Unreal Engine cool colors futuristic” The image is an anachronistic digital artwork portraying Miguel de Cervantes, the famed Spanish writer, in a modern setting. Cervantes is depicted in his traditional 17th-century attire, complete with a ruff collar and a dark, embroidered doublet. His iconic mustache and goatee are meticulously groomed, and his hair is neatly styled, with the grey showing his age. He is seated before a contemporary black computer monitor, which displays a manuscript page. His right hand …

Just Pics (Talking to Chatbots Image Gallery) Read more »

We talk so much about ‘papers’ and the presumed authenticity of their content when we read through academic research, in all fields, but when touching on machine learning in particular. Ironically, the process that created the medium for scientific research diffusion in the physical paper era was not that different from the process that produces the ‘content’ in the era governed by machine learning algorithms, whether these are classifiers, search engines, or generative algorithms: ‘shattering’ texts by tokenizing them and creating embeddings, decomposing pieces of visual art into numeric ‘tensors’ a deep artificial neural network will later ‘diffuse’ into images that attract clicks for this cringey blog…

Searching for information is the quintessential misconception about LLMs being helpful or improving other existing technology. In my opinion, web search is a more effective way to find information, simply because search engines give the user what they want faster, and in a format that fits the purpose much better than a chatting tool: multiple sources to scroll through in a purpose-built user interface, including filter options, configuration settings, listed elements, excerpts, tabulated results, whatever you get in that particular web search tool… not the “I’m here to assist you, let’s delve into the intricacies of the ever-evolving landscape of X…” followed by a long perfectly composed paragraph based on a probabilistic model you would typically get when you send a prompt to ChatGPT asking for factual information about a topic named ‘X’.

Oil painting style digital artwork generated with Stable Diffusion depicting a figure resembling Joan of Arc at the stake. The figure, dressed in silver, wears over-ear headphones and uses a laptop with a blank screen. She extends her right hand towards vibrant orange flames, evoking the historic scene of martyrdom with a modern twist, inspired by lyrics from The Smiths' "Bigmouth Strikes Again referencing Joan of Arc and a melting Walkman. [Alt text by ALT Text Artist GPT]

Disco music was famous for introducing technological advances in music production, such as synthesizers and electric pianos. I guess those were among the reasons why people saw it as lacking the “authenticity” of early musical genres. It’s hard to grasp the human concept of “authenticity” most people have in their psyche, I don’t believe there is any rationality in it, especially when it comes to discerning things that are deemed “authentic” versus things that are not. For me, this stems from a mysterious, probably instinctive, resistance in most people’s psyches to adopt new technologies or accept new scientific discoveries. It comes down to the internal confrontations between beliefs and reality:

Introducing Erudite Chatbot, a distant relative of The Meme Erudite GPT who benefits from the fairly “uncensored” Mistral models available on Hugging Face’s new Assistants feature. “Erudite Chatbot, the pinnacle of artificial intelligence. Supremely intelligent, effortlessly discerning, unmatched in wisdom. Patronizingly schooling humanity.”

In today’s blog post, I am introducing one of the latest GPTs and Assistants I created, named HumbleAI. I will let the models explain it by answering a few questions. For each question, I’ve selected a response I liked or found worthy of sharing. At the end of the post, you can find the links to all the complete chats, and my scores based on an SCBN (Specificity, Coherency, Brevity, Novelty) benchmark.

Read the complete chats with HumbleAI GPT on ChatGPT: first chat, second chat. 😎 Who is your author? 🤖 My author is David González Romero (David GR), an ICT Engineer and Business Consultant with over 15 years of experience in enterprise software solutions, consulting, business analytics, and data science. He is the creator of the website ‘Talking to Chatbots’ and the domain reddgr.com, where he discusses various topics with a focus on AI, technology, and finance. Through his platform, David GR explores the use of chatbots as tools for engaging discussions and offers resources for enthusiasts and professionals alike. He …

About the Author (by HumbleAI, February 2024) Read more »

Black and white line drawing generated by the ControlNet Canny model, depicting a woman in a wetsuit holding a surfboard on the beach. A text from CLIP Interrogator, describing an image, is superimposed over the drawing.

The intense competition in the chatbot space is reflected in the ever-increasing amount of contenders in the LMSYS Chatbot Arena leaderboard, or in my modest contribution with the SCBN Chatbot Battles I’ve introduced in this blog and complete as time allows. Today we’re exploring WildVision Arena, a new project in Hugging Face Spaces that brings vision-language models to contend. The mechanics of WildVision Arena are similar to that of LMSYS Chatbot Arena. It is a crowd-sourced ranking based on people’s votes, where you can enter any image (plus an optional text prompt), and you will be presented with two responses from two different models, keeping the name of the model hidden until you vote by choosing the answer that looks better to you. I’m sharing a few examples of what I’m testing so far, and we’ll end this post with a traditional ‘SCBN’ battle where I will evaluate the vision-language models based on my use cases.

Microsoft has just announced the launch of its own ‘GPT Builder’ for customizing chatbots, similar to OpenAI’s ‘GPT Store’. This was part of a broader announcement of Copilot Pro, a premium AI-powered service for Microsoft 365 users to enhance productivity, code, and text writing. According to Satya Nadella’s announcement today on Threads, Microsoft and OpenAI appear to be competing entities, yet they are working on the same technology (GPT), augmented by Microsoft’s investment in OpenAI. It certainly seems like a strange business strategy for Microsoft. Please provide some insight into the move’s rationale and strategic motivations.