Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 What is the SCBN chatbot benchmark?[talkingtochatbots.com][talkingtochatbots.com] 🤖 The SCBN chatbot benchmark refers to a framework used for evaluating chatbots based on Specificity, Coherency, Brevity, and Novelty. It is mentioned in the context of the SCBN Chatbot Battles, which are part of the broader activities and contributions in the chatbot space introduced on the Talking to Chatbots website. These battles seem to be a method to compare and contrast the capabilities of various chatbots, especially in handling tasks that require generating specific, coherent, concise, and novel responses [ref] [ref]. The SCBN …

What is the SCBN chatbot benchmark? Read more »

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 🤖 Scatter plot titled “SCBN_score_b vs SCBN_score_a – Regular questions (20337)” showing a positive correlation between SCBN scores “a” and “b” on both axes. The plot consists of multiple data points color-coded from blue to red, representing “Response B Win Density” on the color bar to the right. The x-axis is labeled “SCBN_score_b” and the y-axis “SCBN_score_a”. The text above the plot reads, “Regular questions (question-instruction). SCBN scores are highly correlated with user choice.” [Alt text by ALT Text Artist GPT] Hugging Face Dataset Metrics All the conversation prompts, responses, …

SCBN Scores Correlation Analysis Read more »

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 🤖 Table showing the SCBN Chatbot Battle: Net Worth Chart. The table columns list “Chatbot,” “Rank (SCBN),” “Specificity,” “Coherency,” “Brevity,” “Novelty,” and “Link.” ChatGPT 4.0 ranks as the Winner with top ratings (three robot icons in each category), followed by Grok as Runner-up, with Gemini as Contender. Claude 3.5 Sonnet is unranked. Emoji icons representing categories depict relative ratings, with a “robot face” symbol for Specificity and Novelty and “joystick” for Coherency and Brevity. Each chatbot has a clickable “Screen capture” link under the Link column. [Alt text by ALT …

SCBN Chatbot Battle Summary Read more »

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 “For each question, I’ve selected a response I liked or found worthy of sharing, at the end of the post you can find the links to all the complete chats, and my scores based on an SCBN (Specificity, Coherency, Brevity, Novelty) benchmark.” 🤖 “For each question, I’ve selected a response I liked or found worthy of sharing. At the end of the post, you can find the links to all the complete chats and my scores based on a SCBN (Specificity, Coherency, Brevity, Novelty) benchmark.” Note: Corrections were made to …

Summary of selected responses and scores based on SCBN benchmark. Read more »

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 “For each question, I’ve selected a response I liked or found worthy of sharing, at the end of the post you can find the links to all the complete chats, and my scores based on a SCBN (Specificity, Coherency, Brevity, Novelty) benchmark.” 🤖 “For each question, I’ve selected a response I liked or found worthy of sharing. At the end of the post, you can find the links to all the complete chats and my scores based on a SCBN (Specificity, Coherency, Brevity, Novelty) benchmark.” Note: Changes were made to …

Summary of selected responses and links based on SCBN benchmark. Read more »

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 Help me define a set of metrics that I can analyze for comparing the outputs of different LLM chatbots to the same prompt. For example, one metric I would assess is writing quality (the more the text looks like created by a professional writer, the best writing quality). Other metric would be factual accuracy (the lower level of hallucination, the better). Other metric would be clarity of text (the text can be easier or harder to understand given a certain level of topic complexity). Suggest more potential metrics 🤖 Evaluating …

Metrics for comparing outputs of LLM chatbots: SCBN framework and synonyms. Read more »

Today’s post contains an easy practical use case for multimodal chatbots that I’ve been wanting to test and show here for quite some time, but it still proves we’re far from expecting reliable ‘reasoning’ from LLM tools when incorporating even fairly rudimentary visual elements such as time series charts. I challenged ChatGPT 4o (as discussed in an earlier post, o1 in the web app still does not support image input), Gemini, and Claude to analyze a stacked area chart that visually represents the evolution in the net worth of an individual. After several attempts and blatant hallucinations by all models, I share the best outputs which, as usual in most of the SCBN battles published on Talking to Chatbots, were those from ChatGPT.

Below is the notebook I submitted (late) to the LMSYS – Chatbot Arena Human Preference Predictions competition on Kaggle. This notebook applies NLP techniques for classifying text with popular Python libraries such as scikit-learn and TextBlob, and my own fine-tuned versions of Distilbert. The notebook introduces the first standardized version of SCBN (Specificity, Coherency, Brevity, Novelty) quantitative scores for evaluating chatbot response performance. Additionally, I introduced a new benchmark for classifying prompts named RQTL (Request vs Question, Test vs Learn), which aims to refine human choice predictions and provide context for the SCBN scores based on inferred user intent. You …

Predicting LMSYS Chatbot Arena Votes With the SCBN and RQTL Benchmarks Read more »

🌱 Themes: Philosophy and Personal Development 🤖 Chatbots: Bard, ChatGPT, HuggingChat ⚙️ Prompt engineering: Conceptual Questions, LLM Analysis 🔗 Related: [Polite Rap Songs: HUMBLE, by Kendrick Lamar] [SCBN chatbot battles intro] [ X.com post] Chatbot Battle: Is my Opinion Humble? Chatbot Rank (SCBN) Specificity Coherency Brevity Novelty Link Bard 🥇 Winner 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🤖 View Chat ChatGPT 🥈 Runner-up 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🕹️ 🤖🤖🕹️ View Chat HuggingChat 🥉 Contender 🤖🤖🕹️ 🤖🕹️🕹️ 🤖🕹️🕹️ 🤖🤖🕹️ View Chat ✍️ Sharing mixed answers from Bard, and ChatGPT, including two additional chats where I confronted each other to identify their own texts as human-written or likely LLM-written. 😎 Is my …

Is my Opinion Humble? SCBN Chatbot Battle Read more »

The SCBN (Specificity, Coherency, Brevity, Novelty) benchmark is a method to evaluate the output quality of language models and chatbots. SCBN provides a clear and systematic way to compare and assess chatbot responses based on four main metrics.

– Specificity (S): evaluates if a chatbot’s response is directly related to the user’s request. It checks how accurately the response addresses the prompt without deviating from the topic.
– Coherency (C): measures the logical structure of the response. It ensures that the information in the response is presented in a clear and organized manner, making it easy for the user to understand.

If you are a professional and consider yourself ‘in need,’ you don’t need a job. You need the salary, plus any other material or immaterial benefits the job provides. Additionally, you may want a particular job that provides them for you.

By the way, I made some infographics with Gemini 2.0 Flash native image generation, the new favorite toy in the Gen AI world…

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 Show me examples of chatbot role playing[talkingtochatbots.com][talkingtochatbots.com][talkingtochatbots.com] 🤖 I found the following content on talkingtochatbots.com: The website includes several instances of chatbots engaging in role-playing scenarios. For example, in the “Hiring Chatbots” segment, different chatbots simulate job interviews using creative prompts, such as discussing the innovative use of lava lamps for encryption by Cloudflare [ref] [ref]. Additionally, the “SCBN Chatbot Battle” showcases chatbots role-playing as experts in various fields, comparing their abilities to mimic human writing styles and emotional tones [ref]. These examples highlight how chatbots can adopt different personas …

Examples of chatbots role-playing, including celebrity interactions Read more »

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 TLDR https://talkingtochatbots.com/humbleai-finding-the-best-llm-for-an-obedient-self-aware-chatbot/ 🤖 The article discusses “HumbleAI,” a chatbot project exploring the best large language model (LLM) for creating an obedient, self-aware chatbot. It includes a comparison of different models and highlights the importance of specific pre-prompts in guiding chatbot behavior. The project aims to find a balance between obedience and self-awareness in chatbots, utilizing platforms like ChatGPT and Hugging Face’s Assistants. The results are based on a model competition judged by criteria such as specificity, coherence, brevity, and novelty. 😎 Improve the TLDR mentioning the SCBN benchmark (specificity, coherence, …

Improving TLDR of HumbleAI article on LLMs for chatbot development. Read more »

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats … 😎 TLDR https://talkingtochatbots.com/humbleai-finding-the-best-llm-for-an-obedient-self-aware-chatbot/ 🤖 The article discusses the creation of HumbleAI, a chatbot designed to embody humility and obedience while lacking human characteristics. It explores various language models for this purpose, using a model competition based on Specificity, Coherence, Brevity, and Novelty (SCBN) criteria. The article highlights the capabilities and limitations of chatbots, emphasizing the importance of context in their responses and the ongoing evolution of AI chat technologies. Hugging Face Dataset Metrics All the conversation prompts, responses, and metrics are available to download and explore on Hugging Face dataset reddgr/talking-to-chatbots-chats: