coding | Talking to Chatbots

Trying the New OpenAI o1: Zero-Shot CoTs on Coding, Anthropological Victimhood, and More…

Tagged with AI, chatbots, coding, Internet, OpenAI, philosophy, Search engines Last updated on November 8, 2024

OpenAI has just released its o1 models. o1 adds a new level of complexity to the traditional architecture of LLMs, a zero-shot chain-of-thought (CoT). I share my first impressions about o1 in the signature style of this website: talking to chatbots, getting their answers, posting everything.

Predicting LMSYS Chatbot Arena Votes With the SCBN and RQTL Benchmarks

Tagged with AI, chatbots, coding, data science, python, scbn Last updated on December 3, 2024

Below is the notebook I submitted (late) to the LMSYS – Chatbot Arena Human Preference Predictions competition on Kaggle. This notebook applies NLP techniques for classifying text with popular Python libraries such as scikit-learn and TextBlob, and my own fine-tuned versions of Distilbert. The notebook introduces the first standardized version of SCBN (Specificity, Coherency, Brevity, Novelty) quantitative scores for evaluating chatbot response performance. Additionally, I introduced a new benchmark for classifying prompts named RQTL (Request vs Question, Test vs Learn), which aims to refine human choice predictions and provide context for the SCBN scores based on inferred user intent. You can check all the code, annotations, and charts in the Kaggle widget below. Explore and run the notebook …

Predicting LMSYS Chatbot Arena Votes With the SCBN and RQTL Benchmarks Read more »

Is Philosophy a Science? Introducing SCBN Chatbot Battles

Tagged with chatbots, coding, philosophy, scbn, science Last updated on October 9, 2024

The SCBN (Specificity, Coherency, Brevity, Novelty) benchmark is a method to evaluate the output quality of language models and chatbots. SCBN provides a clear and systematic way to compare and assess chatbot responses based on four main metrics.

– Specificity (S): evaluates if a chatbot’s response is directly related to the user’s request. It checks how accurately the response addresses the prompt without deviating from the topic.
– Coherency (C): measures the logical structure of the response. It ensures that the information in the response is presented in a clear and organized manner, making it easy for the user to understand.

Tag: coding

Trying the New OpenAI o1: Zero-Shot CoTs on Coding, Anthropological Victimhood, and More…

Predicting LMSYS Chatbot Arena Votes With the SCBN and RQTL Benchmarks

Is Philosophy a Science? Introducing SCBN Chatbot Battles