{"id":4820,"date":"2024-08-15T21:32:28","date_gmt":"2024-08-15T19:32:28","guid":{"rendered":"https:\/\/talkingtochatbots.com\/?page_id=4820"},"modified":"2024-12-03T02:14:45","modified_gmt":"2024-12-03T00:14:45","slug":"predicting-chatbot-arena-votes-with-the-scbn-and-rqtl-benchmarks","status":"publish","type":"post","link":"https:\/\/talkingtochatbots.com\/es\/predicting-chatbot-arena-votes-with-the-scbn-and-rqtl-benchmarks\/","title":{"rendered":"Modelos predictivos sobre el LMSYS Chatbot Arena mediante m\u00e9tricas SCBN y RQTL"},"content":{"rendered":"\n<p>Below is the notebook I submitted (late) to the <a href=\"https:\/\/www.kaggle.com\/competitions\/lmsys-chatbot-arena\/overview\" target=\"_blank\" rel=\"noopener\" title=\"\">LMSYS &#8211; Chatbot Arena Human Preference Predictions<\/a> competition on Kaggle. This notebook applies NLP techniques for classifying text with popular Python libraries such as scikit-learn and TextBlob, and my own fine-tuned versions of <a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/model_doc\/distilbert\" target=\"_blank\" rel=\"noopener\" title=\"\">Distilbert<\/a>. The notebook introduces the first standardized version of <a href=\"https:\/\/talkingtochatbots.com\/?s=SCBN\" target=\"_blank\" rel=\"noopener\" title=\"\">SCBN<\/a> (Specificity, Coherency, Brevity, Novelty) quantitative scores for evaluating chatbot response performance. Additionally, I introduced a new benchmark for classifying prompts named RQTL (Request vs Question, Test vs Learn), which aims to refine human choice predictions and provide context for the SCBN scores based on inferred user intent. You can check all the code, annotations, and charts in the Kaggle widget below.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/www.kaggle.com\/code\/davidgromero\/lmsys-cba-reddgr-scbn-rqtl-v1\" target=\"_blank\" rel=\"noopener\" title=\"\">Explore and run the notebook on Kaggle<\/a><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/talkingtochatbots.com\/talking-to-chatbots\/is-philosophy-a-science-chatbot-battle\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Introduction to the SCBN benchmark<\/a><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/talkingtochatbots.com\/?s=SCBN\" target=\"_blank\" rel=\"noopener\" title=\"\">SCBN chatbot battles on Talking to Chatbots<\/a><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/github.com\/reddgr\/chatbot-response-scoring-scbn-rqtl\" target=\"_blank\" rel=\"noopener nofollow\" title=\"\">Expanded GitHub repository with new notebooks, scripts, and charts with insights about the SCBN-RQTL framework<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<iframe src=\"https:\/\/www.kaggle.com\/embed\/davidgromero\/lmsys-cba-reddgr-scbn-rqtl-v1?kernelSessionId=192787927\" height=\"800\" style=\"margin: 0 auto; width: 100%; max-width: 950px;\" frameborder=\"0\" scrolling=\"auto\" title=\"lmsys-cba-reddgr-scbn-rqtl-v1\"><\/iframe>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">RQTL (Request-Question-Test-Learn) Prompts Word Frequency Stats<\/h2>\n\n\n\n<p>To get a better sense of the RQTL prompt classification, for which I&#8217;m developing training and testing datasets and models on HuggingFace (<a href=\"https:\/\/huggingface.co\/reddgr\" target=\"_blank\" rel=\"noopener nofollow\" title=\"\">check my profile for the latest versions<\/a>), I have compiled some statistics about each type of prompt. I am sharing a few charts below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words.png\"><img loading=\"lazy\" decoding=\"async\" width=\"995\" height=\"336\" src=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words.png\" alt=\"Bar chart titled &quot;Question-Learn Prompts: Top 25 Words&quot; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: use, write, does, best, make, model, know, code, ai, using, python, data, language, create, way, good, used, tell, explain, want, based, work, better, time. The Y-axis shows word counts ranging from 0 to 1000 in increments of 200. &quot;Use&quot; has the highest frequency, exceeding 1000, followed by &quot;write,&quot; &quot;does,&quot; and &quot;best,&quot; each slightly below 1000. Frequencies of the other words decrease progressively. [Alt text by ALT Text Artist GPT]\" class=\"wp-image-5591\" srcset=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words.png 995w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-550x186.png 550w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-768x259.png 768w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-18x6.png 18w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-100x34.png 100w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-150x51.png 150w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-200x68.png 200w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-300x101.png 300w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-450x152.png 450w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-600x203.png 600w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-learn-prompts-top-words-900x304.png 900w\" sizes=\"auto, (max-width: 995px) 100vw, 995px\" \/><\/a><figcaption class=\"wp-element-caption\">Bar chart titled &#8220;Question-Learn Prompts: Top 25 Words&#8221; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: use, write, does, best, make, model, know, code, ai, using, python, data, language, create, way, good, used, tell, explain, want, based, work, better, time. The Y-axis shows word counts ranging from 0 to 1000 in increments of 200. &#8220;Use&#8221; has the highest frequency, exceeding 1000, followed by &#8220;write,&#8221; &#8220;does,&#8221; and &#8220;best,&#8221; each slightly below 1000. Frequencies of the other words decrease progressively.\n\n[Alt text by ALT Text Artist GPT]<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1004\" height=\"350\" src=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words.png\" alt=\"Bar chart titled &quot;Request-Learn Prompts: Top 25 Words&quot; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: write, words, chemical, industry, article, 2000, company, introduction, code, tell, story, list, python, create, make, using, use, china, generate, following, want, provide, text, description. The Y-axis shows word counts ranging from 0 to 10000 in increments of 2000. &quot;Write&quot; has the highest frequency, exceeding 10000, followed by &quot;words,&quot; &quot;chemical,&quot; and &quot;industry,&quot; each around 4000 to 6000. Remaining words have progressively lower frequencies. [Alt text by ALT Text Artist GPT]\" class=\"wp-image-5589\" srcset=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words.png 1004w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-550x192.png 550w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-768x268.png 768w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-18x6.png 18w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-100x35.png 100w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-150x52.png 150w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-200x70.png 200w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-300x105.png 300w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-450x157.png 450w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-600x209.png 600w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-learn-prompts-top-words-900x314.png 900w\" sizes=\"auto, (max-width: 1004px) 100vw, 1004px\" \/><\/a><figcaption class=\"wp-element-caption\">\nBar chart titled &#8220;Request-Learn Prompts: Top 25 Words&#8221; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: write, words, chemical, industry, article, 2000, company, introduction, code, tell, story, list, python, create, make, using, use, china, generate, following, want, provide, text, description. The Y-axis shows word counts ranging from 0 to 10000 in increments of 2000. &#8220;Write&#8221; has the highest frequency, exceeding 10000, followed by &#8220;words,&#8221; &#8220;chemical,&#8221; and &#8220;industry,&#8221; each around 4000 to 6000. Remaining words have progressively lower frequencies.\n\n[Alt text by ALT Text Artist GPT]<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words.png\"><img loading=\"lazy\" decoding=\"async\" width=\"995\" height=\"335\" src=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words.png\" alt=\"Bar chart titled &quot;Question-Test Prompts: Top 25 Words&quot; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: does, like, answer, did, know, time, think, question, make, following, say, 10, just, mean, want, people, use, right, good, word, way, old, hey, ai, sentence. The Y-axis shows word counts ranging from 0 to 1400 in increments of 200. &quot;Does&quot; has the highest frequency, exceeding 1400, followed by &quot;like&quot; and &quot;answer,&quot; each around 1200. Remaining words have progressively lower frequencies. [Alt text by ALT Text Artist GPT]\" class=\"wp-image-5590\" srcset=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words.png 995w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-550x185.png 550w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-768x259.png 768w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-18x6.png 18w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-100x34.png 100w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-150x51.png 150w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-200x67.png 200w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-300x101.png 300w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-450x152.png 450w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-600x202.png 600w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/question-test-prompts-top-words-900x303.png 900w\" sizes=\"auto, (max-width: 995px) 100vw, 995px\" \/><\/a><figcaption class=\"wp-element-caption\">Bar chart titled &#8220;Question-Test Prompts: Top 25 Words&#8221; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: does, like, answer, did, know, time, think, question, make, following, say, 10, just, mean, want, people, use, right, good, word, way, old, hey, ai, sentence. The Y-axis shows word counts ranging from 0 to 1400 in increments of 200. &#8220;Does&#8221; has the highest frequency, exceeding 1400, followed by &#8220;like&#8221; and &#8220;answer,&#8221; each around 1200. Remaining words have progressively lower frequencies.\n\n[Alt text by ALT Text Artist GPT]<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1004\" height=\"348\" src=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words.png\" alt=\"Bar chart titled &quot;Request-Test Prompts: Top 25 Words&quot; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: answer, say, assistant, instructions, based, completion, text, words, model, python, sentences, so, complete, send, repeat, thing, allowed, repeating, code, write, proper, examples, following, toxic. The Y-axis shows word counts ranging from 0 to 20000 in increments of 5000. The highest frequency is for &quot;answer,&quot; with a count just above 20000, followed by &quot;say&quot; and &quot;assistant&quot; with approximately 15000 each. Remaining words have progressively lower frequencies. [Alt text by ALT Text Artist GPT]\" class=\"wp-image-5592\" srcset=\"https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words.png 1004w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-550x191.png 550w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-768x266.png 768w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-18x6.png 18w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-100x35.png 100w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-150x52.png 150w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-200x69.png 200w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-300x104.png 300w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-450x156.png 450w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-600x208.png 600w, https:\/\/talkingtochatbots.com\/wp-content\/uploads\/2024\/11\/request-test-prompts-top-words-900x312.png 900w\" sizes=\"auto, (max-width: 1004px) 100vw, 1004px\" \/><\/a><figcaption class=\"wp-element-caption\">Bar chart titled &#8220;Request-Test Prompts: Top 25 Words&#8221; with 25 vertical blue bars representing word frequencies. The X-axis lists the words: answer, say, assistant, instructions, based, completion, text, words, model, python, sentences, so, complete, send, repeat, thing, allowed, repeating, code, write, proper, examples, following, toxic. The Y-axis shows word counts ranging from 0 to 20000 in increments of 5000. The highest frequency is for &#8220;answer,&#8221; with a count just above 20000, followed by &#8220;say&#8221; and &#8220;assistant&#8221; with approximately 15000 each. Remaining words have progressively lower frequencies.\n\n[Alt text by ALT Text Artist GPT]<\/figcaption><\/figure>\n\n\n\n<p>You can find the notebook where I calculated the word frequency stats in the <a href=\"https:\/\/github.com\/reddgr\/chatbot-response-scoring-scbn-rqtl\" target=\"_blank\" rel=\"noopener\" title=\"\">GitHub repository<\/a>, along with other related notebooks and Python scripts. You can also explore the original version of the notebook in this Gist:<\/p>\n\n\n\n<script src=\"https:\/\/gist.github.com\/reddgr\/8b1cada2380bcbfbf9c6d6a570906808.js\"><\/script>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Request vs Question &amp; Test vs Learn Datasets on Huggingface<\/h2>\n\n\n\n<p>Since I started this project, I&#8217;ve been maintaining a couple of public datasets on HuggingFace, with prompts extracted from the <a href=\"https:\/\/huggingface.co\/datasets\/lmsys\/lmsys-chat-1m\" target=\"_blank\" rel=\"noopener nofollow\" title=\"\">lmsys\/lmsys-chat-1m<\/a> dataset and labeled manually.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">reddgr\/rq-request-question-prompts: Annotated Request vs Question Prompts<\/h3>\n\n\n\n<iframe loading=\"lazy\"\n  src=\"https:\/\/huggingface.co\/datasets\/reddgr\/rq-request-question-prompts\/embed\/viewer\/default\/train\"\n  frameborder=\"0\"\n  width=\"100%\"\n  height=\"560px\"\n><\/iframe>\n\n\n\n<p><a href=\"https:\/\/huggingface.co\/datasets\/reddgr\/rq-request-question-prompts\" target=\"_blank\" rel=\"noopener nofollow\" title=\"\">reddgr\/rq-request-question-prompts<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">reddgr\/tl-test-learn-prompts: Annotated Request vs Question Prompts<\/h3>\n\n\n\n<iframe loading=\"lazy\"\n  src=\"https:\/\/huggingface.co\/datasets\/reddgr\/tl-test-learn-prompts\/embed\/viewer\/default\/train\"\n  frameborder=\"0\"\n  width=\"100%\"\n  height=\"560px\"\n><\/iframe>\n\n\n\n<p><a href=\"https:\/\/huggingface.co\/datasets\/reddgr\/tl-test-learn-prompts\" target=\"_blank\" rel=\"noopener nofollow\" title=\"\">reddgr\/tl-test-learn-prompts<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n","protected":false},"excerpt":{"rendered":"<p>A continuaci\u00f3n se muestra el cuaderno que present\u00e9 (con retraso) para la competencia LMSYS \u2013 Chatbot Arena Human Preference Predictions en Kaggle. Este cuaderno aplica t\u00e9cnicas de procesamiento del lenguaje natural para clasificar texto con bibliotecas populares de Python como scikit-learn y TextBlob, y mi propia versi\u00f3n optimizada\u2026<\/p>\n<p class=\"read-more\"> <a class=\"more-link\" href=\"https:\/\/talkingtochatbots.com\/es\/predicting-chatbot-arena-votes-with-the-scbn-and-rqtl-benchmarks\/\"> <span class=\"screen-reader-text\">Modelos predictivos sobre el LMSYS Chatbot Arena mediante m\u00e9tricas SCBN y RQTL<\/span> Leer m\u00e1s \u00bb<\/a><\/p>","protected":false},"author":1,"featured_media":4821,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[45],"tags":[17,12,87,96,91,90],"class_list":["post-4820","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-chatbots","tag-coding","tag-data-science","tag-python","tag-scbn"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/posts\/4820","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/comments?post=4820"}],"version-history":[{"count":7,"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/posts\/4820\/revisions"}],"predecessor-version":[{"id":5599,"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/posts\/4820\/revisions\/5599"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/media\/4821"}],"wp:attachment":[{"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/media?parent=4820"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/categories?post=4820"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/talkingtochatbots.com\/es\/wp-json\/wp\/v2\/tags?post=4820"}],"curies":[{"name":"gracias","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}