Below is the notebook I submitted (late) to the LMSYS – Chatbot Arena Human Preference Predictions competition on Kaggle. This notebook applies NLP techniques for classifying text with popular Python libraries such as scikit-learn and TextBlob, and my own fine-tuned versions of Distilbert. The notebook introduces the first standardized version of SCBN (Specificity, Coherency, Brevity, Novelty) quantitative scores for evaluating chatbot response performance. Additionally, I introduced a new benchmark for classifying prompts named RQTL (Request vs Question, Test vs Learn), which aims to refine human choice predictions and provide context for the SCBN scores based on inferred user intent. You can check all the code, annotations, and charts in the Kaggle widget below.
RQTL (Request-Question-Test-Learn) Prompts Word Frequency Stats
To get a better sense of the RQTL prompt classification, for which I’m developing training and testing datasets and models on HuggingFace (check my profile for the latest versions), I have compiled some statistics about each type of prompt. I am sharing a few charts below:
You can find the notebook where I calculated the word frequency stats in the GitHub repository, along with other related notebooks and Python scripts. You can also explore the original version of the notebook in this Gist:
Request vs Question & Test vs Learn Datasets on Huggingface
Since I started this project, I’ve been maintaining a couple of public datasets on HuggingFace, with prompts extracted from the lmsys/lmsys-chat-1m dataset and labeled manually.
reddgr/rq-request-question-prompts: Annotated Request vs Question Prompts
Leave a Reply