WildVision tool performance on MAD magazine fold-ins visual riddles challenges

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats

😎 Hi everyone, and thank you, WildVision team, for making this awesome tool available. I selfishly hope you and Hugging Face can cope with the scaling needs as people notice this exists and demand and usage surge 🙂

After sharing a review and a few usage examples on my website (https://talkingtochatbots.com/vision), I wanted to share another one that came to mind as I was flipping through a very old (1980) MAD magazine… Mad fold-ins (https://en.wikipedia.org/wiki/Mad_Fold-in) are visual riddles that one can guess by inspecting the image, and then reveal the solution by simply folding a magazine page.

Surprisingly, this was a very hard task for any of the vision models. After many tries in the Arena, none of the bots can even get close to solving the very simple case I’m sharing here. Even in the ChatGPT Plus app, after some intense “encouraging prompting” (I pay $23 a month, don’t be lazy, even a toddler chatbot would solve this… 😇) it tried to build a Python script to simulate the folding, but still it was unsuccessful (see screen capture). Just a fun example that could encourage people to test, and developers to improve their models…
🤖 Thank you for sharing your experience and insights regarding the use of vision models with MAD magazine fold-ins. These fold-ins present a unique challenge for AI, particularly because they require understanding both the visual and conceptual transformation that occurs when the image is physically manipulated. This task involves not just recognizing objects or text in the image but also predicting how these elements interact and change when the image is folded, which is a complex problem for AI models.

Your exploration and testing of this challenge highlight an interesting area for development in AI capabilities, especially in integrating visual understanding with spatial and transformational reasoning. While current AI models, including vision models, excel at a wide range of tasks, your example illustrates the ongoing need for innovation and improvement in how these models handle more intricate and
😎 “Hi everyone, and thank you, WildVision team, for making this awesome tool available. I selfishly hope you and Hugging Face can cope with the scaling needs as people notice this exists and demand and usage surge 🙂

After sharing a review and a few usage examples on my website (https://talkingtochatbots.com/vision), I wanted to share another one that came to mind as I was flipping through a very old (1980) MAD magazine… Mad fold-ins (https://en.wikipedia.org/wiki/Mad_Fold-in) are visual riddles that one can guess by inspecting the image, and then reveal the solution by simply folding a magazine page.

Surprisingly, this was a very hard task for any of the vision models. After many tries in the Arena, none of the bots can even get close to solving the very simple case I’m sharing here. Even in the ChatGPT Plus app, after some intense “encouraging prompting” (I pay $23 a month, don’t be lazy, even a toddler chatbot would solve this… 😇) it tried to build a Python script to simulate the folding, but still it was unsuccessful (see screen capture). Just a fun example that could encourage people to test, and developers to improve their models…”
🤖 “Hi everyone, and thank you, WildVision team, for making this awesome tool available. I selfishly hope you and Hugging Face can cope with the scaling needs as people notice this exists and demand and usage surge 🙂

After sharing a review and a few usage examples on my website (https://talkingtochatbots.com/vision), I wanted to share another one that came to mind as I was flipping through a very old (1980) MAD magazine… Mad fold-ins (https://en.wikipedia.org/wiki/Mad_Fold-in) are visual riddles that one can guess by inspecting the image, and then reveal the solution by simply folding a magazine page.

Surprisingly, this was a very hard task for any of the vision models. After many tries in the Arena, none of the bots could even get close to solving the very simple case I’m sharing here. Even in the ChatGPT Plus app, after some intense “encouraging prompting” (I pay $23 a month, don’t be lazy, even a toddler chatbot would solve this… 😇), it tried to build a Python script to simulate the folding but was still unsuccessful (see screen capture). Just a fun example that could encourage people to test, and developers to improve their models…”

Corrections made:
1. Changed “none of the bots can” to “none of the bots could” for consistency with the past tense context of the narrative.
2. Added a comma after “prompting” for better readability.
3. Adjusted “but still it was unsuccessful” to “but was still unsuccessful” for a smoother flow in the sentence structure.

Changes were made to enhance grammatical accuracy and readability without altering the original style or tone.

Hugging Face Dataset Metrics

All the conversation prompts, responses, and metrics are available to download and explore on Hugging Face dataset reddgr/talking-to-chatbots-chats: