Needs vs Wants: Work Life Thoughts and Gemini 2.0 Flash Infographics

When I attend professional networking events, people often ask me what I do for a living.

Although I avoid the brutal honesty of the passive-aggressive statement above, I observe that “corporate people” frame that question, as well as most others relevant to their context, from the wrong angle.
The recruiter who puts a salary on the table, the interviewee who demands a job that pays that money, the salesperson who encourages others to put another kind of money on the table, or the corporate buyer who watches for that money being well spent, all tend to fall into the same trap. They act based on what they think they need and assume the other party will also act based on those perceived needs, ignoring the simple facts that make what they do clearly and objectively about wants, and absolutely not about needs.

Alt text by ALT Text Artist GPT
Pointing at some of those colorful infographics and self-help web articles explaining the difference could also come across as ‘passive-aggressive-ish,’ so I instead put it this way:
Need vs Want
If you are a professional and consider yourself ‘in need,’ you don’t need a job. You need the salary, plus any other material or immaterial benefits the job provides. Additionally, you may want a particular job that provides them for you.
If you are a business owner or shareholder and consider yourself ‘in need,’ you don’t need a customer who buys your offering or an employee who works for that customer. What you need are profits, dividends, or shareholder value, plus any other material or immaterial benefits the business provides as a consequence of having those customers and employees. Additionally, you may want someone in particular to be your customer or to service them so that the business provides what you need.
If you are any of the above yet consider yourself privileged, besides being an oddity, you are no different from the others and probably just want the same things. Your status may change what you need in the short term, but it doesn’t change what you want. You might still want to hold a specific job, have someone as your customer, or hire a specific individual to work for your business. The ‘need’ doesn’t make any difference, whether you consider yourself in need or not.
My take, as someone who perceives himself as privileged (owing to the environment into which I was born, the education and work experience I have had, and the current, though volatile, circumstances of life), is that most people in the workplace and in business would benefit greatly not from thinking about their own privileges or needs, but from stopping to think about whether others are in need or are privileged. It doesn’t make a difference. Still, most people tend to look at others from that angle.
AI That Makes Infographics, Removes Watermarks… Another Slop Generator or Photoshop Killer?
Hey, hold on! This website’s name is “Talking to Chatbots.” You might be wondering where the chatbots are. I thought the topic of “needs vs. wants” would be a good way to test multimodal LLMs. It’s that intermediary stage between traditional text-to-image prompts and having a ‘conversation’ with a chatbot, where you expect an image instead of a text reply, or perhaps a combination of both.
Since the early days of DALL·E being integrated with ChatGPT, one of my favorite funny applications has been to ask for infographics. Beyond the nonsensical, but understandable, texts embedded into the images, it’s sometimes hilarious to see how the training data from millions of Internet articles, like those in the DuckDuckGo screenshot I shared earlier, translates into images created by diffusion models. Such was the case with the satire DALL·E-generated infographics I shared on social media a long time ago, the Cryptobrobot, and the ‘Strategies to Get Rich’ infographic. Here’s the DALL·E infographic for “Needs vs Wants”. Don’t ask me why “designer clothes” and a cartoon of what looks like a woman wearing them were placed on the left column.

And here are a couple more ‘minimalist’ infographics I made with Gemini 2.0 Flash native image generation, the model everybody in the AI echo chamber was talking about as I wrote this article:


Alt text by ALT Text Artist GPT
If money and food are what you need, more food and more money is what you want. I appreciate your honesty, and your simplified illustration of Maslow’s hierarchy of needs, Gemini. In case you’re wondering, the ‘joke’ wasn’t Gemini’s ‘idea’. Actually, ‘zero-shotting’ the same prompt I used on DALL·E often resulted in repetitive, dull, and nonsensical outcomes. Although Imagen 3 (the diffusion model behind Gemini 2.0 Flash) is a good contender, I still prefer DALL·E, Stable Diffusion, or Flux, for pure text-to-image generation.

Alt text by ALT Text Artist GPT
Models such as DALL·E, Stable Diffusion, Midjourney, or Flux, work great for certain image generation tasks, like obtaining realistic photos, simulating artistic styles, and even advanced photo editing with tools like ControlNet, but they are quite constrained when combined with complex language tasks. Throughout the past two years of writing articles for this blog, I’ve often shown how SOTA AI tools are not as ‘smart’ as we tend to think, and how they get particularly ‘dumber’ the more multimodal they get: GPT-4 was not good at getting jokes; multimodal LLMs from Wild Vision Arena weren’t nearly as smart as a random Redditor; Grok has a decent sense of humor, but it’s a total idiot when it comes to analyzing financial charts…

Alt text by ALT Text Artist GPT
The sudden resurgence of the discourse around AI-generated images sparked by the launch of Gemini 2.0 Flash, can be attributed not to the quality of the images themselves, but to the effectiveness of the function calling and behind-the-scenes processing that occur when you provide specific instructions in the chat interface: add chocolate to my croissant, put more money on the chart, while leaving the rest untouched, remove this “Shutterstock” watermark… We must acknowledge that Google has achieved the highest level of multimodal ‘understanding’ with Gemini 2.0 Flash, surpassing any other chatbot tool we have seen. Here are some examples of how I played around in the Google AI Studio visual interface:

Alt text by ALT Text Artist GPT

Alt text by ALT Text Artist GPT
There is nothing in Google AI Studio and Gemini 2.0 Flash that cannot be achieved with other diffusion models, text-to-image prompting techniques, and advanced AI tools like ControlNet. Some AI skeptics might have a fair point in saying that much can be achieved with ‘classic’ tools like Photoshop or GIMP, along with a significant dose of human creativity.
Even OpenAI incorporated some cool features, like inpainting editing, into the ChatGPT app. However, experienced users found these tools to be slow, limited, and dull compared to more powerful, though harder-to-learn, alternatives. Here is one comment I posted on X comparing image generation with Stable Diffusion on my computer with the embarrassingly slow image editing demo (yes, the ‘fakeable’ demo) that was shared by OpenAI when they launched the feature:
Might as well be an ad for Nvidia GPUs… 8 GB VRAM, 12 images, 40 secs text2image + 32 secs inpainting: pic.twitter.com/qa9bKgDaqw
— David G. R. (@dgromero) April 4, 2024
Google’s Gemini 2.0 Flash, the Latest AI Hype Machine
It’s no surprise that the AI crowd has received Gemini 2.0 Flash with huge enthusiasm, labeling it as a potential game changer. Here’s just one of the many social media posts fueling the hype:

Alt text by ALT Text Artist GPT
You can try Gemini 2.0 Flash on Google AI Studio, the Gemini app, or use my Google Colab notebook. For inspiration on what to try and further reading, you can read the AI-generated article I curated on Perplexity:
Google’s Gemini 2.0 Flash: A Major Breakthrough in Multimodal Gen AI

A Google Colab Notebook to Test Your Gemini API Key and Send Prompts in Code
Face swapping is surely one of the most popular uses of gen AI image models. Countless websites and apps are essentially built around API calls to a diffusion model, combined with a straightforward user interface. This responds to how easy it is to find fun use cases for manipulating images, even though not everyone is necessarily “skilled” in prompting or using advanced AI tools. The latest Gemini model is, again, not doing anything that can’t be done with others, but it significantly streamlines the process, opens new impressively easy uses, and bridges the gap between complex tools like Photoshop, or the Stable Diffusion Web UI, and the overly simple “AI wrapper apps” that proliferate on the Internet.
To conclude this post, I thought it would be interesting to connect some fun and easy applications of Gemini 2.0 Flash, like face swapping, with the corporate clichés and LinkedIn stuff it began with. Here’s a relatable joke that I came across on Pinterest, and just thought I would give it a new look to explore what we can do with this new AI toy:

Alt text by ALT Text Artist GPT
As I mentioned earlier, I’ve put all these examples into a Jupyter Notebook, which you can easily connect to your Gemini API key (there’s a pretty ‘generous’ free tier and usage limits as I write these lines) and use the model with Python code, which allows you use the model in a more personalized and flexible way than the Google AI Studio UI, just if you are into coding. You can click here to run the notebook yourself on Google Colab, or copy the code from the below Gist and run it elsewhere:
… And Then Came OpenAI and the Ghibli Fever
Just a few days after I published this post, OpenAI rolled out a major update to their image generation model: 4o image generation. With this update, OpenAI definitely seems to be steering away from the ‘DALL·E’ branding, focusing on integrating image generation and editing into the ChatGPT interface. This clearly represents a response to the groundbreaking Gemini model I evaluated in this post.
I also curated a Perplexity page about the Studio Ghibli-style images viral trend for those who might have been living in a remote cave without access to social media during this time. If you want to skip the slop, I can summarize it with a (sort of) Ghibli-fied version of a meme I posted here when introducing OpenAI o1:

Alt text by ALT Text Artist GPT
As pretty much everyone else, I’ve been playing around with the model in ChatGPT (not yet available in the OpenAI API as I write this), and going into my impressions, opinions, and a comparison with Gemini 2.0 Flash would definitely require an entire new post. Given that I’ll probably never have the time for that, I’ll end this update with a “model mashup”… Here’s my definitive version of the “Greatest Weakness Interview Meme”. Ghibli-fied with ChatGPT, edited with Gemini:

Alt text by ALT Text Artist GPT
A very useful feature in ChatGPT, significantly improved after the latest update, is the “inpainting, or the ability to select a specific part of the image that needs to be modified, while ensuring that the rest remains (almost 100%) unchanged:

Alt text by ALT Text Artist GPT
In Google AI Studio, we can try the same approach, which usually yields decent results. However, there is no visual interface for selecting manually, meaning that everything relies on the prompt, which is a limitation, although it was more than sufficient to make a minor edit in the cartoon to keep the characters consistent:

Alt text by ALT Text Artist GPT
Putting jokes and hype aside, we must credit OpenAI for developing the best user interfaces for generative AI models and a remarkable ability to bring it to the masses.

Alt text by ALT Text Artist GPT
Outpainting is another crucial feature in AI image generation that was typically only possible with advanced tools leveraging open source models, such as Stable Diffusion web UI and ComfyUI. In ChatGPT’s latest update, extending a square image into a 16:9 image is just one prompt away: “extend the sides so it’s a 16:9 image, keeping the scene and setting intact.”

Alt text by ALT Text Artist GPT
Leave a Reply