Needs vs Wants: Work Life Thoughts and Gemini 2.0 Flash Infographics

Needs vs Wants: Work Life Thoughts and Gemini 2.0 Flash Infographics

When I attend professional networking events, people often ask me what I do for a living. 

Screenshot of a Twitter (now X) post showing a tweet by the account "memes.xlsx" (@ExcelHumor) dated 10/12/23, which reads, "poorly explain what you do for a living." Below it is a reply from the verified user David G. R. (@dgromero), stating, "Like every other human, I eat for a living; otherwise, I would die. To facilitate payments for the food and shelter I have needed to live so far, my major sources of income have come from wages in IT and finance businesses, among others." The interface includes engagement metrics such as replies, retweets, and likes.Alt text by ALT Text Artist GPT
Screenshot of an X post showing a tweet by @ExcelHumor, which reads, “poorly explain what you do for a living.” Below is a reply from @dgromero, stating, “Like every other human, I eat for a living; otherwise, I would die. To facilitate payments for the food and shelter I have needed to live so far, my major sources of income have come from wages in IT and finance businesses, among others.” Published on x.com

Although I avoid the brutal honesty of the passive-aggressive statement above, I observe that “corporate people” frame that question, as well as most others relevant to their context, from the wrong angle. 

The recruiter who puts a salary on the table, the interviewee who demands a job that pays that money, the salesperson who encourages others to put another kind of money on the table, or the corporate buyer who watches for that money being well spent, all tend to fall into the same trap. They act based on what they think they need and assume the other party will also act based on those perceived needs, ignoring the simple facts that make what they do clearly and objectively about wants, and absolutely not about needs.

Screen capture of a DuckDuckGo image search results page for "needs vs wants." The search bar at the top displays the query, with filter options for language, time, size, color, type, layout, and license settings. The results include various infographic-style images, diagrams, and illustrations comparing needs and wants, often using text, arrows, and icons. Some images feature split-screen designs, while others use objects such as road signs or blocks labeled "needs" and "wants." Many images contain educational content, such as lists and worksheets, to explain the differences between needs and wants.Alt text by ALT Text Artist GPT
Screen capture of a DuckDuckGo image search results page for “needs vs wants.” The results include various infographic-style images, diagrams, and illustrations comparing needs and wants, often using text, arrows, and icons. Some images feature split-screen designs, while others use objects such as road signs or blocks labeled “needs” and “wants.” Many images contain educational content, such as lists and worksheets, to explain the differences between needs and wants.
Alt text by ALT Text Artist GPT

Pointing at some of those colorful infographics and self-help web articles explaining the difference could also come across as ‘passive-aggressive-ish,’ so I instead put it this way:


Need vs Want

If you are a professional and consider yourself ‘in need,’ you don’t need a job. You need the salary, plus any other material or immaterial benefits the job provides. Additionally, you may want a particular job that provides them for you.

If you are a business owner or shareholder and consider yourself ‘in need,’ you don’t need a customer who buys your offering or an employee who works for that customer. What you need are profits, dividends, or shareholder value, plus any other material or immaterial benefits the business provides as a consequence of having those customers and employees. Additionally, you may want someone in particular to be your customer or to service them so that the business provides what you need.

If you are any of the above yet consider yourself privileged, besides being an oddity, you are no different from the others and probably just want the same things. Your status may change what you need in the short term, but it doesn’t change what you want. You might still want to hold a specific job, have someone as your customer, or hire a specific individual to work for your business. The ‘need’ doesn’t make any difference, whether you consider yourself in need or not.

My take, as someone who perceives himself as privileged (owing to the environment into which I was born, the education and work experience I have had, and the current, though volatile, circumstances of life), is that most people in the workplace and in business would benefit greatly not from thinking about their own privileges or needs, but from stopping to think about whether others are in need or are privileged. It doesn’t make a difference. Still, most people tend to look at others from that angle.

Originally posted on LinkedIn.


AI That Makes Infographics, Removes Watermarks… Another Slop Generator or Photoshop Killer?

Hey, hold on! This website’s name is “Talking to Chatbots.” You might be wondering where the chatbots are. I thought the topic of “needs vs. wants” would be a good way to test multimodal LLMs. It’s that intermediary stage between traditional text-to-image prompts and having a ‘conversation’ with a chatbot, where you expect an image instead of a text reply, or perhaps a combination of both.

Since the early days of DALL·E being integrated with ChatGPT, one of my favorite funny applications has been to ask for infographics. Beyond the nonsensical, but understandable, texts embedded into the images, it’s sometimes hilarious to see how the training data from millions of Internet articles, like those in the DuckDuckGo screenshot I shared earlier, translates into images created by diffusion models. Such was the case with the satire DALL·E-generated infographics I shared on social media a long time ago, the Cryptobrobot, and the ‘Strategies to Get Rich’ infographic. Here’s the DALL·E infographic for “Needs vs Wants”. Don’t ask me why “designer clothes” and a cartoon of what looks like a woman wearing them were placed on the left column.

Image generated with DALL·E showing a split design with a blue left side labeled "NEEDS" and a green right side labeled "WANTS." The left side contains icons of a house, burger, water bottle, plate of food, ice cream, various vehicles, a medical kit, a person, and a phone. The right side contains icons of a sweater, jacket, sunglasses, a clock, a smartphone, a car, a sun, a stereo, a heart symbol, a watch, and a dollar bill. Text labels accompany each icon.Alt text by ALT Text Artist GPT
“Needs vs Wants”. Image generated with DALL·E

And here are a couple more ‘minimalist’ infographics I made with Gemini 2.0 Flash native image generation, the model everybody in the AI echo chamber was talking about as I wrote this article:

Image generated with Gemini 2.0 Flash showing a comparison labeled "NEEDS VS WANTS" at the top. The left side is labeled "FOOD" and "MONEY," depicting a single large dollar bill with a tray of food and a coffee cup on top. The right side is labeled "MORE FOOD" and "MORE MONEY," showing multiple overlapping dollar bills.Alt text by ALT Text Artist GPT
“Needs vs Wants”. Created with Gemini 2.0 Flash image generation
Image with a "NEEDS VS WANTS" comparison, divided into two sections. The left side has a light blue background and is labeled "FOOD," displaying a plate with a sandwich and a brown bread sandwich. The right side has a yellow background and is labeled "MORE FOOD" and "MORE MONEY," featuring multiple stacks of green banknotes.Alt text by ALT Text Artist GPT
Image with a “NEEDS VS WANTS” comparison, divided into two sections. The left side has a light blue background and is labeled “FOOD,” displaying a plate with a sandwich and a brown bread sandwich. The right side has a yellow background and is labeled “MORE FOOD” and “MORE MONEY,” featuring multiple stacks of green banknotes.
Alt text by ALT Text Artist GPT

If money and food are what you need, more food and more money is what you want. I appreciate your honesty, and your simplified illustration of Maslow’s hierarchy of needs, Gemini. In case you’re wondering, the ‘joke’ wasn’t Gemini’s ‘idea’. Actually, ‘zero-shotting’ the same prompt I used on DALL·E often resulted in repetitive, dull, and nonsensical outcomes. Although Imagen 3 (the diffusion model behind Gemini 2.0 Flash) is a good contender, I still prefer DALL·E, Stable Diffusion, or Flux, for pure text-to-image generation.

Screen capture of a Windows Explorer window displaying multiple image files with filenames related to "Needs vs Wants." The thumbnails show various graphical comparisons of needs and wants, with different layouts, colors, and icon styles. Some images contain text such as "NEEDS VS WANTS," "FOOD," "MONEY," and "MORE MONEY." The file names are visible, including variations like "needs_vs_wants_1.png," "needs_vs_wants_series_4.png," and "food-more-food.png."Alt text by ALT Text Artist GPT
Screen capture of a Windows Explorer window displaying multiple image files with filenames related to “Needs vs Wants.” The thumbnails show various graphical comparisons of needs and wants, with different layouts, colors, and icon styles. Some images contain text such as “NEEDS VS WANTS,” “FOOD,” “MONEY,” and “MORE MONEY.”
Alt text by ALT Text Artist GPT

Models such as DALL·E, Stable Diffusion, Midjourney, or Flux, work great for certain image generation tasks, like obtaining realistic photos, simulating artistic styles, and even advanced photo editing with tools like ControlNet, but they are quite constrained when combined with complex language tasks. Throughout the past two years of writing articles for this blog, I’ve often shown how SOTA AI tools are not as ‘smart’ as we tend to think, and how they get particularly ‘dumber’ the more multimodal they get: GPT-4 was not good at getting jokes; multimodal LLMs from Wild Vision Arena weren’t nearly as smart as a random Redditor; Grok has a decent sense of humor, but it’s a total idiot when it comes to analyzing financial charts

Screen capture of an X (formerly Twitter) post by user George Arrowsmith (@ThatArrowsmith). The post states, "Just discovered a fantastic new use for Google Gemini Flash 2.0 Image Generation:" and includes two images of a man in a kitchen reading from a tablet. The first image has a "Shutterstock" watermark, while the second image is the same but with the watermark removed. A text box between the images reads, "Remove the 'Shutterstock' watermark from this image." The post has engagement metrics at the bottom, showing views, likes, retweets, and comments.Alt text by ALT Text Artist GPT
Screen capture of an X post by George Arrowsmith (@ThatArrowsmith). The post states, “Just discovered a fantastic new use for Google Gemini Flash 2.0 Image Generation:” and includes two images of a man in a kitchen reading from a tablet. The first image has a “Shutterstock” watermark, while the second image is the same but with the watermark removed. A text box between the images reads, “Remove the ‘Shutterstock’ watermark from this image.”
Alt text by ALT Text Artist GPT

The sudden resurgence of the discourse around AI-generated images sparked by the launch of Gemini 2.0 Flash, can be attributed not to the quality of the images themselves, but to the effectiveness of the function calling and behind-the-scenes processing that occur when you provide specific instructions in the chat interface: add chocolate to my croissant, put more money on the chart, while leaving the rest untouched, remove this “Shutterstock” watermark… We must acknowledge that Google has achieved the highest level of multimodal ‘understanding’ with Gemini 2.0 Flash, surpassing any other chatbot tool we have seen. Here are some examples of how I played around in the Google AI Studio visual interface:

Screen capture of Google AI Studio showing an image generation process using the Gemini 2.0 Flash model. The interface displays a project titled "Adding Banknotes Right Side." Three versions of an image labeled "NEEDS VS WANTS" are shown, with progressive modifications based on text prompts. The first version features a single banknote stack on each side, the second adds more banknotes on the right, and the third introduces food on the left while increasing food on the right. The right panel displays model settings, including token count, temperature, and advanced options.Alt text by ALT Text Artist GPT
Screen capture of Google AI Studio showing an image generation process using the Gemini 2.0 Flash model. The interface displays a project titled “Adding Banknotes Right Side.” Three versions of an image labeled “NEEDS VS WANTS” are shown, with progressive modifications based on text prompts. The first version features a single banknote stack on each side, the second adds more banknotes on the right, and the third introduces food on the left while increasing food on the right. The right panel displays model settings, including token count, temperature, and advanced options.
Alt text by ALT Text Artist GPT
Screen capture of Google AI Studio displaying an image generation process using the Gemini 2.0 Flash model. The project is titled "Branch of Adding Banknotes Right Side." The interface shows multiple iterations of an image labeled "NEEDS VS WANTS," progressively modified based on text prompts. The first version features food and money on the left, with more money on the right. The second version increases the amount of food on the right while keeping the rest unchanged. The right panel contains model settings such as token count, temperature, and advanced options.Alt text by ALT Text Artist GPT
Screen capture of Google AI Studio displaying an image generation process using the Gemini 2.0 Flash model. The project is titled “Branch of Adding Banknotes Right Side.” The interface shows multiple iterations of an image labeled “NEEDS VS WANTS,” progressively modified based on text prompts. The first version features food and money on the left, with more money on the right. The second version increases the amount of food on the right while keeping the rest unchanged. The right panel contains model settings such as token count, temperature, and advanced options.
Alt text by ALT Text Artist GPT

There is nothing in Google AI Studio and Gemini 2.0 Flash that cannot be achieved with other diffusion models, text-to-image prompting techniques, and advanced AI tools like ControlNet. Some AI skeptics might have a fair point in saying that much can be achieved with ‘classic’ tools like Photoshop or GIMP, along with a significant dose of human creativity.

Even OpenAI incorporated some cool features, like inpainting editing, into the ChatGPT app. However, experienced users found these tools to be slow, limited, and dull compared to more powerful, though harder-to-learn, alternatives. Here is one comment I posted on X comparing image generation with Stable Diffusion on my computer with the embarrassingly slow image editing demo (yes, the ‘fakeable’ demo) that was shared by OpenAI when they launched the feature:


Google’s Gemini 2.0 Flash, the Latest AI Hype Machine

It’s no surprise that the AI crowd has received Gemini 2.0 Flash with huge enthusiasm, labeling it as a potential game changer. Here’s just one of the many social media posts fueling the hype:

Screen capture of an X (formerly Twitter) post by user Deedy (@deedydas). The post discusses Google's new AI image editing model, stating that it could replace 99% of Photoshop. A list follows, describing various image editing capabilities such as generating passport photos, altering appearances, decorating houses, animating sprites, and modifying hair. The post mentions that Gemini Flash 2.0 experimental is accessible on Google AI Studio and concludes with a statement about the product's potential. Engagement metrics, including views, comments, retweets, and likes, are visible at the bottom.Alt text by ALT Text Artist GPT
Screen capture of an X (formerly Twitter) post by user Deedy (@deedydas). The post discusses Google’s new AI image editing model, stating that it could replace 99% of Photoshop. A list follows, describing various image editing capabilities such as generating passport photos, altering appearances, decorating houses, animating sprites, and modifying hair. The post mentions that Gemini Flash 2.0 experimental is accessible on Google AI Studio and concludes with a statement about the product’s potential. Engagement metrics, including views, comments, retweets, and likes, are visible at the bottom.
Alt text by ALT Text Artist GPT

You can try Gemini 2.0 Flash on Google AI Studio, the Gemini app, or use my Google Colab notebook. For inspiration on what to try and further reading, you can read the AI-generated article I curated on Perplexity:

Google’s Gemini 2.0 Flash: A Major Breakthrough in Multimodal Gen AI


A Google Colab Notebook to Test Your Gemini API Key and Send Prompts in Code

Face swapping is surely one of the most popular uses of gen AI image models. Countless websites and apps are essentially built around API calls to a diffusion model, combined with a straightforward user interface. This responds to how easy it is to find fun use cases for manipulating images, even though not everyone is necessarily “skilled” in prompting or using advanced AI tools. The latest Gemini model is, again, not doing anything that can’t be done with others, but it significantly streamlines the process, opens new impressively easy uses, and bridges the gap between complex tools like Photoshop, or the Stable Diffusion Web UI, and the overly simple “AI wrapper apps” that proliferate on the Internet.

To conclude this post, I thought it would be interesting to connect some fun and easy applications of Gemini 2.0 Flash, like face swapping, with the corporate clichés and LinkedIn stuff it began with. Here’s a relatable joke that I came across on Pinterest, and just thought I would give it a new look to explore what we can do with this new AI toy:

Image with two sections. The left side contains the original version of a meme, while the right side is a modified version where faces have been changed using Gemini Flash 2.0 image generation. Speech bubbles were added afterward. Both sides depict a job interview setting with two sequences. In both versions, the top part shows a man and a woman interviewing a candidate. The interviewer asks, "What is your greatest weakness?" The candidate responds, "Honesty." The interviewer replies, "I don't think honesty is a weakness..." The bottom part in both versions shows the candidate smiling and saying, "I don't give a f*** what you think." In the right-side version, the candidate is a bald man while keeping the layout and text the same.
Alt text by ALT Text Artist GPT
Image with two sections. The left side contains the original version of a meme, while the right side is a modified version where faces have been changed using Gemini Flash 2.0 image generation. Speech bubbles were added afterward. Both sides depict a job interview setting with two sequences. In both versions, the top part shows a man and a woman interviewing a candidate. The interviewer asks, “What is your greatest weakness?” The candidate responds, “Honesty.” The interviewer replies, “I don’t think honesty is a weakness…” The bottom part in both versions shows the candidate smiling and saying, “I don’t give a f*** what you think.” In the right-side version, the candidate is a bald man while keeping the layout and text the same.
Alt text by ALT Text Artist GPT

As I mentioned earlier, I’ve put all these examples into a Jupyter Notebook, which you can easily connect to your Gemini API key (there’s a pretty ‘generous’ free tier and usage limits as I write these lines) and use the model with Python code, which allows you use the model in a more personalized and flexible way than the Google AI Studio UI, just if you are into coding. You can click here to run the notebook yourself on Google Colab, or copy the code from the below Gist and run it elsewhere:


… And Then Came OpenAI and the Ghibli Fever

Just a few days after I published this post, OpenAI rolled out a major update to their image generation model: 4o image generation. With this update, OpenAI definitely seems to be steering away from the ‘DALL·E’ branding, focusing on integrating image generation and editing into the ChatGPT interface. This clearly represents a response to the groundbreaking Gemini model I evaluated in this post.

I also curated a Perplexity page about the Studio Ghibli-style images viral trend for those who might have been living in a remote cave without access to social media during this time. If you want to skip the slop, I can summarize it with a (sort of) Ghibli-fied version of a meme I posted here when introducing OpenAI o1:

Cartoon illustration of a job interview where a bald character responds to the question What is your biggest strength with the word Hype, followed by a handshake scene where the interviewer, now smiling, says You're hired with a large OpenAI sign on the wall behind themAlt text by ALT Text Artist GPT
Cartoon illustration of a job interview where a bald character responds to the question “What is your biggest strength?” with the word “Hype”, followed by a handshake scene where the smiling interviewer says “You’re hired”, with a large OpenAI sign visible on the wall behind them
Alt text by ALT Text Artist GPT

As pretty much everyone else, I’ve been playing around with the model in ChatGPT (not yet available in the OpenAI API as I write this), and going into my impressions, opinions, and a comparison with Gemini 2.0 Flash would definitely require an entire new post. Given that I’ll probably never have the time for that, I’ll end this update with a “model mashup”… Here’s my definitive version of the “Greatest Weakness Interview Meme”. Ghibli-fied with ChatGPT, edited with Gemini:

Cartoon illustration of a job interview with two panels; in the top panel, a male interviewer holding a sheet of paper asks What is your greatest weakness while a female interviewer looks surprised as the candidate responds with Honesty, followed by the woman saying I don’t think honesty is a weakness; in the bottom panel, the candidate, now smiling and leaning forward, says I don’t give a f*** what you think while facing the woman seated with her back to the viewer, with a bookshelf and window in the backgroundAlt text by ALT Text Artist GPT
Cartoon illustration of a job interview with two panels. In the top panel, a male interviewer holding a sheet of paper asks, “What is your greatest weakness?” while a female interviewer looks surprised as the candidate responds with, “Honesty.” The woman replies, “I don’t think honesty is a weakness.” In the bottom panel, the candidate, now smiling and leaning forward, says, “I don’t give a f*** what you think,” while facing the woman seated with her back to the viewer.
Alt text by ALT Text Artist GPT

A very useful feature in ChatGPT, significantly improved after the latest update, is the “inpainting, or the ability to select a specific part of the image that needs to be modified, while ensuring that the rest remains (almost 100%) unchanged:

Screen capture of the ChatGPT image editor interface showing a cartoon-style scene where a man in a suit is speaking with a woman seated across a desk, viewed from behind. A blue selection mask outlines the back of the woman’s head and upper shoulders. The instruction "make the lady’s hair slightly longer" is typed in the input field at the bottom of the interface. A circular upward arrow icon, used to submit the prompt and generate the edited image, appears in the lower-right corner.Alt text by ALT Text Artist GPT
Screen capture of the ChatGPT image editor interface showing a cartoon-style scene where a man in a suit is speaking with a woman seated across a desk, viewed from behind. A blue selection mask outlines the back of the woman’s head and upper shoulders. The instruction “make the lady’s hair slightly longer” is typed in the input field at the bottom of the interface. A circular upward arrow icon, used to submit the prompt and generate the edited image, appears in the lower-right corner.
Alt text by ALT Text Artist GPT

In Google AI Studio, we can try the same approach, which usually yields decent results. However, there is no visual interface for selecting manually, meaning that everything relies on the prompt, which is a limitation, although it was more than sufficient to make a minor edit in the cartoon to keep the characters consistent:

Screen capture of the Google AI Studio interface showing an image generation task using the Gemini 2.0 Flash model. The prompt "Make the lady’s hair slightly longer and voluminous, keeping the rest of the image untouched" is displayed above the generated result. The interface shows a before-and-after comparison of a cartoon-style office scene where the woman’s hairstyle is modified in the output image. The right panel displays run settings including output format, token count, temperature slider, and advanced settings.Alt text by ALT Text Artist GPT
Screen capture of the Google AI Studio interface showing an image generation task using the Gemini 2.0 Flash model. The prompt “Make the lady’s hair slightly longer and voluminous, keeping the rest of the image untouched” is displayed above the generated result. The interface shows a before-and-after comparison of a cartoon-style office scene where the woman’s hairstyle is modified in the output image. The right panel displays run settings including output format, token count, temperature slider, and advanced settings.
Alt text by ALT Text Artist GPT

Putting jokes and hype aside, we must credit OpenAI for developing the best user interfaces for generative AI models and a remarkable ability to bring it to the masses.

Screen capture of the ChatGPT interface showing an image generation result based on the prompt: "Restyle the photo into ghibli art style." The original photo at the top shows two interviewers and a candidate in a modern office setting. Below, the generated image replicates the same scene in a Ghibli-style illustration, with the same composition and characters rendered in hand-drawn animation style.Alt text by ALT Text Artist GPT
Screen capture of the ChatGPT interface showing an image generation result based on the prompt: “Restyle the photo into ghibli art style.” The original photo at the top shows two interviewers and a candidate in a modern office setting. Below, the generated image replicates the same scene in a Ghibli-style illustration, with the same composition and characters rendered in hand-drawn animation style.
Alt text by ALT Text Artist GPT

Outpainting is another crucial feature in AI image generation that was typically only possible with advanced tools leveraging open source models, such as Stable Diffusion web UI and ComfyUI. In ChatGPT’s latest update, extending a square image into a 16:9 image is just one prompt away: “extend the sides so it’s a 16:9 image, keeping the scene and setting intact.”

Screen capture of the ChatGPT interface showing an image generation result in response to the prompt: "extend the sides so it's a 16:9 image, keeping the scene and setting intact." The generated image displays a cartoon-style office scene where a man in a suit is speaking to a woman seated across a desk, viewed from behind.Alt text by ALT Text Artist GPT
Screen capture of the ChatGPT interface showing an image generation result in response to the prompt: “extend the sides so it’s a 16:9 image, keeping the scene and setting intact.” The generated image displays a cartoon-style office scene where a man in a suit is speaking to a woman seated across a desk, viewed from behind.
Alt text by ALT Text Artist GPT

Leave a Reply

Your email address will not be published. Required fields are marked *

*