$title =

1000 Words vs 1300 Tokens (what is an image?)

;

$content = [

So wit, “a picture is worth a thousand words.”

The saying is attributed to Fred Barnard from 1921, who (quite naturally) adapted it from a Chinese proverb.

He had to adapt it, because he was in advertising and using a new medium to do an age old thing, sell things.

In order to sell something at the time, you had to describe it. If you go back and see the ads of the day, you’ll find them (to modern sensibilities) to be VERY text dense.

A simple line drawing, a *DESCRIPTION* of what the benefit is…and a whole lot of work on the end of the perceiver to understand what’s going on here and engage with the medium.

But technology was advancing, printing was advancing, and you could *show* people things. This changed the game *of commerce for humans*.

Now, of course, we have machines that do laundry, and are considered such a technological nothing that people complain we don’t have machines to do laundry.

We literally have machines that do both laundry and dishes and BOTH WERE INVENTED AND POPULARIZED WITHIN THE LAST 100 YEARS!

And so the technology continues to advance and now encroaches on the realm of what was, for a time (a very short time in human history) solely the province of specially trained folks. That realm of created highly polished professional images of anything.

Those 1,000 words.

Recall, we’ve thought (and felt and expressed the notion) for a long time that the two were close to the same. We have something close to proof on that.

Google’s latest image model, Gemini 2.5 Flash Image, includes this bit in their introductory blog post.

That highlighted bit is where I want to focus. We are fudging a bit with 1290~1300 = 1000, but there ya go. As this model has the ability (like many others) to create nearly any image that humans have created or seen, there is a strong argument that this “1,000 or so tokens/words” is a solid line for when the graph goes infinite (i.e. it will work for 99.9999% of use case).

WORDS VS TOKENS (definitions)

So this is all my layman/user level understanding: A token is a “word in context”. Take the two phrases: A)”I like art”, and B)”That is like, totally art”. Each, when tokenized, would include the word “like” as a token. However, because of their contextual differences, they would not be the same one, not interchangeable. This is only to say that the *word order matters*. It’s not 1,000 random words that describe any image, it’s the sequence of a particular 1,000 words in that order.

Given that understanding, if one find themselves unable to consistently create images that conform to what one imagines those words to mean, *use more words*. Fully describe the vision. Explore the possibilities that exist *using words*.

We already have a system for translating all of reality into text (natural language). Now we have the common means to *translate it back and forth to images*. Explore them and enjoy!

Let the machines do the laundry and dishes! And tell them what art to make for you too!

];

$date =

;

$author =

;