When A Word Makes A Thousand Pictures

This post is about how I created the illustrations of my AI-generated children’s picture book: Strawberries Are Not Berries. If you’ve not read it, head over here.

Jump to the sections below with these quick links:

Picking My AI Artist

Without giving it too much thought, I started out with DALL·E, the text-to-image generator which runs on GPT-3. By then, I had already seen many of DALL·E’s images floating around the Internet. Based on its portfolio of work, I believed it had the potential to be my creative partner on this project.

Starting out, I didn’t have a fixed art style in mind and was very open to whatever DALL·E would throw at me. That said, I knew I wanted the cute, friendly and imaginative illustrations that characterize most children’s picture books. Things were off to a promising start when I successfully generated a series of adorable jellyfish.

However, it got a little trickier when I moved on to inanimate objects. Attempts to anthropomorphize them didn’t quite work out. And while AI could understand natural language very well, it wasn’t quite as adept at reproducing letters and numbers within the image.

Another big struggle I faced was maintaining the consistency of style across the images. Keeping ‘children’s book illustration style’ constant in the prompts did not seem to help very much as you can see.

Once the novelty of generating AI images wore off, I took a hard look at all of DALL·E’s images. Immediately I knew that none of them would pass an editor or publisher. Many possess an ineffable unsettling and unfinished quality. Somehow, they have the distinct look of being created by artificial intelligence. Like the uncanny valley of art. Dissatisfied with the results, I made the switch to MidJourney.

Switching Artists Mid-journey

MidJourney can only be accessed through an instant messaging platform and upon entering, it becomes clear right away why it’s called Discord. The chaotic chatrooms are constantly buzzing with prompts and images. It was somewhat strange to see everyone just talking at the MidJourney bot and no one really talking to each other.

The way it works is that you send off a prompt as a chat message to the bot and it’ll return 4 images in about a minute or so (or longer if many requests are queued ahead of you). Of these images, you can then choose to upscale or generate variants. Once I figured my way around this rather unintuitive UI, I generated my first images.

Right off the bat, the images were astounding. They had curves and textures that made them feel hand-drawn. Most importantly, they were able to capture the soft and playful charm of children’s illustrations, producing highly detailed and rich worlds to great effect.

Only a couple of details would give away the fact that the images were created by AI. For one, it isn’t always good with faces. Characters can end up having five too many eyes or mouths. Positioning may not always make sense either (check out the two kids wedged in the table below). And on occasion, you’ll find random alien-looking creatures in your image. Which is thankfully excusable and sometimes even delightful to have in a children’s book.

The Aarght of Prompting

Without a doubt, writing prompts for these generators is in itself an art to master. At the start, I went in typing a prompt the way I would type a search in Google Images. That only left me exasperated and poorer in free credits.

Perhaps the one good thing to come from MidJourney’s messy Discord chatrooms is that you’ll get to see the prompts and results of others. Some prompts would run the length of short stories and were not unlike the hashtag spam of those annoying social media posts. That’s how I learned the talk and began fluffing up my prompts with more specific descriptors.

While the results got markedly better, I soon realized that more isn’t always more. MidJourney seemed to be a kid with a short attention span, choosing to pick up only on certain words at its whim.

(Prompt: A cute kid at the playground dancing with his arms in the air and lightning sparks coming out from his elbow) MidJourney strangely refused to set him in a playground despite numerous attempts.

Due to the machine’s stochastic nature, one can never fully control its output 100%. To avoid frustration, I’ve learned that it’s best to adopt a balanced position. Just like a creative director, you need a clear vision to guide the model well. Yet at the same time, you need to give it ample creative freedom to roam.

Should you wish for the model to match a very specific image in your head, disappointment is sure to follow. On the flip side, not having a clear view of what you want to achieve will lead to the same discouraging end.

I was delighted to stumble upon this very helpful guide on prompting put together by dallery gallery. It’s a compilation of useful prompt terminology as well as a great showcase of AI’s expansive artistic repertoire. Pity I only found it after completing my book. But I’ll be sure to come back to it for my next project.

Treading In Murky Waters

Instead of describing every single detail of a desired image, many times it’s easiest to encapsulate a particular style by naming an artist or illustrator within the prompt. This is where my ethical alarm bells began to sound.

It’s common in the creative industry to draw reference from the work of other creators. But this felt different. It’s hard to draw the line between inspiration and plagiarism when you can’t peek into the black box of AI. Furthermore, we know that most of these artists never consented to having their works used in the training of the model. So to skip over these moral quandaries, I avoided mentioning any names in my prompts.

However, that didn’t mean my conscience was completely at ease. Time and again, signatures and watermarks would pop up in the generated artwork. Were these copied and pasted from existing copyrighted works? Or was the model just trained to mimick them as stylistic features? Either way it’s disconcerting and once again reminds you that permission from the original creators was unlikely to have been granted.

More To Explore

Despite these controversial issues (which I trust will be worked out more clearly with artists in the near future), I’ll definitely be continuing my exploration in this emerging space of AI art. There’s just an unbelievable wealth of things to learn and experiment with right now.

For instance, I only just found out you can extract and set the image’s seeds. These can be used to control image reproducibility and might just be the answer to my problem with consistency. Also, I can’t wait to try my hand at fine-tuning a model on my very own computer. Which would mean not having to scrimp on free credits. More on these when I begin my next project!

Wendy AwJanuary 10, 2023