Image to text with LLaVA and Ollama

Sometimes I have trouble writing alt text for images, because I don’t know where to start, or how to phrase things, or whatever. I’ve found that LLaVA is good enough to get me going.

I run it via Ollama, like this:

ollama run llava:13b "describe this image concisely: ./some_image.png"

For example, for this image:

Illustration with three cartoon characters. On the left is a female character with her hand on her hip. In the center is the same female character with the same pose, but with no face. On the right is a black silhouette of a male character. — Figure 1: Concept art of characters in Coffee Run, by Blender Studio.

I got the following, which is a decent starting point:

The image shows a digital illustration of three characters. On the left, there is a female character with blue hair and casual attire, displaying a confident stance with her hands on her hips. In the center, another similar female character is depicted, also with blue hair but wearing a tank top instead of a shirt. To the right, there is an outline of a male figure with no visible features or clothing, suggesting his form is not fully visible in the illustration. The background is dark with a purple hue, providing a stark contrast to the characters.