With thanks to Jon Hunt and Robin Keech from Cuespeak for their input
Prompt:
Images are a key part of aphasia assessment and treatment (and plenty of other areas of practice!). But it can be really difficult and/or expensive to find high quality images. Artificial Intelligence can now generate high quality images from a text prompt - I was interested in whether this might be useful to create customised images quickly and affordably.
I selected 80 nouns, 80 verbs and 40 sentences at random from aphasia assessments and treatment software. I used DALL-E 2 to try to create a suitable image for each target. I entered up to six prompts and stopped if I judged that the image represented the target.
You can view all of the prompts and resulting images on this page (takes a while to load them all).
❗️Even though many images represented the target word, they looked a bit off. These imperfections could be distracting or unsettling for users.
Obviously, these were not considered successful. But they are funny.
See the full set of strange images here.
If you're a clinician, you can probably just use Google image search without thinking about copyright. But for anything that will be published, such as assessments, treatment materials or aphasia-friendly information, finding the right images is hard.
This brief exploration shows that AI image generation could become a low cost, rapid source of copyright-free images that can be heavily customised. This is a very new technology that will only get better. In fact, in the short time since I ran this test, Dall-E and many other generators have improved substantially. Combining AI images and manual editing (e.g. Dall-E 2 + photoshop) will probably be the new normal.
Complexity/frequency: Unfortunately, the most complex and low frequency items were the most difficult to generate. It's easy to find an image of a glass of juice from the internet but very hard to find an image of 'the cat chases the dog'. Yet AI seems to struggle more with the latter, at present.
Syntax: Part of the difficulty with more complex scenes was that Dall-E 2 was not accurately parsing the syntax. Prepositions and adjectives were inconsistently applied and often to the wrong noun. This is a known limitation.
Bias: Dall-E 2 produced a diverse range of races without prompting, but it has been intentionally programmed to do so because early models defaulted to white-looking people. AI is not biased itself but reflects and exaggerates the human-generated data used to train it. Most AI image generators have been trained on English-language, Western culture images and captions, meaning generating images for other cultures may be substantially less accurate and efficient.
We need to see what people think about these images - I plan to investigate the acceptability and accuracy of AI-generated images compared to 'human generated' images.
Pierce, J. E. (2023). AI-generated images for speech pathology—An exploratory application to aphasia assessment and intervention materials. American Journal of Speech-Language Pathology. https://doi.org/10.1044/2023_AJSLP-23-00142
If you'd like to try using Dall-E 2 yourself, I recommend this excellent prompt guide.
You may also want to compare my prompts with the results to see what worked best.
Below, I have summarised what I learned about prompts. Note that each AI image generator (Dall-E, Midjourney, Photoshop, Stable Diffusion...) will have its own quirks. Dall-E 2 does involve a cost.
Tips for
enhancing results in DALL-E 2 Expect randomness – the same prompt will produce
better and worse results across multiple attempts. Picture the type of result you want before creating
the prompt. This encourages a more specific prompt. Prompt as if you are captioning an existing image
in a newspaper. Read stock photograph descriptions to get a feel for wording
and style as DALL-E 2 was trained on image-caption pairs. Present tense seems
to work best. Multiple clauses can be used to specify
additional requirements: Medium Source Lighting Camera attributes Specify camera zoom and angle as DALL-E 2 often
defaults to closeups Adjectives can be very effective but are not
consistently applied to the correct noun - keep trying! Duplication
has been reported to be effective at focusing on a particular description and
improving its quality, e.g.
A smiling girl is tickled, laughing, bright
lighting, happy
|