
This chapter covers
- Generating digital images and video
- Generating AI-assisted video editing and text-to-video
- Generating presentation resources
- Generating audio-to-text and text-to-audio
Text and programming code are natural targets for generative AI. After all, after binary, those are the languages with which your computer has the most experience. So, intuitively, the ability to generate the kinds of resources we discussed in the previous chapter was expected.
But images, audio, and video would be a very different story. That’s because visual and audio data
- Are inherently more complex and high-dimensional than text
- Lack symbolic representations and have more nuanced meaning, making it challenging to directly apply traditional programming techniques
- Can be highly subjective and ambiguous, making it difficult to build automated systems that can consistently and accurately interpret such data
- Lack inherent context, making it harder for computer systems to confidently derive meaning
- Require significant computational resources for processing
Nevertheless, tools for generating media resources have been primary drivers of the recent explosion of interest in AI. So the rest of this chapter will be dedicated to exploring the practical use of AI-driven digital media creation services.