Creating this was interesting! There wasn’t much of a struggle with ChatGPT for stylization. It mostly got that right. The difficulty came with getting the content of images right. Especially with the last image in the video (the hands). Getting decent-looking hands wasn’t the problem, but instead creating a clear distinction between the two sides of the image. It got color and content flipped several times. The narration audio definitely needed some enhancement wit reverb. I’d love to see more tools like echo, reverb, and distortion included in voice generation in the future.