For this particular presentation I think it's helpful to have text and images. Often I'm not a fan of that, but for what you're doing I liked being able to read the text as you read or referred to it. Same with viewing the painting.
There is a podcast that I follow (typically one hour + in length) which streams on Google Podcasts as well as YouTube; in that case I listen to the audio stream and only rarely watch the video if there's something specific that I want to see. In your case I think the visuals are pretty integral. Having a purely audio stream might make it easier to access (in the car, for example) but as long as the length is around ten minutes I for one would watch the video.