Technology Question - How Best To Create "Video" With Speech Matched To Scrolling Text

  • I am thinking ahead to the Lucretius book review discussion, and I am realizing that we have a continuing need for computer method that will allow us to create video that is mostly the spoken reading of text that scrolls by on the screen as it is being read. In other words, for example, creating a video of an audio reading of a letter of Epicurus while the text of the letter scrolls by in the video.


    I have confronted this issue before in creating the "Foundations of Epicurean Philosophy" video without really coming to a satisfactory conclusion. It is very tedious to match text to the audio and flash it by in segments, so what I am wondering is this:


    It ought to be possible to record a segment of text on audio, and then move that audio file into a video processor where it can be matched to a text file of the same material. When I say "matched" I mean that in most video editors the audio file shows up on screen as a picture of the waveform of appropriate length. What then needs to be done is to add the "text file" of the text into the editor, and then drag that out to the same length as the audio waveform along with a "scroll this" command - kind of like the opening scene of the original Star Wars movie, but not necessarily fading into the background but simply scrolling from bottom of the screen to the top.


    We ought to have a number of people who are capable of creating text excerpts like this, so I would like to develop a recommended procedure in which we identify the program and the method to recommend. In order to make that accessible to as many people as possible it would be desirable if the programs were free (such as, for example, the AUDACITY editor for editing audio) .


    But the first order of business is probably to find a method that as many people as possible will find convenient, whether it is "free" or not.


    I will work on this and report back in this thread.


    My preferred tool is linux, but we aren't limited to that. As for free programs, the Audacity audio editor will do all the audio editing that we need, and such free video editors as KDENLIVE and OPENSHOT may do some of the video, but not necessarily the scrolling effect that is really needed.


    If you are knowledgeable in computers and have a suggestion I would really appreciate it.

  • I got it so that the scroll speed matches the speed of the Text-To-Speech. Here's my demo of it:


    I'll make a user interface so you can either upload audio or use your system's Text-To-Speech.
    I'm gonna create an installer so it's easy for other people to use, I'll do that tomorrow.

  • WOW KYLE THANK YOU! I was hoping that there might already exist some kind of sourceforge or github project to do such a thing, but it never occurred to me that it might be possible that someone here could make one specially for this use!


    The demo looks great! I looked over the code and it seems so simple, but I don't know the first thing about Python programming.


    I see you have a place to specify the color of the background, and the video size. That's excellent too. I can imagine that someone doing this might want to take the resulting video and surround it with some kind of frame or Titling, and the ability to specify those settings ought to be easy enough to take the result and do that in any video editor. I can see someone's first request might be the ability to specify a fixed title, a special font, and a background image instead of a simple color, and if I were to dare to make requests those would probably be it, but even as is this looks highly usable. (As to the font I mean the default font for the entire scrolling text, not the ability to change fonts within it - I presume that would require it's own word processor. But it's not clear to me whether the python program is being input text in Ascii format, or RTF, or what.)

    Many many thanks for taking this initiative!!

  • The input text must be in ASCII format.

    printable = set(string.printable)

    audio_txt = [filter(lambda x: x in printable, origin_txt)]


    these two lines strip everything that isn't ASCII.


    I simplified the script a little bit and commented to make it simpler.


    I added options for user to specify a bunch of new stuff, including backgrounds.


    Quote

    if I were to dare to make requests

    No, feel free to be creative with requests!



    Here's another demo, I chose ugly red text but it's easy to change in the ALL CAPS options at the top of the script (FONT_COLOR)



    Next is to add a static title!

    Edited 3 times, last by Kyle ().

  • Oops, I forgot I added that to my edit draft.


    Edit: Ok I added the demo.


    Next is adding explanation for beginners how to set it up on different operating systems, creating an installer, and static title text.


    (also i should add a background offset)


    (Also, I love how my Text to Speech says "them" like "them-i-ser-fer". I don't know why it does that)

    Edited 3 times, last by Kyle ().

  • I can imagine using this in at least two ways: (1) using it to produce just the scrolling text, and then dropping the resulting video into a video editing program to customize the area around the box with the text. (2) as is, just as you are doing, combining a user-spoken MP3 with the text that is being spoken. You're adding options that will make option (2) very viable, but if it gets to be to huge a project, option (1) should always be workable too. My experience is that it's next to impossible to drop a large amount of text into any standard video-editor program, so this will solve that problem elegantly!

  • What kind of text-to-speech engine are you using? I am familiar that Androld has something but is your engine something that is native to Windows or Mac or Linux or Python or what?


    In the end I am thinking that human voices will be the main use of this, but as text-to-speech improves (IF it improves) then that will be viable too. I say "if" because my experience was that the Ivona packages were leading the field before they were bought (by Amazon I think) and effectively removed from the market.

  • I added the option for choosing a premade audio file and it should work well. I'd recommend not leaving any silence at the beginning or end of the audio file so it matches with the scrolling text.


    The scrolling text will use the length of the audio file you give it (in AUDIO_FILE, if you leave it blank it will generate audio through TTS) in order to calculate its scrolling speed.


    I uploaded the changes. Next I'm going to create an installer so it's easy for others to use

  • > What kind of text-to-speech engine are you using?

    The python library will use whatever is native to your operating system, in my case it's using Mac.


    I tested the human reading option and it works.

    Here's a demo of human reading

  • BTW it will only work with MP3 (I could change that tho if y'all want, I think I will add m4a)

    Edited once, last by Kyle ().

  • ok, I have work and after like 5, I can set up easy install instructions

  • THANK you Kyle! I am working with my first effort now....


    Edit: Well it is taking me a while to install dependencies but I am getting there. No doubt most people of normal experience will not be able to do that, so this is something that can be handled through an installer?

  • An installer will work..


    I'm also considering making this into a web application so that you can just create the video by pasting text into a website and download it from the website

  • WOW that would be even better! I am afraid I am asking too much! ;-)


    At any rate, I have all the python dependencies loaded, I believe, and the program appears to start, (the selection box moves to the next line) and there are no error messages, but it doesn't seem to be doing anything very fast, and I don't see a process I can identify as running. It did however create an audio.mp3 file that was not there in the beginning ;-)


    I should maybe say I am running debian stretch (MX-18)

  • Yeah I should make it print something at the start to let you know it's working.


    It takes a very long time to create the video.

  • OK if it takes a long time then maybe I am good! I will let it run and just see what happens.


    I presume that part of the time is that it is generating the speech to text, and I have an mp3 file that I am ready to use, but I decided to run it first using the Google TTS just so I could step through making the program work as you yourself have it running. I did substitute my own text file (the Vatican Sayings, in pure ascii) in place of yours.