Thanks for the suggestion, but unfortunately individual voice contributions will not help much, for a couple of reasons:
- The material that I am trying to make use of now is generally an hour or more long. including such things as the Velleius section of On the Nature of the Gods, and long sections of Cicero's other works where Epicurus is mentioned, and even the entire Lucretius poem, so those are also going to be long.
- I want to target long term constant improvement, meaning constantly improving both the audio and video with better voices and eventually illustrations and moving pictures to stay on top of the latest technology.
- I've been thinking about what "we" bring to the table in this kind of project, and it seems to me that what no one else is likely to duplicate is to prepare modern American English versions of these based on the public domain translations. There are many people who can do this kind of technology, but IMHO not many who are well equipped to judge what the texts "should" say in modern American english, just as you've done with Menoeceus. We CANNOT let the perfect be the enemy of the good because we'll never reach perfection, but we can produce reasonably accurate modern English versions that will be useable for people who will simply never be willing to read Bailey or the others. And in truth there's no reason to use 18th or 19th century phrasing in these renderings anyway.
- And our goal ougt to be something that can be TOTALLY freely distributed, both as to the text and to the final media form. As to the text, posting it on github makes clear that anyone can use the text as a base, even if they don't want to cooperate with us.
- I am also going to post this following paragraph in the first post to document the public domain engine that I'm using now. Hopefully we will have some techies who know how to do this better than I do, and they can help us improve the voice rendering faster. But at the moment, the current engine works well enough to allow us to run the texts through them and proof the results. The renderings will get better and better as the technology improves, but the texts won't improve unless we take the time to have people who "understand" them supervise the texts that are input.
- Even the current Coqui TTS engine can no doubt be better in more competent hands than mine. I feel sure I am not optimizing my settings correctly , as some of the text sounds fine while in other sections the model goes wild with emotion. To some extent i am addressing those problems with word choice and order, but there are technical issues too that I don't understand yet.
QuoteThis project employs the Coqui TTS open-source text-to-speech engine, specifically the multilingual, multi-speaker “XTTS-v2” model. XTTS-v2 is a transformer-based neural network designed for expressive, natural-sounding speech generation across many languages and voices. It reproduces human intonation and rhythm by processing text at the sentence level and synthesizing waveforms directly from the model’s internal acoustic representations. The implementation uses the Python Coqui-TTS library, running locally in CPU mode for portability and stability, with customized parameters to control pace and clarity—such as a sampling rate of 22.05 kHz, a pause duration of 0.25 seconds between segments, and a neutral male speaker profile known as “Craig Gutsy.” Audio segments are generated sequentially, merged into continuous narration, and paired with synchronized subtitles and a video overlay produced through FFmpeg. The entire process is fully open-source, requiring no external API calls, and emphasizes reproducibility, offline privacy, and fine control over both narration and presentation.
The Coqui TTS engine and its included models are distributed under the Mozilla Public License 2.0 (MPL-2.0), a permissive open-source license that allows free use, modification, and redistribution of the software, provided that source-code changes remain publicly available under the same terms. This ensures that all derivative works remain part of the open research ecosystem while allowing commercial and non-commercial use alike.