New project: a podcast version of my blogs. Locally generated. First the llm processes the blog into a form that is better for speech, another llm verifies that the first one did not screw it up. That should get the text blog into a more readable form. Then I do speech to text, which I verify via text to speech, I also change the parameters automatically if the output is not good.
I even had to change the phonemizer, to recode the inference model a bit.
Text to speech components are on my github.
The podcast is value4value enabled, so I accept sats per minute if your podcasting app supports it and boosts. The boosts are directly integrated with my podcasting 2.0 dashboard (also on my github).