Do you know of any great…

Do you know of any great text to speech models that do intonation well? Open weights. They do not need to clone voices.

I've tried suno bark, but it sometimes hallucinates. I need the reading to be literally what's written. Also tried f5-tts, intonation is not great and the speed varies a lot, so when it's reading multiple texts, the speed of output speech is different between generation. The duration predictor is also not great and sometimes causes cutoffs.

Have I missed something?

English only for now is ok.

This post and comments are published on Nostr.