Do you know of any great text to speech models that do intonation well? Open weights. They do not need to clone voices.

I've tried suno bark, but it sometimes hallucinates. I need the reading to be literally what's written. Also tried f5-tts, intonation is not great and the speed varies a lot, so when it's reading multiple texts, the speed of output speech is different between generation. The duration predictor is also not great and sometimes causes cutoffs.

Have I missed something?

English only for now is ok.