Latest fine tune is still hallucinating…

Latest fine tune is still hallucinating a lot. Especially when it has learned an out of date information or biases, it is hard to make it unlearn. Such as not recommending products that are no longer available or are good for the purpose.

It does not think I'm a hockey player anymore, but I did not start INESS nor have I written a book called Anarchopedia.

The fine tune is based on migtissera's Synthia-70B. Not great, not terrible. These qlora finetunes for 12 epochs (0.0003 learning rate) seem to get slight hints, but there's still a lot of bias from the original model. I have 5000 question-answer examples in training data. Any hints?

This post and comments are published on Nostr.