Anyone found a way to run LLaMA 13B model on Apple Silicon?

Pytorch does not have all operations implemented yet on mps backend. Bitsandbytes for 8bit quantized models is heavily dependent on cuda.