This guy has created a cluster of RPIs, in a 3D printed rack, to run 65B LLaMA chatbot.
"Yeah. I have ChatGPT at home. Not a silly 7b model. A full-on 65B model that runs on my pi cluster, watch how the model gets loaded across the cluster with mmap and does round-robin inferencing 🫡 (10 seconds/token) (sped up 16x)"