This is cool. I have OpenWebUI running with local ollama on my Mac with local models, tunneled through holesail to my nginx.
I also run my Venice ollama-like proxy (link below) to access the LLaMA-3.1-405B model which does not run on my laptop.
And I access it as a PWA from my GrapheneOS phone.
What you see on the video then is:
- I do the first inference on Venice's LLaMA-405B using my project:
You can also get lifetime Venice Pro account there.
- Then I decide to switch to private local inference with Gemma2-27B, which runs on my local Mac
- Then I turn it into picture using the MidJourney prompt generator:
- (resulting image is not generated through Open WebUI, only through Venice's FLUX model with the Pro account)
- Then I ask what the eso-level of this conversation is with my Eso Level Estimator 8000:
The future is now.