11 Deploying an LLM on a Raspberry Pi: How low can you go?

This chapter covers

Setting up a Raspberry Pi server on your local network
Converting and quantizing a model to GGUF format
Serving your model as a drop-in replacement to the OpenAI GPT model
What to do next and how to make it better

The bitterness of poor quality remains long after the sweetness of low price is forgotten.
—Benjamin Franklin

Welcome to one of our favorite projects on this list: serving an LLM on a device smaller than it should ever be served on. In this project, we will be pushing to the edge of this technology. By following along, you’ll be able to really flex everything you’ve learned in this book. In this project, we’ll deploy an LLM to a Raspberry Pi, which we will set up as an LLM Service you can query from any device on your home network. For all the hackers out there, this exercise should open the doors to many home projects. For everyone else, it’s a chance to solidify your understanding of the limitations of using LLMs and appreciate the community that has made this possible.

11.1 Setting up your Raspberry Pi

11.1.1 Pi Imager

11.1.2 Connecting to Pi

11.1.3 Software installations and updates

11.2 Preparing the model

11.3 Serving the model

11.4 Improvements

11.4.1 Using a better interface

11.4.2 Changing quantization

11.4.3 Adding multimodality

11.4.4 Serving the model on Google Colab

Summary