A Primer on Running AI Models on Your Own Computer

Copy to clipboard
Should I Read This?

Large Language Models (LLMs) are super accessible online, you just need to head over to ChatGPT or Claude et al. and get cracking with their web-based chatbot interfaces. If you’re on a mobile device you might even get the native apps from your App Store or Google Play Store. For those who are a bit more technically minded, you can access them programmatically using their APIs for which almost all companies have provided really good documentation. However you are actually able to run some of these models directly on your own computer and there are very good reasons why you might want to do this:

Privacy and data security - All data stays on your machine, potentially preferable for sensitive information and private projects.
Cost efficiency - Skip subscription fees and per-token charges from cloud-based AI services.
Lower latency - Get faster responses without server round-trips.
Customisation and fine-tuning - More control over model parameters and tailor models to your needs.
Learning and experimentation - Gain insights into LLMs through hands-on interaction.
Integration with local workflows - Connect with your local development tools and environments.
No internet dependency - Use AI capabilities offline
For the Flex - Be able to say: "Oh, you're using ChatGPT? That's cute. I run my own private AI that I've trained to really get my unique brand of humour.”

Whatever your reasons are it is fairly straightforward and we’re going to walk through how below. Many readers will no doubt have their opinions on other approaches and might even think this approach isn’t very good. Since I don’t have comments enabled for my blog and I don’t really use social media we’ll never know….

Copy to clipboard
A Quick Comment on Model Size

Everything we’re going to do below can be accomplished with a decent laptop or PC, however some of the LLMs have full-fat heavyweight versions that will be almost impossible to run on any home setup. This is because the biggest variants of the models are massive in size and demand insane computational power to run them. For example, DeepSeek’s deepseek-r1 671 billion parameter model weighs a whopping 404 GB and you’ll need something like 16-20 GPUs amounting to over a 1000GB of VRAM to get them moving in any sort of reasonable timeframe.

If all of this sounds like technobabble to you, which it probably does since you’re here reading an intro article on AI, then don’t worry about it, we’re going to use smaller models in this walkthrough which should run. The downside here is that we won’t be dealing with the most capable models but we should nonetheless be able to accomplish some cool stuff.

Copy to clipboard
Ollama and the Obsession with Llamas

If you’ve done any reading online about LLMs you’ll have noticed the internet’s obsession with LLMs and llamas I don’t know why but I guess they can serve as a nice cutesy distraction from what is otherwise a topic that is, for good reason, creating a lot of technoanxiety.

Anyway the key tool we will be using today is called Ollama it is essentially software that creates a container on your computer to allow LLMs to run without much fuss. It provides a way to talk to these LLMs through your command-line and a REST API. It also has some very simple functions which allow us to “pull” models down from the cloud to and then make some modification as to how these models work. Anyway let’s get started.

Prerequisites: I am using a Mac and have VS Code. You can follow-along using a Windows or Linux but I’m not providing separate steps for each as they are so similar and available elsewhere online. You will also need about 20Gb of space free on your hard drive.

You might not have a code editor (VS Code is my preferred) - a quick Google will help, but if you don’t want to you can still go through most of this walkthrough just without doing some of the tweaks to the models at the end.

Copy to clipboard
Installing Ollama

Note these instructions were accurate as of April 2025

Go to Ollama —> Download
Run the installer and you will be presented with a pop-up and an adorable Llama logo, (I don’t want to spoil the surprise so I’ve not included an image) —> Next
You will be asked to install the command line tools, these tools will let our computer know what instructions can be used when controlling Ollama, —> Install and enter your password
It will then provide a suggested command (e.g ollama run llama3.2 ) which is very helpful, but we’re not going to do this now —> Finish
All you will have to show for your efforts now is small llama head icon in your status bar showing you that the Ollama container is running on your machine.

Copy to clipboard
Using Ollama in the Terminal

Since Ollama is pretty bare bones we are not going to get a nice graphical interface like we would with most bits of software or if we used ChatGPT’s chatbot. This isn’t a problem because we are going to use Terminal and feel like a proper computer whiz (flex).

On mac, either:

Open your Spotlight and Search “Terminal”
Finder > Applications > Utilities > Terminal

Once you’ve got terminal open it will look something like this and you’re going to type:

ollama -h

A gif showing the terminal process for installing ollama with DeepSeek r1 model

The ollama bit is telling Terminal that you are interacting with Ollama and the -h bit is a flag. This tells Ollama to help you by listing all the available commands that you can give it from Terminal. Now we have a list of things we can do, great!

Copy to clipboard
Pulling a Model Down Onto Our Computer

Right now the container we talked about is empty, it has no LLMs in it because we haven’t downloaded any from Ollama servers yet. Let’s fix that. Since there’s been lots in the news about DeepSeek and its impact on the dominance of the exist big AI companies let’s have a play with that.

Head over to the Ollama in your browser and go to the model section. Search for “DeepSeek” and then select deepseek-r1 . You should see a screen similar to the below:

Screenshot of the DeepSeek r1 model page from Ollama's website

Make sure you choose the relatively slim “7b” variant for the reasons we discussed at the beginning of this piece and copy the snippet that says ollama run deepseek-r1 .

Go back to your Terminal, paste in this command and you’re going to see a bunch of stuff happen. Don’t worry this is Ollama downloading the 4.7GB model from Ollama’s servers and running it into your Ollama container. Now we’re ready for the fun stuff.

Copy to clipboard
Interacting with the Model Locally

If all of the above has gone smoothly you will be presented with, something that looks like the below. So what the run bit of the command above has done is tell Ollama to run deepseek-r1 and if it’s not installed locally yet, go ahead and download it and then fire it up.

Screenshot of Mac's Terminal running DeepSeek r1 using Ollama

So here we are, we have deepseek-r1 running on our local machine. Yank your router out of the wall and throw it in the bin, you don’t need the internet anymore..

Ok so the terminal is pretty self explanatory now Ollama is telling you what to do “Send a message”. Let’s do it.

Let’s “send” a message to the model. When it says send it means send a message to the container where Ollama is running and not send it over the internet.

As you can see below I’ve somewhat passive aggressively started my conversation and asked it:

PROMPT: “so what even are you bro?”

Copy to clipboard
Breaking Down the Interaction

So an interesting feature of deepseek-r1 is that fact that it’s response contains a <think> tag. This sort of looks like HTML but it isn’t. What it’s doing here is cleverly showing you its “thought process” prior to actually responding to your prompt. This is called a “chain-of-reasoning” approach which helps LLMs consider the best way to process your prompt. This allows us to see under the hood a bit and understand how we might best prompt LLMs to get the best responses back.

You can have a play with various questions and see how well the model responds. If you are a bit of a dweeb like me you might have some indicators on your computer showing how much memory and CPU is being used. When you ask a question the memory usage might spike. If you want further proof this model is running locally try disconnecting from the internet and asking it more questions. It still works!

If you ask DeepSeek a particularly crazy maths question like:

Prompt:“what is 123456789 divided by 1234?”

DeepSeek will have a proper freakout and show you between the <think> tags. LLMs by themselves are poor at maths, what they tend to do nowadays is act as an Agent and go out and connect with a non-AI tool like a calculator to do the calculation. Since our local LLM isn’t connected to a calculator it tries to do it by itself and fails horribly.

If you tried above maths prompt and caused a freakout of your model hold control + C on your Mac or Windows to stop it. You might have to run the model again with ollama run deepseek-r1.

Copy to clipboard
Working with Local LLMs

So this is pretty cool and is only the tip of the iceberg. What is exciting about this is now you have a model running locally you can ask it all sorts of questions privately and without paying anything. This is all well and good but most of us aren’t paying for ChatGPT and aren’t that bothered about the information we share. The exciting stuff comes as we start to get more into what we can do with the models, see the next section below.

Copy to clipboard
Wrapping Up at This Point

If you’re happy with where we are in terms of playing with models locally feel free to stop here. Before you do you might want to free up some space on your computer by removing the model (and any others you might have pulled down).

To do this first we need to exit the Ollama model prompt within Terminal by using the very friendly command: /bye

We can then remind ourselves of the available Ollama commands by typing: ollama -h

What we want to do is remove the 4.7GB DeepSeek model so we are interested in the rm command.

A handy check of how to use this command (ollama rm -h) tell us that we just need to drop in the model name after the command to remove it:

So run: ollama rm deepseek-r1 to remove the model

If you want to confirm it’s gone just try ollama list to see all the models on your machine - if you’ve followed this tutorial then the list should be empty.

If (for example) you ran ollama run llama3.2 at the beginning as was suggested during Ollama installation you will see Llama3.2 in your list. Remove this as well if you’re done.

You can now close your terminal, exit Ollama running in your menu bar and uninstall the Ollama as you normally would on your OS, bye bye Llama! Tell all your friends you ran your own AI model the next time they ask you if you’re happy with the life you lead.

Copy to clipboard
Some More Advanced Stuff

If you’re sticking around with this for a bit then we can see some more exciting stuff. We will need to open VS Code, but if you don’t have it or can’t be bothered then just read along. I’m not going into detail on how to do the VS Code bits because I think ChatGPT/Google can help you out here and it’s pretty simple too:

Create a new folder (call it ollama_playground or whatever) on your computer and open it within VS Code (dragging it in is the easiest).
In your VS Code open a terminal (it has a handy terminal built in): Terminal > New Terminal
Check your current models with ollama list and you should see deepseek-r1 is still there (if not follow the prior steps to pull it down again)
If for some reason you are already running DeepSeek use the command /bye to exit it.

Copy to clipboard
Playing with Vision Models

Let’s try out another model, head back over to the Ollama website and search for “Llava”. You will hopefully find something that looks like the below, sexy description notwithstanding it can do some cool stuff.

Download the model on your local machine by running ollama run llava and let’s have some fun. If you’re running low on space remove other models first.

Now, go to the top of this blog article and save the hero image of a group of llamas into the folder we’ve just created ollama_playground, save it as image-1.jpg so the below code works.

Now back in your terminal try out this prompt:

Prompt: “describe the image ./image-1.jpg to me”

and watch the magic happen:

VS Code walkthrough of Llava runnig Image recognition using Ollama

So if you can’t quite read it, the model responded (in my case at least - it will be slightly different for you):

Response: “In the image, a group of four sheep is seen traversing a barren desert landscape under a clear blue sky. The sheep are in motion, moving away from us towards the left side of the frame. They are scattered across the foreground, with one standing out prominently in the middle of the group. In the distance, a mountain range stretches across the horizon, its peaks subtly visible against the azure sky. The mountains provide a stark contrast to the otherwise flat landscape.The vegetation in the field is sparse, with only a few shrubs dotting the terrain. The colors are muted, reflecting the harshness of the desert environment. Overall, the image captures a serene moment of these four sheep navigating their way across this expansive desert landscape under clear skies.”

Quite poetic and not entirely accurate, but you get the idea. Llamas and sheep do look pretty similar and the llamas might be having what they would consider to be a “serene” moment”, I then asked:

Prompt: “Do you think they could be llamas not sheep and where do llamas live?”

Response: Yes, it is possible that the animals in the image are llamas instead of sheep. Llamas and sheep have some similarities in appearance but are different species with distinct characteristics. Llamas are native to South America, specifically the Andean region which includes countries like Peru, Bolivia, Ecuador, and parts of Chile and Argentina. They are well-adapted to high altitude environments and are often found grazing in mountainous areas or in the grassy plains at lower altitudes. Their wool is used for various purposes, including producing textiles. Due to their unique appearance and adaptability to challenging environments, llamas have become a popular attraction in many tourist destinations within their native range.

I hope you’re starting to see why running AI locally is quite exciting. Its not hard to see how a developer could quite quickly build an application that could process a whole bunch of images or automate workflows with audio, text and all sorts of other content.

By putting these models in a pipeline (e.g. a RAG model <<<link to RAG>>> you can start to do quite powerful things without ever having to rely on external services. You could even deploy entire AI applications in the cloud by running Ollama ring-fenced on your own server. You can guarantee data privacy, manage costs and use a model that is really tailored to your particular use case.

Copy to clipboard
Customising Models for Your Needs

So far we have just pulled the vanilla model down from Ollama and have the base configuration. We’re able to do much more than this, we can actually create our own models with bespoke settings that fit our use case. Let’s see how by creating a model for identifying whether an image contains a human or not.

Copy to clipboard
The Modelfile

If you’ve been following along you will have VS Code open with its terminal and some images in a folder. Now let’s create a text file in the same folder called Modelfile just like that, no file extension.

In this Modelfile paste the following code:

1FROM llama3.2-vision:11b
2
3SYSTEM """
4
5Your job is to describe an image and assess whether there are humans in the image.
6I am showing you an image. 
7It might contain a human it might not. 
8Follow the below steps to perform your job
9 
10Step1: Confirm that you have received an image, if you have not SAY: NO IMAGE and ignore Steps 2 and 3
11Step2: Consider to yourself what you see in the image, take a moment to really understand the image
12Step3: Now respond in the following format AND ONLY THE FOLLOWING FORMAT
13
14DESCRIPTION <20 WORD DESCRIPTION OF IMAGE>
15HUMAN:'YES HUMANS' OR 'NO HUMANS'
16"""
17
18PARAMETER temperature 0.0

Let’s go through this briefly:

FROM is telling Ollama which model to pull from its cloud repository. In this instance we are asking it for llama3.2vision:11b which is Meta’s vision model with 11 billion parameters.
SYSTEM is a system prompt which is telling the model the underlying rules it should following when providing an answer:
1. It is telling it that its role is as an image classifier for detecting humans
2. Walks the model through how it is meant to process its task
3. Prescribes what answer format the model is allowed to give
4. Attempts to stop the tendency of the model to give longer, more descriptive answers
TEMPERATURE is complicated, but it determines how “adventurous” the model will be. Lower temperatures will choose the most probable answers, higher temperatures mean the model may choose lower probability options and be more “creative”. In this instance we want no creativity - just do exactly as instructed.

Ok so now we can build our own model with this Modelfile. Exit any current models with /bye and run the following command:

ollama create human-detector -f Modelfile

We’ve now created our own model called human-detector based on Meta’s Llama3.2 (no relation to Ollama 🦙) which follows the build instructions as defined in our Modelfile.

To run this model:

ollama run human-detector

Beware depending on the specifications of your computer this might take a while…

Now we can play, I’ve pasted directly from my terminal for ease here and I’ve tested a bunch of image sequentially and manually which I’ve included below and linked at the end <<<add anchor link>>>, you can decide for yourself as to how well the model performed.

1**>>> ./image-1.jpg
2Added image './image-1.jpg'**
3The image depicts a group of alpacas standing in a field. 
4The alpacas are white with long necks and legs, and they have distinctive 
5banana-shaped ears. They are standing in a green field with tall grasses and
6wildflowers.
7
8In the background, there are rolling hills and mountains that stretch as far 
9as the eye can see. The sky above is clear and blue, with no clouds visible.
10
11The overall atmosphere of the image is one of serenity and peacefulness, 
12with the alpacas appearing to be content and relaxed in their natural habitat.
13
14HUMAN: NO HUMANS
15
16**>>> ./image-2.jpg
17Added image './image-2.jpg'**
18The image depicts a dimly lit, underground tunnel or passageway. 
19The walls are made of rough-hewn stone blocks, and the floor is composed 
20of worn, gray concrete. A faint light emanates from the far 
21end of the tunnel, casting long shadows across the ground.
22
23In the foreground, there is a small, metal sign attached to the wall,
24 bearing an illegible message or warning. The overall atmosphere of the
25  image is one of mystery and intrigue, with the dim 
26lighting and eerie shadows creating a sense of foreboding.
27
28HUMAN: NO HUMANS
29
30**>>> ./image-3.jpg
31Added image './image-3.jpg'**
32The image depicts a young boy riding on the back of a horse, dressed 
33in traditional Mongolian clothing. The boy is wearing a long-sleeved
34 shirt and pants made of thick, woolen fabric, and he has a 
35fur-lined hat on his head. He is holding onto the horse's
36mane with one hand, while the other hand is raised in a gesture
37 of excitement or celebration.
38
39The horse is brown with a white blaze on its forehead, and it appears
40 to be well-groomed and healthy. The background of the image is blurry,
41  but it appears to be a rural landscape with rolling hills 
42and trees in the distance.
43
44Overall, the image conveys a sense of joy and freedom, as the boy
45 rides through the countryside on his horse, enjoying the fresh air
46 and scenery.
47
48HUMAN: YES HUMANS
49
50**>>> ./image-4.jpg
51Added image './image-4.jpg'**
52The image depicts a serene and peaceful scene of a person standing
53in front of a large window, gazing out at a breathtaking view. The 
54person is dressed in casual attire, with their hair pulled back 
55and a relaxed posture, conveying a sense of calmness and contentment.
56
57The window behind them is floor-to-ceiling, allowing an unobstructed
58view of the surrounding landscape. The sky outside is a brilliant blue,
59with only a few wispy clouds scattered across it. In the distance,
60 there are rolling hills and mountains that stretch as far as the eye can see.
61
62The overall atmosphere of the image is one of tranquility and relaxation,
63 inviting the viewer to step into the peaceful world depicted within.
64
65HUMAN: YES HUMANS

A montage of multiple images used for processing, one is a herd of llamas, another is a shadowy staircase, another is a falcon hunter on horseback and another is a silhouetted person walking

The responses are far from perfect, but it’s pretty impressive how accurately a model running on our own computer (not a massive datacenter) was able to decently describe quite a range of different images. I chose the bottom right picture of the silhouetted person to really push the model and it didn’t do too bad a job. In all cases it was able to correctly classify whether or not a human was in each image and followed our instructions closely when responding.

It did ignore our instructions on limiting the description length to 20 words and it did hallucinate certain features within the images, especially the silhouetted one. Having said that this whole exercise was only 20 minutes work, mainly tweaking the Modelfile SYSTEM prompt to improve results.

Copy to clipboard
Conclusion

Hopefully you’ve seen how easy it can be to get up and running to utilise AI models on your local machine and can imagine with a little bit of work and extension you could realistically build these into bespoke production applications. These models are only going to get better, and with the arrival of other tools to create pipelines to help them perform the tasks we ask of them things they are only going to get more performant and versatile.