.jpg%3F2025-04-17T11%3A01%3A54.817Z&w=3840&q=100)
Copy to clipboardShould I Read This?
Large Language Models (LLMs) are super accessible online, you just need to head over to ChatGPT or Claude et al. and get cracking with their web-based chatbot interfaces. If you’re on a mobile device you might even get the native apps from your App Store or Google Play Store. For those who are a bit more technically minded, you can access them programmatically using their APIs for which almost all companies have provided really good documentation. However you are actually able to run some of these models directly on your own computer and there are very good reasons why you might want to do this:
- Privacy and data security - All data stays on your machine, potentially preferable for sensitive information and private projects.
- Cost efficiency - Skip subscription fees and per-token charges from cloud-based AI services.
- Lower latency - Get faster responses without server round-trips.
- Customisation and fine-tuning - More control over model parameters and tailor models to your needs.
- Learning and experimentation - Gain insights into LLMs through hands-on interaction.
- Integration with local workflows - Connect with your local development tools and environments.
- No internet dependency - Use AI capabilities offline
- For the Flex - Be able to say: "Oh, you're using ChatGPT? That's cute. I run my own private AI that I've trained to really get my unique brand of humour.”
Whatever your reasons are it is fairly straightforward and we’re going to walk through how below. Many readers will no doubt have their opinions on other approaches and might even think this approach isn’t very good. Since I don’t have comments enabled for my blog and I don’t really use social media we’ll never know….
Copy to clipboardA Quick Comment on Model Size
Everything we’re going to do below can be accomplished with a decent laptop or PC, however some of the LLMs have full-fat heavyweight versions that will be almost impossible to run on any home setup. This is because the biggest variants of the models are massive in size and demand insane computational power to run them. For example, DeepSeek’s deepseek-r1
671 billion parameter model weighs a whopping 404 GB and you’ll need something like 16-20 GPUs amounting to over a 1000GB of VRAM to get them moving in any sort of reasonable timeframe.
If all of this sounds like technobabble to you, which it probably does since you’re here reading an intro article on AI, then don’t worry about it, we’re going to use smaller models in this walkthrough which should run. The downside here is that we won’t be dealing with the most capable models but we should nonetheless be able to accomplish some cool stuff.
Copy to clipboardOllama and the Obsession with Llamas
If you’ve done any reading online about LLMs you’ll have noticed the internet’s obsession with LLMs and llamas I don’t know why but I guess they can serve as a nice cutesy distraction from what is otherwise a topic that is, for good reason, creating a lot of technoanxiety.
Anyway the key tool we will be using today is called Ollama it is essentially software that creates a container on your computer to allow LLMs to run without much fuss. It provides a way to talk to these LLMs through your command-line and a REST API. It also has some very simple functions which allow us to “pull” models down from the cloud to and then make some modification as to how these models work. Anyway let’s get started.
Prerequisites: I am using a Mac and have VS Code. You can follow-along using a Windows or Linux but I’m not providing separate steps for each as they are so similar and available elsewhere online. You will also need about 20Gb of space free on your hard drive.
You might not have a code editor (VS Code is my preferred) - a quick Google will help, but if you don’t want to you can still go through most of this walkthrough just without doing some of the tweaks to the models at the end.
Copy to clipboardInstalling Ollama
Note these instructions were accurate as of April 2025
- Go to Ollama —> Download
- Run the installer and you will be presented with a pop-up and an adorable Llama logo, (I don’t want to spoil the surprise so I’ve not included an image) —> Next
- You will be asked to install the command line tools, these tools will let our computer know what instructions can be used when controlling Ollama, —> Install and enter your password
- It will then provide a suggested command (e.g
ollama run llama3.2
) which is very helpful, but we’re not going to do this now —> Finish - All you will have to show for your efforts now is small llama head icon in your status bar showing you that the Ollama container is running on your machine.
Copy to clipboardUsing Ollama in the Terminal
Since Ollama is pretty bare bones we are not going to get a nice graphical interface like we would with most bits of software or if we used ChatGPT’s chatbot. This isn’t a problem because we are going to use Terminal and feel like a proper computer whiz (flex).
On mac, either:
- Open your Spotlight and Search “Terminal”
- Finder > Applications > Utilities > Terminal
Once you’ve got terminal open it will look something like this and you’re going to type:
ollama -h

The ollama
bit is telling Terminal that you are interacting with Ollama and the -h
bit is a flag. This tells Ollama to help you by listing all the available commands that you can give it from Terminal. Now we have a list of things we can do, great!
Copy to clipboardPulling a Model Down Onto Our Computer
Right now the container we talked about is empty, it has no LLMs in it because we haven’t downloaded any from Ollama servers yet. Let’s fix that. Since there’s been lots in the news about DeepSeek and its impact on the dominance of the exist big AI companies let’s have a play with that.
Head over to the Ollama in your browser and go to the model section. Search for “DeepSeek” and then select deepseek-r1
. You should see a screen similar to the below:

Make sure you choose the relatively slim “7b” variant for the reasons we discussed at the beginning of this piece and copy the snippet that says ollama run deepseek-r1
.
Go back to your Terminal, paste in this command and you’re going to see a bunch of stuff happen. Don’t worry this is Ollama downloading the 4.7GB model from Ollama’s servers and running it into your Ollama container. Now we’re ready for the fun stuff.
Copy to clipboardInteracting with the Model Locally
If all of the above has gone smoothly you will be presented with, something that looks like the below. So what the run
bit of the command above has done is tell Ollama to run deepseek-r1
and if it’s not installed locally yet, go ahead and download it and then fire it up.

So here we are, we have deepseek-r1
running on our local machine. Yank your router out of the wall and throw it in the bin, you don’t need the internet anymore..
Ok so the terminal is pretty self explanatory now Ollama is telling you what to do “Send a message”. Let’s do it.
Let’s “send” a message to the model. When it says send it means send a message to the container where Ollama is running and not send it over the internet.
As you can see below I’ve somewhat passive aggressively started my conversation and asked it:
PROMPT: “so what even are you bro?”

Copy to clipboardBreaking Down the Interaction
So an interesting feature of deepseek-r1
is that fact that it’s response contains a <think>
tag. This sort of looks like HTML but it isn’t. What it’s doing here is cleverly showing you its “thought process” prior to actually responding to your prompt. This is called a “chain-of-reasoning” approach which helps LLMs consider the best way to process your prompt. This allows us to see under the hood a bit and understand how we might best prompt LLMs to get the best responses back.
You can have a play with various questions and see how well the model responds. If you are a bit of a dweeb like me you might have some indicators on your computer showing how much memory and CPU is being used. When you ask a question the memory usage might spike. If you want further proof this model is running locally try disconnecting from the internet and asking it more questions. It still works!
If you ask DeepSeek a particularly crazy maths question like:
Prompt:“what is 123456789 divided by 1234?”
DeepSeek will have a proper freakout and show you between the <think>
tags. LLMs by themselves are poor at maths, what they tend to do nowadays is act as an Agent and go out and connect with a non-AI tool like a calculator to do the calculation. Since our local LLM isn’t connected to a calculator it tries to do it by itself and fails horribly.
If you tried above maths prompt and caused a freakout of your model hold control + C on your Mac or Windows to stop it. You might have to run the model again with ollama run deepseek-r1.
Copy to clipboardWorking with Local LLMs
So this is pretty cool and is only the tip of the iceberg. What is exciting about this is now you have a model running locally you can ask it all sorts of questions privately and without paying anything. This is all well and good but most of us aren’t paying for ChatGPT and aren’t that bothered about the information we share. The exciting stuff comes as we start to get more into what we can do with the models, see the next section below.
Copy to clipboardWrapping Up at This Point
If you’re happy with where we are in terms of playing with models locally feel free to stop here. Before you do you might want to free up some space on your computer by removing the model (and any others you might have pulled down).
To do this first we need to exit the Ollama model prompt within Terminal by using the very friendly command: /bye
We can then remind ourselves of the available Ollama commands by typing: ollama -h
What we want to do is remove the 4.7GB DeepSeek model so we are interested in the rm
command.
A handy check of how to use this command (ollama rm -h
) tell us that we just need to drop in the model name after the command to remove it:
So run: ollama rm deepseek-r1
to remove the model
If you want to confirm it’s gone just try ollama list
to see all the models on your machine - if you’ve followed this tutorial then the list should be empty.
If (for example) you ran ollama run llama3.2
at the beginning as was suggested during Ollama installation you will see Llama3.2 in your list. Remove this as well if you’re done.
You can now close your terminal, exit Ollama running in your menu bar and uninstall the Ollama as you normally would on your OS, bye bye Llama! Tell all your friends you ran your own AI model the next time they ask you if you’re happy with the life you lead.
Copy to clipboardSome More Advanced Stuff
If you’re sticking around with this for a bit then we can see some more exciting stuff. We will need to open VS Code, but if you don’t have it or can’t be bothered then just read along. I’m not going into detail on how to do the VS Code bits because I think ChatGPT/Google can help you out here and it’s pretty simple too:
- Create a new folder (call it
ollama_playground
or whatever) on your computer and open it within VS Code (dragging it in is the easiest). - In your VS Code open a terminal (it has a handy terminal built in): Terminal > New Terminal
- Check your current models with
ollama list
and you should seedeepseek-r1
is still there (if not follow the prior steps to pull it down again) - If for some reason you are already running DeepSeek use the command
/bye
to exit it.
Copy to clipboardPlaying with Vision Models
Let’s try out another model, head back over to the Ollama website and search for “Llava”. You will hopefully find something that looks like the below, sexy description notwithstanding it can do some cool stuff.
Download the model on your local machine by running ollama run llava
and let’s have some fun. If you’re running low on space remove other models first.
Now, go to the top of this blog article and save the hero image of a group of llamas into the folder we’ve just created ollama_playground
, save it as image-1.jpg
so the below code works.
Now back in your terminal try out this prompt:
Prompt: “describe the image ./image-1.jpg to me”
and watch the magic happen:

So if you can’t quite read it, the model responded (in my case at least - it will be slightly different for you):
Response: “In the image, a group of four sheep is seen traversing a barren desert landscape under a clear blue sky. The sheep are in motion, moving away from us towards the left side of the frame. They are scattered across the foreground, with one standing out prominently in the middle of the group. In the distance, a mountain range stretches across the horizon, its peaks subtly visible against the azure sky. The mountains provide a stark contrast to the otherwise flat landscape.The vegetation in the field is sparse, with only a few shrubs dotting the terrain. The colors are muted, reflecting the harshness of the desert environment. Overall, the image captures a serene moment of these four sheep navigating their way across this expansive desert landscape under clear skies.”
Quite poetic and not entirely accurate, but you get the idea. Llamas and sheep do look pretty similar and the llamas might be having what they would consider to be a “serene” moment”, I then asked:
Prompt: “Do you think they could be llamas not sheep and where do llamas live?”
Response: Yes, it is possible that the animals in the image are llamas instead of sheep. Llamas and sheep have some similarities in appearance but are different species with distinct characteristics. Llamas are native to South America, specifically the Andean region which includes countries like Peru, Bolivia, Ecuador, and parts of Chile and Argentina. They are well-adapted to high altitude environments and are often found grazing in mountainous areas or in the grassy plains at lower altitudes. Their wool is used for various purposes, including producing textiles. Due to their unique appearance and adaptability to challenging environments, llamas have become a popular attraction in many tourist destinations within their native range.
I hope you’re starting to see why running AI locally is quite exciting. Its not hard to see how a developer could quite quickly build an application that could process a whole bunch of images or automate workflows with audio, text and all sorts of other content.
By putting these models in a pipeline (e.g. a RAG model <<<link to RAG>>> you can start to do quite powerful things without ever having to rely on external services. You could even deploy entire AI applications in the cloud by running Ollama ring-fenced on your own server. You can guarantee data privacy, manage costs and use a model that is really tailored to your particular use case.
Copy to clipboardCustomising Models for Your Needs
So far we have just pulled the vanilla model down from Ollama and have the base configuration. We’re able to do much more than this, we can actually create our own models with bespoke settings that fit our use case. Let’s see how by creating a model for identifying whether an image contains a human or not.
Copy to clipboardThe Modelfile
If you’ve been following along you will have VS Code open with its terminal and some images in a folder. Now let’s create a text file in the same folder called Modelfile
just like that, no file extension.
In this Modelfile paste the following code:
1FROM llama3.2-vision:11b23SYSTEM """45Your job is to describe an image and assess whether there are humans in the image.6I am showing you an image.7It might contain a human it might not.8Follow the below steps to perform your job910Step1: Confirm that you have received an image, if you have not SAY: NO IMAGE and ignore Steps 2 and 311Step2: Consider to yourself what you see in the image, take a moment to really understand the image12Step3: Now respond in the following format AND ONLY THE FOLLOWING FORMAT1314DESCRIPTION <20 WORD DESCRIPTION OF IMAGE>15HUMAN:'YES HUMANS' OR 'NO HUMANS'16"""1718PARAMETER temperature 0.0
Let’s go through this briefly:
FROM
is telling Ollama which model to pull from its cloud repository. In this instance we are asking it forllama3.2vision:11b
which is Meta’s vision model with 11 billion parameters.SYSTEM
is a system prompt which is telling the model the underlying rules it should following when providing an answer:- It is telling it that its role is as an image classifier for detecting humans
- Walks the model through how it is meant to process its task
- Prescribes what answer format the model is allowed to give
- Attempts to stop the tendency of the model to give longer, more descriptive answers
TEMPERATURE
is complicated, but it determines how “adventurous” the model will be. Lower temperatures will choose the most probable answers, higher temperatures mean the model may choose lower probability options and be more “creative”. In this instance we want no creativity - just do exactly as instructed.
Ok so now we can build our own model with this Modelfile. Exit any current models with /bye
and run the following command:
ollama create human-detector -f Modelfile
We’ve now created our own model called human-detector
based on Meta’s Llama3.2 (no relation to Ollama 🦙) which follows the build instructions as defined in our Modelfile.
To run this model:
ollama run human-detector
Beware depending on the specifications of your computer this might take a while…
Now we can play, I’ve pasted directly from my terminal for ease here and I’ve tested a bunch of image sequentially and manually which I’ve included below and linked at the end <<<add anchor link>>>, you can decide for yourself as to how well the model performed.
1**>>> ./image-1.jpg2Added image './image-1.jpg'**3The image depicts a group of alpacas standing in a field.4The alpacas are white with long necks and legs, and they have distinctive5banana-shaped ears. They are standing in a green field with tall grasses and6wildflowers.78In the background, there are rolling hills and mountains that stretch as far9as the eye can see. The sky above is clear and blue, with no clouds visible.1011The overall atmosphere of the image is one of serenity and peacefulness,12with the alpacas appearing to be content and relaxed in their natural habitat.1314HUMAN: NO HUMANS1516**>>> ./image-2.jpg17Added image './image-2.jpg'**18The image depicts a dimly lit, underground tunnel or passageway.19The walls are made of rough-hewn stone blocks, and the floor is composed20of worn, gray concrete. A faint light emanates from the far21end of the tunnel, casting long shadows across the ground.2223In the foreground, there is a small, metal sign attached to the wall,24 bearing an illegible message or warning. The overall atmosphere of the25 image is one of mystery and intrigue, with the dim26lighting and eerie shadows creating a sense of foreboding.2728HUMAN: NO HUMANS2930**>>> ./image-3.jpg31Added image './image-3.jpg'**32The image depicts a young boy riding on the back of a horse, dressed33in traditional Mongolian clothing. The boy is wearing a long-sleeved34 shirt and pants made of thick, woolen fabric, and he has a35fur-lined hat on his head. He is holding onto the horse's36mane with one hand, while the other hand is raised in a gesture37 of excitement or celebration.3839The horse is brown with a white blaze on its forehead, and it appears40 to be well-groomed and healthy. The background of the image is blurry,41 but it appears to be a rural landscape with rolling hills42and trees in the distance.4344Overall, the image conveys a sense of joy and freedom, as the boy45 rides through the countryside on his horse, enjoying the fresh air46 and scenery.4748HUMAN: YES HUMANS4950**>>> ./image-4.jpg51Added image './image-4.jpg'**52The image depicts a serene and peaceful scene of a person standing53in front of a large window, gazing out at a breathtaking view. The54person is dressed in casual attire, with their hair pulled back55and a relaxed posture, conveying a sense of calmness and contentment.5657The window behind them is floor-to-ceiling, allowing an unobstructed58view of the surrounding landscape. The sky outside is a brilliant blue,59with only a few wispy clouds scattered across it. In the distance,60 there are rolling hills and mountains that stretch as far as the eye can see.6162The overall atmosphere of the image is one of tranquility and relaxation,63 inviting the viewer to step into the peaceful world depicted within.6465HUMAN: YES HUMANS

The responses are far from perfect, but it’s pretty impressive how accurately a model running on our own computer (not a massive datacenter) was able to decently describe quite a range of different images. I chose the bottom right picture of the silhouetted person to really push the model and it didn’t do too bad a job. In all cases it was able to correctly classify whether or not a human was in each image and followed our instructions closely when responding.
It did ignore our instructions on limiting the description length to 20 words and it did hallucinate certain features within the images, especially the silhouetted one. Having said that this whole exercise was only 20 minutes work, mainly tweaking the Modelfile SYSTEM
prompt to improve results.
Copy to clipboardConclusion
Hopefully you’ve seen how easy it can be to get up and running to utilise AI models on your local machine and can imagine with a little bit of work and extension you could realistically build these into bespoke production applications. These models are only going to get better, and with the arrival of other tools to create pipelines to help them perform the tasks we ask of them things they are only going to get more performant and versatile.
Copy to clipboardCleaning Up
Earlier on I described how to remove these models, for ease:
/bye
exit the model promptollama list
ollama <rm> <modelname>
- Quit and Uninstall Ollama if you want
Copy to clipboardImage Credits
https://unsplash.com/photos/a-silhouette-dances-gracefully-on-a-reflective-floor-WMvQlDWt0Ds
https://unsplash.com/photos/a-mongolian-eagle-hunter-on-horseback-Jdhacfg3iW4
https://unsplash.com/photos/stairs-lead-to-light-in-a-dark-subway-L6iZmGhCss0
https://unsplash.com/photos/a-group-of-llamas-walking-in-the-desert-Yp7eEcfuSR8
Copy to clipboardResources
https://github.com/ZeidBsaibes/ollama-playground
https://www.youtube.com/watch?v=GWB9ApTPTv4&t=3736s&ab_channel=freeCodeCamp.org
https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/

A walkthrough of AI fundamentals which helps contextualise the current excitement around on Retrieval-Augmented Generation (RAG) and Agentic AI Applications.