2026-0320b
This is for Ollama v0.16.1.
Ollama is a free program to run many different LLM models on your own PC. It helps if you have an Nvidia graphics card to run the LLMs on. Models take a lot of memory and processing power.
How much VRAM do I need?
VRAM is the memory in your graphics card (aka GPU). It is faster than your system RAM. The amount of VRAM in your GPU depends on the the actual file size of the model you want to run. If you want to run a model that is 10GB, you should have more than 10GB of VRAM, i.e. you should have about 12GB of VRAM. Save 10-20% of VRAM for processing.
Download and install Ollama on your OS. https://ollama.com/download/windows
Ollama also can interface with online AI providers. For this you will need an API key which generally costs money.
Using the CLI
Now open a cmd.exe window in the program directory and start ollama.
We will use the CLI (command line interface) for this example. The
Ollama executable files in Windows are in :
c:\Users\USERNAME\AppData\Local\Programs\Ollama>
I have a shortcut made to open right to this directory. The cmd.exe
shortcut target is
C:\Windows\System32\cmd.exe /k c:\apps\autoexec.bat My
autoexec.bat just sets up paths to commonly used utilities. The Start In
directory is
c:\users\USERNAME\AppData\Local\Programs\Ollama\.
Now type this command to start the server: ollama serve.
Before we do any other CLI commands the Ollama server must be
running.
Now open another cmd.exe window in the same directory under
cL\users\USERNAME\AppData\Local\Programs\Ollama\
Type this to get the version of Ollama:
ollama --version. The server must be running to do any more
with the CLI.
Download a model
Ollama can download a model, this is called "pulling". Go to the Ollama models page at https://ollama.com/search. You can search for a mode by thing its name in the search box like "qwen". Let's search for "qwen", it's a great model for programmers.
Control click on the model name to open the model page in a new
browser window. We will download, or "pull" the model "llama3.2:3b"
which has a filesize of 3.4GB. Use this command:
ollama pull llama3.2:3b.
The model page should have more details on how to use it, like if it's a model that will generate images.
Wait for the model file to download. It will automatically be put in the right directory.
To run a model do this at the CLI:
ollama run llama3.2:3b. You should see a text prompt which
currently says "Send a message (/? for help)". Now you can ask it
questions.
Enter a question like "How many days are in March?"
We can ask it another question like "What are the top 5 programming languages that you have? Make a bullet list with one bullet per entry, and make a clickable link for the source. Make sure the source link is on the same line as the bulleted data." You have to be quite specific with what you want.
Quit the CLI interface by typing /bye.
To start the gui type: "ollama app". On windows you do
need those double quotes because there is a space in the executable file
name.
A window should popup up in a few seconds. On the right side you should see a "Send a message" prompt area where you type in your prompt. As part of the prompt area you should see a dropdown box to choose a model. Click the dropdown box. You will see a list of models. If there is a download icon next to the model you have NOT downloaded it yet. Click the download icon to download another model, or chose a model you already have like llama3.2:3b.
We can type a question like "How many square miles does India have?"
Wait a few seconds for the result. The wait time will be longer if you don't have a GPU or don't have much GPU memory.
Settings
In the upper left of the GUI window there is a gear icon that says "Settings". Click it.
There are a few other things here which we won't get into.
To generate images you need a model that generates images.
https://ollama.com/jmorgan/z-image-turbo
This model generates images. It is 32GB in size.
ollama pull jmorgan/z-image-turboollama run jmorgan/z-image-turbo