Notes on Ollama AI local LLM

2026-0320b

1 Introduction

This is for Ollama v0.16.1.

Ollama is a free program to run many different LLM models on your own PC. It helps if you have an Nvidia graphics card to run the LLMs on. Models take a lot of memory and processing power.

See https://ollama.com

How much VRAM do I need?

VRAM is the memory in your graphics card (aka GPU). It is faster than your system RAM. The amount of VRAM in your GPU depends on the the actual file size of the model you want to run. If you want to run a model that is 10GB, you should have more than 10GB of VRAM, i.e. you should have about 12GB of VRAM. Save 10-20% of VRAM for processing.

2 Basic Ollama, for Windows

Download and install Ollama on your OS. https://ollama.com/download/windows

Ollama also can interface with online AI providers. For this you will need an API key which generally costs money.

Using the CLI

Now open a cmd.exe window in the program directory and start ollama. We will use the CLI (command line interface) for this example. The Ollama executable files in Windows are in : c:\Users\USERNAME\AppData\Local\Programs\Ollama>

I have a shortcut made to open right to this directory. The cmd.exe shortcut target is C:\Windows\System32\cmd.exe /k c:\apps\autoexec.bat My autoexec.bat just sets up paths to commonly used utilities. The Start In directory is c:\users\USERNAME\AppData\Local\Programs\Ollama\.

Now type this command to start the server: ollama serve. Before we do any other CLI commands the Ollama server must be running.

Now open another cmd.exe window in the same directory under cL\users\USERNAME\AppData\Local\Programs\Ollama\

Type this to get the version of Ollama: ollama --version. The server must be running to do any more with the CLI.

Download a model

Ollama can download a model, this is called "pulling". Go to the Ollama models page at https://ollama.com/search. You can search for a mode by thing its name in the search box like "qwen". Let's search for "qwen", it's a great model for programmers.

Control click on the model name to open the model page in a new browser window. We will download, or "pull" the model "llama3.2:3b" which has a filesize of 3.4GB. Use this command: ollama pull llama3.2:3b.

The model page should have more details on how to use it, like if it's a model that will generate images.

Wait for the model file to download. It will automatically be put in the right directory.

To run a model do this at the CLI: ollama run llama3.2:3b. You should see a text prompt which currently says "Send a message (/? for help)". Now you can ask it questions.

Enter a question like "How many days are in March?"

We can ask it another question like "What are the top 5 programming languages that you have? Make a bullet list with one bullet per entry, and make a clickable link for the source. Make sure the source link is on the same line as the bulleted data." You have to be quite specific with what you want.

Quit the CLI interface by typing /bye.

3 Use the GUI

To start the gui type: "ollama app". On windows you do need those double quotes because there is a space in the executable file name.

A window should popup up in a few seconds. On the right side you should see a "Send a message" prompt area where you type in your prompt. As part of the prompt area you should see a dropdown box to choose a model. Click the dropdown box. You will see a list of models. If there is a download icon next to the model you have NOT downloaded it yet. Click the download icon to download another model, or chose a model you already have like llama3.2:3b.

We can type a question like "How many square miles does India have?"

Wait a few seconds for the result. The wait time will be longer if you don't have a GPU or don't have much GPU memory.

Settings

In the upper left of the GUI window there is a gear icon that says "Settings". Click it.

  1. The Model Location tells you where the models are actually stored if you want to back them up.

There are a few other things here which we won't get into.

4 Model features

  1. Chat models are models where you ask them questions. One is llama3.2. Other models are specialized to answer questions for programming, like Qwen.
  2. MOE models only load part of the model into memory at one time, thus a person with low VRAM can use models where the file size is larger than the VRAM they have. One example is "granite3.1-moe".
  3. Medical models. Some models are trained on medical data, like MedAIBase/MedGamma1.5.

5 Generating images

To generate images you need a model that generates images.

5.1 jmorgan/z-image-turbo

https://ollama.com/jmorgan/z-image-turbo

This model generates images. It is 32GB in size.

  1. Pull command: ollama pull jmorgan/z-image-turbo
  2. Run command: ollama run jmorgan/z-image-turbo

6 Links

  1. Discord invite: It's on this page at the bottom: https://docs.ollama.com/ There is a Reddit group as well.
  2. Ollama docs: https://docs.ollama.com/