Skip to main content
  1. Posts/

Running Local AI on Any Laptop: A Beginner's Guide to Ollama and AnythingLLM

·1387 words·7 mins·
Arnab
Author
Arnab
Sharing thoughts and experiments from my digital workshop

My laptop sounds like a small aircraft when I have more than five Chrome tabs open. It’s three years old, the fan is permanently confused, and I’ve accepted that it will never run Cyberpunk 2077.

Last month I ran a 7-billion-parameter language model on it anyway.

No GPU. No cloud subscription. Just two free tools and maybe twenty minutes of fiddling. If you’ve been curious about running your own AI models but figured you needed expensive hardware, I was in the same boat. Here’s what I found out: you probably don’t need it.

Why I Started Running AI Locally
#

Full disclosure: I was skeptical. My homelab runs on repurposed hardware and wishful thinking. When people started talking about “local LLMs,” I assumed it was another thing I’d never touch without a graphics card that costs more than my car.

Then a friend showed me what he was doing with Ollama on a 2019 ThinkPad. No cloud APIs. No monthly fees. Just asking questions and getting answers, completely offline.

That got my attention. A few weeks later, I had my own setup running. Here’s why I kept it:

Privacy. When you’re experimenting with AI, you ask dumb questions. You test it with work documents you shouldn’t upload to the cloud. You want to keep your curiosity private. Local models don’t phone home. What you type stays on your machine.

It’s actually free. No trial periods. No “you’ve used all your tokens.” Once it’s set up, you can hammer it as much as you want. I’ve had sessions where I asked hundreds of questions just to see where things broke.

You learn how this stuff works. Clicking around ChatGPT teaches you how to prompt. Running models locally teaches you about token limits, quantization, model sizes—the actual mechanics. That knowledge transfers to other projects.

No one can change the terms on you. OpenAI decides to block certain topics? Switch models. Anthropic changes pricing? Download something else. You’re not locked into anyone’s ecosystem.

Ollama: The Thing That Just Works
#

Ollama is what happens when someone decides the entire LLM stack is too complicated and fixes it. One command downloads a model. Another runs it.

ollama pull llama3.2
ollama run llama3.2

That’s it. You’re now chatting with a language model running on your hardware.

The model library has the usual suspects—Llama 3.2, Mistral, Gemma, Phi-3. They come pre-quantized, which is a fancy way of saying “compressed so they fit in your RAM without turning into gibberish.”

What If My Laptop Is Too Old?
#

Ollama Cloud launched recently with a free tier. Same commands, same interface—the model just runs on their servers instead of your machine.

I tested this on my parent’s Lenovo Mini PC (the one I wrote about in the Proxmox post). That thing struggles with YouTube. But with Ollama Cloud configured, it ran Mistral without breaking a sweat. The requests go out, the response comes back, and you don’t really notice the difference unless you’re watching latency.

The free tier has limits. For messing around and learning? More than enough.

AnythingLLM: Because Clicking Beats Typing
#

The command line is great for setup. For daily use, I want something nicer. AnythingLLM gives you a ChatGPT-like interface that talks to your local Ollama instance.

It’s not just a pretty face, though. The thing that kept me using it is RAG.

RAG Without the Buzzwords
#

RAG stands for Retrieval-Augmented Generation, which is consultant-speak for “the AI can read your documents.”

Here’s what actually happens:

  1. You upload a PDF or text file
  2. AnythingLLM breaks it into chunks and creates embeddings (numerical representations of meaning—don’t overthink this)
  3. When you ask a question, it finds the relevant chunks
  4. The LLM generates an answer using your documents as context

I use this for technical manuals I should read but won’t. Upload the PDF, ask “how do I configure X,” and it points me to the right section. Sometimes it’s wrong. But it’s wrong faster than I could find the answer myself.

You can also feed it your own writing, meeting notes, research papers—anything you want to query later. It turns a generic LLM into something that knows about your stuff.


Setting It Up
#

I’ve tested these steps on both Windows and Linux. Pick your section and follow along.

Windows Installation

Step 1: Install Ollama

Download the installer from ollama.com and run it. That’s the whole process—it sets itself up and runs in the background.

Open PowerShell and verify:

ollama --version

Step 2: Install AnythingLLM

Download from anythingllm.com and install like any other Windows app.

Step 3: Connect Them

When you launch AnythingLLM, it asks for an LLM provider. Pick “Ollama.” The endpoint should already say http://localhost:11434. Click connect.

Step 4: Get a Model

In PowerShell:

ollama pull llama3.2

Or if you want something slightly smarter and your hardware can handle it:

ollama pull mistral

Step 5: Start Chatting

In AnythingLLM, go to Settings → LLM and select your model. Save. Start asking questions.

Linux Installation

Step 1: Install Ollama

One command:

curl -fsSL https://ollama.com/install.sh | sh

This sets up a systemd service, so Ollama starts automatically. Verify with:

ollama --version

Step 2: Install AnythingLLM

You have options. The AppImage is simplest:

mkdir -p ~/apps
cd ~/apps
wget https://anythingllm.com/releases/AnythingLLM-Desktop-1.0.0.AppImage
chmod +x AnythingLLM-Desktop-1.0.0.AppImage
./AnythingLLM-Desktop-1.0.0.AppImage

Or Docker, if that’s your thing:

docker run -d -p 3001:3001 \
  --name anythingllm \
  -v ~/anythingllm:/app/storage \
  ghcr.io/mintplex-labs/anythingllm

Then hit http://localhost:3001 in your browser.

Step 3: Connect Them

Open AnythingLLM, select Ollama as your provider, confirm the endpoint is http://localhost:11434, and connect.

Step 4: Get a Model

ollama pull llama3.2

Step 5: Configure

Settings → LLM in AnythingLLM, pick your model, save. You’re done.


Actually Using RAG
#

Once you’re set up, try this:

  1. Create a new workspace in AnythingLLM
  2. Upload a PDF or text file
  3. Wait for the embedding to finish (there’s a progress bar)
  4. Ask questions about the document

Good test documents:

  • Technical manuals you’ve been avoiding
  • Your own blog posts or writing
  • Meeting transcripts
  • Research papers in your field

The interface shows which document it pulled from, so you can verify the answers aren’t hallucinated. Sometimes they are. That’s when you know to dig deeper.

Things That Went Wrong (So You Don’t Have To)
#

“Connection refused” in AnythingLLM

Ollama isn’t running. On Linux:

systemctl status ollama
systemctl start ollama

On Windows, check the system tray for the Ollama icon.

Model is painfully slow

Smaller model. Llama 3.2 comes in sizes:

ollama pull llama3.2:1b  # Fast, dumb
ollama pull llama3.2     # Medium, okay

Out of memory

Close Chrome. Seriously. If that doesn’t help, switch to Ollama Cloud:

export OLLAMA_HOST=https://api.ollama.cloud

You’ll need an API key from their dashboard.

RAG gives generic answers

Check that the embedding finished. Also, be specific in your questions—mention the document name. “What does the Kubernetes deployment guide say about rolling updates?” works better than “how do rolling updates work?”

Docker installation can’t find documents

Volume mount issue. Check:

docker logs anythingllm

Look for permission errors. The ~/anythingllm directory needs to exist and be writable.


Where I Went From Here
#

Once you have this running, here’s what I’d suggest:

Try different models. Llama 3.2 for general stuff. Mistral when you need better reasoning. Gemma if you’re curious about Google’s take. Swap them in seconds.

Build a personal knowledge base. I uploaded my homelab documentation and can now ask “what’s the IP for the Proxmox backup server?” without remembering where I wrote it down.

Create workspaces for different things. I have one for coding help, one for writing feedback, one for random experimentation. They keep conversations organized.

Play with the API. Ollama has a REST API. I’m working on a script that pulls my calendar and lets me ask questions about my schedule. It’s useless but fun.

The best way to learn is to break things. Ask weird questions. Upload documents that are too long. See where the model gives up. That’s how you build intuition for what these tools can actually do.

If you want to go deeper, the Ollama docs and AnythingLLM docs explain more. But honestly, you’ll learn more by just using it.

Three weeks ago, I thought local AI was for people with better hardware than me. Now I run models on a laptop that struggles with video calls. Turned out the barrier was mostly in my head.


Related