My laptop sounds like a small aircraft when I have more than five Chrome tabs open. It’s three years old, the fan is permanently confused, and I’ve accepted that it will never run Cyberpunk 2077.
Last month I ran a 7-billion-parameter language model on it anyway.
No GPU. No cloud subscription. Just two free tools and maybe twenty minutes of fiddling. If you’ve been curious about running your own AI models but figured you needed expensive hardware, I was in the same boat. Here’s what I found out: you probably don’t need it.
Why I Started Running AI Locally#
Full disclosure: I was skeptical. My homelab runs on repurposed hardware and wishful thinking. When people started talking about “local LLMs,” I assumed it was another thing I’d never touch without a graphics card that costs more than my car.
Then a friend showed me what he was doing with Ollama on a 2019 ThinkPad. No cloud APIs. No monthly fees. Just asking questions and getting answers, completely offline.
That got my attention. A few weeks later, I had my own setup running. Here’s why I kept it:
Privacy. When you’re experimenting with AI, you ask dumb questions. You test it with work documents you shouldn’t upload to the cloud. You want to keep your curiosity private. Local models don’t phone home. What you type stays on your machine.
It’s actually free. No trial periods. No “you’ve used all your tokens.” Once it’s set up, you can hammer it as much as you want. I’ve had sessions where I asked hundreds of questions just to see where things broke.
You learn how this stuff works. Clicking around ChatGPT teaches you how to prompt. Running models locally teaches you about token limits, quantization, model sizes—the actual mechanics. That knowledge transfers to other projects.
No one can change the terms on you. OpenAI decides to block certain topics? Switch models. Anthropic changes pricing? Download something else. You’re not locked into anyone’s ecosystem.
Ollama: The Thing That Just Works#
Ollama is what happens when someone decides the entire LLM stack is too complicated and fixes it. One command downloads a model. Another runs it.
ollama pull llama3.2
ollama run llama3.2That’s it. You’re now chatting with a language model running on your hardware.
The model library has the usual suspects—Llama 3.2, Mistral, Gemma, Phi-3. They come pre-quantized, which is a fancy way of saying “compressed so they fit in your RAM without turning into gibberish.”
What If My Laptop Is Too Old?#
Ollama Cloud launched recently with a free tier. Same commands, same interface—the model just runs on their servers instead of your machine.
I tested this on my parent’s Lenovo Mini PC (the one I wrote about in the Proxmox post). That thing struggles with YouTube. But with Ollama Cloud configured, it ran Mistral without breaking a sweat. The requests go out, the response comes back, and you don’t really notice the difference unless you’re watching latency.
The free tier has limits. For messing around and learning? More than enough.
AnythingLLM: Because Clicking Beats Typing#
The command line is great for setup. For daily use, I want something nicer. AnythingLLM gives you a ChatGPT-like interface that talks to your local Ollama instance.
It’s not just a pretty face, though. The thing that kept me using it is RAG.
RAG Without the Buzzwords#
RAG stands for Retrieval-Augmented Generation, which is consultant-speak for “the AI can read your documents.”
Here’s what actually happens:
- You upload a PDF or text file
- AnythingLLM breaks it into chunks and creates embeddings (numerical representations of meaning—don’t overthink this)
- When you ask a question, it finds the relevant chunks
- The LLM generates an answer using your documents as context
I use this for technical manuals I should read but won’t. Upload the PDF, ask “how do I configure X,” and it points me to the right section. Sometimes it’s wrong. But it’s wrong faster than I could find the answer myself.
You can also feed it your own writing, meeting notes, research papers—anything you want to query later. It turns a generic LLM into something that knows about your stuff.
Setting It Up#
I’ve tested these steps on both Windows and Linux. Pick your section and follow along.
Windows Installation
Step 1: Install Ollama
Download the installer from ollama.com and run it. That’s the whole process—it sets itself up and runs in the background.
Open PowerShell and verify:
ollama --versionStep 2: Install AnythingLLM
Download from anythingllm.com and install like any other Windows app.
Step 3: Connect Them
When you launch AnythingLLM, it asks for an LLM provider. Pick “Ollama.” The endpoint should already say http://localhost:11434. Click connect.
Step 4: Get a Model
In PowerShell:
ollama pull llama3.2Or if you want something slightly smarter and your hardware can handle it:
ollama pull mistralStep 5: Start Chatting
In AnythingLLM, go to Settings → LLM and select your model. Save. Start asking questions.
Linux Installation
Step 1: Install Ollama
One command:
curl -fsSL https://ollama.com/install.sh | shThis sets up a systemd service, so Ollama starts automatically. Verify with:
ollama --versionStep 2: Install AnythingLLM
You have options. The AppImage is simplest:
mkdir -p ~/apps
cd ~/apps
wget https://anythingllm.com/releases/AnythingLLM-Desktop-1.0.0.AppImage
chmod +x AnythingLLM-Desktop-1.0.0.AppImage
./AnythingLLM-Desktop-1.0.0.AppImageOr Docker, if that’s your thing:
docker run -d -p 3001:3001 \
--name anythingllm \
-v ~/anythingllm:/app/storage \
ghcr.io/mintplex-labs/anythingllmThen hit http://localhost:3001 in your browser.
Step 3: Connect Them
Open AnythingLLM, select Ollama as your provider, confirm the endpoint is http://localhost:11434, and connect.
Step 4: Get a Model
ollama pull llama3.2Step 5: Configure
Settings → LLM in AnythingLLM, pick your model, save. You’re done.
Actually Using RAG#
Once you’re set up, try this:
- Create a new workspace in AnythingLLM
- Upload a PDF or text file
- Wait for the embedding to finish (there’s a progress bar)
- Ask questions about the document
Good test documents:
- Technical manuals you’ve been avoiding
- Your own blog posts or writing
- Meeting transcripts
- Research papers in your field
The interface shows which document it pulled from, so you can verify the answers aren’t hallucinated. Sometimes they are. That’s when you know to dig deeper.
Things That Went Wrong (So You Don’t Have To)#
“Connection refused” in AnythingLLM
Ollama isn’t running. On Linux:
systemctl status ollama
systemctl start ollamaOn Windows, check the system tray for the Ollama icon.
Model is painfully slow
Smaller model. Llama 3.2 comes in sizes:
ollama pull llama3.2:1b # Fast, dumb
ollama pull llama3.2 # Medium, okayOut of memory
Close Chrome. Seriously. If that doesn’t help, switch to Ollama Cloud:
export OLLAMA_HOST=https://api.ollama.cloudYou’ll need an API key from their dashboard.
RAG gives generic answers
Check that the embedding finished. Also, be specific in your questions—mention the document name. “What does the Kubernetes deployment guide say about rolling updates?” works better than “how do rolling updates work?”
Docker installation can’t find documents
Volume mount issue. Check:
docker logs anythingllmLook for permission errors. The ~/anythingllm directory needs to exist and be writable.
Where I Went From Here#
Once you have this running, here’s what I’d suggest:
Try different models. Llama 3.2 for general stuff. Mistral when you need better reasoning. Gemma if you’re curious about Google’s take. Swap them in seconds.
Build a personal knowledge base. I uploaded my homelab documentation and can now ask “what’s the IP for the Proxmox backup server?” without remembering where I wrote it down.
Create workspaces for different things. I have one for coding help, one for writing feedback, one for random experimentation. They keep conversations organized.
Play with the API. Ollama has a REST API. I’m working on a script that pulls my calendar and lets me ask questions about my schedule. It’s useless but fun.
The best way to learn is to break things. Ask weird questions. Upload documents that are too long. See where the model gives up. That’s how you build intuition for what these tools can actually do.
If you want to go deeper, the Ollama docs and AnythingLLM docs explain more. But honestly, you’ll learn more by just using it.
Three weeks ago, I thought local AI was for people with better hardware than me. Now I run models on a laptop that struggles with video calls. Turned out the barrier was mostly in my head.
