LLMS, Vision, and RAG

What I find useful outside of code

Mar 7, 2026

Here are my thoughts on the AI bubble so far as it relates specifically to me.

We’re gonna hand-wave the broader financial, geopolitical, environmental, safety, and other concerns. All are valid but out of scope.

Agentic coding has been talked to death so here are two other cool things.

Local Models

A couple times a year I have to produce a significant amount of descriptive text for images.

A large enough volume that I quickly hit the limits of the various companies I was paying a monthly subscription to.

Ollama has been a great addition to my tool-belt. I can run a model on my own hardware, and set up a job to run overnight that will hit gemma3 running on Ollama with every single image. And then capture the output in a database. I can then export that to a csv and do a 2nd pass of edits to clean them up instead of starting at a blank page for each one.

Double digits are doable by hand, triple digits start looking for a way to introduce automation.

Here’s the webapp version.

Hybrid Models

As my collection of homebrew notes grew, so did my interest in RAG.

The very basic version is uploading docs to a chat in the web browser. But that runs into its limitations quickly.

Then I tried all local. But that time I was limited my infrastructure and the context windows it could support.

So this time I’m trying a hybrid approach.

Run the RAG pipeline locally to ingest the documents, fetch the chunks from a SQLite vector store, embed them in the message, and then hit cloud models via Zen.

Why am I building this?

I like the AGENTS.md that Carson Gross recently shared. It won’t write the code for you, but it will help you do the writing.

I don’t mind the bot writing parts of my code. I’m instructing the computer to do something; there’s something funny about telling the nondeterministic function to generate deterministic commands for the fancy calculator.

But I don’t want the clanker to write human language for me. I feared the act of writing long enough to now cherish every word I add and cut.

I wanted some kind of tool that can sit in my terminal. It would be restricted from doing any ghost writing, but it would provide feedback in the form of line and developmental edits. Two modes to switch between.

So first I ingest my notes, they’re processed and the vectorized data is stored locally on my device. And then when I want to prompt the model, the chunks are fetched locally and then sent off the cloud. Not 100% private, but not as bad sending everything.

I’m looking for a good enough rock here, not a diamond.

I picked Ollama because it works well on MacOS. Picked Zen because I find OpenRouter overwhelming with all its options. Why I picked SQLite should be obvious.

Why call it RAGE CLI?

First, it’s Retrieval AuGmented Editor CLI. Second, I love acronyms. Third, I like lists. Fourth, I am angry that everyone is told me to “use AI” and then me expected to figure out what it’s actually good for.

Prototype’s up. Don’t have a timeline on finishing it. Not in a rush.