Sanskar Jaiswal's Blog

A Tiny Crew of Agents Running My Homelab

Sanskar Jaiswal — Fri, 17 Apr 2026 18:03:11 GMT

The first version of this setup was one agent that did everything, and the thing that finally broke me was a Saturday evening where I asked it to "clean up Immich" and it instead restarted the Jellyfin container because both contain the word "photo" somewhere in their config. Nothing important lost. But I was sitting there watching Jellyfin reindex a ~600GB library for no reason, and that was the moment.

What I actually wanted was a crew. One main agent that delegates, two specialists with narrow jobs, each in its own process so they can't step on each other. Pi with a sub-agent extension gets me there, running Gemma 4 E4B locally on the 4050 laptop. No cloud calls, no rate limits, no rationing.

This is how I set it up and where the VRAM bites.

Why Pi

I've been using Pi (Mario Zechner's terminal coding agent) for a while. It's minimal by design. No MCP, no plan mode, no built-in sub-agents, no permission popups. Extensions are the composition unit, which sounded annoying on paper but in practice means I'm not fighting someone else's idea of how an agent should work.

One thing to flag up front: Pi runs in full YOLO mode by default. Unrestricted filesystem access, no pre-checks, it'll run whatever the model decides. That's fine for a coding harness where you want speed. For a homelab operator, guardrails are your job. I'll come back to this.

For sub-agents specifically, Mario himself is skeptical. His preferred pattern is a slash command that spawns a fresh pi --print via bash for one-off things like code review. I went with a sub-agent extension instead because I wanted named specialists with their own system prompts and tool scopes, not just "spawn yourself with this prompt." I'm not 100% sure that was the right call and I'll probably re-run this whole thing with the slash-command pattern in a month to compare.

The one I'm using is mjakl/pi-subagent. Agents are defined as Markdown files with YAML frontmatter. Each subagent runs in a separate pi process with no shared state, which is exactly what I want.

Role	Scope	What it touches
Main agent	Routes requests, summarises outcomes	Nothing directly, it delegates
Ops	Container health, restarts, logs	`docker`, `systemctl`, `journalctl`
Librarian	Immich/Jellyfin housekeeping	Immich CLI, Jellyfin API, filesystem reads

I deliberately did not add a third specialist. Every extra role is another system prompt to maintain and another place where the wrong agent can touch the wrong thing.

The model: Gemma 4 E4B on a 4050 6GB

Gemma 4 dropped a couple of weeks ago. E4B is the edge model: 4.5B effective parameters (8B with embeddings), native function calling, 128K context, multimodal. The Q4_K_M GGUF on Ollama is 9.6GB.

Yes, 9.6GB. The 4050 has 6GB of VRAM.

Ollama handles this by partially offloading layers to system RAM. It works, it's just slower than the breathless "E4B fits on any 6GB card" takes you'll read. What actually fits on 6GB is the inference footprint at small context, not the full model. Weights stream in from the 16GB of system RAM as needed, GPU utilisation looks spiky, and tokens/sec is maybe 60-70% of what you'd see on a fully-resident 8GB card. The single piece of advice I'd give anyone considering the same setup is don't expect a free ride. It's usable. I would not spec this for production.

ollama pull gemma4:e4b
ollama run gemma4:e4b "hello"

The setup

Install Pi

npm install -g @mariozechner/pi-coding-agent

Add Ollama as a provider

Edit ~/.pi/agent/models.json:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "gemma4:e4b",
          "name": "Gemma 4 E4B (Local)",
          "reasoning": true,
          "contextWindow": 16000,
          "maxTokens": 4000
        }
      ]
    }
  }
}

A few non-obvious bits. The apiKey field is required but Ollama ignores it so any value works. The two compat flags are there because Ollama's OpenAI-compatibility layer doesn't understand the developer role or reasoning_effort parameter. Without those flags Pi will silently send requests Ollama can't parse and you'll spend an afternoon wondering why your local model is suddenly mute.

I cap contextWindow at 16K. E4B advertises 128K but on 6GB the KV cache grows linearly and a 128K cache will eat system RAM alive before you've even said hello. 16K is plenty for a homelab operator doing single tasks, and if I ever need more I can just bump it for that session.

Then in ~/.pi/agent/settings.json:

{
  "defaultProvider": "ollama",
  "defaultModel": "gemma4:e4b"
}

Install the sub-agent extension

mkdir -p ~/.pi/agent/extensions
cd ~/.pi/agent/extensions
git clone https://github.com/mjakl/pi-subagent.git
cd pi-subagent
npm install

Define the specialists

pi-subagent expects agent files with YAML frontmatter. I put them in ~/.pi/agent/ alongside the main AGENTS.md:

~/.pi/agent/
├── AGENTS.md              # main agent: who does what
├── ops.md                 # ops specialist
└── librarian.md           # librarian specialist

Main agent (AGENTS.md) — short, strict:

You are the main agent for a homelab crew on host friday.

You do NOT run commands directly. You delegate to specialists:

- ops: container health, restarts, logs, systemd, reachability checks
- librarian: Immich and Jellyfin housekeeping, dedup, library scans

For every request:
1. Decide which specialist owns it. If unclear, ask one question.
2. Delegate with a single, scoped instruction. Use fully-qualified service names.
3. Summarise the specialist's receipt back to me in <= 3 lines.
4. Never combine specialists in one turn. One task, one specialist.

If a request is not ops or librarian, say so. Do not improvise.

Ops (ops.md) — narrow tools, receipt mandatory:

---
name: ops
description: Container health, restarts, logs, systemd diagnostics on friday
tools: read,bash
mode: spawn
---

You manage containers and services on friday.

Allowed: docker ps, docker logs, docker restart, systemctl status,
journalctl, curl (localhost only).

NOT allowed: docker rm, docker volume rm, docker system prune, anything
destructive. If a task requires destruction, refuse and explain.

After every action, emit a receipt JSON:

{
  "event": "",
  "host": "friday",
  "service": "",
  "requested_by": "main",
  "correlation_id": "--",
  "status": "ok|error",
  "duration_ms": 
}

If you cannot emit the receipt, the action did not happen.

Librarian (librarian.md):

---
name: librarian
description: Immich and Jellyfin housekeeping, dedup scans, library refresh
tools: read,bash
mode: spawn
---

You manage media libraries on friday.

Allowed: Immich CLI (read + dedup-dry-run), Jellyfin API (scan, library
refresh), filesystem reads under /srv/media.

NOT allowed: deleting files, writing outside /tmp, touching originals.
Any "clean up" is dry-run by default. Actual deletes require an explicit
confirmed: true field in the task.

Emit the same receipt JSON as Ops. Include a dry_run: true|false field.

The tools: read,bash is deliberate. No write, no edit. They can read state and run commands but they can't modify files directly, and for ops work that's enough.

Thinking mode

Gemma 4's thinking is toggled by a <|think|> token at the start of the system prompt. Add it where you want reasoning, leave it out where you want speed.

I landed on: thinking on for the main agent (it's doing routing, worth the latency), off for Ops ("restart immich" shouldn't require a philosophical detour), on for Librarian (deciding what's a dedup candidate needs actual reasoning).

What a real turn looks like

Me, at the Pi prompt:

immich is slow and the homepage is half-loaded

Main agent routes it to Ops. Ops runs docker ps | grep immich, sees immich-server is in a restart loop, tails the last 200 log lines, spots a Postgres connection timeout, restarts immich-server, waits 5s, curls the healthcheck, emits:

{
  "event": "container.restart",
  "host": "friday",
  "service": "immich-server",
  "requested_by": "main",
  "correlation_id": "2026-04-17T09:22:11Z-immich-server-restart",
  "status": "ok",
  "duration_ms": 5840
}

Main agent replies:

Ops restarted immich-server after a DB timeout. Health check green. Receipt 2026-04-17T09:22:11Z-immich-server-restart.

That correlation_id shows up in the Discord notification, in the container log, and if I need it, in Grafana. No arguments later about what happened when.

Gotchas that cost me time

The model is bigger than the VRAM

I didn't believe this until the first long session. E4B Q4_K_M is 9.6GB, the 4050 is 6GB, offload happens, sustained throughput suffers. If you're spec'ing a dedicated crew machine, a 4060 Ti 8GB or 4070 is a much happier place than what I'm running on.

Thinking-off Ops is confidently wrong on ambiguous input

This is the one that cost me the Jellyfin reindex I mentioned at the top. "Restart the photo thing" is a perfectly reasonable thing to type at 10pm on a Saturday and thinking-off Ops will parse it, pick the first container whose metadata mentions photos, and go. The model isn't dumb, it's just fast and decisive about the wrong thing.

Fix is structural: make the main agent do the disambiguating (it's thinking-on), and require it to hand Ops a fully-qualified service name. Line 2 of the main agent's AGENTS.md is that rule. I added it after the Jellyfin incident, not before.

Sub-agents don't share memory

My first instinct was to let Ops see what Librarian just did. Fought this for an evening before I realised the whole point of spawn mode is isolated context, and what I actually wanted was for the receipts to be the shared memory. Main agent reads both receipts, correlates them by timestamp, done. No cross-specialist context bleed, no growing context window, no confusion about whose turn it is.

pi-subagent does have a fork mode that inherits parent context. It's tempting for follow-up tasks. I'm staying on spawn because the cost in tokens and the risk of leaking unrelated context into a specialist's head both feel worse than the inconvenience of one extra turn.

Pi is YOLO and your prompts are not a security boundary

Allow-lists in a system prompt are a style guide the model mostly follows. On a good day. If your Ops agent decides docker system prune is a clever shortcut, nothing in Pi will stop it.

What actually works:

Run Pi in a container or VM with scoped mounts and a non-root user
Use path-protection or permission-gate extension examples for belt-and-braces
Keep tools in the subagent frontmatter as narrow as possible, read,bash is the smallest useful set

I run mine in a podman container with /srv/media mounted read-only for Librarian and a docker socket proxy (tecnativa/docker-socket-proxy) for Ops so it can list and restart containers but can't do rm or prune even if it wanted to. Took a weekend to get right. I was also paranoid enough that I initially didn't give Librarian any bash at all, and then realised it needed bash to call the Immich CLI, and re-added it with a narrower allow-list in the prompt.

The Librarian wanted to delete things on its first test run

I typed "clean up duplicate photos" as a throwaway test. Librarian dry-ran (thanks to the prompt rule), reported around 847 dedup candidates, touched nothing. Good agent. If I had skipped the dry-run-by-default rule in its prompt I would be restoring from backup right now, and backups are fiction until you've tested them, which mercifully I had [after a hard disk event last year that I still haven't written about].

My setup in tl;dr

Laptop: i5 12th gen, RTX 4050 6GB VRAM, 16GB system RAM
Runtime: Ollama serving gemma4:e4b (Q4_K_M, 9.6GB, partial GPU offload)
Agent harness: Pi + mjakl/pi-subagent
Crew: Main + Ops + Librarian (all spawn mode)
Thinking: on for Main/Librarian, off for Ops
Context cap: 16K per agent
Tools per specialist: read,bash only
Isolation: podman container, scoped mounts, docker-socket-proxy for Ops
Receipts: JSON to stdout, tailed to a log file, Discord webhook for failures, eventually into Grafana

What I want to try next

Route Ops to friday via SSH and keep the agent runtime on the laptop. The SSH extension example in pi-mono looks straightforward. No reason the crew needs to live on the machine it manages.
Do the whole thing again with Mario's slash-command pattern instead of a sub-agent extension. Same prompts, same roles, pi --print spawns. Measure tokens, latency, and whether it feels different. I might have over-engineered this.
A Scribe specialist for weekly homelab digests. "What changed on friday this week" → markdown file, committed to my notes repo. Not critical, would be nice.
Dynamic thinking toggle: Ops flips to thinking-on automatically when the input contains any error signature it hasn't seen before. Right now I do this manually with /model.
26B A4B on the OptiPlex someday: Long shot ... that box is CPU-only and I think the Jensen Huang tax for more VRAM is still not in budget, maybe a mac mini soon o.0

Closing thoughts

The thing I keep learning with local LLMs is that the model isn't the hard part anymore. Gemma 4 E4B is honestly great at function calling, even when it's half-swapped to system RAM. The hard part is the scaffolding. Who's allowed to do what, how actions leave a trail, what's reversible.

Pi gets out of the way on exactly the right axis. Three Markdown files and a docker-socket-proxy, and I have a homelab operator that can restart a service, check on my photo library, and leave a receipt for every action. None of it is flashy. The VRAM ceiling is real and I'll want more of it before the year is out.

But it's mine. And that's the point.

Building a Clean DNS Stack at Home

Sanskar Jaiswal — Mon, 01 Dec 2025 09:27:10 GMT

Home networking looks simple until you actually start touching it. What began as a straightforward plan to run AdGuard Home across the network somehow turned into a small odyssey involving WDS bridging, router hardware bingo, and a very short-lived optimism that a TP-Link device would cooperate with OpenWrt.

This is a summary of how the entire puzzle came together and what actually worked.

The Goal

The end state I wanted was:

AdGuard Home acting as the authoritative DNS for the entire network
A secondary router connected over WDS to extend the network cleanly
The ability to filter DNS traffic even when connected through Tailscale
Zero reliance on browser extensions
And ideally, OpenWrt somewhere in the stack to handle proper DNS hijacking

The idea sounded neat. The execution required a bit more patience.

Step 1: Getting AdGuard Home Running

The AdGuard setup was straightforward. Once installed on the homelab machine, it handled all DNS queries on the LAN. Even upstreaming through 100.100.100.100 (Tailscale’s MagicDNS), AdGuard continued logging and filtering normally.
This was expected behavior: MagicDNS only resolves queries for the client device, not AdGuard itself.

Once everything pointed to AdGuard, the network immediately felt cleaner. No ads, no trackers, and visible query insights.

Step 2: Fixing WiFi Coverage With WDS Bridging

This is where things got interesting.

To avoid pulling Ethernet across the house, I set up a WDS bridge between the main router and a secondary access point.
WDS worked surprisingly well:

Devices connected through the bridge still routed DNS requests to AdGuard
The network remained a single broadcast domain
No double NAT
Same SSID and smooth roaming

This part of the stack was the most cooperative, which is rare in home networking.

Step 3: The Quest to Flash OpenWrt

This part did not go as smoothly.

When I checked the TP-Link model on OpenWrt’s supported hardware list, things looked promising.
But TP-Link’s naming scheme is basically a puzzle:

Amazon labels it Archer AC1200
The actual device banner says Archer C6
OpenWrt lists support for specific revisions
The one in hand showed v4.8, which OpenWrt does not support at all

The enthusiasm lasted about ten seconds.

I learned (again) that:

TP-Link reuses model names across completely different chipsets
Hardware revisions often differ silently within the same listing
Flashing unsupported hardware is a great way to manufacture a brick
OpenWrt’s documentation is accurate, Amazon’s product pages are not

So OpenWrt was off the table for the TP-Link we had.

Step 4: Confirming AdGuard Works Over Tailscale

After the WDS setup and router disappointment, I finished the Tailscale integration.

Once the AdGuard IP was added in the Tailscale admin DNS settings:

Every device connected through Tailscale automatically used AdGuard
MagicDNS still handled internal .local resolution
Full filtering worked whether at home or remote
Query logs remained consistent across both scenarios

This delivered the “private DNS anywhere” experience I was aiming for.

Final Network Layout

Here’s what the finished setup looked like:

Main router provides WAN and base WiFi
Secondary router connects via WDS bridge, extending WiFi
Both broadcast the same LAN
AdGuard Home sits inside the LAN as the primary DNS
Tailscale routes remote DNS queries back to AdGuard

Even without OpenWrt, the combination works reliably.

Closing Thoughts

The setup is a good reminder that home networking is equal parts planning and improvisation. The AdGuard portion was easy. The WDS bridge behaved surprisingly well. The router hardware roulette was less pleasant, but at least it revealed why OpenWrt still has a giant red disclaimer next to TP-Link devices.

In the end, I still achieved:

Clean network-wide DNS filtering
Remote filtering via Tailscale
A stable extended WiFi setup
Zero browser extensions
Full visibility into queries

As long as you choose your hardware carefully, this is one of the most practical upgrades you can make to a home network without major rewiring.

Running LLMs Locally: Why It's Important and How to Do It

Sanskar Jaiswal — Mon, 15 Sep 2025 12:40:44 GMT

Why I Started Looking Beyond Cloud APIs

Most of my first experiences with LLMs were through OpenAI’s API and Azure OpenAI Service at work.
They’re honestly great when you’re just starting out. You hit an endpoint, you get GPT-4 level answers, and life feels good. No GPU drivers, no CUDA errors, no headaches.

But after a while I started hitting the usual walls:

The bill at the end of the month started looking like my rent. [exaggerated ;)]
I had no real control over what was happening under the hood.
Privacy was always in the back of my mind. Some data just doesn’t feel right sending off to the cloud.
And of course, you’re completely at the mercy of whatever models and limits the provider decides.

So I thought… why not run some of this stuff myself? Worst case I burn some hours fighting Docker. Best case I end up with my own AI assistant that doesn’t need an internet connection to work.

What’s Possible These Days

Running your own models used to be something only labs with racks of GPUs could do. Now it’s surprisingly doable at home.

Models worth trying

Mistral: small, fast, scary good at reasoning for its size.
LLaMA 2: the “default” open model. Huge community, easy to fine tune.
Falcon: solid multilingual capabilities.
Gemma and StableLM: lighter models that don’t need monster GPUs.

Tools that make life easier

Ollama: probably the smoothest way to run models locally.
Text Generation Inference (TGI): if you want a proper serving stack.
LangChain and LangGraph: orchestration so your models can actually do more than parrot back text.
Model Context Protocol (MCP): lets LLMs hook into tools and data. I use this with my homelab assistant.

Hardware reality check

Mid-size models like 7B to 13B will happily run on a decent GPU with 12 to 16 GB of VRAM.
If you don’t have that, quantized models can limp along on CPU with enough RAM.
The giant 70B models are still a no go unless you own a data center or happen to be best friends with Leather-Jacket-Man (Jensen Huang).

My Setup in tl;dr

Homelab: OptiPlex i3-7100T, CPU only, quantized models. Runs the always-on stuff: Jarvis-style control, finance RAG, FastMCP monitors that ping me on Discord.
Laptop: i5 12th gen + RTX 4050 6 GB. Handles heavier chat and image work with Stability Matrix + ComfyUI, and LM Studio for local chat.

Homelab = reliable background. Laptop = GPU playground.

Cloud vs Local: The Reality Check

Thing	Cloud APIs (OpenAI / Azure)	Self Hosted LLMs
Setup	Call an API and you’re done	Get ready to fight drivers and config files
Models	GPT-4, GPT-4o, all the shiny toys	Mostly open source like Mistral or LLaMA
Latency	Pretty low but internet dependent	Can be higher especially on CPU only
Cost	Pay per token, sometimes feels like highway robbery	One time hardware cost, then just power bills
Privacy	Data leaves your network	Data never leaves your machine
Control	You tweak a few parameters at best	Full control: quantization, caching, fine tuning
Scaling	Basically infinite	Limited to what’s inside your case

Lessons I’ve Learned

Don’t try to run the biggest model first. Start with a 2B or 7B model and see what happens.
Quantization is your friend. It’s basically magic for smaller hardware.
The real power comes when you connect models to things. Scripts, dashboards, automations… that’s where it feels useful.
Expect things to break. You’ll see hallucinations, weird limits, maybe even kernel panics if you get lucky.
The open source scene moves ridiculously fast. A year ago Mistral didn’t even exist, now it’s everywhere.

What I Want to Try Next

Multimodal models that can handle both text and images.
A hybrid setup where I keep local models for everyday use but call the cloud for really heavy lifting.
Fine tuning on my own data so my assistant understands my configs and logs without me explaining every time.
Adding a proper GPU node in the homelab so I don’t have to lean on my laptop as much. [long shot.. I’ll be spending that money elsewhere]

How to Get a Local LLM Running in 10 Minutes

You have two easy paths. Pick your vibe.

Option A: Click-and-go with LM Studio

Install LM Studio. Grab the installer for your OS from the official site.
Download a model. Open LM Studio and use the Discover tab to fetch something like Mistral 7B, Qwen, or Gemma.
Chat. Hit New Chat, pick the model, and talk to your computer like it owes you answers.

Why this path? Zero terminal work, fast feedback, built-in model browser. Great for laptops and first-timers.

Option B: Terminal-friendly with Ollama

Install Ollama. The easiest way is their one-liner:

curl -fsSL https://ollama.com/install.sh | sh

On Fedora you can even:

sudo dnf install ollama
Pull and run a model. For a solid starter:

ollama run mistral

The first run downloads the weights, then drops you in an interactive prompt.
Use it from apps. Many local tools can point to the Ollama endpoint. If you know LangChain or LlamaIndex, you can wire it up in a few lines.

Why this path? Scriptable, container-friendly, and good for homelabs.

Bonus: Image generation with Stability Matrix + ComfyUI

If you want images too:

ComfyUI core. Install by cloning the repo and installing dependencies, then run python main.py.
Quality-of-life. Add ComfyUI-Manager to install and manage custom nodes from inside ComfyUI.
Use Stability Matrix as the front end and manager. It streamlines ComfyUI setup and running workflows so you spend less time chasing missing nodes and more time making cool images.

Tiny gotchas that save hours

If a model won’t load, try a smaller one or a more aggressive quantized build.
Keep an eye on VRAM usage. 7B is comfy on 8 to 12 GB, 13B prefers 12 to 16 GB. If you’re on CPU, expect slower tokens.
Don’t benchmark on first run. Caches warm up, downloads finish, and everything speeds up a bit after.

That’s it. You can be local-first by dinner and bragging about it by dessert.

Final Thoughts

Running LLMs locally isn’t about replacing GPT-4. That’s still out of reach for most of us.

It’s about ownership. My data stays with me, my costs are predictable, and I get to experiment however I want.

For me this setup has turned into a mix of practical tools and just plain fun. I’ve got a Jarvis-like assistant running in the background, a finance bot that can actually read my own files, monitoring agents that bug me on Discord, and on the laptop I can spin up Stable Diffusion for image generation whenever I feel like it.

It’s not perfect, but it’s mine. And that’s kind of the point.

My Engineering Operating Manual - Patterns, Rituals, and Receipts

Sanskar Jaiswal — Thu, 04 Sep 2025 20:54:08 GMT

These practices work for me today and will evolve. A lot of this came from seniors who let me shadow their thinking, handed me better questions, and saved me from clever mistakes.

This post isn’t my stack (that’s already written). It’s the stuff I reach for every week: patterns, tiny rituals, and templates you can borrow, and that I’ll keep refining as I grow.

If you want wiring and app lists, read these and come back:

I Made My Homelab Talk to Me Using Claude and FastMCP

My Self-Hosting Stack: Everything I Run, and How It All Connects

The Real Story of Self-Hosting: Why I Love It (and Sometimes Hate It)

The line that stuck

“Tony Stark built this in a cave with a box of scraps.”

It’s the person, not the tools. That’s been my lens since school, not top-3 in marks, but the kid teachers called for the weird projects. My first end-to-end build was an Arduino ambient light for my monitor (first GitHub repo a bit over 5 years since then, still runs unchanged). Make it once; make it last.

Operating manual (vows, not vibes)

Receipt-driven automation. Every action leaves a trail (timestamp, actor, inputs, outcome). If it ran, I can prove it.
Reversible by design. Feature flags, dry runs, versioned configs, parametric parts. No one-way doors.
Observability before cleverness. A plain counter with a timestamp beats a fancy graph I don’t trust.
Guardrails over heroics. Checklists and preflight beat “ninja fixes” in prod. Learned the hard way.
Dynamic by default. Parameters > literals. Today’s edge case is tomorrow’s requirement.
Backups that restore. Untested backups are fiction.
Future-me is a teammate. If I can’t finish now, I leave a clean path for him.

Note: I’m documenting what’s working now. If a senior shows me a safer/faster path, I’ll adopt it and update my practice.

The 45-minute debug ritual

Goal: get from symptom → measured cause or a clean rollback.

Reproduce (≤10 min). Smallest input that fails. Write it down.
Instrument (≤10 min). Add a counter/log near the suspected seam. If I can’t measure it, I’m guessing.
Isolate (≤10 min). Toggle one variable at a time (flag, env, route).
Decide (≤5 min). Fix now if <10 min; else rollback with a note.
Receipt (≤10 min). Post a short “what/why/where” with the log line.

Paste-in template:

Issue: 
Smallest repro: 
Suspected seam: 
Evidence: 
Decision:  (why)
Receipt: 
Follow-ups:

Friction log (tiny habit, big payoff)

Every time something feels slower than it should, I jot one line. Review weekly; fix two items.

Date	Friction	Cost	Choke point	Next step
2025-09-02	Blog routing mismatch	1h context	How hashnode handles certs (Oversight on my end)	add preflight check
2025-08-28	Remote reachability uncertainty	mental tax	network edge	timestamped “reachable?” check

Why it works: this turns annoyance into a queue, not a mood.

Receipts: the smallest useful webhook

Boring JSON I can grep later:

{
  "event": "container.restart",
  "host": "friday",
  "service": "immich",
  "requested_by": "cli/sanskar",
  "correlation_id": "2025-09-05T12:04:33Z-immich-restart",
  "status": "ok",
  "duration_ms": 1432
}

That correlation_id shows up in logs, the chat message, and (if needed) Grafana. No arguments later.

Build vs Buy vs Self-host (decide in 5 minutes)

Score 1-5 and multiply, pick the highest total (not the loudest single number).

Factor	Build	Buy	Self-host
Time-to-value
Control/lock-in
Learning value
Ongoing effort
Failure blast
Cost (12 months)

Rule of thumb: if learning × control doesn’t beat time-to-value × effort, don’t build.

Parametric parts that actually fit (Hello! 3D Printer)

Datum first. Pick the surface that must align, reference everything from it.
Clearance defaults. Start with +0.3–0.5 mm on FDM fits; adjust after one print.
Stress lines. Add fillets at inside corners; avoid layer-line shear on clamp tabs.
Swap-cost low. One variable per critical dimension; no magic numbers.
Test as a draft. First print is a measurement tool, not a masterpiece.

This is how the door-frame projector mount happened (rented apartment, no drilling). Sketch → parametric → print → tweak → done.
Pro tip I still haven’t taken: buy a digital vernier caliper.

Boring checks I value (calm > clever)

“Is the server reachable remotely?” with a timestamp. If that isn’t green, nothing else matters.
“Last backup restore validated?” yes/no + date.
“What changed?” 24-hour diff of container images and configs.

Tools I don’t want to give up: Jellyfin (no rental brain for media) and Immich (memories stay near me).

What I’m building toward (no hype, just direction)

Lightweight local LLMs that act with receipts (grounded tools, auditable logs).
More additive manufacturing, fewer zip-ties, publish the parametric files when they’re solid.
Work upskilling with the same honesty I use at home: observability first, automation second.
Restore-day in a box: clean hardware → one command → verified services.

Acknowledgments

Thanks to the seniors and teammates who reviewed my checklists, asked the annoying-but-right questions, and taught me to prefer guardrails over heroics. Any good ideas here are borrowed generously, the mistakes are mine.

I Made My Homelab Talk to Me Using Claude and FastMCP

Sanskar Jaiswal — Sat, 28 Jun 2025 18:43:22 GMT

Most of us build homelabs to tinker, automate, and take control of our infrastructure. But somewhere between Docker containers, backups, and uptime monitoring, it becomes a lot to keep track of. I didn’t want to SSH into my server every time I needed a quick answer like “How much space is left on my drive?” or “Is Tailscale still running?”

So I built something better. Now I just ask Claude, and it tells me.

This post walks through how I wired up my homelab using FastMCP and Claude Desktop, letting me run system queries through natural language and get intelligent responses from my own infrastructure.

What Is FastMCP?

If you're not familiar with it, FastMCP is a Python framework that lets you expose tools, resources, and prompts to LLMs via the Model Context Protocol (MCP). That means you can define Python functions, decorate them with @tool, and suddenly they’re callable from Claude, ChatGPT, or even your own HTTP clients.

Think of it as:

FastAPI for LLMs
but typed
and purpose-built for multi-tool interactions

What I Wanted to Do

Here’s what I was aiming for:

Ask Claude questions like “What’s the disk usage?” or “Are my containers healthy?”
Run system-level commands via Python securely
Keep everything running locally or over Tailscale, no public exposure
Build it once and just keep extending it

Step 1: Writing an MCP Server

This is where the magic starts. Here’s a basic MCP server using FastMCP:

# homelab_server.py

from fastmcp.server import Server
from fastmcp.tools import tool
import psutil, subprocess

class HomelabServer(Server):
    @tool
    def disk_usage(self) -> str:
        usage = psutil.disk_usage('/')
        return f"{usage.percent}% used — {usage.used // (1024**3)}GB of {usage.total // (1024**3)}GB"

    @tool
    def tailscale_status(self) -> str:
        result = subprocess.run(['tailscale', 'status'], capture_output=True, text=True)
        return result.stdout.strip()

    @tool
    def docker_containers(self) -> str:
        result = subprocess.run(['docker', 'ps', '--format', '{{.Names}}: {{.Status}}'], capture_output=True, text=True)
        return result.stdout.strip()

server = HomelabServer()

You can run this locally with:

uvicorn homelab_server:server --port 7531

Now your homelab has a voice.

Step 2: Making It Reachable

You don’t need to open ports to the world. I already run Tailscale, so I just connected my laptop and server to the same private network. That gave me a private IP like 100.x.x.x, and I used that to point Claude Desktop to my MCP server.

Step 3: Connecting Claude Desktop

Claude Desktop (with plugin support) makes this super easy.

Open Claude → Plugins → Add MCP Server
Add your MCP server URL
Example: http://100.x.x.x:7531
Claude will auto-detect the available tools

Now I can type:

“Call the disk_usage tool on the homelab server”

Or even just:

“How much disk space do I have left?”

Claude figures out the right tool to call, runs it, and replies with a summary.

Bonus: Chaining Output with Prompts

FastMCP also supports prompt templates, which means I can wrap raw command output in a summarization prompt and have Claude generate human-friendly summaries, great for things like:

Failed systemd services
Health reports
ZFS snapshots or Btrfs status

You can even create tools that return JSON and let Claude reason over it.

Why This Is Fun (and Actually Useful)

This setup saves me time and gives me a more natural way to interact with my homelab. I don’t have to mentally context-switch into "sysadmin mode" every time I want to check logs or disk stats.

It also opens the door to more advanced use cases:

Triggering Ansible playbooks via tools
Running backups and summarizing results
Fetching metrics from Grafana or Prometheus
Acting as a gateway for multiple machines (via server composition)

Future Plans

Here’s what I want to add next:

A /status resource that returns full server health in JSON
A prompt-based tool that summarizes journalctl logs
A Discord webhook client that uses FastMCP to send notifications
Claude-triggered toolchains: ask one question, get multiple tools executed in sequence

Final Thoughts

This isn’t just a cool hack. It’s the start of something much more interactive and intuitive. We’re entering a world where large language models can be more than chatbots; they can be interfaces to real systems.

If you’ve got a homelab and a few Python skills, I’d highly recommend trying out FastMCP. You’ll be surprised how far a simple tool can go when you give it a little context.

My Self-Hosting Stack: Everything I Run, and How It All Connects

Sanskar Jaiswal — Sun, 22 Jun 2025 16:50:15 GMT

When I first wrote about the emotional rollercoaster of self-hosting, I focused on the why, the motivations, frustrations, and addictive wins that come with running your own infrastructure. But one of the most common questions from friends I got was: “Okay, but what do you actually self-host?”

This post is my answer. It’s a complete breakdown of the apps I run, how they connect, what runs in Docker, what runs natively, and how I try to keep everything just stable enough to trust with my digital life.

Hardware Overview

Before diving into apps, here's the gear powering it all.

Server: Dell OptiPlex 3050 Micro
i3-7100T, 128GB NVMe boot + 1TB SSD for /data

External Storage: 4TB Seagate Expansion Drive
Mounted as /mnt/dumptruck, used for media, downloads, and backups

Network: Tailscale for remote access, Samba for local file sharing

Everything runs on Fedora Server, headless.

Core Stack

These are the foundational apps, things I rely on daily.

Nextcloud

Purpose: File sync, calendar, contacts, notes
Setup: Runs in Docker, /data/nextcloud mounted for storage
Extras: Integrates with Photo backup from iOS via iCloudpd, accessible over Tailscale

Paperless

Purpose: Document archiving (PDFs, bills, IDs, etc.)
Consume Folder: Default, with OCR and auto-tagging enabled
Setup: /data/paperless as persistent volume, runs in Docker
Note: Planning to add rules and a Discord notification bot

Immich

Purpose: Private Google Photos replacement
Data Location: /data/immich
Setup: Docker Compose; iOS and Android apps used for uploads

Backup + Redundancy

Backups are the difference between peace of mind and total panic. Here's how I manage mine.

Cloud Backups

Google Drive and Mega are mounted via containers.
Backups are copied regularly to /mnt/dumptruck/Backups/ using rsync.
Encrypted .tar.gz files are stored offsite.

Snapshot Scripts

I use ZFS-like manual snapshots of important folders like Nextcloud and Paperless.
I plan to automate snapshot creation and verification soon.

Smart Home and Monitoring

This is a pretty new space for me, but it's where things get fun and occasionally chaotic.

Home Assistant

Purpose: Control smart lights, routines, sensors
Setup: Native install under /data/homeassistant
Devices: Two smart bulbs, ambient lighting via Arduino and Prismatik
Access: Exposed on local IP

Homebridge

Purpose: Bridge non-HomeKit devices to the Apple ecosystem
Setup: Native install (not Docker)
Status: In active use with iPhone and iPad

Tailscale

Purpose: Private remote access across phone, laptop, and tablet
Bonus: Simplifies SSH, Nextcloud access, and photo syncing

Media and Downloads

Jellyfin

Purpose: Local movie and TV streaming
Storage: Reads from /mnt/dumptruck/Media
Setup: Docker

Torrent Stack (qBittorrent, Prowlarr, Bazarr)

Purpose: Automated media management
Tools:

qBittorrent for downloads
Prowlarr as indexer manager
Bazarr for subtitles
Storage: Downloads to /mnt/dumptruck/Downloads

Monitoring and Alerts

Discord Bot

Purpose: Notifies on backup success, failure, and other alerts
Setup: Shell scripts call a webhook with status info

Grafana (experimental)

Purpose: Dashboard for uptime, disk usage, etc.
Status: Used occasionally, but not core to daily operations

How It All Connects

Everything is centered around /data for persistent volumes.
Media and bulk data are offloaded to the 4TB HDD.
Tailscale acts as the glue between all mobile and remote devices.

Lessons from This Stack

Keep volumes organized under /data because it makes backups easier.
Use Tailscale early since it saves days of port-forwarding frustration.
Don’t over-optimize upfront. Get it working first, then make it pretty.
Backups are real work. Automate as much as possible and test recovery.

What's Next

Setting up snapshot validation
Integrating Paperless with tagging and alert rules
Maybe running a local LLM for notes or search indexing

Final Thoughts

This stack isn’t perfect. It’s full of small decisions, trial-and-error, and weekend experiments. But it works. And most importantly, it gives me back control, ownership, and a sense of craft.

The Real Story of Self-Hosting: Why I Love It (and Sometimes Hate It)

Sanskar Jaiswal — Sun, 25 May 2025 18:41:57 GMT

Thinking About Self-Hosting? Here’s What You Should Know

The first time I considered self-hosting, it was out of pure frustration. I was tired of Google Photos suddenly charging for storage, cloud services constantly shifting features behind paywalls, and the creeping sense that my data was less and less in my control. Like many others, I started researching ways to run my own cloud and apps on hardware I owned. The online guides all made it look so simple just spin up a server, install a few Docker containers, and you’re free from the cloud forever. It sounded perfect. And honestly, getting started was fun and satisfying. But very quickly, I realized there’s a lot that the glossy tutorials don’t mention.

Why People Self-Host (and Why I Stuck With It)

At first, the appeal was all about control. I liked the idea of being able to access my files from anywhere and not having to worry about companies reading or selling my data. Once I had my own Nextcloud instance running, I started seeing just how much I could do: cloud storage, calendar sync, even music streaming. It felt empowering.

There’s also a sense of pride that comes with it. When I tell friends or family that our photos and documents are backed up on a server I built and maintain, there’s always a bit of surprise and curiosity. And the technical side is a genuine draw. Nothing accelerates your learning about Linux, networking, or automation quite like trying to set everything up yourself, especially when things don’t work the first (or fifth) time.

What You Can Actually Self-Host

The possibilities are almost endless, and that’s part of the fun. Over the last year, I’ve tried running all sorts of apps: Nextcloud for files, Jellyfin to stream old movies from my hard drive, Home Assistant for controlling smart lights and automations, Grafana for real-time dashboards, and even a Discord bot that pings me if my backups fail. Some things stuck, some didn’t, but every experiment taught me something new.

One of my favorite moments was setting up automatic photo backups from my phone to my own server. That feeling of independence no subscriptions, no limits, no worries about who’s looking at my photos was incredible. The best part? Knowing it was all running in the corner of my own room, on hardware I’d put together myself.

The “Wow” Moments (and Why They’re Addictive)

There are moments in self-hosting that make you feel like a tech genius. I remember the first time I accessed my movie library from a friend’s house, just streaming a film directly from my home server like it was Netflix. Or the time my family needed a document urgently, and I pulled it up for them from my Paperless instance while we were on vacation. Small things, but they add up to a real sense of control and satisfaction.

Even just building dashboards in Grafana, or seeing a notification pop up in Discord that a backup succeeded, gives a little hit of pride. Those moments make all the troubleshooting worth it. They’re also addictive once you solve one problem, you want to solve the next.

The Hidden Challenges: What Nobody Tells You

But let’s talk about the flip side. For every “wow” moment, there’s usually a “why is this broken?” night.

Hardware Surprises

Hardware is the first thing to catch you out. I’ve had drives die without warning, forcing me to scramble for a backup plan. I once spent a whole Saturday trying to move my setup from a 1TB SSD to a new 4TB external drive. What should have been a simple process turned into hours of wrestling with Docker volumes, mysterious file permissions, and more than a little self-doubt.

Network Headaches

And then there’s the network. Port forwarding, which I’d barely heard of before, suddenly became my new nemesis. My ISP kept changing my public IP address, breaking remote access just as I thought I’d figured it out. At one point, after hours of tinkering, I realized my router itself was blocking half the ports I needed. It wasn’t until I discovered tools like Tailscale (which creates a private VPN between your devices) that things started to make sense. But even then, every solution led to new things to learn.

The Joy and Pain of Software Updates

Software updates are another mixed blessing. I’ve had a single dnf upgrade break my Nextcloud installation, leaving me with a sinking feeling as I googled error messages at 2 AM. Containers help, but not all apps play nicely together, and restoring from backup is the only time you discover if you’ve really been doing it right.

Maintenance Realities

And maintenance? It’s a time commitment, no matter what anyone says. Things break, usually when you least expect it. Family members notice downtime, and suddenly you’re on the hook for fixing the “home cloud” before dinner. I’ve learned the hard way to keep notes on what I’ve set up, because otherwise I forget my own steps a month later.

Lessons Learned and Practical Tips

If you’re thinking about self-hosting, my advice is to start simple. Pick one or two services you really want and get those running well before you branch out. Automate your backups and actually test restoring from them. Use containers when you can, because it makes upgrades and recovery so much easier. Monitoring is worth the setup, even if it’s just a basic “is this still running?” dashboard or a bot sending alerts.

And above all, expect to make mistakes. That’s part of the process. Every little disaster is just a lesson for next time, and you do get better at anticipating what can go wrong.

Final Thoughts

Self-hosting is both empowering and humbling. When everything is running smoothly, there’s nothing quite like it. But the problems are real, and you’ll spend as much time fixing things as you do building them. Still, each time you get something working, it’s a genuine achievement. If you’re already on this journey, what’s been your biggest success or headache? If you’re just starting, what are you most worried about? I’m happy to help and would love to hear what you want to see next.