<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Sanskar Jaiswal's Blog]]></title><description><![CDATA[Software developer in Bengaluru writing about homelab builds, self-hosted
services, and local LLMs. Posts usually involve a Dell OptiPlex, an RTX
4050, and more]]></description><link>https://blog.sanskarjaiswal.dev</link><generator>RSS for Node</generator><lastBuildDate>Sat, 18 Apr 2026 07:07:20 GMT</lastBuildDate><atom:link href="https://blog.sanskarjaiswal.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[A Tiny Crew of Agents Running My Homelab]]></title><description><![CDATA[The first version of this setup was one agent that did everything, and the thing that finally broke me was a Saturday evening where I asked it to "clean up Immich" and it instead restarted the Jellyfi]]></description><link>https://blog.sanskarjaiswal.dev/a-tiny-crew-of-agents-running-my-homelab</link><guid isPermaLink="true">https://blog.sanskarjaiswal.dev/a-tiny-crew-of-agents-running-my-homelab</guid><category><![CDATA[Homelab]]></category><category><![CDATA[Local LLM]]></category><category><![CDATA[self-hosted]]></category><category><![CDATA[gemma]]></category><category><![CDATA[pi]]></category><category><![CDATA[ollama]]></category><dc:creator><![CDATA[Sanskar Jaiswal]]></dc:creator><pubDate>Fri, 17 Apr 2026 18:03:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6826f16a8c26f3f07ff4c7b8/66a891d8-e2a5-4c49-8b99-3d4db99f2477.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The first version of this setup was one agent that did everything, and the thing that finally broke me was a Saturday evening where I asked it to "clean up Immich" and it instead restarted the Jellyfin container because both contain the word "photo" somewhere in their config. Nothing important lost. But I was sitting there watching Jellyfin reindex a ~600GB library for no reason, and that was the moment.</p>
<p>What I actually wanted was a crew. One main agent that delegates, two specialists with narrow jobs, each in its own process so they can't step on each other. Pi with a sub-agent extension gets me there, running Gemma 4 E4B locally on the 4050 laptop. No cloud calls, no rate limits, no rationing.</p>
<p>This is how I set it up and where the VRAM bites.</p>
<hr />
<h2>Why Pi</h2>
<p>I've been using <a href="https://shittycodingagent.ai/">Pi</a> (Mario Zechner's terminal coding agent) for a while. It's minimal by design. No MCP, no plan mode, no built-in sub-agents, no permission popups. Extensions are the composition unit, which sounded annoying on paper but in practice means I'm not fighting someone else's idea of how an agent should work.</p>
<p>One thing to flag up front: Pi runs in full YOLO mode by default. Unrestricted filesystem access, no pre-checks, it'll run whatever the model decides. That's fine for a coding harness where you want speed. For a homelab operator, guardrails are your job. I'll come back to this.</p>
<p>For sub-agents specifically, Mario himself is skeptical. His preferred pattern is a slash command that spawns a fresh <code>pi --print</code> via bash for one-off things like code review. I went with a sub-agent extension instead because I wanted named specialists with their own system prompts and tool scopes, not just "spawn yourself with this prompt." I'm not 100% sure that was the right call and I'll probably re-run this whole thing with the slash-command pattern in a month to compare.</p>
<p>The one I'm using is <a href="https://github.com/mjakl/pi-subagent">mjakl/pi-subagent</a>. Agents are defined as Markdown files with YAML frontmatter. Each subagent runs in a separate <code>pi</code> process with no shared state, which is exactly what I want.</p>
<table>
<thead>
<tr>
<th>Role</th>
<th>Scope</th>
<th>What it touches</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Main agent</strong></td>
<td>Routes requests, summarises outcomes</td>
<td>Nothing directly, it delegates</td>
</tr>
<tr>
<td><strong>Ops</strong></td>
<td>Container health, restarts, logs</td>
<td><code>docker</code>, <code>systemctl</code>, <code>journalctl</code></td>
</tr>
<tr>
<td><strong>Librarian</strong></td>
<td>Immich/Jellyfin housekeeping</td>
<td>Immich CLI, Jellyfin API, filesystem reads</td>
</tr>
</tbody></table>
<p>I deliberately did not add a third specialist. Every extra role is another system prompt to maintain and another place where the wrong agent can touch the wrong thing.</p>
<hr />
<h2>The model: Gemma 4 E4B on a 4050 6GB</h2>
<p>Gemma 4 dropped a couple of weeks ago. E4B is the edge model: 4.5B effective parameters (8B with embeddings), native function calling, 128K context, multimodal. The Q4_K_M GGUF on Ollama is 9.6GB.</p>
<p>Yes, 9.6GB. The 4050 has 6GB of VRAM.</p>
<p>Ollama handles this by partially offloading layers to system RAM. It works, it's just slower than the breathless "E4B fits on any 6GB card" takes you'll read. What actually fits on 6GB is the inference footprint at small context, not the full model. Weights stream in from the 16GB of system RAM as needed, GPU utilisation looks spiky, and tokens/sec is maybe 60-70% of what you'd see on a fully-resident 8GB card. The single piece of advice I'd give anyone considering the same setup is don't expect a free ride. It's usable. I would not spec this for production.</p>
<pre><code class="language-bash">ollama pull gemma4:e4b
ollama run gemma4:e4b "hello"
</code></pre>
<hr />
<h2>The setup</h2>
<h3>Install Pi</h3>
<pre><code class="language-bash">npm install -g @mariozechner/pi-coding-agent
</code></pre>
<h3>Add Ollama as a provider</h3>
<p>Edit <code>~/.pi/agent/models.json</code>:</p>
<pre><code class="language-json">{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "gemma4:e4b",
          "name": "Gemma 4 E4B (Local)",
          "reasoning": true,
          "contextWindow": 16000,
          "maxTokens": 4000
        }
      ]
    }
  }
}
</code></pre>
<p>A few non-obvious bits. The <code>apiKey</code> field is required but Ollama ignores it so any value works. The two <code>compat</code> flags are there because Ollama's OpenAI-compatibility layer doesn't understand the developer role or reasoning_effort parameter. Without those flags Pi will silently send requests Ollama can't parse and you'll spend an afternoon wondering why your local model is suddenly mute.</p>
<p>I cap <code>contextWindow</code> at 16K. E4B advertises 128K but on 6GB the KV cache grows linearly and a 128K cache will eat system RAM alive before you've even said hello. 16K is plenty for a homelab operator doing single tasks, and if I ever need more I can just bump it for that session.</p>
<p>Then in <code>~/.pi/agent/settings.json</code>:</p>
<pre><code class="language-json">{
  "defaultProvider": "ollama",
  "defaultModel": "gemma4:e4b"
}
</code></pre>
<h3>Install the sub-agent extension</h3>
<pre><code class="language-bash">mkdir -p ~/.pi/agent/extensions
cd ~/.pi/agent/extensions
git clone https://github.com/mjakl/pi-subagent.git
cd pi-subagent
npm install
</code></pre>
<h3>Define the specialists</h3>
<p>pi-subagent expects agent files with YAML frontmatter. I put them in <code>~/.pi/agent/</code> alongside the main <code>AGENTS.md</code>:</p>
<pre><code class="language-plaintext">~/.pi/agent/
├── AGENTS.md              # main agent: who does what
├── ops.md                 # ops specialist
└── librarian.md           # librarian specialist
</code></pre>
<p><strong>Main agent (</strong><code>AGENTS.md</code><strong>)</strong> — short, strict:</p>
<pre><code class="language-plaintext">You are the main agent for a homelab crew on host friday.

You do NOT run commands directly. You delegate to specialists:

- ops: container health, restarts, logs, systemd, reachability checks
- librarian: Immich and Jellyfin housekeeping, dedup, library scans

For every request:
1. Decide which specialist owns it. If unclear, ask one question.
2. Delegate with a single, scoped instruction. Use fully-qualified service names.
3. Summarise the specialist's receipt back to me in &lt;= 3 lines.
4. Never combine specialists in one turn. One task, one specialist.

If a request is not ops or librarian, say so. Do not improvise.
</code></pre>
<p><strong>Ops (</strong><code>ops.md</code><strong>)</strong> — narrow tools, receipt mandatory:</p>
<pre><code class="language-plaintext">---
name: ops
description: Container health, restarts, logs, systemd diagnostics on friday
tools: read,bash
mode: spawn
---

You manage containers and services on friday.

Allowed: docker ps, docker logs, docker restart, systemctl status,
journalctl, curl (localhost only).

NOT allowed: docker rm, docker volume rm, docker system prune, anything
destructive. If a task requires destruction, refuse and explain.

After every action, emit a receipt JSON:

{
  "event": "&lt;event.name&gt;",
  "host": "friday",
  "service": "&lt;service&gt;",
  "requested_by": "main",
  "correlation_id": "&lt;ISO-timestamp&gt;-&lt;service&gt;-&lt;action&gt;",
  "status": "ok|error",
  "duration_ms": &lt;int&gt;
}

If you cannot emit the receipt, the action did not happen.
</code></pre>
<p><strong>Librarian (</strong><code>librarian.md</code><strong>)</strong>:</p>
<pre><code class="language-plaintext">---
name: librarian
description: Immich and Jellyfin housekeeping, dedup scans, library refresh
tools: read,bash
mode: spawn
---

You manage media libraries on friday.

Allowed: Immich CLI (read + dedup-dry-run), Jellyfin API (scan, library
refresh), filesystem reads under /srv/media.

NOT allowed: deleting files, writing outside /tmp, touching originals.
Any "clean up" is dry-run by default. Actual deletes require an explicit
confirmed: true field in the task.

Emit the same receipt JSON as Ops. Include a dry_run: true|false field.
</code></pre>
<p>The <code>tools: read,bash</code> is deliberate. No <code>write</code>, no <code>edit</code>. They can read state and run commands but they can't modify files directly, and for ops work that's enough.</p>
<h3>Thinking mode</h3>
<p>Gemma 4's thinking is toggled by a <code>&lt;|think|&gt;</code> token at the start of the system prompt. Add it where you want reasoning, leave it out where you want speed.</p>
<p>I landed on: thinking on for the main agent (it's doing routing, worth the latency), off for Ops ("restart immich" shouldn't require a philosophical detour), on for Librarian (deciding what's a dedup candidate needs actual reasoning).</p>
<hr />
<h2>What a real turn looks like</h2>
<p>Me, at the Pi prompt:</p>
<blockquote>
<p>immich is slow and the homepage is half-loaded</p>
</blockquote>
<p>Main agent routes it to Ops. Ops runs <code>docker ps | grep immich</code>, sees <code>immich-server</code> is in a restart loop, tails the last 200 log lines, spots a Postgres connection timeout, restarts <code>immich-server</code>, waits 5s, curls the healthcheck, emits:</p>
<pre><code class="language-json">{
  "event": "container.restart",
  "host": "friday",
  "service": "immich-server",
  "requested_by": "main",
  "correlation_id": "2026-04-17T09:22:11Z-immich-server-restart",
  "status": "ok",
  "duration_ms": 5840
}
</code></pre>
<p>Main agent replies:</p>
<blockquote>
<p>Ops restarted immich-server after a DB timeout. Health check green. Receipt 2026-04-17T09:22:11Z-immich-server-restart.</p>
</blockquote>
<p>That <code>correlation_id</code> shows up in the Discord notification, in the container log, and if I need it, in Grafana. No arguments later about what happened when.</p>
<hr />
<h2>Gotchas that cost me time</h2>
<h3>The model is bigger than the VRAM</h3>
<p>I didn't believe this until the first long session. E4B Q4_K_M is 9.6GB, the 4050 is 6GB, offload happens, sustained throughput suffers. If you're spec'ing a dedicated crew machine, a 4060 Ti 8GB or 4070 is a much happier place than what I'm running on.</p>
<h3>Thinking-off Ops is confidently wrong on ambiguous input</h3>
<p>This is the one that cost me the Jellyfin reindex I mentioned at the top. "Restart the photo thing" is a perfectly reasonable thing to type at 10pm on a Saturday and thinking-off Ops will parse it, pick the first container whose metadata mentions photos, and go. The model isn't dumb, it's just fast and decisive about the wrong thing.</p>
<p>Fix is structural: make the main agent do the disambiguating (it's thinking-on), and require it to hand Ops a fully-qualified service name. Line 2 of the main agent's <code>AGENTS.md</code> is that rule. I added it after the Jellyfin incident, not before.</p>
<h3>Sub-agents don't share memory</h3>
<p>My first instinct was to let Ops see what Librarian just did. Fought this for an evening before I realised the whole point of spawn mode is isolated context, and what I actually wanted was for the receipts to be the shared memory. Main agent reads both receipts, correlates them by timestamp, done. No cross-specialist context bleed, no growing context window, no confusion about whose turn it is.</p>
<p>pi-subagent does have a <code>fork</code> mode that inherits parent context. It's tempting for follow-up tasks. I'm staying on <code>spawn</code> because the cost in tokens and the risk of leaking unrelated context into a specialist's head both feel worse than the inconvenience of one extra turn.</p>
<h3>Pi is YOLO and your prompts are not a security boundary</h3>
<p>Allow-lists in a system prompt are a style guide the model mostly follows. On a good day. If your Ops agent decides <code>docker system prune</code> is a clever shortcut, nothing in Pi will stop it.</p>
<p>What actually works:</p>
<ul>
<li><p>Run Pi in a container or VM with scoped mounts and a non-root user</p>
</li>
<li><p>Use path-protection or permission-gate extension examples for belt-and-braces</p>
</li>
<li><p>Keep <code>tools</code> in the subagent frontmatter as narrow as possible, <code>read,bash</code> is the smallest useful set</p>
</li>
</ul>
<p>I run mine in a podman container with <code>/srv/media</code> mounted read-only for Librarian and a docker socket proxy (tecnativa/docker-socket-proxy) for Ops so it can list and restart containers but can't do <code>rm</code> or <code>prune</code> even if it wanted to. Took a weekend to get right. I was also paranoid enough that I initially didn't give Librarian any bash at all, and then realised it needed bash to call the Immich CLI, and re-added it with a narrower allow-list in the prompt.</p>
<h3>The Librarian wanted to delete things on its first test run</h3>
<p>I typed "clean up duplicate photos" as a throwaway test. Librarian dry-ran (thanks to the prompt rule), reported around 847 dedup candidates, touched nothing. Good agent. If I had skipped the dry-run-by-default rule in its prompt I would be restoring from backup right now, and backups are fiction until you've tested them, which mercifully I had [after a hard disk event last year that I still haven't written about].</p>
<hr />
<h2>My setup in tl;dr</h2>
<ul>
<li><p><strong>Laptop</strong>: i5 12th gen, RTX 4050 6GB VRAM, 16GB system RAM</p>
</li>
<li><p><strong>Runtime</strong>: Ollama serving <code>gemma4:e4b</code> (Q4_K_M, 9.6GB, partial GPU offload)</p>
</li>
<li><p><strong>Agent harness</strong>: Pi + mjakl/pi-subagent</p>
</li>
<li><p><strong>Crew</strong>: Main + Ops + Librarian (all spawn mode)</p>
</li>
<li><p><strong>Thinking</strong>: on for Main/Librarian, off for Ops</p>
</li>
<li><p><strong>Context cap</strong>: 16K per agent</p>
</li>
<li><p><strong>Tools per specialist</strong>: <code>read,bash</code> only</p>
</li>
<li><p><strong>Isolation</strong>: podman container, scoped mounts, docker-socket-proxy for Ops</p>
</li>
<li><p><strong>Receipts</strong>: JSON to stdout, tailed to a log file, Discord webhook for failures, eventually into Grafana</p>
</li>
</ul>
<hr />
<h2>What I want to try next</h2>
<ul>
<li><p><strong>Route Ops to</strong> <code>friday</code> <strong>via SSH</strong> and keep the agent runtime on the laptop. The SSH extension example in pi-mono looks straightforward. No reason the crew needs to live on the machine it manages.</p>
</li>
<li><p><strong>Do the whole thing again with Mario's slash-command pattern</strong> instead of a sub-agent extension. Same prompts, same roles, <code>pi --print</code> spawns. Measure tokens, latency, and whether it feels different. I might have over-engineered this.</p>
</li>
<li><p><strong>A Scribe specialist</strong> for weekly homelab digests. "What changed on friday this week" → markdown file, committed to my notes repo. Not critical, would be nice.</p>
</li>
<li><p><strong>Dynamic thinking toggle</strong>: Ops flips to thinking-on automatically when the input contains any error signature it hasn't seen before. Right now I do this manually with <code>/model</code>.</p>
</li>
<li><p><strong>26B A4B on the OptiPlex someday</strong>: Long shot ... that box is CPU-only and I think the Jensen Huang tax for more VRAM is still not in budget, maybe a mac mini soon o.0</p>
</li>
</ul>
<hr />
<h2>Closing thoughts</h2>
<p>The thing I keep learning with local LLMs is that the model isn't the hard part anymore. Gemma 4 E4B is honestly great at function calling, even when it's half-swapped to system RAM. The hard part is the scaffolding. Who's allowed to do what, how actions leave a trail, what's reversible.</p>
<p>Pi gets out of the way on exactly the right axis. Three Markdown files and a docker-socket-proxy, and I have a homelab operator that can restart a service, check on my photo library, and leave a receipt for every action. None of it is flashy. The VRAM ceiling is real and I'll want more of it before the year is out.</p>
<p>But it's mine. And that's the point.</p>
]]></content:encoded></item><item><title><![CDATA[Building a Clean DNS Stack at Home]]></title><description><![CDATA[Home networking looks simple until you actually start touching it. What began as a straightforward plan to run AdGuard Home across the network somehow turned into a small odyssey involving WDS bridging, router hardware bingo, and a very short-lived o...]]></description><link>https://blog.sanskarjaiswal.dev/building-a-clean-dns-stack-at-home</link><guid isPermaLink="true">https://blog.sanskarjaiswal.dev/building-a-clean-dns-stack-at-home</guid><category><![CDATA[wifi bridging]]></category><category><![CDATA[adguardhome]]></category><category><![CDATA[tailscale]]></category><category><![CDATA[home networking]]></category><category><![CDATA[Homelab]]></category><category><![CDATA[DNS Filtering]]></category><dc:creator><![CDATA[Sanskar Jaiswal]]></dc:creator><pubDate>Mon, 01 Dec 2025 09:27:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764580785975/6b6302af-cc6d-4ede-b95a-46e0d0ce67a4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Home networking looks simple until you actually start touching it. What began as a straightforward plan to run AdGuard Home across the network somehow turned into a small odyssey involving WDS bridging, router hardware bingo, and a very short-lived optimism that a TP-Link device would cooperate with OpenWrt.</p>
<p>This is a summary of how the entire puzzle came together and what actually worked.</p>
<hr />
<h2 id="heading-the-goal">The Goal</h2>
<p>The end state I wanted was:</p>
<ul>
<li><p>AdGuard Home acting as the authoritative DNS for the entire network</p>
</li>
<li><p>A secondary router connected over WDS to extend the network cleanly</p>
</li>
<li><p>The ability to filter DNS traffic even when connected through Tailscale</p>
</li>
<li><p>Zero reliance on browser extensions</p>
</li>
<li><p>And ideally, OpenWrt somewhere in the stack to handle proper DNS hijacking</p>
</li>
</ul>
<p>The idea sounded neat. The execution required a bit more patience.</p>
<hr />
<h2 id="heading-step-1-getting-adguard-home-running">Step 1: Getting AdGuard Home Running</h2>
<p>The AdGuard setup was straightforward. Once installed on the homelab machine, it handled all DNS queries on the LAN. Even upstreaming through 100.100.100.100 (Tailscale’s MagicDNS), AdGuard continued logging and filtering normally.<br />This was expected behavior: MagicDNS only resolves queries for the client device, not AdGuard itself.</p>
<p>Once everything pointed to AdGuard, the network immediately felt cleaner. No ads, no trackers, and visible query insights.</p>
<hr />
<h2 id="heading-step-2-fixing-wifi-coverage-with-wds-bridging">Step 2: Fixing WiFi Coverage With WDS Bridging</h2>
<p>This is where things got interesting.</p>
<p>To avoid pulling Ethernet across the house, I set up a <strong>WDS bridge</strong> between the main router and a secondary access point.<br />WDS worked surprisingly well:</p>
<ul>
<li><p>Devices connected through the bridge still routed DNS requests to AdGuard</p>
</li>
<li><p>The network remained a single broadcast domain</p>
</li>
<li><p>No double NAT</p>
</li>
<li><p>Same SSID and smooth roaming</p>
</li>
</ul>
<p>This part of the stack was the most cooperative, which is rare in home networking.</p>
<hr />
<h2 id="heading-step-3-the-quest-to-flash-openwrt">Step 3: The Quest to Flash OpenWrt</h2>
<p>This part did not go as smoothly.</p>
<p>When I checked the TP-Link model on OpenWrt’s supported hardware list, things looked promising.<br />But TP-Link’s naming scheme is basically a puzzle:</p>
<ul>
<li><p>Amazon labels it <em>Archer AC1200</em></p>
</li>
<li><p>The actual device banner says <em>Archer C6</em></p>
</li>
<li><p>OpenWrt lists support for specific revisions</p>
</li>
<li><p>The one in hand showed <em>v4.8</em>, which OpenWrt does not support at all</p>
</li>
</ul>
<p>The enthusiasm lasted about ten seconds.</p>
<p>I learned (again) that:</p>
<ul>
<li><p>TP-Link reuses model names across completely different chipsets</p>
</li>
<li><p>Hardware revisions often differ silently within the same listing</p>
</li>
<li><p>Flashing unsupported hardware is a great way to manufacture a brick</p>
</li>
<li><p>OpenWrt’s documentation is accurate, Amazon’s product pages are not</p>
</li>
</ul>
<p>So OpenWrt was off the table for the TP-Link we had.</p>
<hr />
<h2 id="heading-step-4-confirming-adguard-works-over-tailscale">Step 4: Confirming AdGuard Works Over Tailscale</h2>
<p>After the WDS setup and router disappointment, I finished the Tailscale integration.</p>
<p>Once the AdGuard IP was added in the Tailscale admin DNS settings:</p>
<ul>
<li><p>Every device connected through Tailscale automatically used AdGuard</p>
</li>
<li><p>MagicDNS still handled internal <code>.local</code> resolution</p>
</li>
<li><p>Full filtering worked whether at home or remote</p>
</li>
<li><p>Query logs remained consistent across both scenarios</p>
</li>
</ul>
<p>This delivered the “private DNS anywhere” experience I was aiming for.</p>
<h2 id="heading-final-network-layout">Final Network Layout</h2>
<p>Here’s what the finished setup looked like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764580493326/a7ab4662-b93c-4aa2-b82c-14a68d987575.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Main router provides WAN and base WiFi</p>
</li>
<li><p>Secondary router connects via WDS bridge, extending WiFi</p>
</li>
<li><p>Both broadcast the same LAN</p>
</li>
<li><p>AdGuard Home sits inside the LAN as the primary DNS</p>
</li>
<li><p>Tailscale routes remote DNS queries back to AdGuard</p>
</li>
</ul>
<p>Even without OpenWrt, the combination works reliably.</p>
<hr />
<h2 id="heading-closing-thoughts">Closing Thoughts</h2>
<p>The setup is a good reminder that home networking is equal parts planning and improvisation. The AdGuard portion was easy. The WDS bridge behaved surprisingly well. The router hardware roulette was less pleasant, but at least it revealed why OpenWrt still has a giant red disclaimer next to TP-Link devices.</p>
<p>In the end, I still achieved:</p>
<ul>
<li><p>Clean network-wide DNS filtering</p>
</li>
<li><p>Remote filtering via Tailscale</p>
</li>
<li><p>A stable extended WiFi setup</p>
</li>
<li><p>Zero browser extensions</p>
</li>
<li><p>Full visibility into queries</p>
</li>
</ul>
<p>As long as you choose your hardware carefully, this is one of the most practical upgrades you can make to a home network without major rewiring.</p>
]]></content:encoded></item><item><title><![CDATA[Running LLMs Locally: Why It's Important and How to Do It]]></title><description><![CDATA[Why I Started Looking Beyond Cloud APIs
Most of my first experiences with LLMs were through OpenAI’s API and Azure OpenAI Service at work.They’re honestly great when you’re just starting out. You hit an endpoint, you get GPT-4 level answers, and life...]]></description><link>https://blog.sanskarjaiswal.dev/running-llms-locally-why-its-important-and-how-to-do-it</link><guid isPermaLink="true">https://blog.sanskarjaiswal.dev/running-llms-locally-why-its-important-and-how-to-do-it</guid><category><![CDATA[llm]]></category><category><![CDATA[self-hosted]]></category><category><![CDATA[Homelab]]></category><category><![CDATA[ Edge AI]]></category><category><![CDATA[mlops]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[AI]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[privacy]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Linux]]></category><category><![CDATA[ollama]]></category><category><![CDATA[langchain]]></category><category><![CDATA[comfyui]]></category><category><![CDATA[stable diffusion]]></category><dc:creator><![CDATA[Sanskar Jaiswal]]></dc:creator><pubDate>Mon, 15 Sep 2025 12:40:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757939508279/5234d877-2c28-4bb3-988d-c9aa6168f745.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-why-i-started-looking-beyond-cloud-apis">Why I Started Looking Beyond Cloud APIs</h1>
<p>Most of my first experiences with LLMs were through <strong>OpenAI’s API</strong> and <strong>Azure OpenAI Service</strong> at work.<br />They’re honestly great when you’re just starting out. You hit an endpoint, you get GPT-4 level answers, and life feels good. No GPU drivers, no CUDA errors, no headaches.</p>
<p>But after a while I started hitting the usual walls:</p>
<ul>
<li><p>The bill at the end of the month started looking like my rent. [exaggerated ;)]</p>
</li>
<li><p>I had no real control over what was happening under the hood.</p>
</li>
<li><p>Privacy was always in the back of my mind. Some data just doesn’t feel right sending off to the cloud.</p>
</li>
<li><p>And of course, you’re completely at the mercy of whatever models and limits the provider decides.</p>
</li>
</ul>
<p>So I thought… why not run some of this stuff myself? Worst case I burn some hours fighting Docker. Best case I end up with my own AI assistant that doesn’t need an internet connection to work.</p>
<hr />
<h2 id="heading-whats-possible-these-days">What’s Possible These Days</h2>
<p>Running your own models used to be something only labs with racks of GPUs could do. Now it’s surprisingly doable at home.</p>
<h3 id="heading-models-worth-trying">Models worth trying</h3>
<ul>
<li><p><strong>Mistral</strong>: small, fast, scary good at reasoning for its size.</p>
</li>
<li><p><strong>LLaMA 2</strong>: the “default” open model. Huge community, easy to fine tune.</p>
</li>
<li><p><strong>Falcon</strong>: solid multilingual capabilities.</p>
</li>
<li><p><strong>Gemma and StableLM</strong>: lighter models that don’t need monster GPUs.</p>
</li>
</ul>
<h3 id="heading-tools-that-make-life-easier">Tools that make life easier</h3>
<ul>
<li><p><strong>Ollama</strong>: probably the smoothest way to run models locally.</p>
</li>
<li><p><strong>Text Generation Inference (TGI)</strong>: if you want a proper serving stack.</p>
</li>
<li><p><strong>LangChain and LangGraph</strong>: orchestration so your models can actually do more than parrot back text.</p>
</li>
<li><p><strong>Model Context Protocol (MCP)</strong>: lets LLMs hook into tools and data. I use this with my homelab assistant.</p>
</li>
</ul>
<h3 id="heading-hardware-reality-check">Hardware reality check</h3>
<ul>
<li><p>Mid-size models like 7B to 13B will happily run on a decent GPU with 12 to 16 GB of VRAM.</p>
</li>
<li><p>If you don’t have that, quantized models can limp along on CPU with enough RAM.</p>
</li>
<li><p>The giant 70B models are still a no go unless you own a data center or happen to be best friends with Leather-Jacket-Man (<a target="_blank" href="https://www.google.com/search?client=firefox-b-d&amp;q=nvidia+ceo">Jensen Huang</a>).</p>
</li>
</ul>
<hr />
<h2 id="heading-my-setup-in-tldr">My Setup in tl;dr</h2>
<ul>
<li><p>Homelab: OptiPlex i3-7100T, CPU only, quantized models. Runs the always-on stuff: Jarvis-style control, finance RAG, FastMCP monitors that ping me on Discord.</p>
</li>
<li><p>Laptop: i5 12th gen + RTX 4050 6 GB. Handles heavier chat and image work with Stability Matrix + ComfyUI, and LM Studio for local chat.</p>
</li>
</ul>
<p>Homelab = reliable background. Laptop = GPU playground.</p>
<hr />
<h2 id="heading-cloud-vs-local-the-reality-check">Cloud vs Local: The Reality Check</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Thing</td><td>Cloud APIs (OpenAI / Azure)</td><td>Self Hosted LLMs</td></tr>
</thead>
<tbody>
<tr>
<td>Setup</td><td>Call an API and you’re done</td><td>Get ready to fight drivers and config files</td></tr>
<tr>
<td>Models</td><td>GPT-4, GPT-4o, all the shiny toys</td><td>Mostly open source like Mistral or LLaMA</td></tr>
<tr>
<td>Latency</td><td>Pretty low but internet dependent</td><td>Can be higher especially on CPU only</td></tr>
<tr>
<td>Cost</td><td>Pay per token, sometimes feels like highway robbery</td><td>One time hardware cost, then just power bills</td></tr>
<tr>
<td>Privacy</td><td>Data leaves your network</td><td>Data never leaves your machine</td></tr>
<tr>
<td>Control</td><td>You tweak a few parameters at best</td><td>Full control: quantization, caching, fine tuning</td></tr>
<tr>
<td>Scaling</td><td>Basically infinite</td><td>Limited to what’s inside your case</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-lessons-ive-learned">Lessons I’ve Learned</h2>
<ul>
<li><p>Don’t try to run the biggest model first. Start with a 2B or 7B model and see what happens.</p>
</li>
<li><p>Quantization is your friend. It’s basically magic for smaller hardware.</p>
</li>
<li><p>The real power comes when you connect models to things. Scripts, dashboards, automations… that’s where it feels useful.</p>
</li>
<li><p>Expect things to break. You’ll see hallucinations, weird limits, maybe even kernel panics if you get lucky.</p>
</li>
<li><p>The open source scene moves ridiculously fast. A year ago Mistral didn’t even exist, now it’s everywhere.</p>
</li>
</ul>
<hr />
<h2 id="heading-what-i-want-to-try-next">What I Want to Try Next</h2>
<ul>
<li><p>Multimodal models that can handle both text and images.</p>
</li>
<li><p>A hybrid setup where I keep local models for everyday use but call the cloud for really heavy lifting.</p>
</li>
<li><p>Fine tuning on my own data so my assistant understands my configs and logs without me explaining every time.</p>
</li>
<li><p>Adding a proper GPU node in the homelab so I don’t have to lean on my laptop as much. [long shot.. I’ll be spending that money elsewhere]</p>
</li>
</ul>
<hr />
<h2 id="heading-how-to-get-a-local-llm-running-in-10-minutes">How to Get a Local LLM Running in 10 Minutes</h2>
<p>You have two easy paths. Pick your vibe.</p>
<h3 id="heading-option-a-click-and-go-with-lm-studio">Option A: Click-and-go with LM Studio</h3>
<ol>
<li><p>Install LM Studio. Grab the installer for your OS from the official site.</p>
</li>
<li><p>Download a model. Open LM Studio and use the Discover tab to fetch something like Mistral 7B, Qwen, or Gemma.</p>
</li>
<li><p>Chat. Hit New Chat, pick the model, and talk to your computer like it owes you answers.</p>
</li>
</ol>
<p>Why this path? Zero terminal work, fast feedback, built-in model browser. Great for laptops and first-timers.</p>
<h3 id="heading-option-b-terminal-friendly-with-ollama">Option B: Terminal-friendly with Ollama</h3>
<ol>
<li><p>Install Ollama. The easiest way is their one-liner:</p>
<p> curl -fsSL <a target="_blank" href="https://ollama.com/install.sh">https://ollama.com/install.sh</a> | sh</p>
<p> On Fedora you can even:</p>
<p> sudo dnf install ollama</p>
</li>
<li><p>Pull and run a model. For a solid starter:</p>
<p> ollama run mistral</p>
<p> The first run downloads the weights, then drops you in an interactive prompt.</p>
</li>
<li><p>Use it from apps. Many local tools can point to the Ollama endpoint. If you know LangChain or LlamaIndex, you can wire it up in a few lines.</p>
</li>
</ol>
<p>Why this path? Scriptable, container-friendly, and good for homelabs.</p>
<h3 id="heading-bonus-image-generation-with-stability-matrix-comfyui">Bonus: Image generation with Stability Matrix + ComfyUI</h3>
<p>If you want images too:</p>
<ol>
<li><p>ComfyUI core. Install by cloning the repo and installing dependencies, then run <code>python</code> <a target="_blank" href="http://main.py"><code>main.py</code></a>.</p>
</li>
<li><p>Quality-of-life. Add <strong>ComfyUI-Manager</strong> to install and manage custom nodes from inside ComfyUI.</p>
</li>
<li><p>Use Stability Matrix as the front end and manager. It streamlines ComfyUI setup and running workflows so you spend less time chasing missing nodes and more time making cool images.</p>
</li>
</ol>
<hr />
<h3 id="heading-tiny-gotchas-that-save-hours">Tiny gotchas that save hours</h3>
<ul>
<li><p>If a model won’t load, try a smaller one or a more aggressive quantized build.</p>
</li>
<li><p>Keep an eye on VRAM usage. 7B is comfy on 8 to 12 GB, 13B prefers 12 to 16 GB. If you’re on CPU, expect slower tokens.</p>
</li>
<li><p>Don’t benchmark on first run. Caches warm up, downloads finish, and everything speeds up a bit after.</p>
</li>
</ul>
<p>That’s it. You can be local-first by dinner and bragging about it by dessert.</p>
<hr />
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>Running LLMs locally isn’t about replacing GPT-4. That’s still out of reach for most of us.</p>
<p>It’s about ownership. My data stays with me, my costs are predictable, and I get to experiment however I want.</p>
<p>For me this setup has turned into a mix of practical tools and just plain fun. I’ve got a Jarvis-like assistant running in the background, a finance bot that can actually read my own files, monitoring agents that bug me on Discord, and on the laptop I can spin up Stable Diffusion for image generation whenever I feel like it.</p>
<p>It’s not perfect, but it’s mine. And that’s kind of the point.</p>
]]></content:encoded></item><item><title><![CDATA[My Engineering Operating Manual - Patterns, Rituals, and Receipts]]></title><description><![CDATA[These practices work for me today and will evolve. A lot of this came from seniors who let me shadow their thinking, handed me better questions, and saved me from clever mistakes.

This post isn’t my stack (that’s already written). It’s the stuff I r...]]></description><link>https://blog.sanskarjaiswal.dev/my-engineering-operating-manual-patterns-rituals-and-receipts</link><guid isPermaLink="true">https://blog.sanskarjaiswal.dev/my-engineering-operating-manual-patterns-rituals-and-receipts</guid><category><![CDATA[operating-manual]]></category><category><![CDATA[engineering]]></category><category><![CDATA[Homelab]]></category><category><![CDATA[SelfHosting]]></category><category><![CDATA[fastmcp]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Sanskar Jaiswal]]></dc:creator><pubDate>Thu, 04 Sep 2025 20:54:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757019165783/db7c33c0-168b-43a4-b482-9ffa7544fd08.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>These practices work for me today and will evolve. A lot of this came from seniors who let me shadow their thinking, handed me better questions, and saved me from clever mistakes.</p>
</blockquote>
<p>This post isn’t my stack (that’s already written). It’s the stuff I reach for every week: patterns, tiny rituals, and templates you can borrow, and that I’ll keep refining as I grow.</p>
<blockquote>
<p>If you want wiring and app lists, read these and come back:</p>
<ul>
<li><p><a target="_blank" href="https://blog.sanskarjaiswal.dev/i-made-my-homelab-talk-to-me-using-claude-and-fastmcp">I Made My Homelab Talk to Me Using Claude and FastMCP</a></p>
</li>
<li><p><a target="_blank" href="https://blog.sanskarjaiswal.dev/my-self-hosting-stack-everything-i-run-and-how-it-all-connects">My Self-Hosting Stack: Everything I Run, and How It All Connects</a></p>
</li>
<li><p><a target="_blank" href="https://blog.sanskarjaiswal.dev/the-real-story-of-self-hosting-why-i-love-it-and-sometimes-hate-it">The Real Story of Self-Hosting: Why I Love It (and Sometimes Hate It)</a></p>
</li>
</ul>
</blockquote>
<hr />
<h2 id="heading-the-line-that-stuck">The line that stuck</h2>
<blockquote>
<p>“Tony Stark built this in a cave with a box of scraps.”</p>
</blockquote>
<p>It’s the person, not the tools. That’s been my lens since school, not top-3 in marks, but the kid teachers called for the weird projects. My first end-to-end build was an Arduino ambient light for my monitor (first GitHub repo a bit over 5 years since then, still runs unchanged). Make it once; make it last.</p>
<hr />
<h2 id="heading-operating-manual-vows-not-vibes">Operating manual (vows, not vibes)</h2>
<ul>
<li><p><strong>Receipt-driven automation.</strong> Every action leaves a trail (timestamp, actor, inputs, outcome). If it ran, I can prove it.</p>
</li>
<li><p><strong>Reversible by design.</strong> Feature flags, dry runs, versioned configs, parametric parts. No one-way doors.</p>
</li>
<li><p><strong>Observability before cleverness.</strong> A plain counter with a timestamp beats a fancy graph I don’t trust.</p>
</li>
<li><p><strong>Guardrails over heroics.</strong> Checklists and preflight beat “ninja fixes” in prod. Learned the hard way.</p>
</li>
<li><p><strong>Dynamic by default.</strong> Parameters &gt; literals. Today’s edge case is tomorrow’s requirement.</p>
</li>
<li><p><strong>Backups that restore.</strong> Untested backups are fiction.</p>
</li>
<li><p><strong>Future-me is a teammate.</strong> If I can’t finish now, I leave a clean path for him.</p>
</li>
</ul>
<blockquote>
<p>Note: I’m documenting what’s working <em>now</em>. If a senior shows me a safer/faster path, I’ll adopt it and update my practice.</p>
</blockquote>
<hr />
<h2 id="heading-the-45-minute-debug-ritual">The 45-minute debug ritual</h2>
<p><strong>Goal:</strong> get from <em>symptom</em> → <em>measured cause</em> or a clean rollback.</p>
<ol>
<li><p><strong>Reproduce (≤10 min).</strong> Smallest input that fails. Write it down.</p>
</li>
<li><p><strong>Instrument (≤10 min).</strong> Add a counter/log near the suspected seam. If I can’t measure it, I’m guessing.</p>
</li>
<li><p><strong>Isolate (≤10 min).</strong> Toggle one variable at a time (flag, env, route).</p>
</li>
<li><p><strong>Decide (≤5 min).</strong> Fix now if &lt;10 min; else rollback with a note.</p>
</li>
<li><p><strong>Receipt (≤10 min).</strong> Post a short “what/why/where” with the log line.</p>
</li>
</ol>
<p><strong>Paste-in template:</strong></p>
<pre><code class="lang-plaintext">Issue: &lt;one sentence&gt;
Smallest repro: &lt;command/url/config&gt;
Suspected seam: &lt;component or part&gt;
Evidence: &lt;log line or metric delta&gt;
Decision: &lt;fix/rollback/defer&gt; (why)
Receipt: &lt;paste link or id&gt;
Follow-ups: &lt;one or two&gt;
</code></pre>
<h2 id="heading-friction-log-tiny-habit-big-payoff">Friction log (tiny habit, big payoff)</h2>
<p>Every time something feels slower than it should, I jot one line. Review weekly; fix two items.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Date</td><td>Friction</td><td>Cost</td><td>Choke point</td><td>Next step</td></tr>
</thead>
<tbody>
<tr>
<td>2025-09-02</td><td>Blog routing mismatch</td><td>1h context</td><td>How hashnode handles certs (Oversight on my end)</td><td>add preflight check</td></tr>
<tr>
<td>2025-08-28</td><td>Remote reachability uncertainty</td><td>mental tax</td><td>network edge</td><td>timestamped “reachable?” check</td></tr>
</tbody>
</table>
</div><p>Why it works: this turns annoyance into a queue, not a mood.</p>
<h2 id="heading-receipts-the-smallest-useful-webhook">Receipts: the smallest useful webhook</h2>
<p>Boring JSON I can grep later:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"event"</span>: <span class="hljs-string">"container.restart"</span>,
  <span class="hljs-attr">"host"</span>: <span class="hljs-string">"friday"</span>,
  <span class="hljs-attr">"service"</span>: <span class="hljs-string">"immich"</span>,
  <span class="hljs-attr">"requested_by"</span>: <span class="hljs-string">"cli/sanskar"</span>,
  <span class="hljs-attr">"correlation_id"</span>: <span class="hljs-string">"2025-09-05T12:04:33Z-immich-restart"</span>,
  <span class="hljs-attr">"status"</span>: <span class="hljs-string">"ok"</span>,
  <span class="hljs-attr">"duration_ms"</span>: <span class="hljs-number">1432</span>
}
</code></pre>
<p>That <code>correlation_id</code> shows up in logs, the chat message, and (if needed) Grafana. No arguments later.</p>
<hr />
<h2 id="heading-build-vs-buy-vs-self-host-decide-in-5-minutes">Build vs Buy vs Self-host (decide in 5 minutes)</h2>
<p>Score 1-5 and multiply, pick the highest <strong>total</strong> (not the loudest single number).</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Factor</td><td>Build</td><td>Buy</td><td>Self-host</td></tr>
</thead>
<tbody>
<tr>
<td>Time-to-value</td><td></td><td></td><td></td></tr>
<tr>
<td>Control/lock-in</td><td></td><td></td><td></td></tr>
<tr>
<td>Learning value</td><td></td><td></td><td></td></tr>
<tr>
<td>Ongoing effort</td><td></td><td></td><td></td></tr>
<tr>
<td>Failure blast</td><td></td><td></td><td></td></tr>
<tr>
<td>Cost (12 months)</td><td></td><td></td></tr>
</tbody>
</table>
</div><p><strong>Rule of thumb:</strong> if <code>learning × control</code> doesn’t beat <code>time-to-value × effort</code>, don’t build.</p>
<hr />
<h2 id="heading-parametric-parts-that-actually-fit-hello-3d-printer">Parametric parts that actually fit (Hello! 3D Printer)</h2>
<ul>
<li><p><strong>Datum first.</strong> Pick the surface that <em>must</em> align, reference everything from it.</p>
</li>
<li><p><strong>Clearance defaults.</strong> Start with +0.3–0.5 mm on FDM fits; adjust after one print.</p>
</li>
<li><p><strong>Stress lines.</strong> Add fillets at inside corners; avoid layer-line shear on clamp tabs.</p>
</li>
<li><p><strong>Swap-cost low.</strong> One variable per critical dimension; no magic numbers.</p>
</li>
<li><p><strong>Test as a draft.</strong> First print is a measurement tool, not a masterpiece.</p>
</li>
</ul>
<p>This is how the <strong>door-frame projector mount</strong> happened (rented apartment, no drilling). Sketch → parametric → print → tweak → done.<br />Pro tip I still haven’t taken: <strong>buy a digital vernier caliper.</strong></p>
<h2 id="heading-boring-checks-i-value-calm-gt-clever">Boring checks I value (calm &gt; clever)</h2>
<ul>
<li><p><strong>“Is the server reachable remotely?”</strong> with a timestamp. If that isn’t green, nothing else matters.</p>
</li>
<li><p><strong>“Last backup restore validated?”</strong> yes/no + date.</p>
</li>
<li><p><strong>“What changed?”</strong> 24-hour diff of container images and configs.</p>
</li>
</ul>
<p>Tools I don’t want to give up: <strong>Jellyfin</strong> (no rental brain for media) and <strong>Immich</strong> (memories stay near me).</p>
<h2 id="heading-what-im-building-toward-no-hype-just-direction">What I’m building toward (no hype, just direction)</h2>
<ul>
<li><p><strong>Lightweight local LLMs</strong> that act with receipts (grounded tools, auditable logs).</p>
</li>
<li><p><strong>More additive manufacturing,</strong> fewer zip-ties, publish the parametric files when they’re solid.</p>
</li>
<li><p><strong>Work upskilling</strong> with the same honesty I use at home: observability first, automation second.</p>
</li>
<li><p><strong>Restore-day in a box:</strong> clean hardware → one command → verified services.</p>
</li>
</ul>
<h2 id="heading-acknowledgments">Acknowledgments</h2>
<p>Thanks to the seniors and teammates who reviewed my checklists, asked the annoying-but-right questions, and taught me to prefer guardrails over heroics. Any good ideas here are borrowed generously, the mistakes are mine.</p>
]]></content:encoded></item><item><title><![CDATA[I Made My Homelab Talk to Me Using Claude and FastMCP]]></title><description><![CDATA[Most of us build homelabs to tinker, automate, and take control of our infrastructure. But somewhere between Docker containers, backups, and uptime monitoring, it becomes a lot to keep track of. I didn’t want to SSH into my server every time I needed...]]></description><link>https://blog.sanskarjaiswal.dev/i-made-my-homelab-talk-to-me-using-claude-and-fastmcp</link><guid isPermaLink="true">https://blog.sanskarjaiswal.dev/i-made-my-homelab-talk-to-me-using-claude-and-fastmcp</guid><category><![CDATA[fastmcp]]></category><category><![CDATA[claude.ai]]></category><category><![CDATA[Model Context Protocol]]></category><category><![CDATA[mcp server]]></category><dc:creator><![CDATA[Sanskar Jaiswal]]></dc:creator><pubDate>Sat, 28 Jun 2025 18:43:22 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1751136102557/55969db1-9daf-457d-a290-fb7ab69eadd0.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most of us build homelabs to tinker, automate, and take control of our infrastructure. But somewhere between Docker containers, backups, and uptime monitoring, it becomes a lot to keep track of. I didn’t want to SSH into my server every time I needed a quick answer like “How much space is left on my drive?” or “Is Tailscale still running?”</p>
<p>So I built something better. Now I just <strong>ask Claude</strong>, and it tells me.</p>
<p>This post walks through how I wired up my homelab using <a target="_blank" href="https://gofastmcp.com">FastMCP</a> and <a target="_blank" href="https://claude.ai">Claude Desktop</a>, letting me run system queries through natural language and get intelligent responses from my own infrastructure.</p>
<hr />
<h2 id="heading-what-is-fastmcp">What Is FastMCP?</h2>
<p>If you're not familiar with it, <strong>FastMCP</strong> is a Python framework that lets you expose tools, resources, and prompts to LLMs via the Model Context Protocol (MCP). That means you can define Python functions, decorate them with <code>@tool</code>, and suddenly they’re callable from Claude, ChatGPT, or even your own HTTP clients.</p>
<p>Think of it as:</p>
<ul>
<li><p><code>FastAPI</code> for LLMs</p>
</li>
<li><p>but typed</p>
</li>
<li><p>and purpose-built for multi-tool interactions</p>
</li>
</ul>
<hr />
<h2 id="heading-what-i-wanted-to-do">What I Wanted to Do</h2>
<p>Here’s what I was aiming for:</p>
<ul>
<li><p>Ask Claude questions like “What’s the disk usage?” or “Are my containers healthy?”</p>
</li>
<li><p>Run system-level commands via Python securely</p>
</li>
<li><p>Keep everything running locally or over Tailscale, no public exposure</p>
</li>
<li><p>Build it once and just keep extending it</p>
</li>
</ul>
<hr />
<h2 id="heading-step-1-writing-an-mcp-server">Step 1: Writing an MCP Server</h2>
<p>This is where the magic starts. Here’s a basic MCP server using FastMCP:</p>
<pre><code class="lang-python"><span class="hljs-comment"># homelab_server.py</span>

<span class="hljs-keyword">from</span> fastmcp.server <span class="hljs-keyword">import</span> Server
<span class="hljs-keyword">from</span> fastmcp.tools <span class="hljs-keyword">import</span> tool
<span class="hljs-keyword">import</span> psutil, subprocess

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">HomelabServer</span>(<span class="hljs-params">Server</span>):</span>
<span class="hljs-meta">    @tool</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">disk_usage</span>(<span class="hljs-params">self</span>) -&gt; str:</span>
        usage = psutil.disk_usage(<span class="hljs-string">'/'</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"<span class="hljs-subst">{usage.percent}</span>% used — <span class="hljs-subst">{usage.used // (<span class="hljs-number">1024</span>**<span class="hljs-number">3</span>)}</span>GB of <span class="hljs-subst">{usage.total // (<span class="hljs-number">1024</span>**<span class="hljs-number">3</span>)}</span>GB"</span>

<span class="hljs-meta">    @tool</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">tailscale_status</span>(<span class="hljs-params">self</span>) -&gt; str:</span>
        result = subprocess.run([<span class="hljs-string">'tailscale'</span>, <span class="hljs-string">'status'</span>], capture_output=<span class="hljs-literal">True</span>, text=<span class="hljs-literal">True</span>)
        <span class="hljs-keyword">return</span> result.stdout.strip()

<span class="hljs-meta">    @tool</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">docker_containers</span>(<span class="hljs-params">self</span>) -&gt; str:</span>
        result = subprocess.run([<span class="hljs-string">'docker'</span>, <span class="hljs-string">'ps'</span>, <span class="hljs-string">'--format'</span>, <span class="hljs-string">'{{.Names}}: {{.Status}}'</span>], capture_output=<span class="hljs-literal">True</span>, text=<span class="hljs-literal">True</span>)
        <span class="hljs-keyword">return</span> result.stdout.strip()

server = HomelabServer()
</code></pre>
<p>You can run this locally with:</p>
<pre><code class="lang-bash">uvicorn homelab_server:server --port 7531
</code></pre>
<p>Now your homelab has a voice.</p>
<hr />
<h2 id="heading-step-2-making-it-reachable">Step 2: Making It Reachable</h2>
<p>You don’t need to open ports to the world. I already run <strong>Tailscale</strong>, so I just connected my laptop and server to the same private network. That gave me a private IP like <code>100.x.x.x</code>, and I used that to point Claude Desktop to my MCP server.</p>
<hr />
<h2 id="heading-step-3-connecting-claude-desktop">Step 3: Connecting Claude Desktop</h2>
<p>Claude Desktop (with plugin support) makes this super easy.</p>
<ol>
<li><p>Open Claude → <code>Plugins</code> → <code>Add MCP Server</code></p>
</li>
<li><p>Add your MCP server URL<br /> Example: <a target="_blank" href="http://100.x.x.x:7531"><code>http://100.x.x.x:7531</code></a></p>
</li>
<li><p>Claude will auto-detect the available tools</p>
</li>
</ol>
<p>Now I can type:</p>
<blockquote>
<p>“Call the <code>disk_usage</code> tool on the homelab server”</p>
</blockquote>
<p>Or even just:</p>
<blockquote>
<p>“How much disk space do I have left?”</p>
</blockquote>
<p>Claude figures out the right tool to call, runs it, and replies with a summary.</p>
<hr />
<h2 id="heading-bonus-chaining-output-with-prompts">Bonus: Chaining Output with Prompts</h2>
<p>FastMCP also supports <em>prompt templates</em>, which means I can wrap raw command output in a summarization prompt and have Claude generate human-friendly summaries, great for things like:</p>
<ul>
<li><p>Failed systemd services</p>
</li>
<li><p>Health reports</p>
</li>
<li><p>ZFS snapshots or Btrfs status</p>
</li>
</ul>
<p>You can even create tools that return JSON and let Claude reason over it.</p>
<hr />
<h2 id="heading-why-this-is-fun-and-actually-useful">Why This Is Fun (and Actually Useful)</h2>
<p>This setup saves me time <strong>and</strong> gives me a more natural way to interact with my homelab. I don’t have to mentally context-switch into "sysadmin mode" every time I want to check logs or disk stats.</p>
<p>It also opens the door to more advanced use cases:</p>
<ul>
<li><p>Triggering Ansible playbooks via tools</p>
</li>
<li><p>Running backups and summarizing results</p>
</li>
<li><p>Fetching metrics from Grafana or Prometheus</p>
</li>
<li><p>Acting as a gateway for multiple machines (via <a target="_blank" href="https://gofastmcp.com/servers/composition.md">server composition</a>)</p>
</li>
</ul>
<hr />
<h2 id="heading-future-plans">Future Plans</h2>
<p>Here’s what I want to add next:</p>
<ul>
<li><p>A <code>/status</code> resource that returns full server health in JSON</p>
</li>
<li><p>A prompt-based tool that summarizes <code>journalctl</code> logs</p>
</li>
<li><p>A Discord webhook client that uses FastMCP to send notifications</p>
</li>
<li><p>Claude-triggered toolchains: ask one question, get multiple tools executed in sequence</p>
</li>
</ul>
<hr />
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>This isn’t just a cool hack. It’s the start of something much more interactive and intuitive. We’re entering a world where large language models can be more than chatbots; they can be <em>interfaces</em> to real systems.</p>
<p>If you’ve got a homelab and a few Python skills, I’d highly recommend trying out FastMCP. You’ll be surprised how far a simple tool can go when you give it a little context.</p>
]]></content:encoded></item><item><title><![CDATA[My Self-Hosting Stack: Everything I Run, and How It All Connects]]></title><description><![CDATA[When I first wrote about the emotional rollercoaster of self-hosting, I focused on the why, the motivations, frustrations, and addictive wins that come with running your own infrastructure. But one of the most common questions from friends I got was:...]]></description><link>https://blog.sanskarjaiswal.dev/my-self-hosting-stack-everything-i-run-and-how-it-all-connects</link><guid isPermaLink="true">https://blog.sanskarjaiswal.dev/my-self-hosting-stack-everything-i-run-and-how-it-all-connects</guid><category><![CDATA[self-hosted]]></category><category><![CDATA[Homelab]]></category><dc:creator><![CDATA[Sanskar Jaiswal]]></dc:creator><pubDate>Sun, 22 Jun 2025 16:50:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ISG-rUel0Uw/upload/274bd70b240b7b463696b5861d195460.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When I first wrote about the emotional rollercoaster of self-hosting, I focused on the why, the motivations, frustrations, and addictive wins that come with running your own infrastructure. But one of the most common questions from friends I got was: <strong><em>“Okay, but what do you actually self-host?”</em></strong></p>
<p>This post is my answer. It’s a complete breakdown of the apps I run, how they connect, what runs in Docker, what runs natively, and how I try to keep everything just stable enough to trust with my digital life.</p>
<h2 id="heading-hardware-overview">Hardware Overview</h2>
<p>Before diving into apps, here's the gear powering it all.</p>
<p><strong>Server:</strong> Dell OptiPlex 3050 Micro<br />i3-7100T, 128GB NVMe boot + 1TB SSD for <code>/data</code></p>
<p><strong>External Storage:</strong> 4TB Seagate Expansion Drive<br />Mounted as <code>/mnt/dumptruck</code>, used for media, downloads, and backups</p>
<p><strong>Network:</strong> Tailscale for remote access, Samba for local file sharing</p>
<p>Everything runs on <strong>Fedora Server</strong>, headless.</p>
<h2 id="heading-core-stack">Core Stack</h2>
<p>These are the foundational apps, things I rely on daily.</p>
<h3 id="heading-nextcloud">Nextcloud</h3>
<p><strong>Purpose:</strong> File sync, calendar, contacts, notes<br /><strong>Setup:</strong> Runs in Docker, <code>/data/nextcloud</code> mounted for storage<br /><strong>Extras:</strong> Integrates with Photo backup from iOS via iCloudpd, accessible over Tailscale</p>
<h3 id="heading-paperless">Paperless</h3>
<p><strong>Purpose:</strong> Document archiving (PDFs, bills, IDs, etc.)<br /><strong>Consume Folder:</strong> Default, with OCR and auto-tagging enabled<br /><strong>Setup:</strong> <code>/data/paperless</code> as persistent volume, runs in Docker<br /><strong>Note:</strong> Planning to add rules and a Discord notification bot</p>
<h3 id="heading-immich">Immich</h3>
<p><strong>Purpose:</strong> Private Google Photos replacement<br /><strong>Data Location:</strong> <code>/data/immich</code><br /><strong>Setup:</strong> Docker Compose; iOS and Android apps used for uploads</p>
<h2 id="heading-backup-redundancy">Backup + Redundancy</h2>
<p>Backups are the difference between peace of mind and total panic. Here's how I manage mine.</p>
<h3 id="heading-cloud-backups">Cloud Backups</h3>
<p>Google Drive and Mega are mounted via containers.<br />Backups are copied regularly to <code>/mnt/dumptruck/Backups/</code> using <code>rsync</code>.<br />Encrypted <code>.tar.gz</code> files are stored offsite.</p>
<h3 id="heading-snapshot-scripts">Snapshot Scripts</h3>
<p>I use ZFS-like manual snapshots of important folders like Nextcloud and Paperless.<br />I plan to automate snapshot creation and verification soon.</p>
<h2 id="heading-smart-home-and-monitoring">Smart Home and Monitoring</h2>
<p>This is a pretty new space for me, but it's where things get fun and occasionally chaotic.</p>
<h3 id="heading-home-assistant">Home Assistant</h3>
<p><strong>Purpose:</strong> Control smart lights, routines, sensors<br /><strong>Setup:</strong> Native install under <code>/data/homeassistant</code><br /><strong>Devices:</strong> Two smart bulbs, ambient lighting via Arduino and Prismatik<br /><strong>Access:</strong> Exposed on local IP</p>
<h3 id="heading-homebridge">Homebridge</h3>
<p><strong>Purpose:</strong> Bridge non-HomeKit devices to the Apple ecosystem<br /><strong>Setup:</strong> Native install (not Docker)<br /><strong>Status:</strong> In active use with iPhone and iPad</p>
<h3 id="heading-tailscale">Tailscale</h3>
<p><strong>Purpose:</strong> Private remote access across phone, laptop, and tablet<br /><strong>Bonus:</strong> Simplifies SSH, Nextcloud access, and photo syncing</p>
<h2 id="heading-media-and-downloads">Media and Downloads</h2>
<h3 id="heading-jellyfin">Jellyfin</h3>
<p><strong>Purpose:</strong> Local movie and TV streaming<br /><strong>Storage:</strong> Reads from <code>/mnt/dumptruck/Media</code><br /><strong>Setup:</strong> Docker</p>
<h3 id="heading-torrent-stack-qbittorrent-prowlarr-bazarr">Torrent Stack (qBittorrent, Prowlarr, Bazarr)</h3>
<p><strong>Purpose:</strong> Automated media management<br /><strong>Tools:</strong></p>
<ul>
<li><p><code>qBittorrent</code> for downloads</p>
</li>
<li><p><code>Prowlarr</code> as indexer manager</p>
</li>
<li><p><code>Bazarr</code> for subtitles<br />  <strong>Storage:</strong> Downloads to <code>/mnt/dumptruck/Downloads</code></p>
</li>
</ul>
<h2 id="heading-monitoring-and-alerts">Monitoring and Alerts</h2>
<h3 id="heading-discord-bot">Discord Bot</h3>
<p><strong>Purpose:</strong> Notifies on backup success, failure, and other alerts<br /><strong>Setup:</strong> Shell scripts call a webhook with status info</p>
<h3 id="heading-grafana-experimental">Grafana (experimental)</h3>
<p><strong>Purpose:</strong> Dashboard for uptime, disk usage, etc.<br /><strong>Status:</strong> Used occasionally, but not core to daily operations</p>
<h2 id="heading-how-it-all-connects">How It All Connects</h2>
<p>Everything is centered around <code>/data</code> for persistent volumes.<br />Media and bulk data are offloaded to the 4TB HDD.<br />Tailscale acts as the glue between all mobile and remote devices.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750610927713/e4da9e33-6eec-4eeb-8304-44e394c8969d.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-lessons-from-this-stack">Lessons from This Stack</h2>
<ul>
<li><p>Keep volumes organized under <code>/data</code> because it makes backups easier.</p>
</li>
<li><p>Use Tailscale early since it saves days of port-forwarding frustration.</p>
</li>
<li><p>Don’t over-optimize upfront. Get it working first, then make it pretty.</p>
</li>
<li><p>Backups are real work. Automate as much as possible and test recovery.</p>
</li>
</ul>
<h2 id="heading-whats-next">What's Next</h2>
<ul>
<li><p>Setting up snapshot validation</p>
</li>
<li><p>Integrating Paperless with tagging and alert rules</p>
</li>
<li><p>Maybe running a local LLM for notes or search indexing</p>
</li>
</ul>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>This stack isn’t perfect. It’s full of small decisions, trial-and-error, and weekend experiments. But it works. And most importantly, it gives me back control, ownership, and a sense of craft.</p>
]]></content:encoded></item><item><title><![CDATA[The Real Story of Self-Hosting: Why I Love It (and Sometimes Hate It)]]></title><description><![CDATA[Thinking About Self-Hosting? Here’s What You Should Know
The first time I considered self-hosting, it was out of pure frustration. I was tired of Google Photos suddenly charging for storage, cloud services constantly shifting features behind paywalls...]]></description><link>https://blog.sanskarjaiswal.dev/the-real-story-of-self-hosting-why-i-love-it-and-sometimes-hate-it</link><guid isPermaLink="true">https://blog.sanskarjaiswal.dev/the-real-story-of-self-hosting-why-i-love-it-and-sometimes-hate-it</guid><category><![CDATA[SelfHosting]]></category><dc:creator><![CDATA[Sanskar Jaiswal]]></dc:creator><pubDate>Sun, 25 May 2025 18:41:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748201036173/6cfdf937-f964-40db-9c9b-de834cf26999.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-thinking-about-self-hosting-heres-what-you-should-know">Thinking About Self-Hosting? Here’s What You Should Know</h2>
<p>The first time I considered self-hosting, it was out of pure frustration. I was tired of Google Photos suddenly charging for storage, cloud services constantly shifting features behind paywalls, and the creeping sense that my data was less and less in my control. Like many others, I started researching ways to run my own cloud and apps on hardware I owned. The online guides all made it look so simple just spin up a server, install a few Docker containers, and you’re free from the cloud forever. It sounded perfect. And honestly, getting started <em>was</em> fun and satisfying. But very quickly, I realized there’s a lot that the glossy tutorials don’t mention.</p>
<h2 id="heading-why-people-self-host-and-why-i-stuck-with-it">Why People Self-Host (and Why I Stuck With It)</h2>
<p>At first, the appeal was all about control. I liked the idea of being able to access my files from anywhere and not having to worry about companies reading or selling my data. Once I had my own Nextcloud instance running, I started seeing just how much I could do: cloud storage, calendar sync, even music streaming. It felt empowering.</p>
<p>There’s also a sense of pride that comes with it. When I tell friends or family that our photos and documents are backed up on a server I built and maintain, there’s always a bit of surprise and curiosity. And the technical side is a genuine draw. Nothing accelerates your learning about Linux, networking, or automation quite like trying to set everything up yourself, especially when things don’t work the first (or fifth) time.</p>
<h2 id="heading-what-you-can-actually-self-host">What You Can Actually Self-Host</h2>
<p>The possibilities are almost endless, and that’s part of the fun. Over the last year, I’ve tried running all sorts of apps: Nextcloud for files, Jellyfin to stream old movies from my hard drive, Home Assistant for controlling smart lights and automations, Grafana for real-time dashboards, and even a Discord bot that pings me if my backups fail. Some things stuck, some didn’t, but every experiment taught me something new.</p>
<p>One of my favorite moments was setting up automatic photo backups from my phone to my own server. That feeling of independence no subscriptions, no limits, no worries about who’s looking at my photos was incredible. The best part? Knowing it was all running in the corner of my own room, on hardware I’d put together myself.</p>
<h2 id="heading-the-wow-moments-and-why-theyre-addictive">The “Wow” Moments (and Why They’re Addictive)</h2>
<p>There are moments in self-hosting that make you feel like a tech genius. I remember the first time I accessed my movie library from a friend’s house, just streaming a film directly from my home server like it was Netflix. Or the time my family needed a document urgently, and I pulled it up for them from my Paperless instance while we were on vacation. Small things, but they add up to a real sense of control and satisfaction.</p>
<p>Even just building dashboards in Grafana, or seeing a notification pop up in Discord that a backup succeeded, gives a little hit of pride. Those moments make all the troubleshooting worth it. They’re also addictive once you solve one problem, you want to solve the next.</p>
<h2 id="heading-the-hidden-challenges-what-nobody-tells-you">The Hidden Challenges: What Nobody Tells You</h2>
<p>But let’s talk about the flip side. For every “wow” moment, there’s usually a “why is this broken?” night.</p>
<h3 id="heading-hardware-surprises"><strong>Hardware Surprises</strong></h3>
<p>Hardware is the first thing to catch you out. I’ve had drives die without warning, forcing me to scramble for a backup plan. I once spent a whole Saturday trying to move my setup from a 1TB SSD to a new 4TB external drive. What should have been a simple process turned into hours of wrestling with Docker volumes, mysterious file permissions, and more than a little self-doubt.</p>
<h3 id="heading-network-headaches"><strong>Network Headaches</strong></h3>
<p>And then there’s the network. Port forwarding, which I’d barely heard of before, suddenly became my new nemesis. My ISP kept changing my public IP address, breaking remote access just as I thought I’d figured it out. At one point, after hours of tinkering, I realized my router itself was blocking half the ports I needed. It wasn’t until I discovered tools like Tailscale (which creates a private VPN between your devices) that things started to make sense. But even then, every solution led to new things to learn.</p>
<h3 id="heading-the-joy-and-pain-of-software-updates"><strong>The Joy and Pain of Software Updates</strong></h3>
<p>Software updates are another mixed blessing. I’ve had a single <code>dnf upgrade</code> break my Nextcloud installation, leaving me with a sinking feeling as I googled error messages at 2 AM. Containers help, but not all apps play nicely together, and restoring from backup is the only time you discover if you’ve really been doing it right.</p>
<h3 id="heading-maintenance-realities"><strong>Maintenance Realities</strong></h3>
<p>And maintenance? It’s a time commitment, no matter what anyone says. Things break, usually when you least expect it. Family members notice downtime, and suddenly you’re on the hook for fixing the “home cloud” before dinner. I’ve learned the hard way to keep notes on what I’ve set up, because otherwise I forget my own steps a month later.</p>
<h2 id="heading-lessons-learned-and-practical-tips">Lessons Learned and Practical Tips</h2>
<p>If you’re thinking about self-hosting, my advice is to start simple. Pick one or two services you really want and get those running well before you branch out. Automate your backups and actually test restoring from them. Use containers when you can, because it makes upgrades and recovery so much easier. Monitoring is worth the setup, even if it’s just a basic “is this still running?” dashboard or a bot sending alerts.</p>
<p>And above all, expect to make mistakes. That’s part of the process. Every little disaster is just a lesson for next time, and you do get better at anticipating what can go wrong.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>Self-hosting is both empowering and humbling. When everything is running smoothly, there’s nothing quite like it. But the problems are real, and you’ll spend as much time fixing things as you do building them. Still, each time you get something working, it’s a genuine achievement. If you’re already on this journey, what’s been your biggest success or headache? If you’re just starting, what are you most worried about? I’m happy to help and would love to hear what you want to see next.</p>
]]></content:encoded></item></channel></rss>