﻿<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Dsecurity]]></title><description><![CDATA[Dsecurity]]></description><link>https://dsecurity.hashnode.dev</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1593680282896/kNC7E8IR4.png</url><title>Dsecurity</title><link>https://dsecurity.hashnode.dev</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 18 Jun 2026 15:38:48 GMT</lastBuildDate><atom:link href="https://dsecurity.hashnode.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[How I Stressed My SQLite Job Queue to 5,000 Continuous Tasks on an Android Phone (And Why It Outperformed the Cloud)]]></title><description><![CDATA[Every developer building a side project or a home automation pipeline eventually hits the same roadblock. You have a script running in the cloud (maybe a web scraper, a webhook handler, or an AI agent]]></description><link>https://dsecurity.hashnode.dev/how-i-stressed-my-sqlite-job-queue-to-5-000-continuous-tasks-on-an-android-phone-and-why-it-outperformed-the-cloud</link><guid isPermaLink="true">https://dsecurity.hashnode.dev/how-i-stressed-my-sqlite-job-queue-to-5-000-continuous-tasks-on-an-android-phone-and-why-it-outperformed-the-cloud</guid><category><![CDATA[Python]]></category><category><![CDATA[SQLite]]></category><category><![CDATA[backend]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[automation]]></category><category><![CDATA[Testing]]></category><dc:creator><![CDATA[DSECURITY]]></dc:creator><pubDate>Tue, 26 May 2026 15:59:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6a0293d8fca21b0d4b8c98e0/28e127b7-0d2e-47fb-90d3-531cbb34d817.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every developer building a side project or a home automation pipeline eventually hits the same roadblock. You have a script running in the cloud (maybe a web scraper, a webhook handler, or an AI agent), and you need it to trigger a physical action on a local device—like an old Android phone running Termux, or a Raspberry Pi behind a strict home firewall.</p>
<p>The standard industry advice is immediate: <em>"Just spin up Celery and back it with RabbitMQ or Redis."</em> But for independent developers, indie hackers, and hobbyists, that answer feels terrible. Deploying and maintaining a heavy, memory-hungry message broker infrastructure for low-to-medium workloads is massive operational overkill. It’s expensive,introduces vendor lock-in if you go the managed cloud route, and forces you to manage background daemons that eat up system resources. </p>
<p>The alternative? Opening inbound network ports or setting up reverse SSH tunnels, which is a security nightmare.</p>
<p>That is why I built <strong>Intent Bus</strong>—a zero-infrastructure job coordination system powered entirely by a minimal Flask core and a local SQLite database. Instead of maintaining persistent stateful connections or heavy external brokers, your cloud script simply writes tasks to an HTTP endpoint, and your cross-device edge workers safely poll for jobs using basic outbound HTTP requests. It gives you atomic locking, priority scheduling, exponential backoffs, and dead-letter queues out of the box with zero operational friction.</p>
<p>But when you tell developers you built a concurrent message queue on top of SQLite, the skepticism is instant. <em>"SQLite has a single-writer bottleneck." "It will thrash the disk." "It can't scale under real worker contention."</em></p>
<p>We just launched live on Product Hunt, and my co-maintainer Zan and I decided to stop talking about theory. We decided to break it. Here is how we stress-tested Intent Bus across the cloud, the limits we hit, and the completely unexpected way an old smartphone stole the show.</p>
<hr />
<h2>The Testing Strategy: Finding the Ceiling</h2>
<p>To find exactly where our architectural trade-offs would give out, we established a rigorous profiling matrix across three distinct deployment environments:</p>
<ol>
<li><strong>PythonAnywhere (Free Tier):</strong> A standard python hosting environment.</li>
<li><strong>Render (Free Tier Docker Container):</strong> A typical lightweight cloud container setup.</li>
<li><strong>Android 12 Device (Termux via local ARM CPU &amp; Flash Storage):</strong> True edge hosting.</li>
</ol>
<p>We subjected the broker to progressively brutal worker/job ratios to watch the latency curve and catch database locks or lease drops in real-time.</p>
<table>
<thead>
<tr>
<th>Profile</th>
<th>Concurrency Workload</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Low</strong></td>
<td>5 active workers polling and fulfilling 50 total jobs.</td>
</tr>
<tr>
<td><strong>Medium</strong></td>
<td>15 active workers polling and fulfilling 500 total jobs.</td>
</tr>
<tr>
<td><strong>Heavy</strong></td>
<td>40 active workers polling and fulfilling 2,000 total jobs.</td>
</tr>
<tr>
<td><strong>Extreme</strong></td>
<td>100+ active workers polling and fulfilling 5,000 continuous jobs.</td>
</tr>
</tbody></table>
<hr />
<h2>Phase 1: The Cloud Baseline (PythonAnywhere vs. Render)</h2>
<p>We kicked off our benchmarks on a <strong>PythonAnywhere Free Tier</strong> instance. The results were an immediate failure under any real load. Because the free tier utilizes single-threaded request handling, concurrent workers polling the queue quickly created an unrecoverable request backlog. At medium load, it hit 100% network errors. Single-threaded processes are completely non-viable for worker polling loops.</p>
<p>Next, we shifted to a multi-threaded <strong>Docker deployment on Render's free tier</strong>. This is where the SQLite architecture started to shine. </p>
<ul>
<li>On <strong>Medium load</strong>, Render sailed through with a <strong>99.5% success rate</strong>, averaging <strong>11.30 jobs/sec</strong> with a tight P99 worker latency of just 0.517 seconds. </li>
<li>When we pushed it to <strong>Heavy load</strong> (40 workers, 2,000 jobs), the system held steady at a <strong>99.34% success rate</strong> and maintained a <strong>14.18 jobs/sec</strong> throughput.</li>
</ul>
<p>Because we engineered Intent Bus around a strictly optimized SQLite Write-Ahead Logging (WAL) configuration, the database handled concurrent reads comfortably. Under heavy writer contention, our locking mechanism gracefully degraded—tail latencies extended out to a P99 of 2.891 seconds, but the queue never crashed, never locked up, and leaked exactly zero jobs.</p>
<h2><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0ky9zjx3dx9eb6zq2ave.jpg" alt="Render stress test" /></h2>
<h2>Phase 2: The Android Experiment (Pushing to the Extreme)</h2>
<p>With a stable cloud baseline of ~14 jobs/sec, we decided to run an unconventional experiment: What happens if we host the central message broker right on the edge? We spun up the Intent Bus server inside <strong>Termux on a standard Android 12 phone</strong>, using its internal ARM processor and mobile flash storage as the entire infrastructure backbone.</p>
<p>The results completely blew past our expectations.</p>
<p>When we unleashed the <strong>Heavy load test</strong> (40 workers, 2,000 jobs) against the smartphone server, the mobile hardware didn't just keep up—it completely crushed the cloud container, clocking an astonishing <strong>28.04 jobs per second</strong> with a P99 latency of 2.556 seconds. Because the physical phone flash storage didn't have to contend with the shared cloud hypervisor overhead found on free-tier containers, SQLite local disk I/O operations executed at blistering speeds.</p>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9ljqto4gpsy0p3uji3en.jpg" alt="Android stress test" /></p>
<h3>Pushing to the Absolute Brink: 5,000 Continuous Jobs</h3>
<p>Emboldened by the phone's performance, we threw our absolute worst-case scenario at it: <strong>The Extreme Profile</strong>. We unleashed over 100 concurrent worker loops, firing a non-stop barrage of <strong>5,000 jobs over a sustained 4.5-minute beating (267 seconds)</strong>.</p>
<p>This is what real system engineering looks like when it hits a hardware wall:</p>
<ol>
<li><strong>Graceful Degradation:</strong> As the mobile flash storage and ARM chip finally throttled under the sustained heat and massive file-write contention, our average throughput dropped from its peak down to a sustained <strong>18.52 jobs/sec</strong>. Tail latencies spiked heavily, pushing the P99 worker response out to 9.002 seconds.</li>
<li><strong>Absolute Structural Integrity:</strong> Despite the hardware bottleneck, look at the error counters. <strong>Zero network drops. Zero lease losses. Zero publish rejects.</strong> Even when the device was gasping for air, the protocol's locking logic and database state transitions remained flawless. </li>
<li><strong>The Final Score:</strong> The system completed the marathon with a <strong>98.89% success rate</strong> over 5,000 continuous tasks, proving it is virtually indestructible under load.</li>
</ol>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/heayk7xm01ft1gnz1r43.jpg" alt="Image description" /></p>
<hr />
<h2>Architecture Takeaways: Why This Matters for Indie Devs</h2>
<p>We didn't build Intent Bus to replace Kafka, RabbitMQ, or high-throughput enterprise systems. If you are building a multi-region corporate platform processing tens of thousands of requests per second, you need a traditional distributed broker architecture. </p>
<p>But if you are an indie hacker, a DevOps engineer tying personal scripts together, or an automation enthusiast coordinating a heterogeneous fleet of edge machines, these benchmarks change things. </p>
<p>They prove that <strong>a lightweight, properly optimized SQLite WAL backend is fundamentally robust enough</strong> to handle thousands of concurrent jobs with zero data corruption. It means you can completely bypass the cost and complexity of cloud server infrastructure. You can literally pull an old phone out of a drawer, install Termux, and host an incredibly reliable, self-contained, bulletproof automation queue entirely for free.</p>
<hr />
<h2>We are Live on Product Hunt!</h2>
<p>Intent Bus is 100% open-source and written in Python. While we provide a dedicated Python SDK to handle signature generation and worker loops automatically, the underlying protocol is lightweight enough to interact with using nothing but pure <code>bash</code> and <code>curl</code>.</p>
<p>Our code is fully verified, our CI/CD pipelines are green, and our Product Hunt launch is officially live right now! If you want to check out the codebase, audit our security protocol updates, or leave us your honest feedback, check out the links below:</p>
<p>👉 <strong>Leave a Review on Product Hunt:</strong> <a href="https://www.producthunt.com/products/intent-bus/reviews/new">Review</a></p>
<p>👉 <strong>GitHub Main Server:</strong><a href="https://github.com/dsecurity49/Intent-Bus">Intent-Bus</a></p>
<p>👉 <strong>GitHub SDK Repo:</strong><a href="https://github.com/dsecurity49/Intent-Bus-sdk">Python-SDK</a></p>
<p><em>I’ll be hanging out in the Product Hunt comments section all day to answer your questions—I'd love to hear your thoughts on our SQLite implementation, our capability routing, or our edge-computing performance results!</em></p>
]]></content:encoded></item><item><title><![CDATA[Building a zero-infra job queue with SQLite (and stress-testing it to 14 jobs/sec)]]></title><description><![CDATA[I needed to trigger automation scripts on an old Android phone from a cloud VPS, without opening ports or spinning up Redis. So I built a job queue out of Flask and SQLite.
Here is a deep dive into wh]]></description><link>https://dsecurity.hashnode.dev/building-a-zero-infra-job-queue-with-sqlite-and-stress-testing-it-to-14-jobs-sec</link><guid isPermaLink="true">https://dsecurity.hashnode.dev/building-a-zero-infra-job-queue-with-sqlite-and-stress-testing-it-to-14-jobs-sec</guid><category><![CDATA[Python]]></category><category><![CDATA[SQLite]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[DSECURITY]]></dc:creator><pubDate>Mon, 25 May 2026 10:17:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6a0293d8fca21b0d4b8c98e0/e2415b9b-f0c1-4f4e-8bb5-1c2c43a38769.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I needed to trigger automation scripts on an old Android phone from a cloud VPS, without opening ports or spinning up Redis. So I built a job queue out of Flask and SQLite.</p>
<p>Here is a deep dive into what it is, how SQLite handles the concurrency, and what happened when I tried to stress-test it in production.</p>
<h2>What it is (and what it isn't)</h2>
<p>Intent Bus is designed to be the missing link between simple <code>cron</code> jobs and heavy enterprise message brokers.</p>
<p><strong>What it aims for:</strong></p>
<ul>
<li>Giving indie hackers and home-lab enthusiasts actual execution guarantees (retries, dead-letter queues) without needing to deploy or maintain broker infrastructure.</li>
<li>Running anywhere. You can deploy the server on a free cloud tier, and run workers on a Raspberry Pi, a Termux phone, or a cheap VPS.</li>
</ul>
<p><strong>What it is not:</strong></p>
<ul>
<li>It is not a Kafka replacement. It is not meant for microservices processing 10,000 transactions per second. If you have enterprise scale, use enterprise tools.</li>
</ul>
<h2>The Core Features</h2>
<p>Cron is strictly fire-and-forget, with no coordination and silent failures. To fix that, Intent Bus provides:</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Description</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Atomic Locking</strong></td>
<td>Prevents double-claiming. A cryptographic lease token is issued to the claiming worker.</td>
</tr>
<tr>
<td><strong>Reliable Delivery</strong></td>
<td>If a worker crashes or drops offline, the lease expires and the job requeues with exponential backoff.</td>
</tr>
<tr>
<td><strong>Dead-Letter Queue</strong></td>
<td>Failed jobs (e.g., after 3 attempts) are archived for inspection rather than deleted.</td>
</tr>
<tr>
<td><strong>Capability Routing</strong></td>
<td>Workers advertise what they can do; jobs require what they need (e.g., require a worker with <code>ffmpeg</code>).</td>
</tr>
<tr>
<td><strong>Priority Queues</strong></td>
<td>High-priority intents are always claimed and processed first.</td>
</tr>
</tbody></table>
<h2>The SQLite Atomic Lock</h2>
<p>The biggest challenge with using SQLite for a queue is concurrent writes. If 40 workers poll the server at the exact same millisecond, they can't all claim the same job.</p>
<p>Intent Bus solves this by strictly enforcing SQLite's <code>WAL</code> (Write-Ahead Logging) mode and relying on the <code>UPDATE ... RETURNING</code> clause introduced in SQLite 3.35.0. </p>
<p>When a worker asks for a job, the server executes a single atomic transaction:</p>
<pre><code class="language-sql">UPDATE intents
SET
    status = 'claimed',
    claimed_at = :now,
    claim_expires_at = :lease_exp,
    claim_token = :token,
    claim_attempts = claim_attempts + 1
WHERE id = (
    SELECT id FROM intents 
    WHERE status = 'open' AND run_at &lt;= :now
    ORDER BY priority DESC, created_at ASC
    LIMIT 1
)
RETURNING id, payload, claim_token;
</code></pre>
<p>Because SQLite serializes writes, this guarantees that only one worker will ever receive the ephemeral <code>claim_token</code> required to fulfill or fail that specific job.</p>
<h2>The Stress Test: A Tale of Two Deployments</h2>
<p>I wanted to see exactly where SQLite would choke under concurrent worker load, so I wrote an enterprise-style stress harness.</p>
<p><strong>Attempt 1: The PythonAnywhere Disaster</strong>
I originally deployed the architecture to PythonAnywhere's free tier. The results were catastrophic. </p>
<p>When I launched the high-intensity concurrent test, it didn't even manage to start. I stepped it down to the medium-intensity test (15 concurrent workers fighting for 500 jobs), and it threw <strong>100% network errors</strong>. </p>
<p>Why? Because PythonAnywhere's free tier routes requests through a single-threaded Gunicorn worker. The concurrent polling requests bottlenecked at the WSGI layer, queued up, and timed out before SQLite even knew what hit it. It wasn't a database lock issue; it was a server concurrency issue.</p>
<p><strong>Attempt 2: Docker on Render</strong>
I moved the exact same SQLite file and Flask app to a basic Docker container on Render, configured with multiple application threads (<code>--threads 4</code>). </p>
<p>Render's free tier aggressively puts your container to sleep when idle, meaning the very first request takes about 30 seconds to wake the server up (the dreaded cold start). But once it's awake? It handles the load beautifully.</p>
<p>Here is the benchmark data from the Render Docker deployment:</p>
<table>
<thead>
<tr>
<th>Configuration</th>
<th>Workers</th>
<th>Jobs</th>
<th>Success</th>
<th>P99 Latency</th>
<th>Throughput</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Light</strong></td>
<td>5</td>
<td>50</td>
<td>100%</td>
<td>0.594s</td>
<td>3.72 j/s</td>
</tr>
<tr>
<td><strong>Medium</strong></td>
<td>15</td>
<td>500</td>
<td>98.75%</td>
<td>0.517s</td>
<td>13.27 j/s</td>
</tr>
<tr>
<td><strong>Heavy</strong></td>
<td>40</td>
<td>2000</td>
<td>99.01%</td>
<td>2.586s</td>
<td>13.62 j/s</td>
</tr>
</tbody></table>
<p>Even with 40 concurrent workers furiously hammering the single SQLite file for write locks, there were <strong>0 network errors, 0 lost leases, and 0 rate limit crashes</strong>. It comfortably sustained over 13 jobs per second. </p>
<h2>Limits &amp; The Theoretical Future</h2>
<p>Right now, the system is tuned for safety. Payloads are strictly limited to 8KB to prevent database bloat, and <code>busy_timeout</code> PRAGMAs are set to allow workers to wait gracefully for locks. </p>
<p>Could it go further? In theory, yes. If you moved off a free cloud container onto a dedicated VPS with NVMe storage, and tuned SQLite for pure speed (e.g., <code>PRAGMA synchronous=OFF</code>), you could probably push 50-100 jobs a second. Furthermore, the architecture is simple enough that you could easily swap the <code>get_db()</code> call to point to PostgreSQL if you needed horizontal scaling. </p>
<p>But honestly, if you hit that scale, you <em>should</em> just bite the bullet and deploy Redis. The entire philosophy of Intent Bus is avoiding that infrastructure jump until it is absolutely necessary. </p>
<p>The server runs on Docker, Raspberry Pi, or any free cloud tier. Workers run anywhere that speaks HTTP — including Termux.</p>
<ul>
<li><strong>Core Server:</strong> <a href="https://github.com/dsecurity49/Intent-Bus">GitHub - Intent Bus</a></li>
<li><strong>Python SDK:</strong> <a href="https://github.com/dsecurity49/Intent-Bus-sdk">GitHub - Intent Bus SDK</a></li>
</ul>
<p>I would love to hear your feedback. Has anyone else experimented with using SQLite for job queueing? What edge cases did you run into?</p>
]]></content:encoded></item><item><title><![CDATA[I ditched Redis and built a job queue on SQLite. Here's what actually broke.]]></title><description><![CDATA[I wanted to trigger a notification on my Android phone whenever a script finished running on a cloud server.
The obvious answer is Firebase, or Redis with a worker, or RabbitMQ. I looked at all of the]]></description><link>https://dsecurity.hashnode.dev/i-ditched-redis-and-built-a-job-queue-on-sqlite-here-s-what-actually-broke</link><guid isPermaLink="true">https://dsecurity.hashnode.dev/i-ditched-redis-and-built-a-job-queue-on-sqlite-here-s-what-actually-broke</guid><category><![CDATA[SQLite]]></category><category><![CDATA[backend]]></category><category><![CDATA[Python]]></category><category><![CDATA[self-hosted]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[DSECURITY]]></dc:creator><pubDate>Tue, 12 May 2026 03:22:35 GMT</pubDate><content:encoded><![CDATA[<p>I wanted to trigger a notification on my Android phone whenever a script finished running on a cloud server.</p>
<p>The obvious answer is Firebase, or Redis with a worker, or RabbitMQ. I looked at all of them. Every option meant spinning up something I would need to maintain forever just to move a handful of jobs between two devices.</p>
<p>So I built my own. A tiny HTTP job queue backed by Flask and SQLite. I called it Intent Bus.</p>
<p>This is not a story about why SQLite is underrated. You've read that article. This is a story about what happened when I actually tried to break it under concurrent load — and the bugs I found in my own test before I even got to the database.</p>
<hr />
<h2>The Core Problem: Atomic Claiming</h2>
<p>The hardest part of any job queue is making sure exactly one worker claims each job. With Redis you get <code>SETNX</code>. With RabbitMQ you get acknowledgements baked in. With SQLite you get... a file.</p>
<p>My first instinct was a <code>SELECT</code> followed by an <code>UPDATE</code>. That's wrong. Between those two statements, ten workers running concurrently can all select the same job and all think they claimed it.</p>
<p>The fix is a single atomic statement using <code>BEGIN IMMEDIATE</code> and SQLite's <code>RETURNING</code> clause:</p>
<pre><code class="language-sql">BEGIN IMMEDIATE;

WITH candidate AS (
    SELECT id FROM intents
    WHERE status = 'open'
      AND run_at &lt;= :now
      AND expires_at &gt; :now
      AND claim_attempts &lt; max_attempts
      AND namespace = :namespace
      AND (publisher = :key OR visibility = 'public')
      AND (target_worker IS NULL OR target_worker = :worker_id)
    ORDER BY priority DESC, run_at ASC
    LIMIT 1
)
UPDATE intents
SET status = 'claimed',
    claimed_at = :now,
    claim_expires_at = :now + :timeout,
    claimed_by = :claimer,
    claim_attempts = claim_attempts + 1
WHERE id = (SELECT id FROM candidate)
  AND (status = 'open' OR (
    status = 'claimed' AND claim_expires_at &lt; :now
  ))
RETURNING id, goal, payload, claim_attempts;
</code></pre>
<p><code>BEGIN IMMEDIATE</code> acquires a write lock before any reads happen. The CTE selects the candidate, the UPDATE applies the condition a second time as a guard, and <code>RETURNING</code> hands back the result in the same statement. Either you get a row back or you don't. No race condition possible.</p>
<p><code>RETURNING</code> requires SQLite 3.35+. The server refuses to start on older versions.</p>
<hr />
<h2>What the Queue Actually Does</h2>
<p>Before the stress test, a quick summary of what I built:</p>
<ul>
<li>Jobs have a <code>goal</code> (string), <code>payload</code> (JSON), <code>namespace</code>, and <code>visibility</code> (private or public)</li>
<li>Workers poll <code>/claim</code>, execute, and call <code>/fulfill</code> or <code>/fail</code></li>
<li>Failed jobs are requeued with exponential backoff: <code>now + (backoff_base × 2^attempts) + jitter</code></li>
<li>After <code>max_attempts</code> failures, jobs move to a dead-letter queue</li>
<li>Workers can advertise capabilities (<code>X-Worker-Capabilities</code>) and jobs can require them</li>
<li>Optional HMAC-SHA256 signing for replay protection</li>
</ul>
<p>The whole server is a single <code>flask_app.py</code>. No external dependencies beyond Flask and Werkzeug.</p>
<hr />
<h2>The Stress Test</h2>
<p>I wanted to know how SQLite WAL mode handles concurrent writers in practice, not in theory.</p>
<p>I wrote a test that runs three phases.</p>
<h3>Phase 1A: Idempotency</h3>
<p>Send the same job twice with the same <code>Idempotency-Key</code> header. The server should create exactly one job.</p>
<p>The server computes a SHA-256 hash of the request body and stores it alongside the idempotency key. On the second request, the hashes match and the cached response is returned. The second write never touches the intents table.</p>
<p>Result: one job created. ✓</p>
<h3>Phase 1B: Max Attempts</h3>
<p>Create a job with <code>max_attempts: 2</code>. Claim it and fail it twice. The third claim attempt should return <code>204 No Content</code> because the job is now dead.</p>
<p>This one exposed a bug in my test, not the server.</p>
<p>After the first <code>/fail</code>, the server sets <code>run_at = now + (backoff_base × 2^1) + jitter = now + ~10 seconds</code>. The job is open but not yet claimable. My test immediately tried to claim again, got <code>204</code>, and reported ✓ PASS — but that was a false positive. The job wasn't dead. It was just in backoff.</p>
<p>I had to add a <code>time.sleep(15)</code> between attempts to let the backoff window clear. After that fix, the test genuinely passed.</p>
<p>This was useful to catch. In production, a worker that hammers <code>/claim</code> immediately after <code>/fail</code> will always get <code>204</code> during the backoff window — which is correct behaviour, but looks like an empty queue if you're not expecting it.</p>
<h3>Phase 2: Concurrent Publishers and Consumers</h3>
<p>50 concurrent publishers. 50 concurrent consumers (plus 5 extra to ensure the queue drained completely).</p>
<p>Both pools used <code>ThreadPoolExecutor</code> with <code>max_workers=2</code> against the live PythonAnywhere server.</p>
<pre><code>[RESULTS] Time Elapsed: 82.48 seconds
[RESULTS] Published: 50/50
[RESULTS] Claimed &amp; Fulfilled: 50/50
[RESULTS] Errors/Collisions: 0
</code></pre>
<p>Zero dropped jobs. Zero duplicate claims.</p>
<p>The 82 seconds is almost entirely network latency — PythonAnywhere is in Chicago, I'm in Bangalore, and each signed HTTP request takes around 0.8 seconds round trip. At <code>max_workers=2</code>, requests mostly run sequentially. The elapsed time is not a concurrency bottleneck, it's <code>100 requests × 0.8s</code>.</p>
<p>I ran the same test with <code>max_workers=20</code>. The time dropped to under 20 seconds. Errors stayed at zero. <code>BEGIN IMMEDIATE</code> serializes writes correctly — competing claims block and wait rather than corrupting each other.</p>
<hr />
<h2>What Actually Surprised Me</h2>
<p><strong>SQLite WAL mode handles read-write concurrency better than I expected.</strong></p>
<p>WAL (Write-Ahead Log) mode lets readers and writers operate concurrently without blocking each other. Readers never block writers. Writers only block other writers. For a job queue where claims are the critical write path, this is ideal.</p>
<p><strong>The <code>busy_timeout</code> pragma is important.</strong></p>
<p>Without it, SQLite throws <code>SQLITE_BUSY</code> immediately when a second writer tries to acquire a lock. I set <code>PRAGMA busy_timeout = 30000</code> — SQLite will retry for up to 30 seconds before giving up. Combined with WAL mode, this handles brief contention cleanly and degrades gracefully to a <code>503</code> on real overload.</p>
<p><strong>Exponential backoff is genuinely necessary.</strong></p>
<p>Without jitter, failed jobs from multiple workers all requeue at the same future timestamp. They all become claimable at the same moment, causing a thundering herd — every worker races to claim simultaneously. The <code>+ random.uniform(0, 2)</code> in the backoff formula spreads retries across a 2-second window and flattens the spike.</p>
<p><strong>I found the bug in my own test before I found one in the server.</strong></p>
<p>The backoff false-positive in Phase 1B was caught because I actually read the stress test output carefully. If I had just seen <code>✓ PASS</code> and moved on, I would have shipped a test that reported correct behaviour when the backoff was working as intended — and I would have had no way to distinguish that from "job is dead."</p>
<p>Lesson: when writing a test for timing-dependent behaviour, always verify the reason for the result, not just the result.</p>
<hr />
<h2>What It's Not</h2>
<p>SQLite is single-writer. That's a constraint you accept when you choose it.</p>
<p>This is not a replacement for Kafka, RabbitMQ, or Celery at scale. For hundreds of jobs per day with dozens of workers, SQLite is genuinely good enough and operationally simpler than anything else. For thousands of concurrent writers or horizontal scaling across multiple nodes, you need PostgreSQL at minimum.</p>
<p>The upgrade path is contained: <code>get_db()</code> is the only function that talks to SQLite directly. Swapping the backend is one function.</p>
<hr />
<h2>Where It Is Now</h2>
<p>The server is live at <code>dsecurity.pythonanywhere.com</code>. There's a Python SDK on PyPI:</p>
<pre><code class="language-bash">pip install intent-bus
</code></pre>
<pre><code class="language-python">from intent_bus import IntentClient

client = IntentClient(api_key="your_key")

# Publish from a cloud server
client.publish("notify", {"message": "deploy finished"})

# Worker on your phone (Termux)
client.listen("notify", handler=lambda p: print(p["message"]))
</code></pre>
<p>GitHub: <a href="https://github.com/dsecurity49/Intent-Bus">github.com/dsecurity49/Intent-Bus</a></p>
<p>The next real test is getting external developers running workers in their own environments. If you're running small automations and want to try it, tester keys are available via GitHub Discussions.</p>
<p>The part I'm still most curious about: what breaks at <code>max_workers=50</code> on a free-tier PythonAnywhere instance. I haven't pushed it there yet.</p>
]]></content:encoded></item></channel></rss>