Goal: make durability demos less “trust me”. Click “Run double proof” → it produces N, docker kill -s KILL, restarts, consumes/commits, then proves “no loss + no dupes” while exposing the entire trail (proof JSON + docker inspect/events/log tail).
To clarify: I’m looking for 2–3 teams who already run Kafka/Redpanda/NATS
and are open to testing an alternative focused on simpler ops and lower tail latency.
Not selling anything, just early feedback and real workloads.
Thanks, this is genuinely helpful framing.
We made the mistake of thinking in “roles” (GTM cofounder) instead of pulling people into the problem and watching for ownership.
We’re now doing short problem interviews with early users / people who engaged deeply with our Show HN, and tracking who (1) reframes the problem, (2) proposes concrete next steps, and (3) follows up unprompted. Those are strong signals.
Also +1 on the “technical storyteller” point our ideal partner might be technical but customer-obsessed rather than a traditional sales profile.
One question: when you’ve seen this work best, what’s a good lightweight way to test alignment/commitment before talking equity (e.g., a 2-4 week project sprint, shared doc, pre-defined milestones)?
Classic HTTP Range is byte-oriented, but custom range units (e.g. `Range: offsets=…`) or using `Link` headers for pagination both fit log semantics well.
I kept the initial API explicit (`offset` / `limit`) to stay obvious for curl users, but offset-range via headers is something I want to experiment with, especially if it helps generic tooling.
I’d love a world where “consume an event log” is a standard protocol and client-side tooling doesn’t care which broker is behind it.
Feed API is very close to the mental model I’d want: stable offsets, paging, resumability, and explicit semantics over HTTP. Ayder’s current wedge is keeping the surface area minimal and obvious (curl-first), but long-term I’d much rather converge toward a shared model than invent yet another bespoke API.
If you’re open to it, I’d be very curious what parts of Feed API were hardest to standardize in practice and where you felt the tradeoffs landed in real systems.
I don't have that much to offer... we just implemented it for a few different backends sitting on top of SQL. The concept works (obviously as there is not much there). The main challenge was getting safe export mechanisms from SQL, i.e. a column in tables you can safely use as cursor. The complexity in achieving that was our only problem really.
But because there wasn't any official spec it was a topic of bikeshedding organizationally. That would have been avoided by having more mature client libs and spec provided externally..
This spec is I a bit complex but it is complexity that is needed to support a wide range of backend/database technologies.. Simpler specs are possible by making more assumptions/hardcoding of how backend/DB works.
It has been a few years since I worked with this, but reading it again now I still like it in this version. (This spec was the 2nd iteration.)
The partition splitting etc was a nice idea that wasn't actually implemented/needed in the end. I just felt it was important that it was in the protocol at the time.
That makes a lot of sense the hard part isn’t “HTTP paging”, it’s defining a safe cursor (in SQL that becomes “which column is actually stable/monotonic”), and without an external spec/libs it turns into bikeshedding. In Ayder the cursor is an explicit per-partition log offset, so resumability/paging is inherent, which is why Feed API’s mental model resonates a lot. I’d love to see a minimal “event log profile” of that spec someday.
Totally fair, if this were “single-node HTTP handler on localhost”, then yeah, you can hit big numbers quickly in many stacks.
The point of these numbers is the envelope: 3-node consensus (Raft), real network (not loopback), and sync-majority writes (ACK after 2/3 replicas) plus the crash/recovery semantics (SIGKILL → restart → offsets/data still there).
If you have a quick Python setup that does majority-acked replication + fast crash recovery with similar measurements, I’d honestly love to compare apples-to-apples happy to share exact scripts/config and run the same test conditions.
Good NICs get data out in a microsecond or two. That's still off by quite the order of magnitude, but that could be up to the network topology in question.
Durable consensus means this is waiting for confirmed write to disk on a majority of nodes, it will always be much slower than the time it takes a NIC to put bits on the wire. That's the price of durability until someone figures out a more efficient way.
I'm not sure if you're going out of your way to be a dick or just obtuse but 1) that's not true on most SSDs, 2) there's overhead with all the indirection on a Digital Ocean droplet, and 3) this is obviously a straight forward user space implementation that's going to have all kinds of scheduler overhead. I'm not sure who it's for but it seems to make some reasonable trades for simplicity.
LLM tics like the bits I quoted feel more like marketingspeak by committee than an actual readme written by a human. I don't have any particular suggestions of what to write, but you just don't need to be this punchy in a readme. LLMs love this style though, for some reason.
When I read this type of prose it makes me feel like the author is more worried about trying to sell me something than just describing the project.
For instance, you don't need to tell me the numbers are "real". You just have to show me they're covering real-world use-cases, etc. LLMs love this sort of "telling not showing" where it's constantly saying "this is what I'm going to tell you, this is what I'm telling you, this is what I told you" structure. They do it within sections and then again at higher levels. They have, I think, been overindexed on "five-paragraph essays". They do it way more than most human writers do.
That’s a great comparison nsq is a project I have a lot of respect for.
I think there’s a similar philosophy around simplicity and operator experience.
Where Ayder diverges is in durability and recovery semantics nsq intentionally
trades some of that off to stay lightweight.
The goal here is to keep the “easy to run” feeling, but with stronger guarantees
around crash recovery and replication.