We write a *lot* of EventMachine code here; I have some questions. He writes: Ev...

bascule · on Oct 25, 2012

> I'm not sure I understand how EventMachine "guts the Ruby internals"

EventMachine does not use the Ruby IO primitives (e.g. TCPSocket, UDPSocket, etc) as the basis of its IO abstraction, and instead has reimplemented its own set of primitives for doing IO.

Because of this it can't take advantage of work being done in Ruby core to advance Ruby's socket layer. For this reason IPv6 support langered, among other problems.

This also severely complicates making multiple implementations of the EventMachine API, such as its JRuby backend (which maps onto Java NIO)

I think the real problem is EventMachine's original goal was to be a cross-language I/O backend similar to libevent or libev, but since it wasn't a particularly good one, the only language that wound up using it was Ruby. Compare to Twisted, which is built on libevent, or to Node, which is built on libev/libuv

gnufied · on Oct 25, 2012

Author of Packet (https://github.com/gnufied/packet) a Reactor library that did use Ruby IO primitives here. I wrote this before, eventmachine had pure Ruby reactor. Non blocking calls were introduced in Ruby 1.8.5 IIRC and my library made heavy use of them, and they were buggy. Some issues I found:

* select() call used to segfault. * the non blocking read/write behaved very differently under different OSes and I am not talking Windows. There were differences between Linux and OSX for example. * Again frequent crashes when reading/writing when nonblock flag is set.

I am sure, situation is lot better now but for building a reactor library, native Ruby IO primitives fell short. There is always lack of advanced selectors (Epoll/KQueue) as well.

Now as I replied to thomas(?) below and since you yourself have wrapped libevent for Ruby 1.9, there were severe limitation in interpreter back then for even libevent wrapper to work. So I guess, given historical reasons, it made sense Eventmachine did not use native Ruby IO primitives or libevent.

tptacek · on Oct 25, 2012

I like libevent and all, but I don't think Twisted's use of libevent is a particularly big win for Twisted.

Again, this is apocryphal, but I remember someone trying to wrap Ruby around libevent and failing; I remember there being a reason this had to be done bespoke. And having fallen into this particular NIH trap many times before: there's just not a whole lot to a simple socket I/O loop.

What's been your experience with JRuby/EventMachine?

bascule · on Oct 25, 2012

For what it's worth, I've done two libev bindings, rev/Cool.io (originally in 2008) and nio4r in 2011

gnufied · on Oct 25, 2012

Zed shaw tried to make libevent work with Ruby, but this was when we had Ruby 1.8.5 and there were problems with Ruby interpreter that prevented this (around threading IIRC).

cdavid · on Oct 25, 2012

At least 2 years ago, twisted did not use libevent, it was mostly pure python except for a very tiny piece of code (and optional stuff of course).

I am sure you could write a libevent-based reactor, but twisted was started a long time ago (10 years ? The twisted book in o'Reilly was published in 2005), before libevent existed I think.

tptacek · on Oct 25, 2012

No, libevent existed before Twisted; its first release was in 2000, and we used it extensively at Arbor Networks just a couple years later.

cdavid · on Oct 31, 2012

Thanks for the correction. Seems that I confused the dates with libev that came after.

draegtun · on Oct 25, 2012

> ... Compare to Twisted, which is built on libevent, or to Node, which is built on libev/libuv

Or compared to AnyEvent which works with [m]any event loop - https://metacpan.org/module/AnyEvent

smegel · on Oct 25, 2012

> Golang is militantly anti-event; it doesn't even offer a select primitive

s/anti-event/anti-callback/. They are not the same thing. Under the hood, Golang is doing the eventing and select() for you, while letting you write simple procedural code.

> I try to force myself to write socket code like I did when I was 13, reading a line, parsing it, and writing its response, but that code is brittle and harder to follow than a sane set of handler functions.

How is simple, procedural code hard to follow? Even the way you write it sounds simple "read, process, write". That is really more hard to follow than a series of onRead, onWrite callbacks?

> I'm not arguing that evented code is the best answer to every problem

No, but like many others you seem to be confusing and conflating call-back driven code with event-loop based code. Everyone agrees on the benefits of event loops over kernel threads. Not everyone agrees that call-backs are the best interface to event loops - more and more people are switching on to green threads (which look just like normal threaded code) as the best and simplest interface to the event-loop - as seen by Golang, gevent, Eventmachine, Coro (Perl) etc etc.

tptacek · on Oct 25, 2012

Real evented code isn't a series of onRead/onWrite callbacks, because real evented code buffers strategically, and abstracts itself into state machines with simple callback interfaces, so that the rest of the program can be written in terms of higher level events.

I don't want to turn this into "events work for every program", because like I said above: I don't think event interfaces work for all kinds of programs. I've found them poorly suited to full-featured web frameworks, for instance.

I did not predict this incredibly boring "evented runtime vs. evented API" controversy, but will dispense with it by saying that it is incredibly boring, so you win it in advance. :|

cmccabe · on Oct 25, 2012

I don't see the "evented runtime vs. evented API" issue as boring. This kind of thing is the core of any application you are going to build; it's hardly a minor or nitpicky issue.

Using non-blocking APIs requires you to turn your application "inside out", putting state that would normally have been on the stack into a state machine. You wind up in a maze of callbacks and low-level details.

This kind of thing might be appropriate for writing high performance code in C, but it's a mystery to me why anyone would want to do it in a higher-level language.

lmm · on Oct 25, 2012

It forces you to keep your methods short with reasonably appropriate boundaries between them. Good programmers should be doing this anyway, but an event/callback API at least prevents you from writing a 300-line "do everything" method.

davidw · on Oct 25, 2012

> green threads (which look just like normal threaded code) as the best and simplest interface to the event-loop - as seen by Golang, gevent, Eventmachine, Coro (Perl) etc etc.

That list is just screaming for someone to write a comment mentioning that you forgot Erlang.

ismarc · on Oct 25, 2012

When I saw the title, I thought it was going to be coverage of how under-the-hood EventMachine's model only allows it to scale up to a certain throughput per-process (which is reasonably high for most uses of it) or how there is some fundamental complexity in how it's written that pretty much guarantees there will be livelock conditions (and deadlock conditions). So I share your bewilderment about why they chose the things to highlight that they did.

The lock conditions it has are for particular workloads at particular throughput/utilization, so 99% of the people using it won't hit those conditions initially (and a lot of people aren't writing systems that will ever reach those limits). However, we have a particular service that acts as a TCP multiplexer/router/load balancer to provide high availability for some of our mission critical applications. A little over a year ago, I initially wrote it using EventMachine in a couple of weeks, then spent almost a month trying to find what I thought was a bug in my code where a full deadlock would occur if 3 or more connections were established in a small enough time frame. Turned out to be a bug in EventMachine. After fixing that one, I found another where if 5 or more open connections fired the same event within a small enough time frame, all 5 would hit livelock and given enough connections hitting the condition, the whole process would deadlock. Once I hit the second bug in EventMachine itself directly related to concurrency handling, I switched to Netty and rewrote the whole thing in Scala in about a week and it's been rock solid since.

I've been doing some development on the side in Go because its particular flavor of types and concurrency model is fascinating. It's not that Go is anti-event, it's that it's overwhelmingly stream oriented (not low-level streams, but data streams). Once I finally hit that moment of clarity that goroutines/channels was all about connecting streams of data and not connecting raw streams, they became a much more natural solution. However, don't get me started on the difference between a non-blocking read on a channel and a blocking read on a channel.

tptacek · on Oct 25, 2012

You're threading, I presume? We do high speed high connection rate work in EventMachine (for instance, to work with order entry for trading exchanges) and have never seen anything like this --- but we religiously avoid Ruby threads.

ismarc · on Oct 25, 2012

Yeah we are, as it was the least-convoluted method for handling the multiplexing (but wanting to maintain the same handler for all of them). Burned entirely too much time assuming that documented ways to handle things were actually rock solid, not rarely used. The first bug where 3 connections could deadlock everything was unrelated to threading. It was related to how it created identifiers for each connection and would create identical identifiers for two different connections so it would think there was data pending for a socket, but the data had already been read, so it tried to do a blocking read when there wasn't data to read. I never fully tracked down the second bug, but it was more likely related to using threading.

One thing we did have was a large majority of the connections were coming from the same IP (but different source port) and a peak connection rate of ~200 connections per second per server and would last a few seconds (so at any given moment, roughly 1000 connections being established or with data in flight). I'd be really interested in hearing what your traffic patterns are roughly like (and hand-wavy what you have event machine doing, like proxying requests, building/returning its own responses, etc.) if you're willing to share at all.

stock_toaster · on Oct 25, 2012

  > Golang is militantly anti-event; it doesn't even offer a select primitive! Just alternating read/write on two different sockets seems to demand threads!

Go only blocks the goroutine. It uses native event primitives (epoll/kqueue/etc) under the hood so that IO is non-blocking to the rest of the process (e.g. other goroutines). You said you have used Go for 2 months, so I assume you are aware of that.

So..can you be more specific about what you were talking about in that statement?

tptacek · on Oct 25, 2012

There are times when you might reach for select() not to scale the whole program, but because for instance you're trying to hot potato data from a Reader to a Writer.

Obviously, the whole of Golang's concurrency model is lightweight threads scheduled on I/O events. But that's not exposed to the programmer; in fact, it's hermetically sealed away from the programmer from what I can tell.

So now you know what I meant by that statement.

zemo · on Oct 25, 2012

every time you perform i/o, the current goroutine yields to the scheduler.

If you want to manually yield to the scheduler, you call runtime.Gosched() and the calling goroutine yields. If you want to lock a goroutine into a specific thread, you call runtime.LockOSThread(). Everything is single-threaded to start until you call runtime.GOMAXPROCS(), but that's intended to go away in the very near future. So... I'm not sure what you mean by sealed away; the Go runtime does the sensible things that it can do automatically, but you still have the ability to change the scheduling behavior if you really want to.

what do you mean by "doesn't offer a select primitive"? I'm not familiar with your definition of select, because Go has a select keyword for concurrency control, and I'm under the impression you're talking about select as its defined in EventMachine, could you elaborate?

tptacek · on Oct 25, 2012

I'm referring to the system call. Sorry, I can see how that would cause lots of confusion.

stock_toaster · on Oct 25, 2012

Thanks. I think I understand a bit better now.

Go provides concurrency primitives (scheduler yielding i/o, channels, goroutines, etc) upon which you can build your own "events". I agree (and this seems to be your point) that Go doesn't provide a canned event mechanism that you can just hook into for callbacks on IO.

For example in the Go http server[1] a goroutine accepts in a loop and kicks off goroutines to process and handle new requests.

[1]: http://golang.org/src/pkg/net/http/server.go?s=30200:30241#L...

tptacek · on Oct 25, 2012

Sure, I'm using the Go http server.

Take the example of a simple proxy to see where I'm coming from. Sure, I'm happy to have Go manage all the connections and sockets and I'm happy to spawn new goroutines for each connection and all that. But a proxy accepts an inbound connection, makes an outbound connection, and then monitors the outbound and inbound sides for data. The loop to do this with sockets could be a simple two-descriptor read select(2) call. But instead, I have to spawn two more goroutines, one to "monitor" (really, read) from inbound, and one for outbound.

smegel · on Oct 25, 2012

> But instead, I have to spawn two more goroutines, one to "monitor" (really, read) from inbound, and one for outbound.

What does Golang's select/case not do that you want?

tptacek · on Oct 25, 2012

Is there some feature of the language or the libraries that I am missing that turns a socket (err, a TCPConn) into a channel that I can read with Golang's select construction? Seriously asking. That would be awesome.

smegel · on Oct 25, 2012

No, I think you would need to wrap the TCPConn in a goroutine that does the Read, handles any errors, then passes the bytes back to the consumer via a channel. At least it would be a very generic and small goroutine to do this.

I agree that would be a nice feature though.

tptacek · on Oct 26, 2012

You know what's happening in this thread? I made it sound like I was criticizing Go for needing to spawn goroutines to do this. I'm not! It makes sense, in the context of Go, to do it this way, even though as a C programmer by training that's not my first thought on how to do it.

I'm not criticizing Go for being anti-event; I'm just observing that it is. Idiomatic Go --- like, the code in the standard library --- has a strong bias towards straight-line code.

smegel · on Oct 26, 2012

> I'm not criticizing Go for being anti-event; I'm just observing that it is. Idiomatic Go --- like, the code in the standard library --- has a strong bias towards straight-line code.

I think I understand where you are coming from now - I think you are saying Go is "anti-event-based-callback-driven" rather than "anti-event-loop-implementation" - which is absolutely true. Go's concurrency model is build on CSP (Hoare's Communicating Sequential Processes)[1] which seems to advocate procedural threads rather than callbacks.

[1] http://golang.org/doc/go_faq.html#goroutines

drivebyacct2 · on Oct 25, 2012

Is there a large overhead for having to use a goroutine rather than the lower level? Do you think it's appropriate for Go to expose that, or do you just miss it?

cmccabe · on Oct 25, 2012

> Is there a large overhead for having to use a > goroutine rather than the lower level?

No.

aaronblohowiak · on Oct 25, 2012

Would you just select { case <- reader... case writer <- data..} ?

I know this isn't the same as posix select, but it does let you have one goroutine coordinate the hot potato..

tptacek · on Oct 25, 2012

That would require a goroutine for each socket to hot potato from the socket to a channel, right?

aaronblohowiak · on Oct 26, 2012

Yes. This allows you to have the "when any of these socket's state changes" semantics in your controller/dispatcher function. I would like it if they had such an interface in stdlib...

If you really do just need to take data from one Reader and send it to a Writer, you can make a new Pipe.

agentS · on Oct 25, 2012

> I try to force myself to write socket code like I did when I was 13, reading a line, parsing it, and writing its response, but that code is brittle and harder to follow than a sane set of handler functions.

Do like the http library does and separate out socket handling to be in terms of net.Listeners/net.Conns, and create a handler abstraction for yourself. Your socket code will be testable, because its trivial to write fake net.Conns that are backed with byte buffers. Your handler code will be testable because they are in terms of parsed objects.

I tend to write socket code once per server project, and then leave it alone for months, so this isn't a big deal for me. What makes Go nice for me is that I can block in client code, and retain the efficiency of using a select loop.

I don't have access to the source for the proxy I wrote for work, but here's one I whipped up quickly: http://play.golang.org/p/Fz19qSehCg

Go doesn't expose select, and has the runtime do that for you; but this allows them to make all Go libraries share a select loop, which has nice performance characteristics. Although, now that I think about it, it is probably possible to have a userspace implementation of select that works atop the runtime's shared select loop. Hmmm...

tptacek · on Oct 26, 2012

I ended up writing my proxy this way. See the back-to-back "go copy" lines? That's what blew my mind. I get why Go works this way, but wow would I ever not write code that way in a threaded C program.

gchpaco · on Oct 25, 2012

From hard learned professional experience: EventMachine isn't written very well, when I say that I mean specific things like "whoever wrote the sub process handling in EventMachine wrote code that is uniquely wrong on every platform I've ever heard of." If you are not familiar with this (and are curious, in some sort of macabre way), I urge you do grep for SIGCHLD or wait in the source and see what you find.

What EventMachine does (or at least what it did a year or so ago when I was debugging this) is this: sub processes are equated to popen. When the input side of that pipe closes (that is, when the sub process closes STDOUT) the process will be finally be waited upon—and if it doesn't terminate in a hard coded timeout which by the way blocks the rest of your program, then it will be forcibly gunned down, with SIGKILL if necessary.

Among the problems that arise here, note that unless your daemon script is written to unconditionally drop STDOUT upon forking (uncommon) and you attempt to launch a daemon from within a sub process you are managing using EventMachine, the subprocess itself will terminate quickly, the daemon will go on its merry way, and your driver program will never, ever tell you it has finished running until that daemon is dead and anything it has spawned that might possibly use STDOUT is also dead. And god forbid it close that stream and then dare to continue running, for EventMachine will shoot it dead within IIRC 20 seconds, and lock up your driver program for the duration to boot.

Programmers who understand how the Unix process model works will write a very small signal handler for SIGCHLD that writes a byte on a pipe or some similar method of notifying the main event loop and call wait on the child immediately and then close its end of those pipes. I am reliably informed by those who understand the Windows process model that what EventMachine does is even more wrong there. This is a subsystem that was not written by anyone who knew what popen does, could not be bothered (or was perhaps incompetent to read) what any of a dozen standard implementations of it do, and appears to have debugged the code into some form of submission and then released it upon an unsuspecting public.

This is the only colossally wrong decision they made that I can list off the top of my head, but that's because it was so stupid I stopped looking for trouble after that. EventMachine does not handle anything but a very straightforward select loop very well, and I am sufficiently terrified of what lives under the covers in that system that I would rather write the select by hand (massive pain though it may be) than let this system anywhere near it. The thing that really alarms me is that people build walls of cardboard like NeverBlock (which reaches deeply into the guts of the Ruby software I/O and replaces it with EventMachine driven coroutines) atop this foundation of sand and then wonder when it falls over sideways in an impenetrable and impossible to debug fashion.

Coroutine programming (for that is, essentially, what we are talking about) can be a very elegant way to solve certain problems, but it works best when it is simple, or it least localized (e.g. samefringe). In an event driven server, every little piece must be audited carefully to ensure that it does not block. You get all the same problems any preemptive concurrency model does, with some added nastiness; in exchange you get some slightly better scalability numbers. It is at its best in a fairly simple program such as, say, nginx in its proxy configuration, where it speaks streams and SSL and talks to some application server on the other end of a different stream for anything sophisticated.

tptacek · on Oct 25, 2012

Nobody uses EventMachine to manage daemon processes. Subprocesses in EventMachine aren't "equated to popen"; they exist for the sole purpose of doing evented I/O popen-style. I'd be careful about calling a developer "incompetent" because they write something that doesn't admit to arbitrary use cases.

gchpaco · on Oct 25, 2012

I'm being elliptic about why here because I don't think I can talk about the internal architecture of that system in public, but warning people off one particularly stupid third party bug that we fixed in our internal fork is not, I believe, a problem. Anyway, we certainly did use it to manage daemon processes, although not deliberately; we had a daemon that communicated with external software about system events, and running shell scripts was part of that. We didn't necessarily anticipate folks running 'service httpd start' in those shell scripts, but it was not an inherently unreasonable thing to do.

And this isn't "arbitrary use cases"; this is an explicitly supported function that is completely contrary to good practice and sane behavior and, to boot, has the ability to arbitrarily kill programs for impenetrable reasons and block for significant periods of time (the central sin of event driver programming). You can't tell me that if you saw something like this in a random crypto library you wouldn't immediately tell everybody to stop using it; why should EM's developers get a pass for their, yes, incompetently written popen? I would actually be considerably happier if it wasn't in the library at all; at least then it wouldn't be wrong.

tptacek · on Oct 25, 2012

Do you have other examples of how badly constructed EventMachine is, or is it just that you can't use their process I/O stuff as a daemon manager?

I was using Adam Langley's net/ssl code in Golang to build an HTTPS proxy, and only after several hours of hair-pulling did I discover that Langley hadn't implemented the compat SSL2 handshake that Firefox uses with proxies. net/ssl in Go was, for no good reason other than an omission, unsuited for use as an HTTPS proxy. Should I say net/ssl was incompetently written? That seems like a bad idea to me.

adgar2 · on Oct 25, 2012

> But there are problems --- like, backend processing, or proxies, or routers and transformers, or feed processors --- where event loops are the most natural way to express a performant solution.

Your backend processing apparently fits in one machine, since you "hook Mysql up through Redis." I'm personally astonished you get so much done without a distributed environment tolerant of the relevant failures I see when I read that sentence.