Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It appears to be for the Myricom LanAI processor, which is available to others. It's an older 32 bit RISC processor, used as the offload engine for their NIC cards.

Here's gcc for it: https://github.com/myri/lanai-gcc

Sounds like it's not particularly interesting unless you want to write your own network offload code.

Edit: Probably just the offload processor they are choosing for their in-house routers/switches/servers. And, probably they want their own firmware either for security/nsa reasons, or performance, or both.



Didn't know what an offload engine was, so I looked up TCP offload engine (TOE) on Wikipedia: https://en.wikipedia.org/wiki/TCP_offload_engine

The page states: """ A generally accepted rule of thumb is that 1 hertz of CPU processing is required to send or receive 1 bit/s of TCP/IP.[3] For example, 5 Gbit/s (625 MB/s) of network traffic requires 5 GHz of CPU processing. """

Question for people in this line of work: is this accurate / reasonable?

Thanks


From the linked paper:

The generally accepted rule of thumb is that 1bps of network link requires 1Hz of CPU processing. Figures 11, 12 give a full story of this rule of thumb. (where Hz/bps ratio = %CPU utilization * processor speed / bandwidth). It had held up remarkably well over the years, albeit only for bulk data transfer at large sizes. For smaller transfers, we found the processing requirement to be 6-7 times as expected. Moreover, the figures show that network processing is not scaling with CPU speeds. The processing needs per byte increase when going from 800MHz to 2.4GHz. This happens because as CPU speed increases, the disparity between memory and I/O latencies versus CPU speeds intensifies.


Does anyone know if memory latency is still causing problems in common implementations (like linux and freebsd)? I would think that the parts that are the bottleneck could be re-written with that in mind and gain quite a bit from it.


TSO really turbocharged bulk TCP transfers, so now 2 GHz can drive 10 Gbps as long as you're sending >=64KB chunks. This has made performance brittle, because 10 Gbps of small packets requires 10x-100x as much CPU as 10 Gbps of bulk traffic. Also, receiving requires more cycles than transmitting.


Indeed, it's common to see DDoS protection offerings defined both in throughput (e.g. 50 Gb/s) and packets per second (e.g. 30 Mpps), which results in a bottleneck in packet size (e.g. ~200 bytes) at high throughput.


I'm very biased against “approximately X per Y” statements in general because just about any non-pathological curve has a linear tangent somewhere or the other, but if intervals wherein that proportional approximation can be said to hold are not stated it's pretty haphazard as a means of estimation.

Also, what kind of physical/logical process and attendant costs is it encoding? Does one per second of anything require one per second of something else? What if the processor were half- or double-the-bits?


Just now I was working on integrating and optimizing networking for some microcontroller-based software (using lwIP). At the end, I've reached 7MB/s send and 10MB/s bulk receive speed.

I just did the calculation. It says for 10MB/s I'd need 80MHz. The chip runs at 84MHz. So, it's pretty close :)


There's other things you can offload now as well. Search for info on DPDK, Intel QuickAssist, etc. These sorts of things will allow off the shelf hardware to displace expensive, proprietary ASIC accelerated routers, firewalls, etc.


I thought this too, but someone in the email thread explicitly mentioned the myricom, and the _response_ to that message was that it was purely internal hardware, not useful for others...so it may just be a coincidence.


But then they go on to say that their "previous binutils used" was https://github.com/myri/lanai-binutils.

It certainly seems believable that this is a third-party processor architecture, but Google has a contract with them to build specific models of that processor that meet their needs (possibly alongside some other proprietary hardware, like a high-speed NIC), and those models aren't sold to the general public. That's pretty common outside x86, right? For instance, is there a way for me to buy a BCM2835 other than by buying a Raspberry Pi?


If you search a little, you'll find that Google hired Myricom's CEO, Founder/CTO, and several engineers. I suppose it's possible they licensed the IP too.


Now that's meta. Using Google to uncover their own secrets / low profile activities.


Heh. I suppose there's an app idea in there somewhere...click on a company and visualize notable inbound/outbound talent migrations.


Chips like the BCM2835 are available for purchase, just you need to have a very large purchase order.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: