Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
WebAssembly is eating the database? (dylibso.com)
101 points by nilslice on May 23, 2023 | hide | past | favorite | 91 comments


Again nothing new on the horizon, DB2, Oracle and MS SQL Server have had JVM and CLR available for UDF for the last 20 years.

Naturally one needs to sell WebAssembly as being the first of a kind.


The very first sentence from the blog post:

"User-defined functions (UDFs) have been a fixture in database systems for a considerable period of time, allowing users to extend the database’s built-in functionality to complement good ol’ SQL."

Granted, I'm biased, but I think we did a decent job elaborating from there on the novelties WebAssembly brings to this age-old field.


The very next sentence:

> developers are in most cases forced to use unfamiliar programming languages, typically unique to the database itself.

What parent correctly points out is that this simply isn’t true. Your examples of novelty are all things that were possible decades ago in most mainstream DBMSes. The SQLite one in particular is bizarre: it’s inherently an embedded db with bindings in every language.

Supporting user defined functions as compiled objects loaded into the database is as old as the hills. The sandboxing part is silly since no one in their right mind is going to expose their database to untrusted users in the first place.


> no one in their right mind is going to expose their database to untrusted users in the first place

I've seen projects do exactly this, with PostgreSQL row-level security.

https://www.graphile.org/postgraphile/

EDIT: Though to be clear, it does seem a terrible idea to me too....even when you prevent unauthorized access, there is always the issue of resource usage.


Apologies, I should have said something more like “exposing the database to untrusted users is a specific and narrow use case that needs way more db-specific planning”. RLS is really cool tech but you need to design your schema and application around it for it to be useful. Slapping some WASM on any existing databases (especially the vast majority which have nothing like the unique capabilities provided by RLS) isn’t going to buy anything that doesn’t exist already.


Also Hasura and PostgREST (thus by extension Supabase). I would also add that one of the entry barriers of such projects is the need to write UDF. Most developers are not used to write UDFs, and the languagea (plsql and likes) themselves lack the features and ease of modern languages and vary among database vendors.

As mentioned by other people there have been attempts to allow writing UDF using the JVM, Javascript, Python and so on, by they never caught on (and some of them are difficult to install) and are not that portable to other database vendors, they require installing third party extensions and so on.

I for one see WASM to be a good candidate for a really portable solution.


I think the fact that the most viable way to run Rust on the JVM right now is by compiling to WASM really says it all. JVM is going to die, WASM is the future.


Sure, as if Rust is taking Web development by storm.

Leave it for kernels and drivers.

Plus GraalVM runs LLVM bitcode if you have such a desire.


>The sandboxing part is silly since no one in their right mind is going to expose their database to untrusted users in the first place.

That's circular logic. You would not do it when it's unsafe. But when it is, it could open up entirely new use cases. Most obviously, infrastructure-as-a-service offerings.


It’s only circular when your conception of “safety” has a single dimension. The sort of execution sandboxing provided by WASM runtimes isn’t even close to the primary challenge with allowing untrusted users to run code inside a database.

In the general case using WASM as an object format does nothing to actually enable this use case beyond what exists.

It doesn’t really matter since this “untrusted user functions in the db” scenario is incredibly niche. The vast majority of database usage involves data access patterns entirely controlled by the developers of the system. Stored procedures in a variety of languages have long been available to such developers _and are generally considered an anti-pattern_.


What WebAssembly brings to the table is the ability for unprivileged users to upload their own arbitrary UDF to the DB for execution.


Database privileges have always been completely uncoupled from each other. If you want some user to just run his code, you can already say so, and that has been true for decades.


Spectre and other hardware bugs are not going to be fixed for code running in the same address space. The moment DB allows untrusted WebAssembly to run, then that code can read anything the DB server keeps in memory.


I don't see how Spectre or hardware address spaces are relevant here. WASM code cannot access memory that is not explicitly provided to the sandbox, by design. Each load/store opcode is bounds-checked by the WASM engine to ensure that. Wouldn't be much of a sandbox, otherwise.


You can still have Spectre attacks, check out this blot post

https://blog.cloudflare.com/mitigating-spectre-and-other-sec...


They can't be completely arbitrary without user trust. My guess is security directives like invoker vs creator will need to be specified and checked appropriately whenever the assemblies are installed.


The whole point of having a sandboxed runtime like WebAssembly is that the code can be executed without user trust, as the user cannot pierce the isolation boundary or exceed the resource limits provided by the sandbox.


JavaScript and Python have been used for this since literally the 90s.


How will that work with RLS and similar controls?


I don't see why UDFs and RLS would need to interact, they're orthogonal concepts that can work in tandem. The DB engine filters out the rows that fail the RLS check, then apply the UDF to the remaining rows.


You’re exactly right that these are orthogonal. The data isolation is the much harder part (the type of sandboxing provided by WASM has long been available via interpreted languages, see pl/v8 et al).

Without exception the existing APIs mentioned in this article don’t attempt to handle “data sandboxing” since it’s so application specific. So it remains the case that exposing your db to user defined functions is dangerous without a ton of work, and WASM in no way reduces the amount of work needed.


Not necessarily true. E.g. in PostgreSQL, the planner is free to evaluate predicates prior to imposing RLS constraints, so long as the functions in the predicate are marked LEAKPROOF [1]. (They aren't by default.)

This is because UDFs can write to tables, and perform I/O in other ways depending on system configuration and security policy. Just because the UDFs are in WASM instead of one of the other UDF languages doesn't eliminate the need for a security system.

(Aside -- curiously in PostgreSQL, the privilege system isn't even strong enough to prevent something as innocuous as e.g. reading a view owned by another user from dropping your own tables.)

[1] https://www.postgresql.org/docs/15/sql-createpolicy.html#id-...


OK, that makes sense, though possibly less useful. I was thinking the UDFs would operate on the query or raw data.


I remember seeing "gcc/clang/cl -o blah.so/dll/dylib" for an UDF functions (various DBs), and then dlload/loadlibrary/etc it. Nothing new really, but avoids having a compiler, and maybe the `wasm` sandbox helps with security.

My only worry in all this (and wasm) in general is how well you can debug this later - be it interactive, or postmortem dump. I'm having my reservations, but would have to experiment and see.

So far the best integration I've seen between different languages/runtimes have been C++ / .NET (C#) and C++/CLI (e.g. their managed C++) - I can debug, step in, post-mortem debug without an issue.

I can't even do this properly with GoLang, and had mixed success with Python (+.pyd files).

Debugging (interactive, and post-mortem) is often overlooked, and thought about later.

But I'm all excited about wasm now :)


I'd encourage you to check out one of our projects, Extism[0], which helps make the kind of integrations you mention way easier! (and using wasm)

0: https://github.com/extism/extism


Just wanted to say, this is the best worse idea i've seen in ages! it's on my todo list ;)


> the ability to bring compute to data eliminates the need for as many microservices — run those in the database instead!

Isn’t that a step backwards? De linking them means you can scale the two appropriately independently.

I guess if you’re using could and can “steal” a bit of compute that way it makes sense.


In nearly every practical scenario, the scaling bottleneck lies in network throughput. Trying to scale by putting network I/O between computation and storage is a bit like trying to separate a rocket engine from its fuel tank with a garden hose.


So back to mainframe days?


I wouldn't call it a step backwards, but if you can put it in the database you shouldn't have been considering microservices in the first place.


> De linking them means you can scale the two appropriately independently.

It's definitely situational, and this can be true! But, generally its this kind of scaling optimization on everything that has us drowning in microservices today. If we can afford to "steal" some compute from the DB server, and minimize the amount of infra spun up for things that are basically scripts, I consider that a win.


We de-linked the two because there was no practical way to autoscale the database, but autoscaling microservices is easy. Obviously you want to be very cautious about how you treat inelastic compute.

Today there are lots of practical ways to autoscale databases and colocating data and compute is generally good for performance and simplicity (your mileage may vary for simplicity).


It depends on the workload and being able to go both ways is powerful. If compute dominates every other dimension, then decoupling compute from data makes sense and you see a substantial performance improvement decoupling those workloads.

If Network I/O between your compute and data dominates every other dimension, you might want to to collocate compute and data.

Being able to fit the architecture to the performance of the workload is a good thing.


It depends - usually moving computation closer together is much more performant. You may be underutilizing your DB or spending a lot of roundtrips on networking and this is a good way to reclaim that. It does not prevent you from having services - it just reduces the #.


> Isn’t that a step backwards?

It could be for some use cases, but there are significant efficiency advantages to processing the data where it lies instead of shipping a copy to separate compute services.


Decoupled scales, but you end up replicating the dataset in compute node caches, which is expensive.

Ideally you want scalable storage with compute (assuming really large datasets), so you can move the computation, which usually is small, close to the data.

In any case, it's not one size fits all, and each problem space matters.


The Web Assembly Component Model (1) can't land soon enough.

(1) https://github.com/WebAssembly/component-model


Is there an indicator on progress on this and related specs (WASI)? It is hard to get a good feel from the repo's.


This is actually a part of WASI, you can follow the WASI preview2 milestone tracker:

https://github.com/orgs/bytecodealliance/projects/6


Does anyone have any good resources that cover the implementation of wasm as a plug-in system on either a backend or frotend? Just curious to deep dive on this. Could be a video or in-depth article.


We have an end-to-end demo of a plug-in system driven game platform: https://github.com/extism/game_box (live deployment at https://gamebox.fly.dev/)

Backend in Elixir, with Extism embedded and example plug-ins in Rust and Javascript!


We (Splitgraph) implemented WASM UDFs in Seafowl [0], a database written in Rust based on Datafusion and optimized for executing queries "at the edge" and returning cache-friendly HTTP responses. Users can call CREATE FUNCTION within an SQL query to create a WASM UDF (docs [1]).

We blogged [2] about this feature, and you can read the relevant PRs [3] containing the changes necessary for its implementation.

[0] https://seafowl.io

[1] https://seafowl.io/docs/guides/custom-udf-wasm

[2] https://www.splitgraph.com/blog/seafowl-wasm-udfs

[3] https://github.com/search?q=repo%3Asplitgraph%2Fseafowl+wasm...


woah, awesome! sorry this didn't come up in our research. will try to update and fit it into the post if that's ok?


Sure thing! And no worries, we haven't been great at marketing ourselves :)


Fluvio is an open source data pipeline project [1] w/ plugin wasm modules for data transformations as "SmartModules"[2]. In our case we can run wasm plugins on frontend or backend (none of which require a browser). Feel free to come by our Discord if you have any questions. Disclosure: I work as a engineer for Infinyon backing the project.

[1] https://github.com/infinyon/fluvio

[2] https://www.fluvio.io/smartmodules/


have you seen this project? https://extism.org


dylibso, the company who posted this, built extism (mentioned by wikiwong)


We sure did =) Thank you, zephraph!


while wasm is getting a ton of attention for its new place in the serverless world, I'm excited about how i may never write another sql stored procedure, or line of pl/sql again!


Depending on the database you’re using, you don’t have to do that right now. I’m looking at you, PLV8


good point! postgres ftw


I think what’s interesting is while UDF has existed for some time. It anecdotally has seemed to be increasing in pace in announcements recently driven by wasm. For all the benefits explained in the post leading to a great end user experience, I would expect it’s something all the database providers to roll out as tablestakes in the future.


Being able to write UDFs using modern developer tooling, unit test them, lint them, etc. is my favorite thing about WASM UDFs - I’ve always found writing complex SQL UDFs a frustrating experience (disclaimer- I work for SingleStore mentioned in the article and recently spent a lot of time writing C++ to WASM UDFs for that environment).


WebAssembly: achieving the goals of the JVM two decades later.


Will be interesting to see how interface standards develop across the database ecosystem. I want to write my UDFs once and run them on multiple platforms


A bit off topic but as someone who has never used UDFs, how can running them inside of transactions possibly be performant? Is this something which is actually useful for consumer / “web scale” applications or is it mainly just for enterprise apps with small numbers of users?


Depends on a lot of factors, but most databases have some overhead per call, especially if it happens over a network. If you can write some procedural code that runs on the database and eliminate calls and network traffic that may be a win. Then there is the SQL overhead of parsing and compiling queries, various forms of stored procedures can mitigate that by having a function that is compiled once, and then called with different input data.

How this all works varies a lot between databases, so what works in one may not be optimal in another.


Plenty of conventional applications run code in a transaction; that's not a novelty.

There is a valid concern with using extra CPU/RAM on a (hard to scale) database rather than (easy to scale) application servers.

But it all depends on what you are doing, your tenant model, etc.


DB performance tuning is complicated but well-understood. I don’t see any particular reason that functions run in a db would be any more of a problem than “serverless” in another environment.


love seeing this trend of software systems becoming programmable with wasm. exciting times ahead!


tangentially related, but could anybody give me the TLDR on whether increased usage of WASM on the internet is expected to open up a new slew of security vulnerabilities for users? Someone posted on here before that even JS was controversial during its adoption (and now for some users), I can only imagine that some of those concerns are multiplied when you're talking about something with such broad capabilities as WASM


WASM's runtime is far simpler than JS's, or even something like the JVM, particularly due to lack of garbage collection. The language was deliberately designed to be straightforward and efficient to JIT compile to a safe virtual machine, e.g. attention was paid to making sure bounds checking could be elided efficiently for constant offsets while being enabled elsewhere. It's also significantly simplified compared to assembly, and lacks functionality like arbitrary jumps that are common sources of vulnerabilities. It's pretty much ideal for the usecase of "I want to run pretty fast code on an untrusted machine" which is the main reason it's taking off.

(Obviously, as WASM JIT compilers get more elaborate and it incorporates things like GC'd references, some of the advantages of this design wrt simplicity will disappear, but I anticipate that these kinds of features will mostly not be used in settings like databases).


Dart/flutter can now target wasm (unstable), so better gc support is coming (e.g. not gc running from whatever you compiled, but rather the wasm vm would do for you)


WebAssembly running in the browser runs in an isolated environment whose only access to the outside world is by calling out to JavaScript functions exposed to it when it was initialized. So it doesn’t open up any new APIs.

It does get recompiled to bytecode that runs directly on the processor, so in theory there could be vulnerabilities that come up from that, but I think it’s a pretty well-understood surface area at this point.


I know it's not fundamentally that different from the running arbitrary code we have right now, but something about the idea of indiscriminately running binaries invisibly downloaded over the WWW rubs me the wrong way.

Edit: read the first half of my sentence again.


> something about the idea of indiscriminately running binaries invisibly downloaded over the WWW rubs me the wrong way

I think that's because the word "binary" has connotations that aren't strictly technically justified. There's no difference between running WASM and running JS (which is usually obfuscated anyway). Both are JITted into machine code, and it's important the JIT is implemented securely. (JIT compilers have been sources of exploits.) The only difference is complexity of the source language, where simpler languages are easier to secure. And WASM is orders of magnitude simpler than JS.


That’s what we’ve been doing since the dawn of JavaScript, in a VM with a much larger surface area. A simper VM should be quite a bit safer than the status-quo.


I believe wasm will allow for far greater opportunities for code obfustation.

You can already do things like http://www.jsfuck.com/ and even more complex ones, but they would likely have bad performace.

With wasm you could do malware-level obfuscation with less drawbacks.


I have some bad news for you…


For now wasm does not have any more risk than obfuscated JS.

It is possible that there are a lot of tool to examine JS code for "suspicious code" and that those tools might not work well on WASM.

In practice the WASM interpreter in the browser is a functionality that can be (almost) completely polyfilled.

Personally I would be way more worries about hardware exploits via WebGL.


Also it is likely that the browser debugger will be less feturefull for wasm and that future multithreaded support will make obfuscation even easier.

But it is probably already possible with JSFuck and webworkers.


On paper, this just uses the same security model as javascript and obviously a lot of thought is going into security and sandboxing with this. What has been problematic historically was a lot of native code written in a hurry by dot com era companies being unleashed on browsers via a poorly thought out plugin model.

Flash, Silverlight, Java Applets, and loads more stuff existed while people were still OK serving stuff up without SSL, trying to figure out cookies and generally not putting a lot of thought into cross site scripting attacks. That was a security nightmare and all the obvious things happened. WASM does not seem like a repeat of that. Rather it builds on all the learning we've had since then.


WASM being used to exploit your PC isn't much of a risk, because it has a really robust security model. It's been used in exploit chains before but my understanding is that the number of times people have found holes in WASM sandboxes is much smaller than the number of times people have found holes in JS runtimes. So it's pretty fit for purpose there.

Unfortunately, the design of WASM and common compilers targeting it means that your security is at risk as a user, even though your PC is safe. Applications compiled to WASM have a wide variety of fun security vulnerabilities that have been gone on native targets for decades, which means that if (for example) Gmail moved to WASM, it would now be possible for malicious parties to attack your Gmail tab even though they can't attack your PC. Some examples:

* Address space layout randomization is gone - if an attacker gets a write-anwyhere primitive, it will work 100% of the time

* Function pointers are densely packed - any possible value within the correct range (0 - function count) is a valid function pointer, as long as the signature matches. Even worse than not having ASLR.

* Page protections are gone - all data is mutable, even things that shouldn't be like string constants compiled into the binary

* Zero page accesses don't trap - while compilers go out of their way to try and make reading/writing from null pointers fail in WASM, the actual runtime happily allows you to do it. This makes attacks easier to execute because while stray nulls would kill a native application on dereference, in WASM they will just yield a 0 and execution will often continue.

* Most stuff is static linked - for native applications, it is common (albeit less common now in our modern era of Electron Hell) to pull in services from the OS and its packages, whether it's zlib or https or whatnot. The WASM runtime model generally offers a very limited set of capabilities in comparison, and there's no equivalent way to dynamically link against vendored packages that get security updates automatically. So every application ships its own ICU (for time zone and localization data), ships its own zlib (for compression/decompression), ships its own crypto (because the browser crypto APIs are extremely limited and async-only), etc...

Ultimately WebAssembly runtimes are very secure, but you have to protect everything else from the code you're running, because the code itself can easily be attacked by uncontrolled input.


The desperation to make WebAssembly a thing is reaching epic proportions. It was always a cutesy little thing that no one took super seriously. A cool experiment that had little real-world application. But non-technical investors poured millions into WebAssembly startups, so we're doomed to keep seeing these posts pop up until all that money dries up. And, trust me, will it ever dry up.

Other than a few extremely niche situations (maybe browser-based gaming?) does anyone ever really think to themselves "wow I should really use WebAssembly here?" Even the post itself doesn't talk about any real solutions, just about what might be possible:

> database platforms could gain these additions with little to no incremental work required

Ultimate cope. It's the classic solution looking for a problem.

(Edit: mass downvoted by the WASMOOOORS)


I haven't heard of startups with webassembly based products that have received millions, do you have some examples si I can look into it please?

Regarding use cases, I think there are plenty, mainly in places where you need to protect yourself against th code you're running, so pretty similar to docker, but lighter and faster boot time. Admittedly, also less feature complete and a different target: less business-y code and more system-y code.

So anything like cloud functions, database functions as described in this article, indeed web browser based games. I think Fastly is providing cloud functions at the edge that are wasm based.

With these uses cases, theoretically, an ecosystem might develop making more use cases transferable from docker to wasm, particularly use cases of isolated pieces of logic that need to be executed unitarity.


Just Google "wasm startup raises millions," like half a dozen pop up on the first page (including OP's company).

> mainly in places where you need to protect yourself against the code you're running

You can do this with LUA or like a million other languages (heck, even vanilla JS). The idea that WASM is the only sandboxed thing in the universe is just weird.


> The idea that WASM is the only sandboxed thing in the universe is just weird.

I don't know anyone who has ever said that.


I feel like WASM gets a lot of attention from devs who are like “the web sucks, because JavaScript sucks. I won’t sully myself with such a childish programming language. If there was ever a way to build web apps without JavaScript and I could use [my preferred language] then web development would instantly become a joy and I would be happy doing it”.

Those people seem to get excited at the idea that you could build web apps with a “real compiler”.

But personally I just don’t buy that JavaScript is the reason web development is hard. I think it’s just the natural outcome of the fact that the capabilities of browsers are both varied and ever changing, and therefore the expectations of clients are always changing, and therefore the tooling to be able to (relatively) quickly meet those requirements is always changing.

But we’ll see. Maybe there will be some beautiful, stable web dev framework built on WASM that comes out!


I would like to add to that web developers always makes it harder for themselves too.


99% of programmers will never directly touch WASM, but in 10 years 99% of web users will use a WASM-based runtime in their browser for daily tasks/entertainment.

WASM will eat SPAs and a chunk of the gaming market, it's just a matter of how long it takes.


> WASM will eat SPAs and a chunk of the gaming market, it's just a matter of how long it takes.

Ah yes, the classic "we're still early"—might I introduce you to cryptocurrency? Apologies for the snakiness, but if it hasn't happened in the last 10 years, do you really think it will happen in the next 10?


Cryptocurrency has significant legal, financial, political, and behavioral hurdles.

I've no idea why you think that situation is a close analog to WASM.


Can you offer a convincing explanation why large web apps ought to be built on huge interpreted JS systems (e.g. React) instead of purpose-built runtimes, especially once WASM matures in X years?

It's like implementing Photoshop in Python. Why would you do that?


> purpose-built runtimes

The lift here is astronomical, not to mention that companies just want to get a merch store up. No business person or product manager really cares about the "runtime."

But you dodged my question: do you think that a technology needs two decades to really "mature?" Go and Rust clearly solved real problems (or at least are fun to use) and we see many developers leveraging them. Even React (which I personally hate) made things easier for front-end devs. WASM adoption is virtually zero.


> The lift here is astronomical, not to mention that companies just want to get a merch store up. No business person or product manager really cares about the "runtime."

You fundamentally misunderstand the purpose of WASM for web development. It is not another web app framework or programming language. It's a platform for building frameworks/DSLs.

Of course the "business person or product manager" doesn't care about the runtime. They also don't care about 99% of the Web APIs or how React was implemented. They don't have to, because that work was done by the people working on the runtime (or reactive framework). WASM is for the latter group.

Average webdev yesterday wrote PHP/HTML and JQuery.

Average webdev today writes React DSL (e.g. TSX) for a monstrous "runtime" hacked together in JS.

Average webdev tomorrow writes Python for a reasonably fast Python interpreter implemented in the browser with WASM [1] (and maybe written in Rust).

[1] - https://pyscript.net/


> Average webdev tomorrow writes Python for a reasonably fast Python interpreter implemented in the browser with WASM

Python is just as hand-wavy as Javascript, so I fail to see the win here. At least C/C++/Rust/Go give you some neat guarantees, but you lose all the velocity JS gives you. I mean, pyscript doesn't even support hot reloading out of the box, but yeah, I'm sure it's definitely the future. Besides, you could probably just write the interpreter in vanilla JS and it would be comparably fast (at least in Chrome), so WASM is completely superfluous.


> At least C/C++/Rust/Go give you some neat guarantees

Wow, imagine if you could compile these to WASM.

> Besides, you could probably just write the interpreter in vanilla JS and it would be comparably fast (at least in Chrome), so WASM is completely superfluous.

This one is going in the orange site bookmarks, lol.


> This one is going in the orange site bookmarks, lol.

I mean, benchmarks say the performance gains are from 0.2%-60% comparing vanilla JS to WASM given various browsers (keep in mind it's not even twice as fast even in synthetic benchmarks). So I guess you're right: if I built my next todo web app, I'd definitely want to change my entire workflow, write a custom purpose-built runtime, and transpile from Python to WASM to eek out that sweet sweet performance in Google Chrome.

I don't even think we live on the same planet.


Agreed.


> not to mention that companies just want to get a merch store up

The use case is less traditional websites and more performance-demanding "apps." (And non-web uses for portal runtime.)


The top comment is also skeptical of WASM. You got downvoted because your comment has nothing interesting to say and no purpose other than to fan a flame war.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: