Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Designing the perfect TypeScript schema validation library (vriad.com)
98 points by colinmcd on March 12, 2020 | hide | past | favorite | 75 comments


I built something like this a few months ago but struggled to explain to almost everyone except die hard TypeScript fans what the big deal was.

I explained it was all about the type inference from the schema so I could make guarantees across runtime boundaries.

This is what I came up with, though this library looks like it may be more comprehensive than what I wrote. Struggled to come up with a good name to represent exactly what it did...

https://www.npmjs.com/package/type-safe-validator

The ability of the TypeScript compiler to allow this stuff continues to make it one of my favourite languages to work with, despite it still being JavaScript underneath.


I guess it's something in the air! Given the popularity of the other libraries, it's surprising that there's isn't an optimal one.

And yeah, unless you've gone down the rabbit hole of type safety obsession, it's hard to understand the necessity of this. I like that you specifically call out REST API validation in your docs...the difficultly of type safe API implementation was what drove me to build zod.

I'm using Zod plus a hand-rolled codegen tool to implement an end-to-end type safe RPC API that validates all data at runtime AND generates a statically typed Typescript SDK for use on the client.

PS If people like the sound of that I might publish that too. Let me know if it sounds interesting.


I think it sounds super interesting, would love to see how you've solved it :)


Wow I love it. It seems like a lot of us have encountered similar problems with TS. I am planning to shift what I made into something like yours, using right/left instead of throwing errors.

https://www.npmjs.com/package/presi


I've been working on something similar, also struggeling with `io-ts`. However, I want the schema to be serializable (possibly compatible with json schema) so that a rich ecosystem of tools can be build upon it. There is still a long way to go though.

My library `@hediet/cli` [1] uses this technology to offer a browser based UI for CLI applications [2]. In the long term, I want to replace my use of `io-ts` with this technology in my JSON RPC implementation [3], so that I can implement something similar to Swagger for JSON RPC.

I always felt the possibilities of JSON Schema are not really explored, which I think might be due to bad design of JSON Schema (it's really hard to consider all these denormalizations, it's like HTML 20 years ago).

I'll have a more detailed look at your library once I find time!

[1] https://github.com/hediet/ts-cli/blob/master/cli/README.md [2] https://github.com/hediet/ts-cli/blob/master/cli/docs/gui.pn... [3] https://github.com/hediet/typed-json-rpc


While I think this is pretty cool, given his use case (wanting to use a strongly typed client and server), I can't see why one wouldn't just use GraphQL with TypeScript on both the client and server. I've done this and I love it:

1. Define the GraphQL interface with GraphQL's typedef language. GraphQL then takes care of validating all the request and responses at runtime.

2. Use something like https://github.com/dotansimha/graphql-code-generator to generate all your TypeScript types from your GraphQL schema. Place these types on your resolvers and then you get compile-time type checking.

3. Since GraphQL actually exposes your schema as part of the endpoint, clients can use the same tool to keep their TypeScript types in sync, and get the same typing benefits when writing the client code.


One use case of validators is for config files.


FWIW, you can somewhat accomplish this with just the `config` library and config files written as TypeScript.


Sure, this discussion is full of options, but none are really perfect, so there are plenty of opportunities for bikeshedding!


I guess you could call trying to offer a helpful suggestion bike shedding ¯\_(ツ)_/¯


Excuse me if I have offended you, my intent was rather far from that.

The bike-shedding is to which one to pick right now, when there doesn't seem to be a clear superior choice.

I saw your suggestion as just as a bit half-baked as the ad-hoc stuff I actually use now, and the other in-chimings in the discussion. (io-ts, utility-types, runtypes, etc. all are great, but as the title indicates it feels there should be one schema system specifically designed for TS. That can be used for description and validation of all interfaces of a TS system, including incoming requests, static and runtime configuration, outgoing data, and mapping these to internal domain modeling.)


> Excuse me if I have offended you, my intent was rather far from that. > > The bike-shedding is to which one to pick right now, when there doesn't seem to be a clear superior choice.

Fair enough. My intent was not bike shedding, but just mentioning a thing that exists if you would like to use it. In my experience most people are not aware `config` supports TypeScript config files.

> I saw your suggestion as just as a bit half-baked as the ad-hoc stuff I actually use now, and the other in-chimings in the discussion.

For what it's worth, for libraries which require config I combine io-ts (well my wrapper around it) for validation and the types it generates for type safety. It is not an experience that feels half-baked to me. It guarantees that clients using my libraries provide valid configuration at runtime, and allows them to get static guarantees of the same (which I also employ in clients which use those libraries).

> (io-ts, utility-types, runtypes, etc. all are great, but as the title indicates it feels there should be one schema system specifically designed for TS. That can be used for description and validation of all interfaces of a TS system, including incoming requests, static and runtime configuration, outgoing data, and mapping these to internal domain modeling.)

IMO there doesn't need to be just one, as long as people can find one that suits their needs. I certainly use io-ts for all of the things you describe, it is designed specifically for TS, and I am quite happy with it.


Been meaning to respond to this properly.

I tried this exact approach initially but gave up after experiencing a series of frustrations.

1. I'm operating in a highly-connected, relation-heavy data domain (medical) where the queries I needed to do required a LOT of joins. I wasn't good enough at SQL to make the queries performant in my resolvers. I ended up switching to Neo4j and all the performance problems went away. Unfortunately Neo4j doesn't enforce ANY field level schema whatsoever Zod provides the type enforcement for the whole data layer. I

2. I had a lot of trouble getting the codegen pipelines to work properly, and I got really sick of using codegen to create types for my queries. I also just don't like defining my queries as strings in the first place. There were scenarios where I would run slightly different variants of a query depending on some conditional logic and I had to define these queries separately. I'd much prefer to use a client-side query builder that can be used in conjunction with conditional logic but I couldn't find something to that effect (this was 15 months ago, perhaps things are different now).

3. This is far dumber but I've enjoyed building my own schema definition tooling so I can include whatever metadata on my types without needing to split up my definitions in one place. For instance, on every property of a given model, I'm able to include a name (e.g. "First name") that I can use to auto-generate forms and the like. With GraphQL I would define the model properties in GQL and the other metadata in a separate file. Again, it's dumb but it's a preference I have.

4. I don't like that you have to define relations on both sides in GQL. I'd rather "register" a bi-directional "edge" in one place, instead of splitting it into its two unidirectional "components". Again, just a (rather nitpicky) personal preference. I'm building some schema definition and querying tooling on top of Zod to this effect.

5. Typing the mutations was a huge PITA. Many mutation parameters are a slightly modified version of an already-defined type (e.g., a `createUser` mutation will accept a User instance but without the `id` property). But because all the definitions are in GQL, there isn't an easy way to represent this without highly duplicative typings, like this from the GraphQL docs:

  input MessageInput {
    content: String
    author: String
  }

  type Message {
    id: ID!
    content: String
    author: String
  }

  type Mutation {
    createMessage(input: MessageInput): Message
    updateMessage(id: ID!, input: MessageInput): Message
  }
I'm planning to look back into this given the upcoming release of Prisma 2 and the maturation of prisma-nexus, which I think solve some of these complaints.


This is great. I recently built a fairly large amount of functionality around io-ts, adding support for mixed required/optional fields, eliminating the need to deal with `Either` monads, and a whole lot of other stuff that's mostly just relevant to my company's usage. I think if this had been available at the time I was looking for an underlying library, it would have been a no brainer to choose this one.

Worth noting, another option is Runtypes[1], which also looks great. I can't remember off the top of my head why I ultimately picked io-ts over Runtypes, but it's another one for folks to consider (and I'd be curious what the Zod author thinks of it).

https://github.com/pelotom/runtypes


Oh, I do want to add one bit of minor feedback for the author: one of the things I like about the io-ts interface is that fields/codecs are values unless they require a parameter (e.g. `t.string` vs `z.string()`). It's a little bit easier for me to read. It's also not clear to me at a glance whether calling `z.string()` is creating a new instance of something and whether that affects the behavior of a given schema.


I personally prefer to standardize everything as a function. It also leaves some breathing room in case I decide to include parameters as part of a future API augmentation.


I hope you'll give me the opportunity to try to convince you otherwise.

I understand the context of this is that io-ts puts an onerous emphasis on FP concepts and data structures. But FP principles, when applied in an effective and usable way, are quite good. The make it easier to reason about and maintain code.

One such principle is that a "function" with no parameters is a good sign that it will cause some kind of side effect. I sort of hinted at this when I questioned the design, when I said that it wasn't clear to me whether calling these functions was producing new instances and whether those new instances affect behavior.

I would also argue that leaving room to parameterize primitive types is leaving room to make a drastically more complicated API in the future. A `string` with "options" suddenly becomes its own sub-API.

While io-ts is more complicated, once learnt its API is more predictable. A `Type` is a value. An `interface` (or similar) type is a `Record<string, Type>`. Always. `Type`s can be augmented/composed/etc (and you may provide a function that accepts one or more `Type`s as parameters to accomplish that), but they always resolve to a value. The value can be reused without concern about side effects.


Very strongly agreeing with this assessment and hoping Colin will come to agree.

A zero-argument function is either a constant, or a side-effect. Given we don’t want a side-effect, exposing a constant instead of an effectful-looking function is preferable.


Re: runtypes. Not sure how I missed this, it looks like an excellent tool.

No support for recursive types (which I personally want/need for my project).

I really like their API for constraint checking...might have to steal that...

[UPDATE] runtypes does support recursive types! My bad!


FWIW, runtypes is now part of my standard utility kit (along with utility-types[0]) for every TypeScript project I start. I don't leave home without it, and (AFAIK) it checks off every box mentioned as a problem by the author of the article.

At my employer, I just whacked it straight into our new API codebase on day one and people took to it immediately. It's great, great stuff.

[0]: https://github.com/piotrwitek/utility-types


Second runtypes! It's really Yup but Typescript-first, which is exactly what a lot of people need, per OP's article. The great thing is that there's no need to standardize these libraries - it's trivial, for instance, to build a quick utility wrapper around https://react-hook-form.com/api/#validationResolver that hooks into your runtime-compiletime schema library of choice!


I'd really like something like a compiler step/plugin to generate the validators from the declared typescript types.


You can generate JSON Schema from TS with this library: https://github.com/YousefED/typescript-json-schema

It's certainly conceivable to build an equivalent for Zod but you'll eventually want to validate types that aren't expressible in TS (i.e. Integer).


This looks like it generates JSON Schema, but not runtime validators?


For that you could use something like ajv. Json schema has the advantage of being usable across different languages, but it does come with the overhead of maintaining those files and keeping it in sync with your other repos.


This is basically that but in the opposite direction. It also has the advantage of being able to express validations that you can't express in the type system.


I should point out that there's nothing in Zod (yet) that isn't expressible in Typescript. I'll eventually get around to doing string/numerical validations (`.isEmail()`, etc) that go beyond TS but currently all functionality has an equivalent in TS.


Do you have an example of the kind of validation couldn't be expressed in the type system?


Min or max length of an input. Any type of string validation like email or something similar that's validated via regex.


root_axis gave some good examples. Also types like `int` or `decimal` (which you can also pair with branded static types, but they don't actually guarantee anything other than their name. Really any kind of refinement you can imagine on primitive types.


Yeah, that's really something that feels like it should be a part of the language, because it would be so useful!


It's explicitly one of the language's non-goals[1].

> Add or rely on run-time type information in programs, or emit different code based on the results of the type system. Instead, encourage programming patterns that do not require run-time metadata.

[1] https://github.com/Microsoft/TypeScript/wiki/TypeScript-Desi...


I understand it (there's already enough to do at type level that runtime level is a non-goal), but types are purely indicative at IO boundaries. Validation still needs to happen


That's right. So since it won't be a part of the language, libraries like this are being developed.


Adding to the pile of similar approaches, there's TypeBox [1] which uses JSON Schema as an intermediate artifact (validation can happen with ajv or other libs), and extracts static types for TypeScript.

Having the JSON Schema intermediate is useful when using Fastify [2], so that request validation can happen with less boilerplate.

[1] https://github.com/sinclairzx81/typebox

[2] https://www.fastify.io/docs/latest/Validation-and-Serializat...


Also did something similar[1]. My motivations were a better API (mine is modeled off of Elm's), and better error messages: I have a use case where user's need to be shown raw, dev-tools-style interactive data dumps, with error information overlaid on top. The next version will also enable data generation from schemas.

Also, and perhaps most importantly, this library has no respect for undefined, because particularly in the context of data modeling, undefined is complete nonsense.

[1] https://github.com/ai-labs-team/ts-utils#decoder


I find `undefined` better than `null`. Either a user has contact information, or the contact information are undefined (and not null). How do you model that?


See, I take the opposite view: consider that the 'model' is the abstract structure of the data which your application knows about.

In your case, contact info is a piece of related data that your application knows about, can traverse to, parse, and consume, and it can be either present or absent. To me, `null` fits that case perfectly (personally I'd use Maybe, but, same idea). Finally, in cases where you're modeling a collection of values where the keys are unknown, see `dict()`.


What is `dict()`? What's the meaning of `null`? `undefined` literally means the value is not defined. `null` only has a technical meaning, which says its value is the null pointer.


That's not what null means in javascript and typescript. There, it's more like 'null' signals that there is no value, and 'undefined' signals that there is no variable.


All runtime differences are quite neglectable if you use typescript, at least in my experience. You never accidentally use unassigned variables in typescript. Also, `{ x: undefined }` and `{}` are distinguishable.

Undefined plays well together with optional fields in typescript, null does not and requries normalization! `undefined` in a JSON array is a problem though.


You might want to take a look at ts-json-validator [1] as well. It creates type-aware validators for json-schema.

[1] https://github.com/ostrowr/ts-json-validator


There is also class-validator, which I've used and I find elegant and working well:

https://github.com/typestack/class-validator


Huh, not sure how I missed this.

I don't love the class-based declarations: it's a bit verbose and doesn't allow you to use things like the spread operator to "mix in" fields into objects. There's also some redundancy required for basic types:

  @IsString
  firstName: string;
For my purposes, I also needed support for recursive types, unions, and intersections, which I don't believe are supported.

But their validation built-ins go way beyond Zod (IsEmail, Min, Max, Contains, native Date support, etc). Thanks for sharing.


No mention of runtypes which looks pretty similar? https://github.com/pelotom/runtypes


Huh. Totally missed this, it looks like an excellent tool.

No support for recursive types (which I personally want/need for my project).

I really like their API for constraint checking...might have to steal that...

[UPDATE] runtypes actualyl does support recursive types! My bad! Great lib.


Wow I made the same thing that I put v1 up a couple days ago.

https://github.com/tetranoir/presi

A main difference is I tried to get rid of having a separate line to create the Type and I tried to get rid of having to learn a new libary's api as much as possible.


This looks interesting too, but the syntax fells really weird to me.


A goal was you can treat it Exactly like an "Interface". I agree it looks totally nonsensical but I think of it as a text replacement.


The assignment to class properties reminds me an awful lot of stuff I've seen in Python, particularly in Django/REST Framework.

But the class wrapping thing would just break my brain every time I tried to use it. Calling plain functions just feels a lot more natural to me.


Are there any plans for building validation support like this directly into TypeScript? If not, why not?

TypeScript seems designed to be incrementally added to large existing projects. Why then isn't there a standard way to validate objects coming from the untyped parts of your project?


It's explicitly one of their non-goals[1]:

> Add or rely on run-time type information in programs, or emit different code based on the results of the type system. Instead, encourage programming patterns that do not require run-time metadata.

This approach (build runtime behavior that produces static types) is the intended one.

[1] https://github.com/Microsoft/TypeScript/wiki/TypeScript-Desi...


Something similar, but for flow and focused more on functional combinators https://github.com/appliedblockchain/assert-combinators


Is the repo private? I get a 404


Oh thank you, mistake, fixed.


I've used a few similar libraries [0][1][2] and wrote one for a personal project. If we categorize them as embedded DSLs for runtime type checking with some support for static interop, they all share three major flaws:

1. Sub-optimal developer experience: significantly noisier syntax compared to pure typescript, convoluted typescript errors, and slower type checking.

2. Unfixable edge cases in static type checking: Features like conditional types work less reliably on the types produced by the library.

3. Some typescript features can't be supported in the library (again, conditional types).

I think a better way to approach the runtime+static type checking is to do it as a babel plugin, which would fix the DX and edge-case problems and also gracefully degrade in the case of typescript features that can't work in runtime.

Since babel can now parse typescript, it is trivial to write a babel plugin that takes regular typescript files and converts their type annotations into runtime values [4]. Those values can then be fed into a simpler type checking library, giving us runtime type checking on top of the native static type checking experience.

[0] https://github.com/gcanti/io-ts

[1] https://github.com/pelotom/runtypes

[2] mobx-state-tree also has a schema validation library that works to some extent statically: https://github.com/mobxjs/mobx-state-tree/

[3] https://github.com/gcanti/io-ts#branded-types--refinements

[4] https://gist.github.com/AriaMinaei/2f1229178abad4363f5180db2...


A while ago, I used this library called ts-interface-builder to generate runtime validation functions from TS types. It worked really well, I was able to use the types from my front-end without any modifications for back-end validation.

Also, used the inferred response types from endpoint functions and passed them back to the front-end to use with front-end fetch calls. This was probably the biggest improvement in terms of productivity and improving accuracy. Imagine every time you change an endpoint's response, any fetch calls that are impacted would just show errors at compile time/in your editor.


Possibly stupid question, but why does the dog array validate:

  const dogsList = z.array(dogSchema);

  dogSchema.parse([
   { name: 'Cujo', neutered: null },
   { name: 'Fido', age: 4, neutered: true },
  ]); // passes
Since 'Cujo' doesn't have an age? Assuming it's the same dogSchema as in the previous block, age is required, right?

Oh, and a minor typo: This lets you confidently This way you can confidently....


Ah I see I misread your comment.

You're right, Cujo also shouldn't have validated. In a previous version of the post `neutered` was nullable and `age` was optional, but I decided to save the discussion of nullables/optional until later so I changed it. Good catch!!


Whoops! Should be `dogsList.parse(...)`.

Also fixed the other typo :)

Thanks!!


I don’t want to derail the thread, but just a heads up: the layout is very broken on my phone (iPhone). The text is clipped on the left side which basically makes the article unreadable!


Consider the thread derailed :P

Just fixed this! I'd done exactly zero testing on mobile (built the site from scratch yesterday).


Haha! Timed that well didn’t we...


Sorry to be a pain but even a few rules (even resetting some things) would make it much easier to read on mobile!


Fixed! I'd done exactly zero testing on mobile (built the site from scratch yesterday).


You mention creating object types with optional keys is cumbersome in io-ts. How is that solved in zod, exactly? What allows you to map `foo: union([bar, undefined])` to `foo?: bar | undefined` (note the question mark on the left hand side)? There’s nothing in the declaration to give away why this wouldn’t yield `foo: bar | undefined` which is what I believe you’d get out of io-ts.

Looks useful - I would have an easier time introducing this than io-ts.


Good question! It wasn't easy to get the question mark on the left-hand side, but it is possible.

Here's the Zod equivalent:

  const C = z.object({
    foo: z.string(),
      bar: z.number().optional(),
  });

  type C = t.TypeOf<typeof C>;
  /* {
    foo: string;
    bar?: number | undefined
  } */

And here's the code that pulls this off:

  type OptionalKeys<T extends z.ZodRawShape> = {
    [k in keyof T]: undefined extends T[k]['_type'] ? k : never;
  }[keyof T];

  type RequiredKeys<T extends z.ZodRawShape> = Exclude<keyof T, OptionalKeys<T>>;

  type ObjectType<T extends z.ZodRawShape> = {
    [k in OptionalKeys<T>]?: T[k]['_type'];
  } &
    { [k in RequiredKeys<T>]: T[k]['_type'] };

  export class ZodObject<T extends z.ZodRawShape> extends z.ZodType<
    ObjectType<T>, // { [k in keyof T]: T[k]['_type'] },
    ZodObjectDef<T>
  >{ 
    // ...
  }


Thanks for the reply. So, careful application of mapped types and removing the ability to type a property as `foo: bar | undefined`. I understand this is desirable a lot of times especially if you can’t affect the format of what’s being parsed, but I’m not sure this is unambiguously better.

FWIW it’s made my life easier to say the keys will always be there, but the values are possibly undefined. Less room for ambiguous interpretation.


Looks pretty similar to how I did it with io-ts! I'm pretty surprised that they don't support it by now.


With some effort, you can actually mix optionals and required with io-ts as well. If I ever get permission to open source my io-ts wrapper I'd be glad to show you how :)


I look forward to that!

It's definitely possible to wrap `io-ts` to get a better interface, especially if you spend some (or rather, a LOT) of time figuring out the type declarations...


Can't seem to spot a link to Zod's code Colin. Fancy making it a bit more obvious in the post?

Edit: Found it: https://github.com/vriad/zod


Just made it significantly more obvious...Thanks for pointing that out!


Call me old fashioned, but "mission-critical" and "rock-solid" are inherently at odds with anything that runs a on a clients machine. Doubly so for something that runs in an interpreted language, especially one that doesn't ship with it's own interpreter to said client.

In these cases I'd far rather see an old fashioned reliable server-side application built in a language with a lot of built in safety. The less you ask the client to do, the more reliable the application.


Engineering has to consider cost too. Haskell might be amazing at this, but that might cost a lot more. TS sounds like a complicated beast, and of course has a lot of security trade offs (npm, myriad of unverified/unaudited packages), but using TS on both the client and the server makes things simpler, can cut down costs, help with time to market, yadda-yadda.

Sure, TS has other problems too. (Soundess issues in its type system.) But still much safer than C/C++ in many aspects.

Java might be an old fashioned server-side thing. But again, it has a rather old ecosystem and it's not exactly know for its high quality and security consciousness.

Contrast that with Rust, which is young, but tries to (or had already?) establish itself as the de-facto ecosystem for critical/safety/performance things.

TS is basically that. Rust for the masses. For anyone who picked up JS and found themselves at end of a bootcamp, or anyone who wants to step beyond being a webdev. So compared to vanila JS (and pure C, and pure python, and maybe even pure Java) TS stuff is rock solid. (Thanks to browser vendors spending a lot on browser/DOM/JS-engine security.)

Furthermore. Security _must_ consider ergonomics. Otherwise it will be bypassed. (Eg it simply won't spread, users will work around the secure way, etc.) So if you can simply use the same validation things on your client to provide early in-situ feedback about what's wrong with the input, you can build more robust user interaction flows, which help with people using your secure product.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: