Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A new Go API for Protocol Buffers (golang.org)
228 points by zdw on March 2, 2020 | hide | past | favorite | 97 comments


> The github.com/golang/protobuf module is APIv1.

> The google.golang.org/protobuf module is APIv2. We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider.

That's such a weird choice, and will be quite confusing. Now every time you see protobuf being imported, you have to be sure to pay extra attention to the domain used, and correctly remember which one is v1 or v2. If only go modules established a clear way to differentiate API versions using the import path!


The goal of issuing a major version is to create a separate package namespace so it can be imported concurrently to other major versions. By treating major versions as different packages, it creates efficiencies in VCS history and common namespace.

However, the goal is to create a new package name that can be concurrently imported with previous versions. Everything else is just method. Don't confuse the goal with the method.


People care about versions. If it’s harder or more confusing to understand what version you’re on, that’s more than “just method”. Otherwise we’d just use UUIDs for package names and call it a day.


Julia uses uuids masked with name.


> If only go modules established a clear way to differentiate API versions using the import path!

It did, they just chose not to follow it :/ I wonder what rsc thinks about this decisions


I know, that sentence was meant to be ironic :)


Rsc?



Thanks!


Then you see the version imported being v1.X where X>20 and of course that makes it obvious that it’s v2?

Are version numbers 2.0 and higher explicitly banned?

Edit: found an explanation further down (spoiler: not a very good one but a very go one).


It's Google.

Valid versions are in the 0 < x < 1.0 range.


I believe this is discussed and decided quite early on: https://docs.google.com/document/d/19kfhro7-CnBdFqFk7l4_Hmwa...

>According to the Import Compatiability Rule for versioned Go, this implies that the new proto package must have a new import path. Since "github.com/protobuf/proto/v2" will incur great confusion with the proto2 syntax, we may take this time to use an import path like "google.golang.org/proto" (similar to how gRPC lives at "google.golang.org/grpc").


Random question, but how did they get the source code highlighting in that doc?


Code Blocks addong from G Suit Market (new doco -> addons -> Install -> Code Blocks)


They did get one benefit from this choice:

> the version number alone should be enough to unambiguously differentiate between APIv1 and APIv2.

...which is how version numbers normally work. They've made a big breaking change in a 1.X release, so it isn't semver, but that isn't the end of the world. It works around having to put the major version number in the import path too, so really it works like most other non-Go, non-semver, projects.


I've seen many people append a version to the package name. So they could have done `protobuf.v2` (granted the original would be missing a version but still, at least this is more explicit)


Would be nice if they would just make packages.golang.org sorta how Python has PyPI, Rust has Crates.IO and so on... They can just redirect to relevant repos...


Not to be too Rust evangelist here, but why the emphasis on reflection instead of further support for generics?

On the Rust side, macros can[1] transform .proto files into data structure definitions at compile time, and instances of the structs passed around have the benefit of strong typing, code completion in editors, lints and tests can easily validate properties of protobuf objects and so on. And all without adding more codegen steps which is nice - the macros can replace a build system having to do codegen as a separate step.

How often do you need to mutate any protocol buffer and remove (redact as they say) fields generating a completely invalid protocol buffer object? Will anything downstream match the ".proto" of an object that's missing a bunch of required fields? Wouldn't it be easier to just all of the sensitive data into an optional "sensitive_data" field that's defined by another .proto, and strip that?

[1] - https://github.com/danburkert/prost#generated-code-example


A Go API for protobufs can't support generics because Go doesn't support generics, yet. I guess the protobufs team didn't want to wait until Go v2, probably a good call given Go v1.x is going to be around for years.

In Go, a code generator converts the .proto files into Go code, which give the same benefits you see with the Rust macros. Its arguable if a code generation step is better or worse than macros.

The reflection is needed if you have received a protobuf the code don't have a definition for. You can now implement something like a gRPC proxy or gRPC load balancer in Go, without needing to compile it with code from the specific .proto files. You also appear to be able to access annotations on the message definitions, which are not embedded in the generated Go structs. Rust may well have similar features in its API. Java certainly does. A gRPC proxy is a use case redact sensitive data, for when you use it to create an audit log of the messages in the requests and responses.


If you don't have the proto definition, there's nothing you can do except pass the object through unmodified. And you should not need a reflection API for that (unless the v1 API was totally messed up in other ways).

Without the definition (and knowing the type!) there won't be anything around to tell the reflection API what the names, types and annotations are. All you would have is field numbers mapped to opaque blobs of data.


You can pass around a type descriptor (itself a proto message) along with the opaque message.


> Its arguable if a code generation step is better or worse than macros.

I'd rather just have generics.

Code generation for protos is nice for several reasons, but using it as an alternative to generics is not nice at all. I work on a number of projects with generics in internal classes that are not exposed as an API, and using code generation just to get those to work seems like a very big hammer for a small nail.


> Its arguable if a code generation step is better or worse than macros.

I don't need elaborate, brittle shell scripts to correctly update my compiler macros in order of dependency. I only need the compiler.


That's not very scalable. Putting more logic in a macro means the compiler needs to reprocess it each time the file is touched, even when the part that's changed is not the protos. A separate code gen step means the proto compiler doesn't need to be invoked as often, and the generated files don't need to be recompiled either.

> Will anything downstream match the ".proto" of an object that's missing a bunch of required fields?

You may have missed it, but in proto3, required fields have been removed. All fields are optional now. https://stackoverflow.com/a/31814967/1102119 TL;DR: it's a code evolution problem.


That doesn't sound right, I don't think there's any reason a proc macro couldn't cache the .proto file's generated code and use the cache between runs.


Compile time generic programming is not a replacement for runtime reflection. A great example of this is if you want to use dynamically generated protocol buffers.

A great usecase for this is if you want to store each row in a database as a proto message. Instantiate a descriptor based on the table schema and go.


> Compile time generic programming is not a replacement for runtime reflection.

Why not?

> A great example of this is if you want to use dynamically generated protocol buffers.

When would you want to do that? If your data isn't statically typed, isn't it more appropriate to use a type designed for dynamic shapes (e.g. a list or hashmap)?

> A great usecase for this is if you want to store each row in a database as a proto message.

Wouldn't a reasonable approach to this be to turn each row into a list (of variants) and serialize that?


> Compile time generic programming is not a replacement for runtime reflection.

Yes, it is, especially since the former can be used to implement the latter.


Any talk of required fields is a non-starter. Just forget about it. Don't use them.

Whether a field is "sensitive" may change after the field is defined and used all over the place, so you can't just move it without changing all the code. But you can easily add a option (like [non_sensitive=true]) to any field so your redaction code picks it up.


But the protoc compiler already makes data structure definitions in the target, client language. So what's the point of a parallel macro system? Either it's a subset of capability of what protoc does or equal. Former questionable; latter redundant.


> The github.com/golang/protobuf module is APIv1.

> The google.golang.org/protobuf module is APIv2.

> We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider.

> (Why start at version v1.20.0? To provide clarity. We do not anticipate APIv1 to ever reach v1.20.0, so the version number alone should be enough to unambiguously differentiate between APIv1 and APIv2.)

Stuff like this is what makes Go confusing to newcomers.

For some time I couldn’t understand why people were complaining so much about —the now deprecated— GOPATH environment variable, and later the confusion of how to use all the different and incompatible dependency managers. Then, suddenly last year, it hits me… Go is confusing to people who haven’t followed the language from the beginning.

I started programming in Go in 2013 so for me it was very easy to adapt to every change in the language and the community.

I can only imagine all the ambiguous stuff people have to understand when learning the language in 2019-2020.


I loved Go's simplicity when it came out.

However, it has become clear to me that it was a little too simple to handle all the new requirements and additions that came after v1.

That's how you get weird comments-as-code (// +build linux,386 darwin,!cgo), strings-as-attributes (tags), tacked-on modules versioning / vendoring, non-typed generics( interface{} everywhere), somewhat-typed generics (go generate), etc.

Go was not quite ready to handle all these directions people wanted the language to stretch to and as a result it's now already full of annoying warts.

I'll still pick Go if I need a simple service with amazing performance. Funnily enough though, I have no need for any of the (non-performance) additions that came after 1.6.


I think the bit that got me around GOPATH was last time I'd written Go, that was the only way. Then I fired up Go again, v1.12 and decided to take advantage of a Go module for a new project, which I naturally put beside my existing Go code. And then I got to learn about the settings for GO111MODULE.

Having the build tools a core part of your language itself is both a strength and a weakness for Go, I think.


> the now deprecated— GOPATH environment variable

I've been writing Go since 2015ish and this is news to me...


Interesting that even Google decided to do complex versioning gymnastics and introduce a new namespace in order to avoid how `go mod` adds the version as a path component.


Nah, we just really wanted to stop using an import path tied to a specific hosting provider. Once we decided to change to an entirely different path, we waffled a lot over whether to tag it v1 or v2. There were arguments for and against either, so we eventually picked one and went with it.


The whole idea of tying image names to hosting provider urls or other kinds of locators was probably a bad idea.

What if those things move to different domains, or different directory structures? Wouldn't that break every Go program that depends on them?

I've been using Java, Python, and Ruby for a while. They all have various pain points for managing dependencies but they do get one thing right: packages have names and versions, names are opaque strings that are not coupled to a particular hosting provider, and they don't care what directory on my laptop I use to store my dependencies and my code.

The Go dependency model seems like a real step backwards in usability and maintenance. It's like they are forcing one software org's standards on the rest of us. We don't all work at Google! Just let us depend on things that have names and version numbers!


Sure, but why version the new package as v1.20 instead of v2.x?


Because they would had to add a `/v2` suffix to the import path and seems like they didn't wanted to add that. The question is, why is this not released as v1.0.0 then? Still trying to understand the reasoning for this.


Reasoning went something like this:

We could tag the new API v2:

+ Makes the "v1" and "v2" distinction very clear in the import path.

- Confusing: google.golang.org/protobuf@v1 doesn't exist, but v2 does.

- In ten years, hopefully nobody cares about the old github.com/golang/protobuf and the confusion is gone.

We could tag the new API v1:

- Less visually distinct in the import path.

+ Seems to make sense for the first version of google.golang.org/protobuf to be a v1.

+ If we decide it was a terrible idea, it's easier to go from v1 to v2 than to roll back from v2 to v1.

We waffled back and forth for quite a while on this, and eventually decided that the first version of google.golang.org/protobuf would be a v1. Then as we got closer to release (but with a certain amount of usage of v0 in the wild), we decided not to second-guess that decision but to start with a version that wouldn't overlap with any version of github.com/golang/protobuf to avoid confusion when someone reports a bug in "v1.0.1".

Maybe it was the wrong choice. If it was the worst choice we've made in the new API, I'll be happy!


> The google.golang.org/protobuf module is APIv2. We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider.

This seems like a good example why tying the name of the package in the code with the place where it can be downloaded was a very bad idea. Simply changing hosting providers is now a backwards compatibility break for your clients.

If Go imports didn't work this way, you could have simply offered the old v1 from both github.com and google.golang.org, and used a clear v2 for the v2. Sure , you would have similar problems if you wanted to change the logical package structure for some reason (say, when an open-source package moves to a different organization), but that is a much rarer case than switching code repos.

However, given that that ship has probably long sailed, you probably picked the right choice.

I would also note that v2 in the import path feels a bit strange, since it makes users of the newest code need to know about the history of the package, but there are also clear advantages which probably out-weigh this (I believe this is absolutely not the case for the decision to include the hosting provider in the import path).


Indeed, tying the name of a package to a hosting provider is a bad idea! That's why we changed to an import path which isn't tied to one. google.golang.org/protobuf is not tied to any particular provider, and we can redirect it wherever we want.

See "go help importpath" for details on how this works.


> - Confusing: google.golang.org/protobuf@v1 doesn't exist, but v2 does.

> google.golang.org/protobuf is not tied to any particular provider, and we can redirect it wherever we want.

I agree, this is confusing. Why doesn't google.golang.org/protobuf@v1 return what's at github.com/golang/protobuf, then, if only to provide a tidy answer for those who wonder what v1 looked like?


Because the Go package name is github.com/golang/protobuf. Serving it from a different address is already a breaking change.


When I read the article, my brain passed right over the different domain names; the two are even the exact same number of characters, set in a monospace font, and my brain passes right over the domain of golang imports to the end of the path. That's where the interesting bit is (usually!)

I find this a super confusing and unintuitive choice, but I suppose it's too late for changing it now.

Sincerely, an othwerwise mostly happy protobuf user


I think you’re overestimating how much people pay attention to the body part of the import paths. Since there have been a quite few projects that implemented a sort of redirect (i.e. ‘vanity’ imports [0]) most people’s eyes just glaze over the small change in the URL.

I would strongly suggest using /v2 at the end. /v1 not existing on *.golang.org is not a problem in practice, but if it ends up one you could just set that up as a vanity import pointing to github.com for the old version.

[0] https://sagikazarmark.hu/blog/vanity-import-paths-in-go/


What are you going to do for APIv3?

Version it as google.golang.org@v2? google.golang.org@v3 (but @v2 doesn't exist)? Release an @v2 which is identical to @v1?


We used up our breaking change budget for the next decade with this release, so we'll think about it in 2030.


Admit it, once Windows 22 is out the door, protobuf 1.22 is shippin' as well! Yieehaaw!

Rust .22 with breaking changes, will be breaking 4 years after that as well!


Can you explain why it's not possible to create `google.golang.org/protobuf@v1` retroactively?

Also:

> In ten years, hopefully nobody cares about the old github.com/golang/protobuf and the confusion is gone.

is the corollary that this change will cause 10 years of confusion?


Thanks for the all the work. And I hope my rant doesn't come something as a bad. But I truly believe the versioning along with the import path is confusing. I hope the team goes over the versioning again and come up with something that makes more sense.


When you were waffling back on forth on the decision, did anyone note that “re-using v1 is a weird choice and is going to be the main thing people talk about”?


> - Confusing: google.golang.org/protobuf@v1 doesn't exist, but v2 does.

This would not be confusing at all. The UUID package I use just skipped from major version 3 to major version 7 to avoid confusion with UUIDv4, UUIDv5 and a possibe UUIDv6. It was well-documented, and even if it wasn't, the upgrade was absolutely painless, as it would have been either way.


You could give google.golang.org/protobuf@v1 an //importpath to the github package, so go would warn people about it.


Oh I didn't realize that was mandatory. I thought omitting v2 from the path was sort of OK until you explained this. Now it just seems wrong.


> we just really wanted to stop using an import path tied to a specific hosting provider

Are there any plans to fix this for the whole language, for others as well? i.e. use namespaces, module names in the source but move the actual github.com, google.com or geocities.com URLs outside.


The support has been there for a long time. `go help importpath`


That makes the most sense, and throws a great seasoning of context on that decision. Thanks for providing that here.


So both old and new are available through the new domain?


"we just really wanted to stop using an import path tied to a specific hosting provider"

e.g. github is now owned by Microsoft and it's a little awkward.

If Google won out on their attempt to buy Github, they wouldn't wouldn't be talking about a "specific hosting provider".

Though it's a little weird that people take this at face value, when the majority of the userbase was questioning this since the first release.


Congrats on the big release!

One thing which saddens me a lot though, is returning types you have to type assert because they had to avoid an import cycle. I don’t know if it could be avoided (probably not) but it’s still unergonomic.

> (Why the type assertion? Since the generated descriptorpb package depends on protoreflect, the protoreflect package can't return the concrete options type without causing an import cycle.)


Does someone know if this new version uses fast generated code for (de)serialization? The previous version was pretty slow (because based on reflection I think) so many people used https://github.com/gogo/protobuf for good performance.


Hopefully this is good news. The old Protobuf package had ossified, with lots of ergonomic issues that the maintainers seemed uninterested in addressing.

One perennial annoyance is how awkwardly some Protobuf stuff ends up being represented in Go. For example, "oneof" types:

  message Event {
    oneof payload {
      Create create = 1;
      Delete delete = 2;
    }
  }

  message Create {
    string id = 1;
  }

  message Delete {
    string version = 1;
  }
You get these messy types (I've elided most of the yucky marshaling-related stuff):

  type Event struct {
    Payload isEvent_Payload `protobuf_oneof:"item"`
  }

  type isEvent_Payload interface {
    isEvent_Payload()
    MarshalTo([]byte) (int, error)
    Size() int
  }

  type Event_Create struct {
    Create *Create `protobuf:"bytes,1,opt,name=create,proto3,oneof"`
  }

  type Event_Delete struct {
    Delete *Delete `protobuf:"bytes,1,opt,name=delete,proto3,oneof"`
  }
There are multiple problems here. One: Why the extra struct? For example, to create a single event, you have to do:

  Event{
    Payload: &Event_Create{
      Create: &Create{ID: "123"},
    },
  },
...instead of just:

  Event{
    Payload: &Create{ID: "123"},
  },
Secondly, isEvent_Payload is private! So given a Create or an Delete, you can't store it in a single variable:

  func nextEvent() isEvent_Payload { // <-- not possible
    if something {
      return &Create{...}
    } else {
      return &Delete{...}
    }
  }
The reason for this appears to be so that the generated serialization can be very dumb and not rely on type switches. But I don't buy that this is how it has to be.

There are other issues. Overall, the whole package seems designed from the bottom up for machines (mumbles possibly also by machines), not humans.The upshot is that using Protobuf types as "first-class" types — meaning the types you actually use internally in the meat of your app, as opposed to in the controller glue that lives in your API and mediates between the API and the internals — feels super messy.

As an aside, anyone know what issues this paragraph refers to?

  The google.golang.org/protobuf/encoding/protojson package
  converts protocol buffer messages to and from JSON using the
  canonical JSON mapping, and fixes a number of issues with the
  old jsonpb package that were difficult to change without
  causing problems for existing users.
Did they finally fix the zero value problem with jsonpb.go [1]?

[1] https://github.com/gogo/protobuf/issues/218


The original golang codegen omitted oneof support completely on the grounds that the concept didn't apply to golang.

Eventually they realized that oneof isnt an optional feature of proto - but the solution always felt rushed and bolted on because it was.

I hope the second try at proto api addresses these warts. I'm generally a golang fan - but not in this area.

Citations:

https://github.com/golang/protobuf/issues/29


To quote one of the maintainers of the golang/protobuf implementation:

> The jsonpb package is intended to the be faithful implementation of the proto<->JSON specification, for which the C++ implementation is considered the canonical "reference" implementation.

And that C++ implementation does some kinda dumb things. For example, it marshals the protobuf type of int64 into a string when marshaling from a protobuf struct into JSON.

https://github.com/golang/protobuf/pull/916

I believe that the package they mention there is meant to be a more "canonical" protobuf <-> json marshaller than the existing jsonpb package.


IMO marshaling int64 to string is required for interoperability; RFC7159 recommends not assuming that more than IEEE754's 53 bits of integer precision are available:

https://tools.ietf.org/html/rfc7159#section-6


Its horrible, i know. Try mixing that with grpc too and it gets even worse.


You need the intermediate struct, because there may be multiple entries in the oneof that share the same type. Not having the intermediate struct would make it impossible to distinguish them.


That seems like a rare edge case. Simple solution: Only use an intermediate struct if actually needed.


That has the downside of causing build breakage if seemingly unrelated fields are added/removed from Protobuf messages.


Do you know whether they fixed oneof with APIv2?


No, I don't think so! My point was that I hope this change will renew development a bit and perhaps allow more breaking changes.


Hmm! Hehe I had hope. I can't imagine them doing breaking changes after a fresh api release though sadly.


Hmm. I think it would be quite simple to build an authentication proxy/firewall based on this without having to know the exact underlying message.

Maybe tagging fields with auth_userid and/or auth_token.


Hackernews, where the highlight of mentioning common libraries is the fact that they're done so in Go and Rust.


I find that to be a feature. If I open the article and/or comments and find out this was about a library for Ruby or Java then I wasted my time because I don't use those. Likewise, a non-Go or non-Rust dev can ignore Go and/or Rust tagged library articles.


If only we could filter out stuff like this using some tags...


Umm ... When will pb allow me to decode a field in buffer without having to decode the entire object in the buffer? I'm talking a field in a pb message NOT a byte stream I hand encoded without reference to a message eg more like flat buffers.


This article doesn't do a great job explaining exactly what new features are a part of this release. Seems to me like it's finally enabling some of the useful type iteration stuff that is possible in other versions of protobufs.


It is odd that the Go team is all in on semantic import paths, but basically everyone else is like “I will blow up the moon if it means I don’t have to use v2 in my import path.


v2 is used so that the import path is different. The idea is that by breaking the API, you are basically making a new module instead of improving one (metaphorically of course), and it doesn't make sense for the same module to have the same import path.

If you want to change the import path and also release a new version, then simply changing the import path is enough. No need to put v2.

See it as them releasing a totally new package that just happens to have the same name, if that helps.


I wonder if we will start seeing people add v1.example.com, v2.example.com etc... to their git servers to avoid having to change the path's in all of their private projects.


Seems like you need something like a DNS server for repo's.


I thought the general consensus on protocol buffers was "Meh"?


Why would it be? At this point, it's a simple, effective and fast RPC framework with a lot of cross-language support. If you need something like it, there's not many other choices available with this big of an ecosystem.

There's even GRPC-Web being worked on to bring it to the browser. That being said, it can be overused just like anything else.


It is a bit meh. The IDL and generated code is very ugly, and its neither the fastest nor the most compact (in serialized bytes) format. That said, it's also very widely adopted, and very mature.


Something like it needs to exist. JSON is stupidly inefficient for large sets of numbers. Except a framework just like grpc has been around for 25 years... but shhhh, don't tell hipsters about IIOP, they are like new parents and need to discover things for themselves.


do you have any links to benchmarks showing json serialization inefficiency for large sets of numbers?

I found this which shows json isn't too much worse than others, but it doesn't go into extreme cases

https://github.com/eishay/jvm-serializers/wiki


Anything that represents numeric values as text will be an order of magnitude slower to deserialize than a binary format. I don’t have a published benchmark but in my testing which is easy to replicate I’m storing an array of around 50 million uint64 values in a contiguous block of data to maximize deserialization speed. Just to get a sense of the difference I also tested using a JSON array as a more user friendly storage format and the penalty of deserializing the JSON and populating the array was around 20x in runtime cost in my use case. The platforms I tested on were C# and Java.


I'm wondering if the whole idea behind "general consensus" really work on technical problems. There are so many factors involved with the choices engineers make that you can't say that general consensus says it's "meh" as it really depends on your use case. No tool will ever fit all general problems, so will always be "meh" for someone.


You thought wrong?


Seems so.


I think for large organizations they probably make sense. They're potentially a lot less data per request than JSON.

For the average app? Overkill, sure.


Overkill? It doesn’t take a lot of work to add proto buffers to an existing project (if you’re already using a reasonable build system, anyway).


If you have weeks to spend optimizing build systems, linters/editors, CI/CD, etc...


I also recently learned about grpcurl that makes using protobufs a lot easier. Still not as easy as json but I think protobufs are the future.


CBOR. Binary JSON with some extra features.

I like PB, but it was too hard in my application to sync all of the sides with the same proto as it was developing rapidly.

For me schemaless was better, and now we have extremely easy JSON conversion... but did have to write the parser myself.

I personally can’t agree on PB being “the future”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: