> The google.golang.org/protobuf module is APIv2. We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider.
That's such a weird choice, and will be quite confusing. Now every time you see protobuf being imported, you have to be sure to pay extra attention to the domain used, and correctly remember which one is v1 or v2. If only go modules established a clear way to differentiate API versions using the import path!
The goal of issuing a major version is to create a separate package namespace so it can be imported concurrently to other major versions. By treating major versions as different packages, it creates efficiencies in VCS history and common namespace.
However, the goal is to create a new package name that can be concurrently imported with previous versions. Everything else is just method. Don't confuse the goal with the method.
People care about versions. If it’s harder or more confusing to understand what version you’re on, that’s more than “just method”. Otherwise we’d just use UUIDs for package names and call it a day.
>According to the Import Compatiability Rule for versioned Go, this implies that the new proto package must have a new import path. Since "github.com/protobuf/proto/v2" will incur great confusion with the proto2 syntax, we may take this time to use an import path like "google.golang.org/proto" (similar to how gRPC lives at "google.golang.org/grpc").
> the version number alone should be enough to unambiguously differentiate between APIv1 and APIv2.
...which is how version numbers normally work. They've made a big breaking change in a 1.X release, so it isn't semver, but that isn't the end of the world. It works around having to put the major version number in the import path too, so really it works like most other non-Go, non-semver, projects.
I've seen many people append a version to the package name. So they could have done `protobuf.v2` (granted the original would be missing a version but still, at least this is more explicit)
Would be nice if they would just make packages.golang.org sorta how Python has PyPI, Rust has Crates.IO and so on... They can just redirect to relevant repos...
Not to be too Rust evangelist here, but why the emphasis on reflection instead of further support for generics?
On the Rust side, macros can[1] transform .proto files into data structure definitions at compile time, and instances of the structs passed around have the benefit of strong typing, code completion in editors, lints and tests can easily validate properties of protobuf objects and so on. And all without adding more codegen steps which is nice - the macros can replace a build system having to do codegen as a separate step.
How often do you need to mutate any protocol buffer and remove (redact as they say) fields generating a completely invalid protocol buffer object? Will anything downstream match the ".proto" of an object that's missing a bunch of required fields? Wouldn't it be easier to just all of the sensitive data into an optional "sensitive_data" field that's defined by another .proto, and strip that?
A Go API for protobufs can't support generics because Go doesn't support generics, yet. I guess the protobufs team didn't want to wait until Go v2, probably a good call given Go v1.x is going to be around for years.
In Go, a code generator converts the .proto files into Go code, which give the same benefits you see with the Rust macros. Its arguable if a code generation step is better or worse than macros.
The reflection is needed if you have received a protobuf the code don't have a definition for. You can now implement something like a gRPC proxy or gRPC load balancer in Go, without needing to compile it with code from the specific .proto files. You also appear to be able to access annotations on the message definitions, which are not embedded in the generated Go structs. Rust may well have similar features in its API. Java certainly does. A gRPC proxy is a use case redact sensitive data, for when you use it to create an audit log of the messages in the requests and responses.
If you don't have the proto definition, there's nothing you can do except pass the object through unmodified. And you should not need a reflection API for that (unless the v1 API was totally messed up in other ways).
Without the definition (and knowing the type!) there won't be anything around to tell the reflection API what the names, types and annotations are. All you would have is field numbers mapped to opaque blobs of data.
> Its arguable if a code generation step is better or worse than macros.
I'd rather just have generics.
Code generation for protos is nice for several reasons, but using it as an alternative to generics is not nice at all. I work on a number of projects with generics in internal classes that are not exposed as an API, and using code generation just to get those to work seems like a very big hammer for a small nail.
That's not very scalable. Putting more logic in a macro means the compiler needs to reprocess it each time the file is touched, even when the part that's changed is not the protos. A separate code gen step means the proto compiler doesn't need to be invoked as often, and the generated files don't need to be recompiled either.
> Will anything downstream match the ".proto" of an object that's missing a bunch of required fields?
You may have missed it, but in proto3, required fields have been removed. All fields are optional now. https://stackoverflow.com/a/31814967/1102119 TL;DR: it's a code evolution problem.
That doesn't sound right, I don't think there's any reason a proc macro couldn't cache the .proto file's generated code and use the cache between runs.
Compile time generic programming is not a replacement for runtime reflection. A great example of this is if you want to use dynamically generated protocol buffers.
A great usecase for this is if you want to store each row in a database as a proto message. Instantiate a descriptor based on the table schema and go.
> Compile time generic programming is not a replacement for runtime reflection.
Why not?
> A great example of this is if you want to use dynamically generated protocol buffers.
When would you want to do that? If your data isn't statically typed, isn't it more appropriate to use a type designed for dynamic shapes (e.g. a list or hashmap)?
> A great usecase for this is if you want to store each row in a database as a proto message.
Wouldn't a reasonable approach to this be to turn each row into a list (of variants) and serialize that?
Any talk of required fields is a non-starter. Just forget about it. Don't use them.
Whether a field is "sensitive" may change after the field is defined and used all over the place, so you can't just move it without changing all the code. But you can easily add a option (like [non_sensitive=true]) to any field so your redaction code picks it up.
But the protoc compiler already makes data structure definitions in the target, client language. So what's the point of a parallel macro system? Either it's a subset of capability of what protoc does or equal. Former questionable; latter redundant.
> We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider.
> (Why start at version v1.20.0? To provide clarity. We do not anticipate APIv1 to ever reach v1.20.0, so the version number alone should be enough to unambiguously differentiate between APIv1 and APIv2.)
Stuff like this is what makes Go confusing to newcomers.
For some time I couldn’t understand why people were complaining so much about —the now deprecated— GOPATH environment variable, and later the confusion of how to use all the different and incompatible dependency managers. Then, suddenly last year, it hits me… Go is confusing to people who haven’t followed the language from the beginning.
I started programming in Go in 2013 so for me it was very easy to adapt to every change in the language and the community.
I can only imagine all the ambiguous stuff people have to understand when learning the language in 2019-2020.
However, it has become clear to me that it was a little too simple to handle all the new requirements and additions that came after v1.
That's how you get weird comments-as-code (// +build linux,386 darwin,!cgo), strings-as-attributes (tags), tacked-on modules versioning / vendoring, non-typed generics( interface{} everywhere), somewhat-typed generics (go generate), etc.
Go was not quite ready to handle all these directions people wanted the language to stretch to and as a result it's now already full of annoying warts.
I'll still pick Go if I need a simple service with amazing performance. Funnily enough though, I have no need for any of the (non-performance) additions that came after 1.6.
I think the bit that got me around GOPATH was last time I'd written Go, that was the only way. Then I fired up Go again, v1.12 and decided to take advantage of a Go module for a new project, which I naturally put beside my existing Go code. And then I got to learn about the settings for GO111MODULE.
Having the build tools a core part of your language itself is both a strength and a weakness for Go, I think.
Interesting that even Google decided to do complex versioning gymnastics and introduce a new namespace in order to avoid how `go mod` adds the version as a path component.
Nah, we just really wanted to stop using an import path tied to a specific hosting provider. Once we decided to change to an entirely different path, we waffled a lot over whether to tag it v1 or v2. There were arguments for and against either, so we eventually picked one and went with it.
The whole idea of tying image names to hosting provider urls or other kinds of locators was probably a bad idea.
What if those things move to different domains, or different directory structures? Wouldn't that break every Go program that depends on them?
I've been using Java, Python, and Ruby for a while. They all have various pain points for managing dependencies but they do get one thing right: packages have names and versions, names are opaque strings that are not coupled to a particular hosting provider, and they don't care what directory on my laptop I use to store my dependencies and my code.
The Go dependency model seems like a real step backwards in usability and maintenance. It's like they are forcing one software org's standards on the rest of us. We don't all work at Google! Just let us depend on things that have names and version numbers!
Because they would had to add a `/v2` suffix to the import path and seems like they didn't wanted to add that. The question is, why is this not released as v1.0.0 then? Still trying to understand the reasoning for this.
+ Makes the "v1" and "v2" distinction very clear in the import path.
- Confusing: google.golang.org/protobuf@v1 doesn't exist, but v2 does.
- In ten years, hopefully nobody cares about the old github.com/golang/protobuf and the confusion is gone.
We could tag the new API v1:
- Less visually distinct in the import path.
+ Seems to make sense for the first version of google.golang.org/protobuf to be a v1.
+ If we decide it was a terrible idea, it's easier to go from v1 to v2 than to roll back from v2 to v1.
We waffled back and forth for quite a while on this, and eventually decided that the first version of google.golang.org/protobuf would be a v1. Then as we got closer to release (but with a certain amount of usage of v0 in the wild), we decided not to second-guess that decision but to start with a version that wouldn't overlap with any version of github.com/golang/protobuf to avoid confusion when someone reports a bug in "v1.0.1".
Maybe it was the wrong choice. If it was the worst choice we've made in the new API, I'll be happy!
> The google.golang.org/protobuf module is APIv2. We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider.
This seems like a good example why tying the name of the package in the code with the place where it can be downloaded was a very bad idea. Simply changing hosting providers is now a backwards compatibility break for your clients.
If Go imports didn't work this way, you could have simply offered the old v1 from both github.com and google.golang.org, and used a clear v2 for the v2. Sure , you would have similar problems if you wanted to change the logical package structure for some reason (say, when an open-source package moves to a different organization), but that is a much rarer case than switching code repos.
However, given that that ship has probably long sailed, you probably picked the right choice.
I would also note that v2 in the import path feels a bit strange, since it makes users of the newest code need to know about the history of the package, but there are also clear advantages which probably out-weigh this (I believe this is absolutely not the case for the decision to include the hosting provider in the import path).
Indeed, tying the name of a package to a hosting provider is a bad idea! That's why we changed to an import path which isn't tied to one. google.golang.org/protobuf is not tied to any particular provider, and we can redirect it wherever we want.
See "go help importpath" for details on how this works.
> - Confusing: google.golang.org/protobuf@v1 doesn't exist, but v2 does.
> google.golang.org/protobuf is not tied to any particular provider, and we can redirect it wherever we want.
I agree, this is confusing. Why doesn't google.golang.org/protobuf@v1 return what's at github.com/golang/protobuf, then, if only to provide a tidy answer for those who wonder what v1 looked like?
When I read the article, my brain passed right over the different domain names; the two are even the exact same number of characters, set in a monospace font, and my brain passes right over the domain of golang imports to the end of the path. That's where the interesting bit is (usually!)
I find this a super confusing and unintuitive choice, but I suppose it's too late for changing it now.
Sincerely,
an othwerwise mostly happy protobuf user
I think you’re overestimating how much people pay attention to the body part of the import paths. Since there have been a quite few projects that implemented a sort of redirect (i.e. ‘vanity’ imports [0]) most people’s eyes just glaze over the small change in the URL.
I would strongly suggest using /v2 at the end. /v1 not existing on *.golang.org is not a problem in practice, but if it ends up one you could just set that up as a vanity import pointing to github.com for the old version.
Thanks for the all the work. And I hope my rant doesn't come something as a bad. But I truly believe the versioning along with the import path is confusing. I hope the team goes over the versioning again and come up with something that makes more sense.
When you were waffling back on forth on the decision, did anyone note that “re-using v1 is a weird choice and is going to be the main thing people talk about”?
> - Confusing: google.golang.org/protobuf@v1 doesn't exist, but v2 does.
This would not be confusing at all. The UUID package I use just skipped from major version 3 to major version 7 to avoid confusion with UUIDv4, UUIDv5 and a possibe UUIDv6. It was well-documented, and even if it wasn't, the upgrade was absolutely painless, as it would have been either way.
> we just really wanted to stop using an import path tied to a specific hosting provider
Are there any plans to fix this for the whole language, for others as well? i.e. use namespaces, module names in the source but move the actual github.com, google.com or geocities.com URLs outside.
One thing which saddens me a lot though, is returning types you have to type assert because they had to avoid an import cycle. I don’t know if it could be avoided (probably not) but it’s still unergonomic.
> (Why the type assertion? Since the generated descriptorpb package depends on protoreflect, the protoreflect package can't return the concrete options type without causing an import cycle.)
Does someone know if this new version uses fast generated code for (de)serialization? The previous version was pretty slow (because based on reflection I think) so many people used https://github.com/gogo/protobuf for good performance.
Hopefully this is good news. The old Protobuf package had ossified, with lots of ergonomic issues that the maintainers seemed uninterested in addressing.
One perennial annoyance is how awkwardly some Protobuf stuff ends up being represented in Go. For example, "oneof" types:
Secondly, isEvent_Payload is private! So given a Create or an Delete, you can't store it in a single variable:
func nextEvent() isEvent_Payload { // <-- not possible
if something {
return &Create{...}
} else {
return &Delete{...}
}
}
The reason for this appears to be so that the generated serialization can be very dumb and not rely on type switches. But I don't buy that this is how it has to be.
There are other issues. Overall, the whole package seems designed from the bottom up for machines (mumbles possibly also by machines), not humans.The upshot is that using Protobuf types as "first-class" types — meaning the types you actually use internally in the meat of your app, as opposed to in the controller glue that lives in your API and mediates between the API and the internals — feels super messy.
As an aside, anyone know what issues this paragraph refers to?
The google.golang.org/protobuf/encoding/protojson package
converts protocol buffer messages to and from JSON using the
canonical JSON mapping, and fixes a number of issues with the
old jsonpb package that were difficult to change without
causing problems for existing users.
Did they finally fix the zero value problem with jsonpb.go [1]?
To quote one of the maintainers of the golang/protobuf implementation:
> The jsonpb package is intended to the be faithful implementation of the proto<->JSON specification, for which the C++ implementation is considered the canonical "reference" implementation.
And that C++ implementation does some kinda dumb things. For example, it marshals the protobuf type of int64 into a string when marshaling from a protobuf struct into JSON.
IMO marshaling int64 to string is required for interoperability; RFC7159 recommends not assuming that more than IEEE754's 53 bits of integer precision are available:
You need the intermediate struct, because there may be multiple entries in the oneof that share the same type. Not having the intermediate struct would make it impossible to distinguish them.
I find that to be a feature. If I open the article and/or comments and find out this was about a library for Ruby or Java then I wasted my time because I don't use those. Likewise, a non-Go or non-Rust dev can ignore Go and/or Rust tagged library articles.
Umm ... When will pb allow me to decode a field in buffer without having to decode the entire object in the buffer? I'm talking a field in a pb message NOT a byte stream I hand encoded without reference to a message eg more like flat buffers.
This article doesn't do a great job explaining exactly what new features are a part of this release. Seems to me like it's finally enabling some of the useful type iteration stuff that is possible in other versions of protobufs.
It is odd that the Go team is all in on semantic import paths, but basically everyone else is like “I will blow up the moon if it means I don’t have to use v2 in my import path.
v2 is used so that the import path is different. The idea is that by breaking the API, you are basically making a new module instead of improving one (metaphorically of course), and it doesn't make sense for the same module to have the same import path.
If you want to change the import path and also release a new version, then simply changing the import path is enough. No need to put v2.
See it as them releasing a totally new package that just happens to have the same name, if that helps.
I wonder if we will start seeing people add v1.example.com, v2.example.com etc... to their git servers to avoid having to change the path's in all of their private projects.
Why would it be? At this point, it's a simple, effective and fast RPC framework with a lot of cross-language support. If you need something like it, there's not many other choices available with this big of an ecosystem.
There's even GRPC-Web being worked on to bring it to the browser. That being said, it can be overused just like anything else.
It is a bit meh. The IDL and generated code is very ugly, and its neither the fastest nor the most compact (in serialized bytes) format. That said, it's also very widely adopted, and very mature.
Something like it needs to exist. JSON is stupidly inefficient for large sets of numbers. Except a framework just like grpc has been around for 25 years... but shhhh, don't tell hipsters about IIOP, they are like new parents and need to discover things for themselves.
Anything that represents numeric values as text will be an order of magnitude slower to deserialize than a binary format. I don’t have a published benchmark but in my testing which is easy to replicate I’m storing an array of around 50 million uint64 values in a contiguous block of data to maximize deserialization speed. Just to get a sense of the difference I also tested using a JSON array as a more user friendly storage format and the penalty of deserializing the JSON and populating the array was around 20x in runtime cost in my use case. The platforms I tested on were C# and Java.
I'm wondering if the whole idea behind "general consensus" really work on technical problems. There are so many factors involved with the choices engineers make that you can't say that general consensus says it's "meh" as it really depends on your use case. No tool will ever fit all general problems, so will always be "meh" for someone.
> The google.golang.org/protobuf module is APIv2. We have taken advantage of the need to change the import path to switch to one that is not tied to a specific hosting provider.
That's such a weird choice, and will be quite confusing. Now every time you see protobuf being imported, you have to be sure to pay extra attention to the domain used, and correctly remember which one is v1 or v2. If only go modules established a clear way to differentiate API versions using the import path!