Skip to content

Add support for generating protobuf compliant DESCRIPTORs #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 25, 2025

Conversation

edaniels
Copy link
Contributor

Summary

This adds a new option google_protobuf_descriptors which will rely on protobuf's descriptor capabilities. This fixes reflection and lets libraries like https://github.com/foxglove/mcap/tree/main/python/mcap-protobuf-support work.

Checklist

  • If code changes were made then they have been tested.
    • I have updated the documentation to reflect the changes.
  • This PR fixes an issue.
  • This PR adds something new (e.g. new method or parameters).
    • This change has an associated test.

@@ -127,6 +141,10 @@ class {{ (service.py_name + "Base") | add_to_all }}(ServiceBase):
{% endif %}

raise grpclib.GRPCError(grpclib.const.Status.UNIMPLEMENTED)
{% if method.server_streaming %}
{# yielding here changes the return type from a coroutine to an async_generator #}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdrienVannson this is a driveby but it is truly needed for grpclib to function properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you

@edaniels edaniels changed the title Add support for generate protobuf compliant DESCRIPTORs Add support for generating protobuf compliant DESCRIPTORs Jun 15, 2025
@MichaelSchuldes
Copy link

Nice work. I tested it out for my use case (betterosi a library for working with ASAM OSI).

ASAM OSI use nesting when defining proto messages and enums (e.g., BoundingBox.Type). The betterproto2 compiler ignores the nesting and just concatenates the class names. @edaniels Your code keeps the nesting (which is more true to the protos). When using the generated code I get the error:

Couldn't build proto file into descriptor pool: invalid name: unexpected '.' (BoundingBox.Type)

class BoundingBoxType(betterproto2.Enum):
    """
    Definition of different types of object contained within the bounding box
    """

    @betterproto2.classproperty
    def DESCRIPTOR(self) -> EnumDescriptor:
        return OSI_COMMON_PROTO_DESCRIPTOR.enum_types_by_name["BoundingBox.Type"]

@edaniels
Copy link
Contributor Author

Nice work. I tested it out for my use case (betterosi a library for working with ASAM OSI).

ASAM OSI use nesting when defining proto messages and enums (e.g., BoundingBox.Type). The betterproto2 compiler ignores the nesting and just concatenates the class names. @edaniels Your code keeps the nesting (which is more true to the protos). When using the generated code I get the error:

Couldn't build proto file into descriptor pool: invalid name: unexpected '.' (BoundingBox.Type)

class BoundingBoxType(betterproto2.Enum):
    """
    Definition of different types of object contained within the bounding box
    """

    @betterproto2.classproperty
    def DESCRIPTOR(self) -> EnumDescriptor:
        return OSI_COMMON_PROTO_DESCRIPTOR.enum_types_by_name["BoundingBox.Type"]

I'm looking into this. So of course between google's compiler and ours, the dot is not there on google's but is on ours. The strange part is that the dot is being set in the FileDescriptorProto being sent to the plugin. Maybe google has code that fixes their own up?

I'm looking at https://github.com/protocolbuffers/protobuf/blob/b339f136b2d60427a076e98ad9eb982bf8228def/upb/reflection/internal/def_builder.c#L318.

@MichaelSchuldes can you tell me how you're invoking the compiler? I want to make sure I get it straight.

@edaniels
Copy link
Contributor Author

Ah I see the issue. On it

@edaniels
Copy link
Contributor Author

edaniels commented Jun 15, 2025

@MichaelSchuldes fixed!

This the proto I tested with:

syntax = "proto3";

import "one/two/other.proto";
import "friendly.proto";
import "google/protobuf/struct.proto";
import "google/protobuf/timestamp.proto";

package my.really.super.ha.ha.ha.cool.helloworld;

message CoolRequest {
  string name = 1;
  message Result {
    string LOOK_FOR_ME = 1;
  }
  Result result_usage = 2;
}

message CoolReply {
  string message = 1;
  inner.thingy.other.HelloEnum other = 2;
}

enum CoolEnum {
  ONE = 0;
  TWO = 1;
}

service Greeter {
  rpc SayHello (CoolRequest) returns (stream CoolReply) {}
}

Python:

    with open("test.mcap", "wb") as f, Writer(f) as writer:
        writer.write_message(
            topic="/point_cloud",
            message=CoolRequest(name="hello", result_usage=CoolRequestResult(LOOK_FOR_ME="hi!")),
        )

mcap

 mcap cat --json test.mcap
{"topic":"/point_cloud","sequence":0,"log_time":1749995689.155294000,"publish_time":1749995689.155294000,"data":{"name":"hello","resultUsage":{"LOOKFORME":"hi!"}}}
mcap list schemas test.mcap                                         
id	name                                                	encoding	data                                                                                     	
1 	my.really.super.ha.ha.ha.cool.helloworld.CoolRequest	protobuf	syntax = "proto3";                                                                       	
  	                                                    	        	                                                                                         	
  	                                                    	        	message inner.thingy.other.Otherequest {                                                 	
  	                                                    	        	  optional name string = 1;                                                              	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message inner.thingy.other.Othereply {                                                   	
  	                                                    	        	  optional message string = 1;                                                           	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message hifriend.HelloRequest {                                                          	
  	                                                    	        	  optional name string = 1;                                                              	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message google.protobuf.Struct {                                                         	
  	                                                    	        	  repeated fields .google.protobuf.Struct.FieldsEntry = 1;                               	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message google.protobuf.Value {                                                          	
  	                                                    	        	  optional null_value .google.protobuf.NullValue = 1;                                    	
  	                                                    	        	  optional number_value double = 2;                                                      	
  	                                                    	        	  optional string_value string = 3;                                                      	
  	                                                    	        	  optional bool_value bool = 4;                                                          	
  	                                                    	        	  optional struct_value .google.protobuf.Struct = 5;                                     	
  	                                                    	        	  optional list_value .google.protobuf.ListValue = 6;                                    	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message google.protobuf.ListValue {                                                      	
  	                                                    	        	  repeated values .google.protobuf.Value = 1;                                            	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message google.protobuf.Timestamp {                                                      	
  	                                                    	        	  optional seconds int64 = 1;                                                            	
  	                                                    	        	  optional nanos int32 = 2;                                                              	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message my.really.super.ha.ha.ha.cool.helloworld.CoolRequest {                           	
  	                                                    	        	  optional name string = 1;                                                              	
  	                                                    	        	  optional result_usage .my.really.super.ha.ha.ha.cool.helloworld.CoolRequest.Result = 2;	
  	                                                    	        	}                                                                                        	
  	                                                    	        	message my.really.super.ha.ha.ha.cool.helloworld.CoolReply {                             	
  	                                                    	        	  optional message string = 1;                                                           	
  	                                                    	        	  optional other .inner.thingy.other.HelloEnum = 2;                                      	
  	                                                    	        	}                                                                                        	
  	                                                    	        	                                                                                       

Actually not sure why the schema from mcap doesn't print out the embedded message but it seems to work fine. Shows up with grpc_cli so I'm not concerned:

grpc_cli type localhost:50051 my.really.super.ha.ha.ha.cool.helloworld.CoolRequest 
message CoolRequest {
  message Result {
    string LOOK_FOR_ME = 1 [json_name = "LOOKFORME"];
  }
  string name = 1 [json_name = "name"];
  .my.really.super.ha.ha.ha.cool.helloworld.CoolRequest.Result result_usage = 2 [json_name = "resultUsage"];
}

grpc_cli type localhost:50051 my.really.super.ha.ha.ha.cool.helloworld.CoolRequest.Result
message Result {
  string LOOK_FOR_ME = 1 [json_name = "LOOKFORME"];
}

@MichaelSchuldes
Copy link

@edaniels This works for me know. Thanks for the work!

Your last change changed the naming of enum values for me. BoundingBox.Type -> TYPE_UNKOWN becomes UNKOWN in the python class. Is this intended?

@edaniels
Copy link
Contributor Author

That's great! For the enum, that behavior has been on main since https://github.com/betterproto/python-betterproto2-compiler/commit/bd7de203e16e949666b2844b3dec1eb7c4ed523c#diff-184eade9a866873252783b1ccbf0f408efe1ae5dc357cc634c1545e80763194aR16 (2023).

@edaniels
Copy link
Contributor Author

@AdrienVannson pre 3.13 should be fixed up now. Had a bad f-string that works magically in 3.13

@AdrienVannson
Copy link
Contributor

Thank you! I will have a closer look soon

@edaniels
Copy link
Contributor Author

hi @AdrienVannson, any time to look at this this week? Thanks!

This reverts commit a85929c.
"""
return "\n".join(
[
f"{self.get_descriptor_name(f)} = _descriptor_pool.Default().AddSerializedFile({f.SerializeToString()})"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to change how we do this. it's reaching too deep and ends up breaking a MergeFrom in some random google cloud library.

@edaniels
Copy link
Contributor Author

@AdrienVannson @MichaelSchuldes, I stopped adding to the Default() DescriptorPool because it was causing problems downstream and its reaching into implementation details that we're better off not reaching into. Michael, everything should still work as you except. For reflection, I've updated the descriptors.md for the workaround. Also put up vmagamedov/grpclib#204.

@edaniels
Copy link
Contributor Author

Are we good to approve?

@@ -127,6 +141,10 @@ class {{ (service.py_name + "Base") | add_to_all }}(ServiceBase):
{% endif %}

raise grpclib.GRPCError(grpclib.const.Status.UNIMPLEMENTED)
{% if method.server_streaming %}
{# yielding here changes the return type from a coroutine to an async_generator #}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you

@edaniels edaniels requested a review from AdrienVannson June 25, 2025 12:32
@edaniels
Copy link
Contributor Author

@AdrienVannson ready for another look. Thanks for the feedback re: the duplicative imports

@@ -12,9 +14,11 @@ service Test {
message ExampleRequest {
string example_string = 1;
int64 example_integer = 2;
google.protobuf.Struct example_struct = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now helps ensure if we import something like Struct, that we also add it to the proto descriptor pool prior to adding our file descriptor

@AdrienVannson
Copy link
Contributor

Thanks! One last thing: can you remove all the import_suffix parameters that were added to importing.py? They shouldn't be needed anymore.

After this I think we can finally merge everything!

@edaniels
Copy link
Contributor Author

Thanks! One last thing: can you remove all the import_suffix parameters that were added to importing.py? They shouldn't be needed anymore.

After this I think we can finally merge everything!

Ah yes, whoops. On my way to work. I'll do it when I get in.

@AdrienVannson AdrienVannson requested a review from Copilot June 25, 2025 13:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds an option to generate Google Protobuf DESCRIPTOR fields for better reflection support across libraries.

  • Introduces google_protobuf_descriptors compiler option, updates parser/settings to propagate it, and modifies templates to inject DESCRIPTOR class properties and register serialized files.
  • Extends test utilities and CI to generate and validate descriptor outputs, and adds fresh tests for descriptor integration and grpclib reflection.
  • Updates documentation (mkdocs and a new descriptors.md), package dependencies (adding protobuf), and the CI workflow to include descriptor artifacts.

Reviewed Changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/betterproto2_compiler/src/betterproto2_compiler/templates/header.py.j2 Unconditionally imported Descriptor/EnumDescriptor, should be guarded
src/betterproto2_compiler/src/betterproto2_compiler/compile/importing.py Parameter typo import_suffx vs import_suffix
docs/descriptors.md CLI option call missing leading hyphens in example
src/betterproto2_compiler/src/betterproto2_compiler/settings.py Added google_protobuf_descriptors setting
src/betterproto2/tests/grpc/test_message_enum_descriptors.py New tests for DESCRIPTOR availability
Comments suppressed due to low confidence (1)

betterproto2/docs/descriptors.md:5

  • Specify the full protoc flag syntax including leading dashes, e.g. --python_betterproto2_opt=google_protobuf_descriptors, to avoid confusion about how to pass the option.
By default, betterproto2 doesn't generate these as it introduces a dependency on `protobuf`. If you're okay with this dependency and want to generate DESCRIPTORs, use the compiler option `python_betterproto2_opt=google_protobuf_descriptors`.

@AdrienVannson
Copy link
Contributor

(I just wanted to try Copilot reviews, no need to change anything :) )

@edaniels
Copy link
Contributor Author

(I just wanted to try Copilot reviews, no need to change anything :) )

It was surprisingly almost accurate!

@AdrienVannson
Copy link
Contributor

AdrienVannson commented Jun 25, 2025

Actually, I have a last and very small remark: I noticed that the descriptors added to the generated files by your PR are significantly longer than the descriptors added by the standard protobuf python compiler.

b'\n\x0cdouble.proto\x12\x06double"\x1c\n\x04Test\x12\x14\n\x05count\x18\x01 \x01(\x01R\x05countJk\n\x06\x12\x04\x00\x00\x06\x01\n\x08\n\x01\x0c\x12\x03\x00\x00\x12\n\x08\n\x01\x02\x12\x03\x02\x00\x0f\n\n\n\x02\x04\x00\x12\x04\x04\x00\x06\x01\n\n\n\x03\x04\x00\x01\x12\x03\x04\x08\x0c\n\x0b\n\x04\x04\x00\x02\x00\x12\x03\x05\x04\x15\n\x0c\n\x05\x04\x00\x02\x00\x05\x12\x03\x05\x04\n\n\x0c\n\x05\x04\x00\x02\x00\x01\x12\x03\x05\x0b\x10\n\x0c\n\x05\x04\x00\x02\x00\x03\x12\x03\x05\x13\x14b\x06proto3'

vs

b'\n\x0c\x64ouble.proto\x12\x06\x64ouble\"\x15\n\x04Test\x12\r\n\x05\x63ount\x18\x01 \x01(\x01\x62\x06proto3'

It seems to be the case because your are serializing the field source_code_info of FileDescriptorProto, while the google implementation doesn't. The documentation of this field actually says: "This field contains optional information about the original source code. You may safely remove this entire field without harming runtime functionality of the descriptors -- the information is needed only by development tools."

Can you replace the descriptors method of OutputTemplate by:

        descriptors = []

        for f in self.input_files:
            # Remove the source_code_info field since it is not needed at runtime.
            source_code_info = f.source_code_info
            f.source_code_info = None

            descriptors.append(
                f"{self.get_descriptor_name(f)} = default_google_proto_descriptor_pool.AddSerializedFile({bytes(f)})"
            )

            f.source_code_info = source_code_info

        return "\n".join(descriptors)

?

@edaniels
Copy link
Contributor Author

Actually, I have a last and very small remark: I noticed that the descriptors added to the generated files by your PR are significantly longer than the descriptors added by the standard protobuf python compiler.

b'\n\x0cdouble.proto\x12\x06double"\x1c\n\x04Test\x12\x14\n\x05count\x18\x01 \x01(\x01R\x05countJk\n\x06\x12\x04\x00\x00\x06\x01\n\x08\n\x01\x0c\x12\x03\x00\x00\x12\n\x08\n\x01\x02\x12\x03\x02\x00\x0f\n\n\n\x02\x04\x00\x12\x04\x04\x00\x06\x01\n\n\n\x03\x04\x00\x01\x12\x03\x04\x08\x0c\n\x0b\n\x04\x04\x00\x02\x00\x12\x03\x05\x04\x15\n\x0c\n\x05\x04\x00\x02\x00\x05\x12\x03\x05\x04\n\n\x0c\n\x05\x04\x00\x02\x00\x01\x12\x03\x05\x0b\x10\n\x0c\n\x05\x04\x00\x02\x00\x03\x12\x03\x05\x13\x14b\x06proto3'

vs

b'\n\x0c\x64ouble.proto\x12\x06\x64ouble\"\x15\n\x04Test\x12\r\n\x05\x63ount\x18\x01 \x01(\x01\x62\x06proto3'

It seems to be the case because your are serializing the field source_code_info of FileDescriptorProto, while the google implementation doesn't. The documentation of this field actually says: "This field contains optional information about the original source code. You may safely remove this entire field without harming runtime functionality of the descriptors -- the information is needed only by development tools."

Can you replace the descriptors method of OutputTemplate by:

        descriptors = []

        for f in self.input_files:
            # Remove the source_code_info field since it is not needed at runtime.
            source_code_info = f.source_code_info
            f.source_code_info = None

            descriptors.append(
                f"{self.get_descriptor_name(f)} = default_google_proto_descriptor_pool.AddSerializedFile({bytes(f)})"
            )

            f.source_code_info = source_code_info

        return "\n".join(descriptors)

?

Yep, great idea. Done.

Copy link
Contributor

@AdrienVannson AdrienVannson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

@AdrienVannson AdrienVannson merged commit f5404a6 into betterproto:main Jun 25, 2025
13 checks passed
@edaniels edaniels deleted the descriptors branch June 25, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reflections don't work (I may know why) Support Proto Descriptors
3 participants