Skip to content

Conversation

Sasasu
Copy link
Contributor

@Sasasu Sasasu commented Jul 3, 2023

Postgresql has an endian-dependent format for the bitflag in varlena. currently pgrx only support smallendian. adding bigendian support.

Greenplum uses the bigendian format for all platforms.


I will double check if there is some copy&past error tomorrow. :>

Postgresql has a different format for the bitflag in varlean. currently pgrx
only support smallendian. adding bigendian support.

Greenplum uses the bigendian format for all platforms.
@Sasasu Sasasu force-pushed the sa/varatt_order branch from d5ee002 to 5979a66 Compare July 3, 2023 12:42
@thomcc
Copy link
Contributor

thomcc commented Jul 3, 2023

Greenplum uses the bigendian format for all platforms.

This means target_endian = "little" is wrong, no?

@eeeebbbbrrrr
Copy link
Contributor

The pgrx project has no facilities for testing on big endian architectures. I haven't yet validated that the new big endian impls of these macros are correct/identical-to-postgres-impls, but even if they are, we can't accept this simply because we can't test it.

Is this part of a larger effort by Greenplum to offer pgrx support? If so, I'd like an opportunity to speak to someone in the company. My email is e_ridge @ tcdi.com

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, of course, highlights a deeper problem, which is that we need to actually sit down and agree on the feature for enabling Greenplum support.

Comment on lines +15 to +18
#[cfg(target_endian = "big")]
pub use varlena_bigendian::*;
#[cfg(target_endian = "little")]
pub use varlena_littleendian::*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what you said

Suggested change
#[cfg(target_endian = "big")]
pub use varlena_bigendian::*;
#[cfg(target_endian = "little")]
pub use varlena_littleendian::*;
#[cfg(any(feature = "greenplum7", target_endian = "big"))]
pub use varlena_bigendian::*;
#[cfg(not(any(feature = "greenplum7", target_endian = "big")))]
pub use varlena_littleendian::*;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, @workingjubilee's suggestion here is just for demonstration purposes. We are not in any position to invent a greenplum7 feature yet. Such a thing will require a significant amount of planning, design, and understanding on our part, along with the ability to test it... and test it, apparently, on disparate CPU architectures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is to add greenplum support, but this PR is not related to greenplum. and this PR did not add the greenplum7 feature. If this PR gets merged, I will add gp7 support based on this PR.

@workingjubilee workingjubilee changed the title varlean: support bitflag in bigendian format. varlena: support bitflag in bigendian format. Jul 3, 2023
@Sasasu
Copy link
Contributor Author

Sasasu commented Jul 4, 2023

Greenplum uses the bigendian format for all platforms.

This means target_endian = "little" is wrong, no?

this PR is not related to greenplum. for pgrx target_endian = "little" is correct.
but greenplum remove the littleendian code, and always use bigendian bitflag on all platforms for an unknown reason.

this bitflag does not use to calculate, so it is possible to use bigendian bitflag on littleendian platforms.

this PR is adding the support for bitflag in bigendian format. not related to greenplum.

@workingjubilee
Copy link
Member

I understand that, but we strongly prefer to be able to test changes for validation of their correctness, especially with the varlena-reading tests. It is very hard for us to validate changes are correct without a big-endian system to test on, and unfortunately it isn't possible to use Rust's byte-swapping facilities such as from_be_bytes and to_le_bytes to reimplement these macros, to my understanding of the bit-manipulations. If it were, I would unhesitatingly recommend using an "endian-oblivious" implementation.

Most of all, I don't want to approve this and then find out some other part of our code is busted as hell on big-endian systems. Better for it to fail in some blatantly obvious way... I have many grievances with Postgres but it is surprisingly good at recovering from rampant data corruption and erroring when it enters invalid states. I'd rather someone who tried to use PGRX on a big-endian system encounter an early, spectacular failure, than get far enough for them to induce a more subtle data corruption that Postgres does not detect.

@Sasasu
Copy link
Contributor Author

Sasasu commented Jul 4, 2023

Most of all, I don't want to approve this and then find out some other part of our code is busted as hell on big-endian systems.

this is Ok, there is no modern CPU support big-endian anymore. I agree with you.
I will close this PR after I verified on my machine. there still have a problem in get_nullable_datum or varsize_any_exhdr.


and, just an ask, Can you accept moving most of functions to cshim? This will introduce some function call overhead, which will slightly reduce performance. this ABI problem is already handled in the C side (by static inline and macro functions)


for Greenplum support, there are much more works to do. not only the things mentioned in issue 1179.
I will complete my usage first.

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Jul 4, 2023

this is Ok, there is no modern CPU support big-endian anymore. I agree with you.
I will close this PR after I verified on my machine. there still have a problem in get_nullable_datum or varsize_any_exhdr.

If you're talking about e3464a1, please make a separate PR for that -- it does look like it confused length in bytes with number of elements.

and, just an ask, Can you accept moving most of functions to cshim? This will introduce some function call overhead, which will slightly reduce performance. this ABI problem is already handled in the C side (by static inline and macro functions)

Most the functions being discussed in this PR were in the cshim at one point. The primary problem with the cshim is the ability to easily cross-compile it. Some downstream pgrx users enjoy cross compiling from x86_64 to aarch64. In fact, that's a key feature of PL/Rust.

Internally, we've discussed inventing some very lightweight "C to Rust" transformer that could programmatically port these types of C macros. Of course, this a) doesn't exist yet, and b) wouldn't cover everything, but I think it's a better direction. We definitely agree that hand-porting these macros is error-prone.

Generally speaking, any of the Postgres #define macros or static inline functions that aren't central to general pgrx operation -- things that would be fine if unavailable when the cshim feature is disabled -- are fine candidates for the cshim. Core support for the varlena type is not one of those things, however.

for Greenplum support, there are much more works to do. not only the things mentioned in issue 1179.
I will complete my usage first.

We'd like to support Greenplum.

It's important to recognize that pgrx is developed with an understanding of the official Postgres sources/internals. Supporting deviations from that will require a lot of effort.

As I mentioned earlier, I'd love an opportunity to talk to someone at Greelplum/vmware about this. In order for the pgrx core team to agree to support multiple, different products we're going to need some coordination first. I don't want to encourage you or your team to start sending us Greenplum PRs without that coordination.

@radarwave
Copy link

@eeeebbbbrrrr Thanks for your comments, it would be great to talk with you, and I believe we discussed Zombodb&Greenplum integration before. As a start, maybe we can discuss this in Greenplum open-source Slack channel with both the engineering and product teams. If needed, we can coordinate a time for the meeting.

Slack: greenplum.slack.com
Channel: #extensions

@Sasasu
Copy link
Contributor Author

Sasasu commented Jul 5, 2023

If you're talking about e3464a1, please make a separate PR for that

#1190

Core support for the varlena type is not one of those things, however.

Ok, let us wait for rust-lang/rust-bindgen#2369

I'd love an opportunity to talk to someone at Greelplum/vmware about this.

From my side, this is a 20% time project. there are some people who want to use rust in his project and find pgrx is incompatible with Greenplum. and I can write Rust, and I am helping with his project. I don't know the plan at the company level.

and pgrx is working with Greenplum on my fork. I think we need more tests and start internal usage first (not the company plan). I will close this PR one week after (or you can close this PR at any time).

@workingjubilee
Copy link
Member

and pgrx is working with Greenplum on my fork

Is Sasasu@9f83dbb the fork you mean?

@Sasasu
Copy link
Contributor Author

Sasasu commented Jul 6, 2023

Is Sasasu@9f83dbb the fork you mean?

yes, this commit breaks Postgresql support. but add Greenplum support (with cfg = pg12).

@Sasasu Sasasu closed this Jul 10, 2023
@eeeebbbbrrrr
Copy link
Contributor

My comment over there applies here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants