Skip to content

change result and more optimizations #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: result-as-tuple
Choose a base branch
from

Conversation

bymoye
Copy link
Contributor

@bymoye bymoye commented Jul 1, 2025

No description provided.

@bymoye
Copy link
Contributor Author

bymoye commented Jul 1, 2025

I also modified one thing in the code (I just did some simple research):

.call((raw_bytes_data.to_vec(),), None)?
to:

.call1((PyBytes::new(py, raw_bytes_data),))?
Reason:
This reduces data copying from 2 operations (Rust Vec creation + Python internal copy) to 1 operation (direct copy to Python memory).

@bymoye
Copy link
Contributor Author

bymoye commented Jul 1, 2025

@chandr-andr ,In 8cc74de, I made further optimizations to improve performance. I only conducted simple tests with pytest, no benchmarking. Can you run some tests? If there's a performance regression, feel free to roll back the code.
It is best to increase the memory usage test!

@chandr-andr
Copy link
Member

@bymoye Thank you very much for refactoring, it looks awesome.
I cannot run benchmarks right now cuz I have to work (unfortunately).

Will run them in the evening

@chandr-andr
Copy link
Member

@bymoye I run benchmarks, I have, I found a difference, not big, but you didn't do anything worse.

@chandr-andr
Copy link
Member

The rest of the changes look awesome, thank you a lot!

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

ok!Thank you for testing

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

@chandr-andr I checked your code, and I think the current test is not right. Because the engine benchmark should test the performance of the engine, not the performance of the database. So the database query should not use ORDER BY RANDOM, but to reduce the database pressure as much as possible (so that the database is almost no pressure) to complete the query, and then return to Python to get the same result.
The data will be converted and other actions (completed by the engine).
We can return as much data as possible to complete the performance test.

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

For example, a single query uses SELECT 1;
Multiple queries use id< 101; id < 1001; id < 10001;
and returns the actual data instead of the current "ok";

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

Time Complexity: The original code has a time complexity of $O(N \cdot D)$, while the optimized code reduces it to $O(N)$, significantly lowering complexity and preventing performance degradation.
Space Complexity: Both versions have a space complexity of $O(N)$.

Optimized: Memory is allocated once and used throughout, ensuring stability and efficiency.
Original: Potential memory jitter (repeated allocation and deallocation) introduces additional overhead.

Memory Copy:

Original: Memory copy operations occur $N \cdot (D - 1)$ times (handled internally by Rust).
Optimized: Reduced to $0$ copies, as all operations are performed via references.

Total Copied Elements:

Original: $N \cdot (D + 1)$ elements copied.
Optimized: Reduced to $2N$ elements.

These optimizations should bring significant performance improvements to querying data values ​​(especially for big data).

@bymoye
Copy link
Contributor Author

bymoye commented Jul 2, 2025

Take the following table as an example:

field_one field_two field_three
a1        b1        c1         
a2        b2        c2         
a3        b3        c3         

Query the entire table and process the data through psqlpy.
Then the code will go through:

  1. Unloading (9 copies) (this step is the same)
    The data processing result of this step is: vec!["a1", "b1", "c1", "a2", "b2", "c2", "a3", "b3", "c3"]
  2. Transport (internal processing)
    In the original code, a new Vec is created for each row, and then each row of data is copied into it. This may be bad when there is a lot of data.
    In the optimized code, no copy operation is performed, but a pointer with almost zero cost is used for planning. No string copying is caused.
  3. Return to Python (9 copies) (this step is also the same)
    The process of converting Rust's String to Python's Str.

@bymoye bymoye changed the title change result change result and more optimizations Jul 2, 2025
@chandr-andr
Copy link
Member

@chandr-andr I checked your code, and I think the current test is not right. Because the engine benchmark should test the performance of the engine, not the performance of the database. So the database query should not use ORDER BY RANDOM, but to reduce the database pressure as much as possible (so that the database is almost no pressure) to complete the query, and then return to Python to get the same result. The data will be converted and other actions (completed by the engine). We can return as much data as possible to complete the performance test.

Hello! psqlpy-test is for querying and database communication performance. I have some raw tests for (de)serialization locally, I just didn't have enough time to make them fully complete and push to the repo and documentation.

I totally agree with you, thanks for the optimizations.

@chandr-andr
Copy link
Member

@bymoye Can you check clippy - there are some errors?

@bymoye
Copy link
Contributor Author

bymoye commented Jul 3, 2025

Hi, @chandr-andr ,
I'll look into the clippy issue later.
One other thing you might want to look at is I ran a test I wrote locally and got the following results:

🎯 extreme_query Performance Comparison

Library Status Throughput (ops/s) Avg Latency (ms) Memory Delta (MB)
AsyncPG Success 32.3 30916.50 0.3
Psycopg3 Success 32.2 31050.15 0.1
PSQLPy Success 16.2 61665.21 2.1
Databases Success 8.3 120346.22 0.9

🎯 bulk_insert Performance Comparison

Library Status Throughput (ops/s) Avg Latency (ms) Memory Delta (MB)
AsyncPG Success 32.4 30893.69 0.1
PSQLPy Success 32.2 31030.38 0.1
Psycopg3 Success 32.1 31117.88 0.1
Databases Success 8.1 123635.79 0.1

🎯 concurrent_read Performance Comparison

Library Status Throughput (ops/s) Avg Latency (ms) Memory Delta (MB)
AsyncPG Success 32.3 30893.80 0.1
PSQLPy Success 32.3 30953.28 0.0
Psycopg3 Success 32.2 31029.34 0.0
Databases Success 8.4 118577.62 0.4

🎯 complex_query Performance Comparison

Library Status Throughput (ops/s) Avg Latency (ms) Memory Delta (MB)
Psycopg3 Success 33.9 29478.60 0.0
PSQLPy Success 32.6 30641.29 0.0
AsyncPG Success 31.5 31770.39 0.0
Databases Success 13.2 75851.61 0.1

✅ Cross-Library Performance Comparison Complete!

I investigated the cause a little bit (not in depth), and I found that the problem might be that the execute method will build a complete StatementBuilder regardless of whether there are any parameters passed in. So this causes this problem. Maybe there should be a fast track for simple query code?

@bymoye
Copy link
Contributor Author

bymoye commented Jul 3, 2025

Another thing is.... In the current pytest, it seems that the connection is not disconnected in the end. So the resources are always occupied. (The connection is not released correctly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants