Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

psqlpy slower than psycopg #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BimaAdi opened this issue Jul 3, 2024 · 7 comments
Closed

psqlpy slower than psycopg #62

BimaAdi opened this issue Jul 3, 2024 · 7 comments

Comments

@BimaAdi
Copy link

BimaAdi commented Jul 3, 2024

I made my own benchmark between psqlpy vs psycopg you can found the comparison and code in this github repo https://github.com/BimaAdi/psqlpy-vs-psycopg. I made 2 use case

  1. Insert bulk (insert many data in one transaction)
  2. Get many (get many data from database)

I found that psqlpy is slower than psycopg. Here the result (all in second):

Insert Bulk Time

psqlpy psycopg
1 0.271 0.096
2 0.269 0.184
3 0.319 0.216
4 0.38 0.097
5 0.378 0.117
6 0.288 0.139
7 0.287 0.159
8 0.267 0.101
9 0.285 0.107
10 0.448 0.181
avg 0.319 0.14

Get Bulk Time

psqlpy psycopg
1 0.002 0.002
2 0.005 0.003
3 0.004 0.004
4 0.002 0.002
5 0.005 0.002
6 0.002 0.002
7 0.002 0.003
8 0.003 0.002
9 0.003 0.002
10 0.003 0.002
avg 0.003 0.002

Machine spec

  • intel core i7 9th gen
  • 16 gb RAM
  • SSD

Although by only small margin but psycopg consitently beat psqlpy. What's wrong with my benchmark? did I do something wrong? or maybe due to other factor?

@chandr-andr
Copy link
Member

Hello! Thank you for you benchmarks.
But, I've found some strange moments.

First of all, let's see my results:

insert bulk time
shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.027  | 0.034   |
| 2   | 0.025  | 0.024   |
| 3   | 0.024  | 0.023   |
| 4   | 0.022  | 0.021   |
| 5   | 0.022  | 0.021   |
| 6   | 0.022  | 0.021   |
| 7   | 0.023  | 0.02    |
| 8   | 0.022  | 0.021   |
| 9   | 0.023  | 0.021   |
| 10  | 0.024  | 0.021   |
| avg | 0.023  | 0.023   |
get bulk time
shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.001  | 0.001   |
| 2   | 0.001  | 0.001   |
| 3   | 0.001  | 0.001   |
| 4   | 0.001  | 0.001   |
| 5   | 0.001  | 0.001   |
| 6   | 0.001  | 0.001   |
| 7   | 0.001  | 0.001   |
| 8   | 0.001  | 0.001   |
| 9   | 0.001  | 0.001   |
| 10  | 0.001  | 0.001   |
| avg | 0.001  | 0.001   |

They are approximately the same. I'll make sure to explain why.

So, what I found strange?

  1. In inserting new rows with psqlpy, you don't really use a transaction.
    https://github.com/BimaAdi/psqlpy-vs-psycopg/blob/main/bench_psqlpy.py#L29
    This method is asynchronous
  2. Using the old version of PostgreSQL. As I see, you are using the 12 version, I didn't perform any tests on the 12th version of PostgreSQL.
    So the question is why is so old?

The correct (for you test) PSQLPy bench function is:

async def bench() -> BenchTypedDict:
    bench_time: BenchTypedDict = {"insert_bulk_time": 0.0, "get_bulk_time": 0.0}
    try:
        db_pool = (
            ConnectionPoolBuilder()
            .user(POSTGRESQL_USER)
            .password(POSTGRESQL_PASSWORD)
            .host(POSTGRESQL_HOST)
            .port(POSTGRESQL_PORT)
            .dbname(POSTGRESQL_DATABASE)
            .build()
        )

        async with db_pool.acquire() as connection:
            async with connection.transaction() as trans:
                await create_table(trans)

                start = time()
                await insert_bulk(trans)
                end = time() - start
                bench_time["insert_bulk_time"] = end

                start = time()
            
                await get_bulk(trans)
                end = time() - start
                bench_time["get_bulk_time"] = end

            
                await drop_table(trans)
    finally:
        db_pool.close()

The main explanation:
PSQLPy aims to give you performance in real high-load applications at first, because of how rust (tokio runtime) manages asynchronous tasks.
Let's imagine what is happening in the test you made, you always have 1 connection that is perfectly connected to the PostgreSQL, and all that drivers must do is request the database and get results. There is practically no window for optimization.
On the other hand, when you have a lot of asynchronous requests to the database, it's important how the driver manages the connection pool and how a programming language manages the event loop tasks.

@chandr-andr
Copy link
Member

I've made test with the 12th version of PostgreSQL (from you docker-compose).

shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.029  | 0.045   |
| 2   | 0.026  | 0.027   |
| 3   | 0.024  | 0.024   |
| 4   | 0.025  | 0.023   |
| 5   | 0.025  | 0.025   |
| 6   | 0.025  | 0.025   |
| 7   | 0.021  | 0.023   |
| 8   | 0.024  | 0.024   |
| 9   | 0.028  | 0.023   |
| 10  | 0.027  | 0.026   |
| avg | 0.025  | 0.027   |
get bulk time
shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.0    | 0.001   |
| 2   | 0.0    | 0.0     |
| 3   | 0.0    | 0.0     |
| 4   | 0.0    | 0.0     |
| 5   | 0.0    | 0.0     |
| 6   | 0.0    | 0.0     |
| 7   | 0.0    | 0.0     |
| 8   | 0.0    | 0.0     |
| 9   | 0.0    | 0.0     |
| 10  | 0.0    | 0.0     |
| avg | 0.0    | 0.0     |

@BimaAdi
Copy link
Author

BimaAdi commented Jul 4, 2024

In inserting new rows with psqlpy, you don't really use a transaction.
https://github.com/BimaAdi/psqlpy-vs-psycopg/blob/main/bench_psqlpy.py#L29
This method is asynchronous

I Want transaction only apllied during insert bulk only not on entire function

Using the old version of PostgreSQL. As I see, you are using the 12 version, I didn't perform any tests on the 12th version of PostgreSQL.
So the question is why is so old?

It's the only version I had on my machine I will try on latest postgresql (16)

The main explanation:
PSQLPy aims to give you performance in real high-load applications at first, because of how rust (tokio runtime) manages asynchronous tasks.
Let's imagine what is happening in the test you made, you always have 1 connection that is perfectly connected to the PostgreSQL, and all that drivers must do is request the database and get results. There is practically no window for optimization.
On the other hand, when you have a lot of asynchronous requests to the database, it's important how the driver manages the connection pool and how a programming language manages the event loop tasks.

Great explanation next time I will try benchmark it using web server like fastapi with high rps. Maybe it will get different result

@chandr-andr
Copy link
Member

I Want transaction only apllied during insert bulk only not on entire function

Okay, I thought you need transaction everywhere, because this is standart psycopg behavior: Psycopg has a behaviour that may seem surprising compared to psql: by default, any database operation will start a new transaction..
(https://www.psycopg.org/psycopg3/docs/basic/transactions.html). So, practically, I did exactly the same in PSQLPy but explicit.

Great explanation next time I will try benchmark it using web server like fastapi with high rps. Maybe it will get different result

You can try to use our psqlpy-stress to do it, just install bombardier and run application.
We are going to show our different benchmarks this month. Spoiler: real performance gain is seen only when PostgreSQL is located not on localhost but somewhere else (another server, for example), and in my opinion, it's the most important, because it's really uncommon situation when application and database are located on the same machine and applicable only for small not high-load applications.

@chandr-andr
Copy link
Member

@BimaAdi If you dont mind, I can transfer the issue to discussions cuz it's not really an issue, more like discussion. If we'll find really downsides of PSQLPy, I'll be happy to improve performance.

@BimaAdi
Copy link
Author

BimaAdi commented Jul 4, 2024

@chandr-andr Ok, I don't mind is this issue become a discussion. I made this issue because there is 0 discussion on discussion tab and i read this issue #43 about benchmark as well.

@chandr-andr
Copy link
Member

Yeap, I understand. Library is young (4.5 months), so there are no a lot of activity yet

@psqlpy-python psqlpy-python locked and limited conversation to collaborators Jul 4, 2024
@chandr-andr chandr-andr converted this issue into discussion #64 Jul 4, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants