-
Notifications
You must be signed in to change notification settings - Fork 15
Implement count() method #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would agree to add In almost all cases I know, it is safe to not show the total count of whatever you return to the user. |
len() and bsize() doesn't help if the customer wants to count filtered tuples without actually loading them to the client (over the network in the general case). A full scan on a sharded space is not a big deal if it is not done often. We cannot avoid full scans completely in other cases, knowledge of how it works will be always necessary for developers. |
It is a big deal because it stops other things from accessing a database. And yes, we definitely can avoid full scans. We just won't allow them through the API. |
If we won't allow the full scans, they will not disappear from the customer tasks. This pain will just shift to another place. We need some kind of support for such tasks for being able to implement these things in connectors. |
Your statement is demonstrably false. Aerospike and Redis can exist without such queries. They have the ability to iterate over the collection on the client the same way we propose to do with To count the items of a large collection you can create a separate space with counters. With interactive transactions, you can atomically update both of those spaces from the client. This will not require you to write any additional code. If you have only a few items to count, you can just select them all. |
But counters in special spaces look like an implementation detail, why cannot we have count() method in CRUD API which does all this boilerplate under the hood? I see that for every simple task like count which may involve scan complexity we are going to push the customers to reinvent the wheel. And connectors cannot help to avoid this because there is no DDL API for now. UPD: There is a problem that CRUD API doesn't rely on any DDL API at the moment too. |
Seems it's time to triage once more because we have the following use case: So I suggest to make count_async:
To avoid any locks and slowdowns - implement mutex, that will guarantee that storage_count_async may run no more than N times simultaneously on each storage. |
@no1seman Because, for example, there is still a frequent task on the cluster to write a set of data on the storage in a transaction. And in this transaction on the storage, you need to perform many different operations. It is not necessary to send the procedure code through the cruise, you can simply teach to call an already existing store. |
Lets do something like
Where options:
count look through space indexes and find index for
The same for bsize, pairs |
@unera Why not to use the same API as select/pairs? The man difference from select/pairs: count not get data and do it with yields. So, seems need the folllowing options: |
One more thing, to kill the whole cluster with one wrong query. |
I agree :) I didn't think that the question and select/pairs are different. So, lets do as select/pairs. Drop my comment from 1 Oct. |
local objects, err = crud.count(space_name, conditions, opts) Syntax is the same, excluding options:
|
@unera batch_size may be used as number of pairs cycles between yields or there may be any other option. |
What about this case: Can instead of inventing one more not working 'killer feature' , make a general map/reduce? |
For `count` implementation with the support of the query by conditions there is a need to use query plan and condition filters that has been already written for select. This commit separates common methods from select module and moves them in common folders. Part of #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `count_to_yield`; * counts by any conditions. Closes #74
For `count` implementation with the support of the query by conditions there is a need to use query plan and condition filters that has been already written for select. This commit separates common methods from select module and moves them in common folders. Part of #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `count_to_yield`; * counts by any conditions. Closes #74
For `count` implementation with the support of the query by conditions there is a need to use query plan and condition filters that has been already written for select. This commit separates common methods from select module and moves them in common folders. Part of #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `count_to_yield`; * counts by any conditions. Closes #74
For `count` implementation with the support of the query by conditions there is a need to use query plan and condition filters that has been already written for select. This commit separates common methods from select module and moves them in common folders. Part of #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `count_to_yield`; * counts by any conditions. Closes #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `yield_every`; * counts by any conditions. Closes #74
For `count` implementation with the support of the query by conditions there is a need to use query plan and condition filters that has been already written for select. This commit separates common methods from select module and moves them in common folders. Part of #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `yield_every`; * counts by any conditions. Closes #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `yield_every`; * counts by any conditions. Closes #74
For `count` implementation with the support of the query by conditions there is a need to use query plan and condition filters that has been already written for select. This commit separates common methods from select module and moves them in common folders. Part of #74
This commit introduces count method that: * has arguments and options like `select()`/`pairs()`; * counts number of rows in space with yield by `yield_every`; * counts by any conditions. Closes #74
Use cases:
a) customer wants to see counts of templates in message template catalogs in application UI. The templates and catalogs are stored in vshard, and operated via CRUD through a connector.
b) pagination -- we need to know the total amount of results for displaying it in the UI.
The count() method must accept conditions.
Proposed API variant:
The text was updated successfully, but these errors were encountered: