-
Notifications
You must be signed in to change notification settings - Fork 712
feature: Log data sizes in load test benchmarks #1949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Log data sizes in load test benchmarks #1949
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
tests/load/test_s3_modin.py
Outdated
@pytest.fixture(scope="function") | ||
def df_s() -> pd.DataFrame: | ||
# Data frame with 100000 rows | ||
ray_ds = ray.data.read_parquet("s3://ursa-labs-taxi-data/2010/02/data.parquet") | ||
return ray_ds.to_modin() | ||
|
||
|
||
@pytest.fixture(scope="function") | ||
def big_modin_df() -> pd.DataFrame: | ||
pandas_refs = ray.data.range_table(100_000).to_pandas_refs() | ||
dataset = ray.data.from_pandas_refs(pandas_refs) | ||
|
||
frame = dataset.to_modin() | ||
frame["foo"] = frame.value * 2 | ||
frame["bar"] = frame.value % 2 | ||
|
||
return frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are defined in test_s3
as well, should we just move them into a fixture file?
def test_modin_s3_read_parquet_simple(benchmark_time: float, request: pytest.FixtureRequest) -> None: | ||
path = "s3://ursa-labs-taxi-data/2018/" | ||
with ExecutionTimer(request, data_paths=path) as timer: | ||
ray_ds = ray.data.read_parquet(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we stage data in a Ray dataset first when Modin already has dedicated methods?
The test I had in mind was simply: pd.read_parquet(path)
where pd is import modin.pandas as pd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reasons I replaced read Parquet with the Ray version is that Pandas was having a lot of trouble with the partition style (e.g. the partitions are 01
rather than month=01
).
Once we move the load test data into our bucket, we can make sure that the data structure is such that both Modin and AWS SDK for pandas can access it natively: #1962
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This reverts commit a314437.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
This comment was marked as outdated.
This comment was marked as outdated.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Feature or Bugfix
Detail
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.