Skip to content

Added logic to include partition metadata for no-copy #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

subkanthi
Copy link
Collaborator

closes: #46

@subkanthi
Copy link
Collaborator Author

 insert btc.transactions_no_copy --no-copy --force-no-copy --s3-region=us-east-2 --s3-no-sign-request -p s3://aws-public-blockchain/v1.0/btc/transactions/date=2025-01-01/part-00000-33e8d075-2099-409b-a806-68dd17217d39-c000.snappy.parquet --partition='[{"column": "block_timestamp", "transform": "month"}]'

@subkanthi
Copy link
Collaborator Author

To set the partitionKey to a datafile, we need the partition values

PartitionKey partitionKey = new PartitionKey(table.spec(), table.schema());
// Assuming your S3 path already reflects the partitioning, you'd extract the partition values from the path
// For example, if your path is s3://your-bucket/path/to/your/datafile_hour=10.parquet, you'd set the hour for the partitionKey.
// However, for DataFile, the partitionKey should represent the partition values within the file itself.
// You would then build the PartitionKey from the record's values to match the partition spec.

// If you need to manually set the partition values for the DataFile:
partitionKey.set(0, hour_value); // Set the value for the first partition field (e.g., hour)

@subkanthi subkanthi marked this pull request as ready for review August 12, 2025 02:48
@shyiko
Copy link
Collaborator

shyiko commented Aug 18, 2025

Closing in favor of logic included in 0.5.0 that infers key from the metadata instead of data. https://github.com/Altinity/ice/pull/51/files#diff-efe5f830dfd30841f29c50a9f843a6c295aafb4bfd3d60202bb22f8680272686R559 is also not a valid way to determine partitioning key unless data is guaranteed to be pre-partitioned.

@shyiko shyiko closed this Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When '--no-copy' is used for loading files into partitioned table, partition metadata is not populated
2 participants