Skip to content

Conversation

ayushtkn
Copy link
Member

What changes were proposed in this pull request?

Leverage the Iceberg V3 Column Defaults

Why are the changes needed?

To maintain the column defaults in the Iceberg Spec & able to use defaults for fields of Struct

Does this PR introduce any user-facing change?

Yes, now column defaults are persisted in Iceberg spec. + We can specify defaults for specific fields of a Struct type

How was this patch tested?

UT

@ayushtkn ayushtkn changed the title Iceberg: [V3] Add support for native default column type during create (WIP) HIVE-29192: Iceberg: [V3] Add support for native default column type during create Sep 16, 2025
@apache apache deleted a comment from github-actions bot Sep 16, 2025
@apache apache deleted a comment from github-actions bot Sep 16, 2025
@apache apache deleted a comment from github-actions bot Sep 16, 2025

This comment was marked as outdated.

Copy link

github-actions bot commented Sep 16, 2025

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (6)

calcualtion
Chrono
getenv
ntz
OOM
unsign

Previously acknowledged words that are now absent www
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:ayushtkn/hive.git repository
on the HIVE-29192 branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://github.com/api/repos/apache/hive/issues/comments/3300130684" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u
If the flagged items do not appear to be text

If items relate to a ...

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

  • binary file.

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

Map<String, String> defaultValues) {
HiveSchemaConverter converter = new HiveSchemaConverter(autoConvert);
return new Schema(converter.convertInternal(names, typeInfos, comments));
return new Schema(converter.convertInternal(names, typeInfos, comments, defaultValues));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: comments probably should be the last arg


if (defaultValues.containsKey(columnName)) {
if (type.isPrimitiveType()) {
Object icebergDefaultValue = getDefaultValue(stripQuotes(defaultValues.get(columnName)), type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move stripQuotes inside of getDefaultValue

}

private Map<String, String> getDefaultValuesMap(String defaultValue) {
if (defaultValue == null || defaultValue.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simplified with StringUtils.isEmpty()

* @return An equivalent Iceberg Schema
*/
public static Schema convert(List<FieldSchema> fieldSchemas, boolean autoConvert) {
public static Schema convert(List<FieldSchema> fieldSchemas, boolean autoConvert, Map<String, String> defaultValues) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoConvert should be last arg

.orElse(Collections.emptySet());

Schema schema = schema(catalogProperties, hmsTable, identifierFields);
Map<String, String> defaultValues = Stream.ofNullable(request.getDefaultConstraints()).flatMap(Collection::stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this into schema method?

preAlterTableProperties.format = sd.getInputFormat();
preAlterTableProperties.schema = schema(catalogProperties, hmsTable, Collections.emptySet());
preAlterTableProperties.schema =
schema(catalogProperties, hmsTable, Collections.emptySet(), Collections.emptyMap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok to pass empty list here?

return Collections.emptyMap();
}
// For Struct, the default value is expected to be in key:value format
return Arrays.stream(stripQuotes(defaultValue).split(","))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to use guava splitter here?

Splitter.on(',').trimResults().withKeyValueSeparator(':')
  .split(stripQuotes(defaultValue))

}

@Override
public boolean supportsNativeColumnDefault(Map<String, String> tblProps) {
Copy link
Member

@deniskuzZ deniskuzZ Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supportsDefaultColumnValues ?

super(table, newDataWriter(table, fileWriterFactory, dataFileFactory, context));

this.currentSpecId = table.spec().specId();
this.missingColumns = Optional.ofNullable(missingColumns)
Copy link
Member

@deniskuzZ deniskuzZ Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pas as Set initially? also maybe set this in a Context

@Override
public void write(Writable row) throws IOException {
Record record = ((Container<Record>) row).get();
setDefault(specs.get(currentSpecId).schema().asStruct().fields(), record, missingColumns);
Copy link
Member

@deniskuzZ deniskuzZ Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be applied to every row? can we optimize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get one record only pushed & this is the place where we have the spec to extract the defaults from the Iceberg layer or did I catch your question wrong?

writer.write(record, specs.get(currentSpecId), partition(record, currentSpecId));
}

private static void setDefault(List<Types.NestedField> fields, Record record, Set<String> missingColumns) {
Copy link
Member

@deniskuzZ deniskuzZ Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this helper methods should be extracted from the writer

@deniskuzZ
Copy link
Member

cc @kasakrisz for the compiler part.

@apache apache deleted a comment from github-actions bot Sep 17, 2025
Copy link

github-actions bot commented Sep 17, 2025

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (6)

calcualtion
Chrono
getenv
ntz
OOM
unsign

Previously acknowledged words that are now absent www
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:ayushtkn/hive.git repository
on the HIVE-29192 branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://github.com/api/repos/apache/hive/issues/comments/3304486053" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u
If the flagged items do not appear to be text

If items relate to a ...

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

  • binary file.

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants