HIVE-29192: Iceberg: [V3] Add support for native default column type during create #6074

ayushtkn · 2025-09-11T23:09:08Z

What changes were proposed in this pull request?

Leverage the Iceberg V3 Column Defaults

Why are the changes needed?

To maintain the column defaults in the Iceberg Spec & able to use defaults for fields of Struct

Does this PR introduce any user-facing change?

Yes, now column defaults are persisted in Iceberg spec. + We can specify defaults for specific fields of a Struct type

How was this patch tested?

UT

…during create

github-actions · 2025-09-16T19:47:14Z

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (6)

calcualtion
Chrono
getenv
ntz
OOM
unsign

Previously acknowledged words that are now absent

www

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:ayushtkn/hive.git repository
on the HIVE-29192 branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://github.com/api/repos/apache/hive/issues/comments/3300130684" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u

If the flagged items do not appear to be text

If items relate to a ...

well-formed pattern.

If you can write a pattern that would match it,
try adding it to the patterns.txt file.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.
binary file.

Please add a file path to the excludes.txt file matching the containing file.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

deniskuzZ · 2025-09-17T14:06:31Z

iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java

+      Map<String, String> defaultValues) {
    HiveSchemaConverter converter = new HiveSchemaConverter(autoConvert);
-    return new Schema(converter.convertInternal(names, typeInfos, comments));
+    return new Schema(converter.convertInternal(names, typeInfos, comments, defaultValues));


minor: comments probably should be the last arg

deniskuzZ · 2025-09-17T14:09:39Z

iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java

+
+      if (defaultValues.containsKey(columnName)) {
+        if (type.isPrimitiveType()) {
+          Object icebergDefaultValue = getDefaultValue(stripQuotes(defaultValues.get(columnName)), type);


can we move stripQuotes inside of getDefaultValue

deniskuzZ · 2025-09-17T14:11:38Z

iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java

  }
+
+  private Map<String, String> getDefaultValuesMap(String defaultValue) {
+    if (defaultValue == null || defaultValue.isEmpty()) {


could be simplified with StringUtils.isEmpty()

deniskuzZ · 2025-09-17T14:21:40Z

iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java

   * @return An equivalent Iceberg Schema
   */
-  public static Schema convert(List<FieldSchema> fieldSchemas, boolean autoConvert) {
+  public static Schema convert(List<FieldSchema> fieldSchemas, boolean autoConvert, Map<String, String> defaultValues) {


autoConvert should be last arg

deniskuzZ · 2025-09-17T14:24:13Z

iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/BaseHiveIcebergMetaHook.java

        .orElse(Collections.emptySet());

-    Schema schema = schema(catalogProperties, hmsTable, identifierFields);
+    Map<String, String> defaultValues = Stream.ofNullable(request.getDefaultConstraints()).flatMap(Collection::stream)


can we move this into schema method?

deniskuzZ · 2025-09-17T14:25:57Z

iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java

      preAlterTableProperties.format = sd.getInputFormat();
-      preAlterTableProperties.schema = schema(catalogProperties, hmsTable, Collections.emptySet());
+      preAlterTableProperties.schema =
+          schema(catalogProperties, hmsTable, Collections.emptySet(), Collections.emptyMap());


is it ok to pass empty list here?

deniskuzZ · 2025-09-17T14:31:01Z

iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java

+      return Collections.emptyMap();
+    }
+    // For Struct, the default value is expected to be in key:value format
+    return Arrays.stream(stripQuotes(defaultValue).split(","))


would it be better to use guava splitter here?

Splitter.on(',').trimResults().withKeyValueSeparator(':') .split(stripQuotes(defaultValue))

deniskuzZ · 2025-09-17T14:35:59Z

iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java

  }

+  @Override
+  public boolean supportsNativeColumnDefault(Map<String, String> tblProps) {


supportsDefaultColumnValues ?

deniskuzZ · 2025-09-17T14:38:04Z

...iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergRecordWriter.java

    super(table, newDataWriter(table, fileWriterFactory, dataFileFactory, context));

    this.currentSpecId = table.spec().specId();
+    this.missingColumns = Optional.ofNullable(missingColumns)


why not pas as Set initially? also maybe set this in a Context

deniskuzZ · 2025-09-17T14:41:02Z

...iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergRecordWriter.java

  @Override
  public void write(Writable row) throws IOException {
    Record record = ((Container<Record>) row).get();
+    setDefault(specs.get(currentSpecId).schema().asStruct().fields(), record, missingColumns);


should it be applied to every row? can we optimize?

We get one record only pushed & this is the place where we have the spec to extract the defaults from the Iceberg layer or did I catch your question wrong?

deniskuzZ · 2025-09-17T14:42:35Z

...iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergRecordWriter.java

    writer.write(record, specs.get(currentSpecId), partition(record, currentSpecId));
  }

+  private static void setDefault(List<Types.NestedField> fields, Record record, Set<String> missingColumns) {


i think this helper methods should be extracted from the writer

deniskuzZ · 2025-09-17T14:52:33Z

cc @kasakrisz for the compiler part.

github-actions · 2025-09-17T20:35:16Z

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (6)

calcualtion
Chrono
getenv
ntz
OOM
unsign

Previously acknowledged words that are now absent

www

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:ayushtkn/hive.git repository
on the HIVE-29192 branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://github.com/api/repos/apache/hive/issues/comments/3304486053" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u

If the flagged items do not appear to be text

If items relate to a ...

well-formed pattern.

If you can write a pattern that would match it,
try adding it to the patterns.txt file.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.
binary file.

Please add a file path to the excludes.txt file matching the containing file.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

sonarqubecloud · 2025-09-18T00:32:28Z

Quality Gate passed

Issues
11 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added tests pending tests failed and removed tests pending labels Sep 11, 2025

ayushtkn force-pushed the HIVE-29192 branch from 72d0223 to 06efe12 Compare September 12, 2025 12:08

asf-ci-hive added tests pending tests unstable tests passed and removed tests failed tests pending tests unstable labels Sep 12, 2025

ayushtkn changed the title ~~Iceberg: [V3] Add support for native default column type during create (WIP)~~ HIVE-29192: Iceberg: [V3] Add support for native default column type during create Sep 16, 2025

asf-ci-hive added tests pending and removed tests passed labels Sep 16, 2025

apache deleted a comment from github-actions bot Sep 16, 2025

This comment was marked as outdated.

Sign in to view

asf-ci-hive added tests unstable and removed tests pending labels Sep 16, 2025

HIVE-29192: Iceberg: [V3] Add support for native default column type …

a8c6c85

…during create

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Sep 16, 2025

deniskuzZ reviewed Sep 17, 2025

View reviewed changes

Add the missing tests

6fce249

asf-ci-hive added tests pending and removed tests passed labels Sep 17, 2025

apache deleted a comment from github-actions bot Sep 17, 2025

Address Review Comments

eebe960

asf-ci-hive added tests passed tests pending and removed tests pending tests passed labels Sep 17, 2025

asf-ci-hive added tests passed and removed tests pending labels Sep 18, 2025

HIVE-29192: Iceberg: [V3] Add support for native default column type during create #6074

Are you sure you want to change the base?

HIVE-29192: Iceberg: [V3] Add support for native default column type during create #6074

Conversation

ayushtkn commented Sep 11, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

This comment was marked as outdated.

github-actions bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

@check-spelling-bot Report

🔴 Please review

Unrecognized words (6)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

@check-spelling-bot Report

🔴 Please review

Unrecognized words (6)

Uh oh!

sonarqubecloud bot commented Sep 18, 2025

Quality Gate passed

Uh oh!

Uh oh!

github-actions bot commented Sep 16, 2025 •

edited

Loading

deniskuzZ Sep 17, 2025 •

edited

Loading

deniskuzZ Sep 17, 2025 •

edited

Loading

deniskuzZ Sep 17, 2025 •

edited

Loading

deniskuzZ Sep 17, 2025 •

edited

Loading

github-actions bot commented Sep 17, 2025 •

edited

Loading