-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29192: Iceberg: [V3] Add support for native default column type during create #6074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
72d0223
to
06efe12
Compare
This comment was marked as outdated.
This comment was marked as outdated.
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (6)calcualtion Previously acknowledged words that are now absentwwwTo accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ayushtkn/hive.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
Map<String, String> defaultValues) { | ||
HiveSchemaConverter converter = new HiveSchemaConverter(autoConvert); | ||
return new Schema(converter.convertInternal(names, typeInfos, comments)); | ||
return new Schema(converter.convertInternal(names, typeInfos, comments, defaultValues)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: comments
probably should be the last arg
|
||
if (defaultValues.containsKey(columnName)) { | ||
if (type.isPrimitiveType()) { | ||
Object icebergDefaultValue = getDefaultValue(stripQuotes(defaultValues.get(columnName)), type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move stripQuotes
inside of getDefaultValue
} | ||
|
||
private Map<String, String> getDefaultValuesMap(String defaultValue) { | ||
if (defaultValue == null || defaultValue.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be simplified with StringUtils.isEmpty()
* @return An equivalent Iceberg Schema | ||
*/ | ||
public static Schema convert(List<FieldSchema> fieldSchemas, boolean autoConvert) { | ||
public static Schema convert(List<FieldSchema> fieldSchemas, boolean autoConvert, Map<String, String> defaultValues) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autoConvert should be last arg
.orElse(Collections.emptySet()); | ||
|
||
Schema schema = schema(catalogProperties, hmsTable, identifierFields); | ||
Map<String, String> defaultValues = Stream.ofNullable(request.getDefaultConstraints()).flatMap(Collection::stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this into schema method?
preAlterTableProperties.format = sd.getInputFormat(); | ||
preAlterTableProperties.schema = schema(catalogProperties, hmsTable, Collections.emptySet()); | ||
preAlterTableProperties.schema = | ||
schema(catalogProperties, hmsTable, Collections.emptySet(), Collections.emptyMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok to pass empty list here?
return Collections.emptyMap(); | ||
} | ||
// For Struct, the default value is expected to be in key:value format | ||
return Arrays.stream(stripQuotes(defaultValue).split(",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to use guava splitter here?
Splitter.on(',').trimResults().withKeyValueSeparator(':')
.split(stripQuotes(defaultValue))
} | ||
|
||
@Override | ||
public boolean supportsNativeColumnDefault(Map<String, String> tblProps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supportsDefaultColumnValues ?
super(table, newDataWriter(table, fileWriterFactory, dataFileFactory, context)); | ||
|
||
this.currentSpecId = table.spec().specId(); | ||
this.missingColumns = Optional.ofNullable(missingColumns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not pas as Set initially? also maybe set this in a Context
@Override | ||
public void write(Writable row) throws IOException { | ||
Record record = ((Container<Record>) row).get(); | ||
setDefault(specs.get(currentSpecId).schema().asStruct().fields(), record, missingColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be applied to every row? can we optimize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get one record only pushed & this is the place where we have the spec to extract the defaults from the Iceberg layer or did I catch your question wrong?
writer.write(record, specs.get(currentSpecId), partition(record, currentSpecId)); | ||
} | ||
|
||
private static void setDefault(List<Types.NestedField> fields, Record record, Set<String> missingColumns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this helper methods should be extracted from the writer
cc @kasakrisz for the compiler part. |
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (6)calcualtion Previously acknowledged words that are now absentwwwTo accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:ayushtkn/hive.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
|
What changes were proposed in this pull request?
Leverage the Iceberg V3 Column Defaults
Why are the changes needed?
To maintain the column defaults in the Iceberg Spec & able to use defaults for fields of Struct
Does this PR introduce any user-facing change?
Yes, now column defaults are persisted in Iceberg spec. + We can specify defaults for specific fields of a Struct type
How was this patch tested?
UT