Skip to content

Enable Log Aggregation for Spark #218

Closed
@maltesander

Description

@maltesander
Member

See stackabletech/hbase-operator#291 and stackabletech/docker-images#283 for reference.

This is part of stackabletech/issues#288

Implementation details

Additional Java options are added to the Spark submit job, the Spark driver, the Spark executor, and the Spark history server, e.g. to set the log4j configuration file or the classpath.

Spark submit job

The Java options of the Spark submit job are set with the environment variable SPARK_SUBMIT_OPTS. Using this environment variable is the only way in cluster mode. Unfortunately it is not part of the public API but the logging integration test will break if this will ever be changed. The user cannot override this environment variable in the cluster specification.

Spark driver

The Java options of the Spark driver are set with the configuration spark.driver.defaultJavaOptions. The user can add extra Java options with spark.driver.extraJavaOptions in the cluster specification at spec.sparkConf. The Spark configuration contains both options and is located in the file spark.properties in a generated ConfigMap named spark-drv-<hash>-conf-map. This configuration file is then used in the driver:

$ kubectl logs spark-cluster-<hash>-driver -c spark
...
+ exec /usr/bin/tini -s -- /stackable/spark/bin/spark-submit --conf spark.driver.bindAddress=... --deploy-mode client --properties-file /opt/spark/conf/spark.properties ...

Spark executor

The Java options of the Spark executor are set with the configuration spark.executor.defaultJavaOptions. The user can add extra Java options with spark.executor.extraJavaOptions in the cluster specification at spec.sparkConf. Both options are added to the command line of the executor:

$ kubectl logs spark<job>-<hash>-exec-1 -c spark
...
+ exec /usr/bin/tini -s -- /usr/lib/jvm/jre-11/bin/java ... -Dlog4j.configurationFile=/stackable/log_config/log4j2.properties <extraJavaOptions> ...

Spark history server

The Java options of the Spark history server are set with the environment variable SPARK_HISTORY_OPTS. This environment variable cannot be overriden by the user.

Metadata

Metadata

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @maltesander@siegfriedweber

    Issue actions

      Enable Log Aggregation for Spark · Issue #218 · stackabletech/spark-k8s-operator