-
Notifications
You must be signed in to change notification settings - Fork 556
Closed
Labels
BugHelp wantedExtra attention is neededExtra attention is neededIntegration: Apache SparkTriagedHas been looked at recently during old issue triageHas been looked at recently during old issue triage
Description
Environment
How do you use Sentry?
Sentry SaaS (sentry.io)
Which SDK and version?
[email protected] using spark integration
Steps to Reproduce
This is essentially a MWE of what our setup looks like:
from pyspark import SparkConf
from pyspark.sql import SparkSession
import sentry_sdk
from sentry_sdk.integrations.spark import SparkIntegration
sentry_sdk.init(SENTRY_DSN, integrations=[SparkIntegration()])
def get_spark_context(job_name):
conf = SparkConf().setAppName(job_name)
conf = conf.set("spark.python.use.daemon", True)
conf = conf.set("spark.python.daemon.module", "sentry_daemon")
session = SparkSession.builder.config(conf=conf).getOrCreate()
session.sparkContext.addPyFile(".../sentry_daemon.py")
return session.sparkContext
sc = get_spark_context("my_job")
for batch in batches:
sc.textFile(batch.input_path).map(some_function).saveAsTextFile(batch.output_path)
sc.stop()
- I'm able to get sentry to log exceptions properly using the sentry daemon and the above configuration.
- However, I noticed that each batch took progressively longer: without spark integration in sentry, each batch takes ~3 hours to run, but with the integration enabled, the first took 3 hours, the second took 6, the third 9, and so on.
- I was able to work around the issue by creating and stopping the spark context within each batch instead of having one for the entire loop.
- However, now the job eventually fails due to an out-of-memory error after a few batches, even though we have plenty of resources and we have never encountered this issue at this stage in our pipeline before.
Expected Result
The job would run with Spark Sentry integration normally.
Actual Result
The job either takes progressively longer to finish or will eventually run out of memory and fail.
This is the stdout
of EMR cluster:
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 13796"...
My understanding is that Spark integration is not being actively maintained and is considered somewhat experimental. Any help here would be greatly appreciated, even if just a potential workaround and not an actual fix.
zafercavdar
Metadata
Metadata
Assignees
Labels
BugHelp wantedExtra attention is neededExtra attention is neededIntegration: Apache SparkTriagedHas been looked at recently during old issue triageHas been looked at recently during old issue triage