Skip to content

Conversation

kukushking
Copy link
Contributor

@kukushking kukushking commented Jun 1, 2023

Feature or Bugfix

  • Feature

Scope

  • This PR adds basic capabilities to start a session and kick off a calculation. Any potential data integration (i.e. ability to pass data frames, read results) will be addressed in next PRs.

Detail

  • Add create_spark_session and run_spark_calculation with corresponding waiters
  • Add simple test case
  • Expand test case
  • Add tutorial / docstrings
  • Add test infra IAM role

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

)
_logger.info("Calculation execution info:\n%s", response)

return _get_calculation_execution_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the return value of an Athena Spark execution be a DataFrame? Or will the output always just be written to an S3 location?

I'm mainly just wondering if there's a way to sensibly make these API calls accept or return Data Frames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into the ways to support that where it is applicable but that is very much dependent on what spark code you are running.

There is json metadata file in the results path along with stdout/err text files, but it's almost always empty, well at least have been in my tests so far. Documentation is a bit lacking on this. I'll play around to see what we can make of it.

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: b6f3244
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: b6f3244
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: ad10214
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: ad10214
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking changed the title feat: [DRAFT] Apache Spark on Amazon Athena feat: Apache Spark on Amazon Athena Jun 5, 2023
@kukushking kukushking marked this pull request as ready for review June 5, 2023 17:39
@kukushking kukushking removed the WIP Work in progress label Jun 5, 2023
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: c48cfd9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: c48cfd9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: c48cfd9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking changed the title feat: Apache Spark on Amazon Athena feat: Apache Spark on Amazon Athena: wr.athena.create_spark_session & wr.athena.run_spark_calculation Jun 6, 2023
@kukushking kukushking changed the title feat: Apache Spark on Amazon Athena: wr.athena.create_spark_session & wr.athena.run_spark_calculation feat: Apache Spark on Amazon Athena - wr.athena.create_spark_session & wr.athena.run_spark_calculation Jun 6, 2023
@kukushking kukushking added this to the 3.2.0 milestone Jun 6, 2023
Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, a couple of minor comments

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: e062ab8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: e062ab8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants