-
Notifications
You must be signed in to change notification settings - Fork 28.6k
SPARK-1565, update examples to be used with spark-submit script. #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14484/ |
@@ -74,8 +74,8 @@ class SparkContext(config: SparkConf) extends Logging { | |||
* be generated using [[org.apache.spark.scheduler.InputFormatInfo.computePreferredLocations]] | |||
* from a list of input files or InputFormats for the application. | |||
*/ | |||
@DeveloperApi | |||
def this(config: SparkConf, preferredNodeLocationData: Map[String, Set[SplitInfo]]) = { | |||
@DeveloperApi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indentation change seems wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure but other methods are at this corrected indentation level. Is there some other reason for it being wrong?
Like this https://github.com/apache/spark/pull/552/files#diff-364713d7776956cb8b0a771e9b62f82dL90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant the body of the function should be indented only 2 spaces from the signature, not 4.
This seems like a good start. Hey @ScrapCodes we changed the format of spark-submit a bit to no longer use I don't think it's necessary to prompt the users for arguments, I think just removing all the cases where there was a master argument is sufficient for now. |
Hm.. I will do so !, but in many cases it can be really hard to guess the parameters without reading the code of examples. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@@ -30,22 +30,15 @@ import org.apache.spark.rdd.RDD | |||
|
|||
object WikipediaPageRankStandalone { | |||
def main(args: Array[String]) { | |||
if (args.length < 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still want to have a usage here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should and I think same for all those who accept compulsory arguments. Should I just go ahead and fix that ?
Build triggered. |
Build started. |
Do you want this change to go for streaming as well !, because for some things it may not make sense. And then how do people stop it ? |
Build finished. All automated tests passed. |
All automated tests passed. |
@ScrapCodes yes let's update the streaming examples too. If people run it in driver mode, this will be exactly the same as the current examples. |
Build triggered. |
Build started. |
Build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14622/ |
Build triggered. |
Build started. |
Build finished. All automated tests passed. |
All automated tests passed. |
Integer slices = (args.length > 1) ? Integer.parseInt(args[1]): 2; | ||
SparkConf sparkConf = new org.apache.spark.SparkConf().setAppName("JavaHdfsLR"); | ||
JavaSparkContext sc = new JavaSparkContext(sparkConf); | ||
Integer slices = (args.length > 0) ? Integer.parseInt(args[0]): 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space before colon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we leave space before colon ? I think the convention was nospace before colon and single space after it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, don't put the full package name (org.apache.spark.SparkConf) here since you imported it above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in scala we have no space before colon, but this is a common pattern in Java (e.g. bool ? 1 : 2
is short-hand for if (bool) { 1 } else { 2 }
). We do this elsewhere in other examples actually.
Build started. |
Build finished. All automated tests passed. |
All automated tests passed. |
@ScrapCodes you'll need to merge this with master - unfortunately there was another patch that renamed/moved some of the example files. |
Merged build triggered. |
Merged build started. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
1 similar comment
Merged build finished. All automated tests passed. |
All automated tests passed. |
All automated tests passed. |
@pwendell Done ! |
This LGTM. Thanks @ScrapCodes for all the effort! |
Thanks @ScrapCodes - sorry you had to up-merge this... good stuff :) |
Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ? Also few other things that did not work like `bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2` Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully. Author: Prashant Sharma <[email protected]> Closes #552 from ScrapCodes/SPARK-1565/update-examples and squashes the following commits: 669dd23 [Prashant Sharma] Review comments 2727e70 [Prashant Sharma] SPARK-1565, update examples to be used with spark-submit script. (cherry picked from commit 44dd57f) Signed-off-by: Patrick Wendell <[email protected]>
. tex formulas in the documentation using mathjax. and spliting the MLlib documentation by techniques see jira https://spark-project.atlassian.net/browse/MLLIB-19 and https://github.com/shivaram/spark/compare/mathjax Author: Martin Jaggi <[email protected]> == Merge branch commits == commit 0364bfabbfc347f917216057a20c39b631842481 Author: Martin Jaggi <[email protected]> Date: Fri Feb 7 03:19:38 2014 +0100 minor polishing, as suggested by @pwendell commit dcd2142c164b2f602bf472bb152ad55bae82d31a Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 18:04:26 2014 +0100 enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 17:31:29 2014 +0100 split MLlib documentation by techniques and linked from the main mllib-guide.md site commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:59:43 2014 +0100 enable mathjax formula in the .md documentation files code by @shivaram commit d73948db0d9bc36296054e79fec5b1a657b4eab4 Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:57:23 2014 +0100 minor update on how to compile the documentation
A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <[email protected]> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages
A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <[email protected]> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages (cherry picked from commit cf6cbe9) Signed-off-by: Patrick Wendell <[email protected]>
Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ? Also few other things that did not work like `bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2` Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully. Author: Prashant Sharma <[email protected]> Closes apache#552 from ScrapCodes/SPARK-1565/update-examples and squashes the following commits: 669dd23 [Prashant Sharma] Review comments 2727e70 [Prashant Sharma] SPARK-1565, update examples to be used with spark-submit script.
A recent PR (apache#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on apache#799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once apache#799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <[email protected]> Closes apache#802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages
. tex formulas in the documentation using mathjax. and spliting the MLlib documentation by techniques see jira https://spark-project.atlassian.net/browse/MLLIB-19 and https://github.com/shivaram/spark/compare/mathjax Author: Martin Jaggi <[email protected]> == Merge branch commits == commit 0364bfabbfc347f917216057a20c39b631842481 Author: Martin Jaggi <[email protected]> Date: Fri Feb 7 03:19:38 2014 +0100 minor polishing, as suggested by @pwendell commit dcd2142c164b2f602bf472bb152ad55bae82d31a Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 18:04:26 2014 +0100 enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 17:31:29 2014 +0100 split MLlib documentation by techniques and linked from the main mllib-guide.md site commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:59:43 2014 +0100 enable mathjax formula in the .md documentation files code by @shivaram commit d73948db0d9bc36296054e79fec5b1a657b4eab4 Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:57:23 2014 +0100 minor update on how to compile the documentation Conflicts: docs/mllib-guide.md
Implement AQE again, with union fix.
There is some unknown error that causes spark job exit when installing openjdk8, extract it to be a role and run it at the begaining of the job to debug the root causes.
Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ?
Also few other things that did not work like
bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2
Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully.