You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-107Lines changed: 9 additions & 107 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,124 +32,26 @@ All Unsolveds in this notebook are implemented using **PySpark** to ensure consi
32
32
- Access to your GitHub repository containing unsolved LeetCode SQL questions. Users should be able to clone the repository, review the Unsolveds, and potentially contribute if allowed.
33
33
34
34
35
-
## **1. Setting up Databricks Premium (Paid Version)**
36
-
Databricks Premium is a paid plan that offers advanced features such as higher compute power, security options, and integrations.
37
-
38
-
### **Step 1: Sign Up for Databricks**
39
-
1. Go to [Databricks website](https://databricks.com/).
40
-
2. Click **"Start your free trial"** (for a trial) or go to **"Sign In"** if you have an account.
41
-
3. Choose **"AWS", "Azure", or "GCP"** as your cloud provider.
42
-
4. Follow the registration process, providing details like your email, company, and cloud provider credentials.
43
-
44
-
### **Step 2: Create a Databricks Workspace**
45
-
1. In the cloud provider console (AWS, Azure, or GCP), create a Databricks workspace.
46
-
2. Select the **Premium plan** during setup.
47
-
3. Configure networking and security settings as required.
48
-
4. Once created, launch the workspace from the cloud console.
49
-
50
-
### **Step 3: Create a Cluster**
51
-
1. Inside the Databricks workspace, go to **Compute**.
52
-
2. Click **Create Cluster**.
53
-
3. Choose a cluster name and select a runtime version (latest recommended).
54
-
4. Select the number of workers (scale as needed).
55
-
5. Click **Create Cluster**.
56
-
57
-
### **Step 4: Create a Notebook**
58
-
1. Navigate to **Workspace > Users > Your Name**.
59
-
2. Click **Create > Notebook**.
60
-
3. Name the notebook and select **Python** as the language.
61
-
4. Attach it to your running cluster.
62
-
63
-
## **2. Setting up Databricks Community Edition (Free Version)**
64
-
Databricks Community Edition is a free, limited version ideal for learning PySpark.
65
-
66
35
### **Step 1: Sign Up for Community Edition**
36
+
Databricks Community Edition is a free, limited version ideal for learning PySpark.
67
37
1. Go to [Databricks Community Edition Signup](https://community.cloud.databricks.com/).
68
38
2. Enter your email and complete the registration.
69
39
3. Check your email for the verification link and activate your account.
70
40
4. Log in to your Databricks Community workspace.
71
41
72
-
### **Step 2: Create a Cluster**
73
-
1. Click on **Compute** in the left panel.
74
-
2. Click **Create Cluster**.
75
-
3. Name your cluster.
76
-
4. Choose the latest runtime version.
77
-
5. Click **Create Cluster** (Community Edition supports only small clusters).
78
-
79
-
### **Step 3: Create a Notebook**
80
-
1. Go to **Workspace > Users > Your Name**.
81
-
2. Click **Create > Notebook**.
82
-
3. Name the notebook and select **Python**.
83
-
4. Attach it to the running cluster.
84
-
85
-
## **Key Differences Between Premium and Community Edition**
86
-
87
-
| Feature | Databricks Premium | Databricks Community Edition |
- This command will open a new tab in your web browser with the Jupyter Notebook interface.
120
-
121
-
4. **Open and Run Your Notebook:**
122
-
- In the Jupyter Notebook interface, navigate to the directory where your notebook file (`*.ipynb`) is located.
123
-
- Click on the notebook file to open it.
124
-
- Once the notebook is open, you can run each cell by pressing `Shift + Enter` or using the "Run" button in the toolbar.
125
-
- Ensure that Spark is correctly initialized and configured in your notebook. You may need to import necessary libraries and set up the Spark session if it's not done automatically.
126
-
127
-
5. **Verify Spark Installation and Configuration:**
128
-
- Check if Spark is installed and configured correctly by running a basic Spark operation in one of the notebook cells. For example:
129
-
```python
130
-
from pyspark.sql import SparkSession
131
-
132
-
# Initialize Spark session
133
-
spark = SparkSession.builder \
134
-
.appName("MyApp") \
135
-
.getOrCreate()
136
-
137
-
# Verify Spark session
138
-
spark
139
-
```
140
-
- If Spark is configured correctly, you should see the Spark session information printed without any errors.
141
-
142
-
6. **Execute and Test Your Notebook:**
143
-
- Execute each cell in your notebook to ensure that all code runs as expected.
144
-
- Validate the results of the LeetCode SQL questions solutions to ensure correctness and functionality with PySpark.
145
-
146
-
7. **Save Your Work:**
147
-
- Once you have verified that everything is working correctly, save your notebook with any changes you have made.
148
-
149
-
### Additional Tips:
150
-
- **Environment Management:** Consider using virtual environments or conda environments to manage dependencies and avoid conflicts between different projects.
151
-
- **Documentation:** It's helpful to include documentation within your notebook, such as explanations of the SQL solutions and any specific configurations required for Spark.
152
-
- **Version Control:** Regularly commit your changes to Git and push them to your GitHub repository to keep a versioned history of your work.
53
+
5. Click Create.
54
+
> Databricks will now clone your GitHub repository into your workspace.
153
55
154
56
By following these steps, you should be able to successfully import and run your LeetCode SQL questions notebook using PySpark in Jupyter Notebook on your local machine.
0 commit comments