Skip to content

Commit 6b31b5d

Browse files
Update README.md
1 parent da36dad commit 6b31b5d

File tree

1 file changed

+9
-107
lines changed

1 file changed

+9
-107
lines changed

README.md

Lines changed: 9 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -32,124 +32,26 @@ All Unsolveds in this notebook are implemented using **PySpark** to ensure consi
3232
- Access to your GitHub repository containing unsolved LeetCode SQL questions. Users should be able to clone the repository, review the Unsolveds, and potentially contribute if allowed.
3333

3434

35-
## **1. Setting up Databricks Premium (Paid Version)**
36-
Databricks Premium is a paid plan that offers advanced features such as higher compute power, security options, and integrations.
37-
38-
### **Step 1: Sign Up for Databricks**
39-
1. Go to [Databricks website](https://databricks.com/).
40-
2. Click **"Start your free trial"** (for a trial) or go to **"Sign In"** if you have an account.
41-
3. Choose **"AWS", "Azure", or "GCP"** as your cloud provider.
42-
4. Follow the registration process, providing details like your email, company, and cloud provider credentials.
43-
44-
### **Step 2: Create a Databricks Workspace**
45-
1. In the cloud provider console (AWS, Azure, or GCP), create a Databricks workspace.
46-
2. Select the **Premium plan** during setup.
47-
3. Configure networking and security settings as required.
48-
4. Once created, launch the workspace from the cloud console.
49-
50-
### **Step 3: Create a Cluster**
51-
1. Inside the Databricks workspace, go to **Compute**.
52-
2. Click **Create Cluster**.
53-
3. Choose a cluster name and select a runtime version (latest recommended).
54-
4. Select the number of workers (scale as needed).
55-
5. Click **Create Cluster**.
56-
57-
### **Step 4: Create a Notebook**
58-
1. Navigate to **Workspace > Users > Your Name**.
59-
2. Click **Create > Notebook**.
60-
3. Name the notebook and select **Python** as the language.
61-
4. Attach it to your running cluster.
62-
63-
## **2. Setting up Databricks Community Edition (Free Version)**
64-
Databricks Community Edition is a free, limited version ideal for learning PySpark.
65-
6635
### **Step 1: Sign Up for Community Edition**
36+
Databricks Community Edition is a free, limited version ideal for learning PySpark.
6737
1. Go to [Databricks Community Edition Signup](https://community.cloud.databricks.com/).
6838
2. Enter your email and complete the registration.
6939
3. Check your email for the verification link and activate your account.
7040
4. Log in to your Databricks Community workspace.
7141

72-
### **Step 2: Create a Cluster**
73-
1. Click on **Compute** in the left panel.
74-
2. Click **Create Cluster**.
75-
3. Name your cluster.
76-
4. Choose the latest runtime version.
77-
5. Click **Create Cluster** (Community Edition supports only small clusters).
78-
79-
### **Step 3: Create a Notebook**
80-
1. Go to **Workspace > Users > Your Name**.
81-
2. Click **Create > Notebook**.
82-
3. Name the notebook and select **Python**.
83-
4. Attach it to the running cluster.
84-
85-
## **Key Differences Between Premium and Community Edition**
86-
87-
| Feature | Databricks Premium | Databricks Community Edition |
88-
|---------|-------------------|---------------------------|
89-
| Price | Paid | Free |
90-
| Cloud Providers | AWS, Azure, GCP | Databricks Cloud |
91-
| Cluster Scaling | Scalable | Limited (Single Node) |
92-
| Security Features | Advanced | Basic |
93-
| Collaboration | Multi-user | Single-user |
94-
9542
## **3. Step-by-Step Guide to Importing LeetCode SQL Questions Notebook into Jupyter Notebook**
9643

9744
1. **Clone Your GitHub Repository:**
98-
- First, ensure you have Git installed on your local machine. If not, download and install it from [Git's official website](https://git-scm.com/).
99-
- Open your terminal (command prompt) and navigate to the directory where you want to clone your repository.
100-
- Clone your GitHub repository using the command:
101-
```
102-
git clone <repository_url>
45+
You can clone your GitHub repository directly into Databricks using **Databricks Repos**:
46+
1. Log in to your Databricks workspace.
47+
2. In the left sidebar, click **Repos**.
48+
3. Click **Add Repo****Add Git Repository**.
49+
4. Paste the repository URL from GitHub:
10350
```
104-
- Replace `<repository_url>` with the URL of your GitHub repository. This will download your repository to your local machine.
105-
106-
2. **Install Required Dependencies:**
107-
- Make sure you have Python installed on your machine. It's recommended to use Anaconda or Miniconda to manage your Python environments.
108-
- Install Jupyter Notebook and PySpark dependencies if you haven't already:
109-
```
110-
pip install jupyter pyspark
111-
```
112-
113-
3. **Launch Jupyter Notebook:**
114-
- Navigate to the directory where your Jupyter Notebook files are located. Typically, this would be the root directory of your cloned repository.
115-
- Start Jupyter Notebook by running the command:
51+
[git clone <repository_url>](https://github.com/<username>/<repository-name>.git)
11652
```
117-
jupyter notebook
118-
```
119-
- This command will open a new tab in your web browser with the Jupyter Notebook interface.
120-
121-
4. **Open and Run Your Notebook:**
122-
- In the Jupyter Notebook interface, navigate to the directory where your notebook file (`*.ipynb`) is located.
123-
- Click on the notebook file to open it.
124-
- Once the notebook is open, you can run each cell by pressing `Shift + Enter` or using the "Run" button in the toolbar.
125-
- Ensure that Spark is correctly initialized and configured in your notebook. You may need to import necessary libraries and set up the Spark session if it's not done automatically.
126-
127-
5. **Verify Spark Installation and Configuration:**
128-
- Check if Spark is installed and configured correctly by running a basic Spark operation in one of the notebook cells. For example:
129-
```python
130-
from pyspark.sql import SparkSession
131-
132-
# Initialize Spark session
133-
spark = SparkSession.builder \
134-
.appName("MyApp") \
135-
.getOrCreate()
136-
137-
# Verify Spark session
138-
spark
139-
```
140-
- If Spark is configured correctly, you should see the Spark session information printed without any errors.
141-
142-
6. **Execute and Test Your Notebook:**
143-
- Execute each cell in your notebook to ensure that all code runs as expected.
144-
- Validate the results of the LeetCode SQL questions solutions to ensure correctness and functionality with PySpark.
145-
146-
7. **Save Your Work:**
147-
- Once you have verified that everything is working correctly, save your notebook with any changes you have made.
148-
149-
### Additional Tips:
150-
- **Environment Management:** Consider using virtual environments or conda environments to manage dependencies and avoid conflicts between different projects.
151-
- **Documentation:** It's helpful to include documentation within your notebook, such as explanations of the SQL solutions and any specific configurations required for Spark.
152-
- **Version Control:** Regularly commit your changes to Git and push them to your GitHub repository to keep a versioned history of your work.
53+
5. Click Create.
54+
> Databricks will now clone your GitHub repository into your workspace.
15355
15456
By following these steps, you should be able to successfully import and run your LeetCode SQL questions notebook using PySpark in Jupyter Notebook on your local machine.
15557

0 commit comments

Comments
 (0)