diff --git a/README.md b/README.md
index bb3105c..84fabfb 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,2 @@
-# Gradient Works Interview Repo
-
-Hello friends! If you're here, you're probably looking for an engineering job at Gradient Works and I'm excited you're here!
-
-This repo is a collection of exercises that we use to evaluate candidates for engineering roles at Gradient Works. We hope you find them interesting and challenging. We also hope you find them a good representation of the kind of work you'd be doing here.
-
-If you're looking for the Machine learning role, please check out the [machine learning exercises](./machine-learning/README.md).
-
-We're not currently looking to hire someone for a Salesforce engineering role, but if you have a morbid curiosity around that, you can check out the [Salesforce exercises](./salesforce/README.md).
\ No newline at end of file
+## RAG based LLM
+### model: GPT 3.5
diff --git a/machine-learning/README.md b/machine-learning/README.md
index a772d8b..8b13789 100644
--- a/machine-learning/README.md
+++ b/machine-learning/README.md
@@ -1,46 +1 @@
-# Gradient Works Machine Learning Interview Repo
-Hello friends! If you're reading this, you're probably interviewing for a machine learning position at Gradient Works. If not, GET OUT! SHOO! Go on now, git! Just kidding, you can hang out. But you're probably not going to get much out of this repo. Maybe you will, who knows? I'm not your mom.
-
-Anyway, if you're here for the interview, welcome! I'll outline the steps for the whole process here and if you have any questions, send an email to mando@gradient.works and he'll clear things up.
-
-## Step 1: Phone screen
-
-You've probably already done this, but if not, we'll schedule a 30-minute phone call so we can get to know each other. We'll chat about your background and experience, and we'll also talk about Gradient Works, the role and what you're looking for in your next job. If everything is a good fit, we'll move on to the next step.
-
-## Step 2: Technical exercise
-
-We'll send you a technical exercise to complete. It's a Jupyter notebook that you can run in your own environment or in Google Colab, plus a tiny CSV dataset. The exercise will test your ability to explore data, answer questions about that data, and build a small RAG system to answer arbitrary questions about the data.
-
-We expect the exercise to take 1-3 hours, depending on how far you take the RAG system. We're not looking for a perfect solution, but we are looking for a thoughtful approach and a good understanding of the approach (there's more detail in the notebook).
-
-You can find the exercise [here](./exercise.ipynb).
-
-To complete the exercise, you can either:
-
-1. fork the repo and submit a pull request with your solution, or
-2. import the notebook into your own Google Colab environment and share the notebook with mando@gradient.works
-
-If neither of those work, email mando@gradient.works and we'll figure something out.
-
-> [!NOTE]
->
-> If possible, we'd like to invite you into our Slack while you're working on the exercise. This way, you can ask questions and we can get to know each other a bit better. If you're interested, let us know and we'll send you an invite.
-
-> [!NOTE]
->
-> You might be asking yourself, "Doesn't this dummy know that I can just look at the notebook and get started early?". And yeah, I get it. But the whole point of this is for you to demonstrate your abilities and level of understanding. So, if you spend a bunch of time learning all the ins and outs of RAG systems ahead of time, well, I mean that's great for us :).
-
-Once you've completed the exercise, we'll review it and schedule a Zoom interview to discuss your solution.
-
-## Step 3: Zoom interview
-
-We'll schedule a 1.5-hour Zoom interview to discuss your solution to the technical exercise. We'll ask you to walk us through your approach, explain your code, how you might extend the solution, etc. We'll also make sure to leave plenty of time for you to ask us questions about the team, the company, and what we're building and how.
-
-After that, we'll make a decision and let you know as soon as possible.
-
-## Step 4: There's no Step 4
-
-Hopefully this all makes sense and you're excited to get started. Interviewing is hard and we get that. We hope that this process makes sense and is as stress-free as it can be.
-
-As always, feedback is a gift, so if you have any thoughts on how we can improve this process, please let us know.
\ No newline at end of file
diff --git a/machine-learning/exercise.ipynb b/machine-learning/exercise.ipynb
index 1c64854..cd15b06 100644
--- a/machine-learning/exercise.ipynb
+++ b/machine-learning/exercise.ipynb
@@ -1,136 +1,1432 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Welcome to our ML project!\n",
- "\n",
- "This is a quick exercise to help demonstrate your familiarity with RAG systems - one might say that this is a place where you can b**RAG** about your skills! π€£\n",
- "\n",
- "In this exercise, you will be asked to build a simple RAG system that answer some provided questions using the dataset provided. We expect this exercise to take 1-3 hours TOPS so use that to temper your approach to building this. We're not looking for reusable or production-level code - we're expressly looking for you to show us that you:\n",
- "\n",
- "* can explore an unknown dataset\n",
- "* can use an LLM (in this case, OpenAI's GPT-3) to build a simple RAG system\n",
- "\n",
- "## The Dataset\n",
- "\n",
- "You'll find the dataset in `content.csv`. It is a set of content about companies that has been scraped from the web. It contains the following columns:\n",
- "\n",
- "* `company_id`: a unique identifier for the company (UUID)\n",
- "* `company_name`: the name of the company\n",
- "* `url`: the URL from which the content was scraped\n",
- "* `chunk`: a chunk of the content that was scraped from the `url`\n",
- "* `chunk_hash`: a hash of the chunk\n",
- "* `chunk_id`: a unique identifier for the chunk of content\n",
- "* `chunk_type`: the type of the chunk of content (e.g. `header`, `footer`)\n",
- "\n",
- "\n",
- "Here's an example:\n",
- "\n",
- "|company_id|company_name|url|chunk_type|chunk_hash|chunk|chunk_id|\n",
- "|---|---|---|---|---|---|---|\n",
- "|4c1fde18-8a40-4ee7-9c3c-19152c7d1ff8|Aboitiz Group|https://aboitiz.com/about-us/the-aboitiz-way/|head|d312f0c688076be80ee2e4af8a51c2f10cbb993a4a8de779cb4aa5545fe1051f|\"
Aboitiz - The Aboitiz Way\"|be36e2f0-cd0b-42eb-b36d-c9403c2428be|\n",
- "\n",
- "## Step 1: Explore the dataset\n",
- "\n",
- "Here are some questions that we'd like you to answer about the dataset:\n",
- "\n",
- "1. How many companies are in the dataset?\n",
- "2. How many unique URLs are in the dataset?\n",
- "3. What is the most common chunk type?\n",
- "4. What is the distribution of chunk types by company?"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Requirement already satisfied: pandas in ./.venv/lib/python3.11/site-packages (2.2.1)\n",
- "Requirement already satisfied: numpy<2,>=1.23.2 in ./.venv/lib/python3.11/site-packages (from pandas) (1.26.4)\n",
- "Requirement already satisfied: python-dateutil>=2.8.2 in ./.venv/lib/python3.11/site-packages (from pandas) (2.9.0.post0)\n",
- "Requirement already satisfied: pytz>=2020.1 in ./.venv/lib/python3.11/site-packages (from pandas) (2024.1)\n",
- "Requirement already satisfied: tzdata>=2022.7 in ./.venv/lib/python3.11/site-packages (from pandas) (2024.1)\n",
- "Requirement already satisfied: six>=1.5 in ./.venv/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n",
- "Note: you may need to restart the kernel to use updated packages.\n",
- "Collecting matplotlib\n",
- " Downloading matplotlib-3.8.3-cp311-cp311-macosx_11_0_arm64.whl.metadata (5.8 kB)\n",
- "Collecting contourpy>=1.0.1 (from matplotlib)\n",
- " Using cached contourpy-1.2.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (5.8 kB)\n",
- "Collecting cycler>=0.10 (from matplotlib)\n",
- " Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)\n",
- "Collecting fonttools>=4.22.0 (from matplotlib)\n",
- " Using cached fonttools-4.50.0-cp311-cp311-macosx_10_9_universal2.whl.metadata (159 kB)\n",
- "Collecting kiwisolver>=1.3.1 (from matplotlib)\n",
- " Using cached kiwisolver-1.4.5-cp311-cp311-macosx_11_0_arm64.whl.metadata (6.4 kB)\n",
- "Requirement already satisfied: numpy<2,>=1.21 in ./.venv/lib/python3.11/site-packages (from matplotlib) (1.26.4)\n",
- "Requirement already satisfied: packaging>=20.0 in ./.venv/lib/python3.11/site-packages (from matplotlib) (24.0)\n",
- "Collecting pillow>=8 (from matplotlib)\n",
- " Using cached pillow-10.2.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (9.7 kB)\n",
- "Collecting pyparsing>=2.3.1 (from matplotlib)\n",
- " Using cached pyparsing-3.1.2-py3-none-any.whl.metadata (5.1 kB)\n",
- "Requirement already satisfied: python-dateutil>=2.7 in ./.venv/lib/python3.11/site-packages (from matplotlib) (2.9.0.post0)\n",
- "Requirement already satisfied: six>=1.5 in ./.venv/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)\n",
- "Downloading matplotlib-3.8.3-cp311-cp311-macosx_11_0_arm64.whl (7.5 MB)\n",
- "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m7.5/7.5 MB\u001b[0m \u001b[31m25.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
- "\u001b[?25hUsing cached contourpy-1.2.0-cp311-cp311-macosx_11_0_arm64.whl (243 kB)\n",
- "Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)\n",
- "Using cached fonttools-4.50.0-cp311-cp311-macosx_10_9_universal2.whl (2.8 MB)\n",
- "Using cached kiwisolver-1.4.5-cp311-cp311-macosx_11_0_arm64.whl (66 kB)\n",
- "Using cached pillow-10.2.0-cp311-cp311-macosx_11_0_arm64.whl (3.3 MB)\n",
- "Using cached pyparsing-3.1.2-py3-none-any.whl (103 kB)\n",
- "Installing collected packages: pyparsing, pillow, kiwisolver, fonttools, cycler, contourpy, matplotlib\n",
- "Successfully installed contourpy-1.2.0 cycler-0.12.1 fonttools-4.50.0 kiwisolver-1.4.5 matplotlib-3.8.3 pillow-10.2.0 pyparsing-3.1.2\n",
- "Note: you may need to restart the kernel to use updated packages.\n"
- ]
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "*Nicole Mathias*\n",
+ "\n",
+ "Instructions to run the notebook\n",
+ "\n",
+ "\n",
+ "* Add exercise.ipynb and content.csv to your drive (to specific folder) - my code will add it to a folder automatically."
+ ],
+ "metadata": {
+ "id": "mmjGtnEgU-LP"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Vql-4RvAUJVj"
+ },
+ "source": [
+ "# Welcome to our ML project!\n",
+ "\n",
+ "This is a quick exercise to help demonstrate your familiarity with RAG systems - one might say that this is a place where you can b**RAG** about your skills! π€£\n",
+ "\n",
+ "In this exercise, you will be asked to build a simple RAG system that answer some provided questions using the dataset provided. We expect this exercise to take 1-3 hours TOPS so use that to temper your approach to building this. We're not looking for reusable or production-level code - we're expressly looking for you to show us that you:\n",
+ "\n",
+ "* can explore an unknown dataset\n",
+ "* can use an LLM (in this case, OpenAI's GPT-3) to build a simple RAG system\n",
+ "\n",
+ "## The Dataset\n",
+ "\n",
+ "You'll find the dataset in `content.csv`. It is a set of content about companies that has been scraped from the web. It contains the following columns:\n",
+ "\n",
+ "* `company_id`: a unique identifier for the company (UUID)\n",
+ "* `company_name`: the name of the company\n",
+ "* `url`: the URL from which the content was scraped\n",
+ "* `chunk`: a chunk of the content that was scraped from the `url`\n",
+ "* `chunk_hash`: a hash of the chunk\n",
+ "* `chunk_id`: a unique identifier for the chunk of content\n",
+ "* `chunk_type`: the type of the chunk of content (e.g. `header`, `footer`)\n",
+ "\n",
+ "\n",
+ "Here's an example:\n",
+ "\n",
+ "|company_id|company_name|url|chunk_type|chunk_hash|chunk|chunk_id|\n",
+ "|---|---|---|---|---|---|---|\n",
+ "|4c1fde18-8a40-4ee7-9c3c-19152c7d1ff8|Aboitiz Group|https://aboitiz.com/about-us/the-aboitiz-way/|head|d312f0c688076be80ee2e4af8a51c2f10cbb993a4a8de779cb4aa5545fe1051f|\"Aboitiz - The Aboitiz Way\"|be36e2f0-cd0b-42eb-b36d-c9403c2428be|\n",
+ "\n",
+ "## Step 1: Explore the dataset\n",
+ "\n",
+ "Here are some questions that we'd like you to answer about the dataset:\n",
+ "\n",
+ "1. How many companies are in the dataset?\n",
+ "2. How many unique URLs are in the dataset?\n",
+ "3. What is the most common chunk type?\n",
+ "4. What is the distribution of chunk types by company?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "AhUSjSgJUJVl",
+ "outputId": "dfe7807b-4677-413e-d451-9413efd07f38"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.0.3)\n",
+ "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)\n",
+ "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.4)\n",
+ "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.1)\n",
+ "Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.25.2)\n",
+ "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n",
+ "Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1)\n",
+ "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.2.0)\n",
+ "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1)\n",
+ "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.50.0)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5)\n",
+ "Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.25.2)\n",
+ "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (24.0)\n",
+ "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (9.4.0)\n",
+ "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.2)\n",
+ "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (2.8.2)\n",
+ "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)\n",
+ "Requirement already satisfied: openai in /usr/local/lib/python3.10/dist-packages (1.16.2)\n",
+ "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai) (3.7.1)\n",
+ "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai) (1.7.0)\n",
+ "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai) (0.27.0)\n",
+ "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from openai) (2.6.4)\n",
+ "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai) (1.3.1)\n",
+ "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai) (4.66.2)\n",
+ "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from openai) (4.10.0)\n",
+ "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai) (3.6)\n",
+ "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai) (1.2.0)\n",
+ "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai) (2024.2.2)\n",
+ "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai) (1.0.5)\n",
+ "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.14.0)\n",
+ "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai) (0.6.0)\n",
+ "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai) (2.16.3)\n"
+ ]
+ }
+ ],
+ "source": [
+ "%pip install pandas\n",
+ "%pip install matplotlib\n",
+ "%pip install openai"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# import pandas as pd\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd"
+ ],
+ "metadata": {
+ "id": "SyxR7w6bUZn-"
+ },
+ "execution_count": 1,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from google.colab import drive\n",
+ "drive.mount('/content/drive')"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ZnjVN4qjU0mR",
+ "outputId": "ab2d0b19-e5f4-487a-a6a0-3f9cea13d0a8"
+ },
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import os\n",
+ "import shutil\n",
+ "\n",
+ "# The below code will move content.csv and exercise.ipynb to a specific folder\n",
+ "# Just upload both the files to your drive (No need to add them to a folder, its all handled below)\n",
+ "folder_path = \"/content/drive/MyDrive/gradient_works_data\"\n",
+ "csv_file_path = \"/content/drive/MyDrive/content.csv\"\n",
+ "py_file_path = \"/content/drive/MyDrive/exercise.ipynb\"\n",
+ "\n",
+ "if not os.path.exists(folder_path):\n",
+ " os.makedirs(folder_path)\n",
+ "\n",
+ " shutil.move(py_file_path, folder_path)\n",
+ " shutil.move(csv_file_path, folder_path)\n",
+ "\n",
+ "\n",
+ "os.chdir(folder_path)"
+ ],
+ "metadata": {
+ "id": "N7v-R5TuU2Tt"
+ },
+ "execution_count": 3,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# reading data from csv\n",
+ "df = pd.read_csv('content.csv')"
+ ],
+ "metadata": {
+ "id": "dUryU8cOVovz"
+ },
+ "execution_count": 143,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### PART 1 Questions"
+ ],
+ "metadata": {
+ "id": "luU2ySX_V4mo"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# 1. How many companies are in the dataset?\n",
+ "no_companies = len(df['company_name'].unique())\n",
+ "print(\"No. of Companies:\",no_companies)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Q8Ii5lkgV7Cm",
+ "outputId": "0443a6b0-aa42-4c16-9d23-aecf79537c63"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "No. of Companies: 75\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# 2. How many unique URLs are in the dataset?\n",
+ "no_url = len(df['url'].unique())\n",
+ "print(\"No. of unique URL's\",no_url)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "a5HsVjy7WFDX",
+ "outputId": "f641260c-fbb9-4e19-b4f3-b545f762ef5a"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "No. of unique URL's 530\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# 3. What is the most common chunk type?\n",
+ "most_common_chunk = df['chunk_type'].mode().item()\n",
+ "print(\"Most common chunk:\",most_common_chunk)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "3VcXkmR3WJDF",
+ "outputId": "49714849-ec02-4724-e23e-7b93fc1d6b02"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Most common chunk: header\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# 4. What is the distribution of chunk types by company?\n",
+ "\n",
+ "# - group by companies and then for each company group get the count of chunk_types\n",
+ "chunk_types = df.groupby(\"company_name\")[\"chunk_type\"].value_counts()\n",
+ "\n",
+ "chunk_grp_count = chunk_types.unstack(fill_value=0)\n",
+ "chunk_grp_count"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 455
+ },
+ "id": "RBhmH1MUWMrn",
+ "outputId": "85a1395b-2e77-4a0f-83a2-3932bf3c8a53"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "chunk_type footer head header main\n",
+ "company_name \n",
+ "24Hrbookkeeper 0 1 0 1\n",
+ "365games.net 10 11 11 11\n",
+ "4wheeltravels.com 0 2 0 2\n",
+ "579twu.org 0 11 0 11\n",
+ "66corporation.com 4 4 4 4\n",
+ "... ... ... ... ...\n",
+ "bhatiagraphica.com 5 5 7 0\n",
+ "bijvoorbeeldzo 11 11 11 11\n",
+ "bikesandmunchies.com 9 9 9 0\n",
+ "bingotech.net 2 2 1 2\n",
+ "contact@acom.co.id 2 2 1 2\n",
+ "\n",
+ "[75 rows x 4 columns]"
+ ],
+ "text/html": [
+ "\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "chunk_grp_count",
+ "summary": "{\n \"name\": \"chunk_grp_count\",\n \"rows\": 75,\n \"fields\": [\n {\n \"column\": \"company_name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 75,\n \"samples\": [\n \"66corporation.com\",\n \"accentronix.co.za\",\n \"ABBE Technology Solutions Inc\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"footer\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 0,\n \"max\": 18,\n \"num_unique_values\": 11,\n \"samples\": [\n 18,\n 0,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"head\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 1,\n \"max\": 11,\n \"num_unique_values\": 9,\n \"samples\": [\n 7,\n 11,\n 9\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"header\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 0,\n \"max\": 23,\n \"num_unique_values\": 15,\n \"samples\": [\n 1,\n 7,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"main\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 5,\n \"min\": 0,\n \"max\": 18,\n \"num_unique_values\": 17,\n \"samples\": [\n 1,\n 11,\n 9\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Creating a plot for Q.4\n",
+ "colors = ['teal', 'brown', 'orange', 'red']\n",
+ "\n",
+ "chunk_grp_count.plot(kind='bar', stacked=True, color=colors,\n",
+ " xlabel='Company name',\n",
+ " ylabel='Count by chunk type',\n",
+ " title='Distribution of Chunk Types by Company',\n",
+ " figsize=(20,10)\n",
+ " )"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 810
+ },
+ "id": "o0WBcinLW5oQ",
+ "outputId": "0f0b5c14-01af-48bf-bade-be37952537c2"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 20
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EapiCF4lUJVl"
+ },
+ "source": [
+ "## Step 2: RAGtime!\n",
+ "\n",
+ "Now that you're a little more familar with the dataset, let's build a simple RAG system that uses OpenAI to help answer some questions about the dataset. To reiterate, we don't expect you to add anything else to the environment to build this system - for example, you don't need to set up a database or anything like that. You can add any libraries you need to the environment, but we'd like you to use OpenAI for any and all tasks that require a language model (we'll send you a key to use).\n",
+ "\n",
+ "That being said, we'd like you to show the specifics of how a RAG implementation works so please avoid using any libraries that provide end-to-end RAG implementations.\n",
+ "\n",
+ "Here is the question that we'd like you to answer via your RAG system:\n",
+ "\n",
+ "1. What does the company Caravan Health do?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Steps for preparing the model:\n",
+ "# 1) Getting data & cleaning it - Cleaning can invloving preprocessing like removing stop-words and other information retrieval techniques.\n",
+ "# 2) Chunking logic - based on token size of the embedding and LLM model\n",
+ "# 3) Embedding the chunks\n",
+ "\n",
+ "# Steps for query\n",
+ "# 1) Getting the query\n",
+ "# 2) If query is too long - chunk it\n",
+ "# 3) Embed the query\n",
+ "\n",
+ "# Similarity and scoring mechanism\n",
+ "# 1) Find the similarity scores (cosine) between query embeddings and the chunk embeddings\n",
+ "# 2) Based on the scores, retrieve top k docs\n",
+ "\n",
+ "# Generating answers\n",
+ "# 1) Use the top k documents as context and provide it to the LLM model"
+ ],
+ "metadata": {
+ "id": "7G9DBsM7YXr-"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Data Scraping - Retrieving data from HTML tags\n",
+ "\n",
+ "- Using Beautiful soup to parse the html tags and obtain data.\n",
+ "- There are other tools like selenium as well"
+ ],
+ "metadata": {
+ "id": "PWhV_x0XY5eE"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%pip install beautifulsoup4"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "rwNSb5pvY9Tq",
+ "outputId": "4c742b52-1a36-43ac-cc1b-bfe4953dc2db"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (4.12.3)\n",
+ "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4) (2.5)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from bs4 import BeautifulSoup"
+ ],
+ "metadata": {
+ "id": "TZdtoSIXY_bA"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# function to parse html tags and extract data\n",
+ "def text_cleaner():\n",
+ " df['desc'] = ''\n",
+ "\n",
+ " for idx in range(len(df)):\n",
+ " chunk_content = df['chunk'][idx]\n",
+ " company_name = df['company_name'][idx]\n",
+ "\n",
+ " content = BeautifulSoup(chunk_content,'lxml')\n",
+ "\n",
+ " text_content = content.get_text(separator = '\\n', strip = True)\n",
+ " df['desc'][idx] = company_name + \" \" + text_content"
+ ],
+ "metadata": {
+ "id": "SsPDU_MkZjJQ"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "text_cleaner()\n",
+ "print(\"---Html data was cleaned---\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ijUpIeL2atXv",
+ "outputId": "91e35b33-3d24-45f0-a54a-b918594b0274"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "---Html data was cleaned---\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Chunking logic\n",
+ "\n",
+ "\n",
+ "* Chunking is being performed in order to create list of tokens - that can be fed to the embedding model.\n",
+ "\n",
+ "* Embedding models have different token sizes and we can develop a chunking logic based on the specific embedding model and LLM, but to keep it simple I will use a token size = 512 (most of the models use 512 size size currently. So this code can be easily mapped to other embedding models if required - I would want to try more models and experiment to make a conclusion)\n",
+ "\n",
+ "* I made the above point because, based on analysis there were chunks in languages other than English as well, so a multi-lingual model would be required for it - so it is safe to use a size of 512\n",
+ "\n",
+ "\n",
+ "\n",
+ "**Additional text preprocessing steps can be performed:**\n",
+ "\n",
+ "\n",
+ "1. Removal of stop-words (common words like a, the) which will improve query-document matching during retrieval.\n",
+ "\n",
+ "2. Converting all the words to lowercase - might improve the embeddings.\n",
+ "\n",
+ "3. Using a bag of words model (used in NLP) is a simple way of using embeddings without a pretrained model\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "NIbC_hKGbNg1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import re"
+ ],
+ "metadata": {
+ "id": "UgT49-s9bTX2"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Data splitting logic:\n",
+ "\n",
+ "1. I am splitting every data chunk sentence-wise. (Splitting it paragraph-wise and then breaking it down would have been better for context: will look into this later)\n",
+ "\n",
+ "2. After I have my individual sentence - I split the sentence and check if the no. of token <= 512, then I keep it intact\n",
+ "\n",
+ "3. If no. of tokens > 512, then I break the token list into smaller sublist's.\n",
+ "\n",
+ "4. Also, experiments can be performed by changing the chunk size and chunk overlap which will help determine which setting workds best for our given task.\n",
+ "\n",
+ "*The above logic is based on preserving context of the data, which is better than randomly splitting*\n"
+ ],
+ "metadata": {
+ "id": "ukSyogw9e47d"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def split_tokens(tokens, max_size):\n",
+ "\n",
+ " if not tokens:\n",
+ " return []\n",
+ "\n",
+ " # If the length of tokens is less than or equal to max_size, return the tokens as a single sublist\n",
+ " if len(tokens) <= max_size:\n",
+ " return tokens\n",
+ "\n",
+ " # Split the tokens into two parts: the first max_size tokens and the rest\n",
+ " first_part = tokens[:max_size]\n",
+ " remaining_tokens = tokens[max_size:]\n",
+ "\n",
+ " # Recursively split the remaining tokens\n",
+ " remaining_parts = split_tokens(remaining_tokens, max_size)\n",
+ "\n",
+ " # Combine the first part with the recursively split remaining parts\n",
+ " return first_part + remaining_parts"
+ ],
+ "metadata": {
+ "id": "78y_PMDQdqJU"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "**I wanted to come back and implement the overlap context feature below - but the creation of embeddings took more time than I imagined.**\n",
+ "\n",
+ "*Advantage: This feature would be useful.\n",
+ "Remember, I had split the data based on sentences, but from my observation there were sentences which exceeded the max token limit of 512, so I broke the sentence into a sub-sentences.\n",
+ "The problem here is that these broken sentences loose context, and my idea of preserving the context was through overlapping some of the data*"
+ ],
+ "metadata": {
+ "id": "w8y8AIoOzt5N"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def create_chunks(max_size):\n",
+ " max_size = max_size # Maximum size for each sublist\n",
+ "\n",
+ " df['chunked_data'] = ''\n",
+ "\n",
+ " # overlapping data from previous context - I am keeping it fixed and can experiment with it later (this is the baseline)\n",
+ " for idx in range(len(df)):\n",
+ " # data_split = re.split(r'\\.|\\s+', df['desc'][idx]) # split the desc based on '.' or ' '\n",
+ " data_split = re.split(r'\\.',df['desc'][idx])\n",
+ "\n",
+ " sentence_chunks = []\n",
+ " for sent in data_split:\n",
+ " # calling the function to split and tokenize sentences\n",
+ " sentence = \" \".join(split_tokens(sent.split(), max_size))\n",
+ " sentence_chunks.append(sentence)\n",
+ "\n",
+ " # df['chunked_data'][idx] = split_tokens(data_split, max_size)\n",
+ " df['chunked_data'][idx] = sentence_chunks\n"
+ ],
+ "metadata": {
+ "id": "4AMdYzHfd5b-"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "max_size = 512\n",
+ "create_chunks(max_size)\n",
+ "\n",
+ "print(\"---Data has been chunked---\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "I5rzTMoCgY5_",
+ "outputId": "40129fbc-3d3d-4de6-b801-34bda84dc058"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "---Data has been chunked---\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "saving the dataframe for a backup - this is not a necessary step\n"
+ ],
+ "metadata": {
+ "id": "1DuYyd6F3Fx7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# using pickle - which preserving the data perfectly, even json is another good option\n",
+ "import pickle\n",
+ "with open('df_pickle.pkl', 'wb') as f:\n",
+ " pickle.dump(df, f)"
+ ],
+ "metadata": {
+ "id": "t-e1AoSSgjvr"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ " ### Embeddding Logic and getting top k docs\n",
+ "\n",
+ "\n",
+ "* Saved the API auth key as a secret key named as auth_key\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "0DqJYcZBhyD1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%pip install openai"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ubNkFAWUh1_v",
+ "outputId": "29b10621-5fa2-4fbb-fce0-451a19c55ff6"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: openai in /usr/local/lib/python3.10/dist-packages (1.16.2)\n",
+ "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai) (3.7.1)\n",
+ "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai) (1.7.0)\n",
+ "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai) (0.27.0)\n",
+ "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from openai) (2.6.4)\n",
+ "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai) (1.3.1)\n",
+ "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai) (4.66.2)\n",
+ "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from openai) (4.10.0)\n",
+ "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai) (3.6)\n",
+ "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai) (1.2.0)\n",
+ "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai) (2024.2.2)\n",
+ "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai) (1.0.5)\n",
+ "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.14.0)\n",
+ "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai) (0.6.0)\n",
+ "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai) (2.16.3)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import openai"
+ ],
+ "metadata": {
+ "id": "79L8lz5QiTPs"
+ },
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from google.colab import userdata\n",
+ "API_AUTH = userdata.get('auth_key')"
+ ],
+ "metadata": {
+ "id": "fFB7WlSXh6PX"
+ },
+ "execution_count": 5,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "openai.api_key = API_AUTH"
+ ],
+ "metadata": {
+ "id": "5kri4LtCiP9J"
+ },
+ "execution_count": 6,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# getting embeddings for the chunked data and saving them\n",
+ "def get_embedding(text_to_embed):\n",
+ " # Embed the list of text\n",
+ " response = openai.embeddings.create(\n",
+ " model = \"text-embedding-3-small\",\n",
+ " input = text_to_embed\n",
+ " )\n",
+ "\n",
+ " # embeddings = []\n",
+ "\n",
+ " # for i in range(embed_len):\n",
+ " # embeddings.append(response.data[i].embedding)\n",
+ "\n",
+ " return response.data[0].embedding"
+ ],
+ "metadata": {
+ "id": "jFGloqWNiSBs"
+ },
+ "execution_count": 7,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "df['embeddings'] = ''\n",
+ "for idx in range(len(df)):\n",
+ " embeddings_list = []\n",
+ "\n",
+ " for chunk in df['chunked_data'][idx]:\n",
+ " embeddings_list.append(get_embedding(chunk))\n",
+ "\n",
+ " df['embeddings'][idx] = embeddings_list\n"
+ ],
+ "metadata": {
+ "id": "pHs4_Kb7tJl_"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "**Since I am not creating a vector DB to store the embeddings, I will try to simulate it using simple file storage.**\n",
+ "\n",
+ "* Created a pickle file which contains the embeddings and the source information.\n",
+ "\n",
+ "\n",
+ "* My vector DB would contain - (chunk_id-unique attribute, embeddings)\n",
+ "\n",
+ "\n",
+ "\n",
+ "*Advantage: Embeddings need to be created only once (only if we do not change the embedding model) - so I can store the current embeddings and keep adding new ones as more data is added to the system*"
+ ],
+ "metadata": {
+ "id": "41t9_M17xvpO"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "**Query Embeddings**"
+ ],
+ "metadata": {
+ "id": "ptiW30pR4eqc"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "**Improvement:**\n",
+ "* Before Deep Learning models, the same tasks were performed using information retrieval systems (used in search engines as well)\n",
+ "\n",
+ "* I propose a similar system - We can match the keywords of the query with the documents - there is a whole different technique of using bag of words as embeddings, word normalizations, creating dictionaries, scoring and retrieving the top k documents. I have worked on this task in the past and it works well (but not perfectly, nothing is) and is used in elastic search.\n",
+ "\n",
+ "* Suggestion: We can use both embeddings and this information retrieval technique and then check which model has retrieved better documents and based on that use our LLM to prepare an answer."
+ ],
+ "metadata": {
+ "id": "L9YJl3KF-L0p"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "query = \"What does the company Caravan Health do?\"\n",
+ "query_embedding = get_embedding(query)\n",
+ "# query_embedding"
+ ],
+ "metadata": {
+ "id": "vIt8pRXq-F_p"
+ },
+ "execution_count": 8,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "**Retrieving top k documents based on similarity scores**\n",
+ "\n",
+ "* There are various scoring mechanisms which we can experiment with.\n",
+ "* I have used cosine score, because it is quick ( uses dot product in calculations) and relatively good.\n",
+ "\n",
+ "**Improvement**: I would also re-rank the documents, because I know there were some chunks which did not have any data (i.e the html parser did not produce any data) - So I just added the company name as data for such cases. But at this point these documents are not so useful and it would rather make sense re-rank the documents and get better data."
+ ],
+ "metadata": {
+ "id": "NWgc1oHc-47Q"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%pip install scikit-learn"
+ ],
+ "metadata": {
+ "id": "i3qBG66pp558"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from sklearn.metrics.pairwise import cosine_similarity"
+ ],
+ "metadata": {
+ "id": "l2NRw3L-ppEr"
+ },
+ "execution_count": 9,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# generating scores using cosine distance\n",
+ "def get_score(query, document):\n",
+ " similarity = cosine_similarity(np.array(query).reshape(-1,1), np.array(document).reshape(-1,1))\n",
+ "\n",
+ " similarity = np.array(similarity)\n",
+ " cosine_scores = np.mean(similarity)\n",
+ "\n",
+ " return cosine_scores"
+ ],
+ "metadata": {
+ "id": "2SYjnsg7p335"
+ },
+ "execution_count": 10,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def retrieve_top_docs(query_embedding, k):\n",
+ " score_dict = {}\n",
+ "\n",
+ " for idx in range(len(df)):\n",
+ " score_dict[df['chunk_id'][idx]] = get_score(query_embedding,df['embeddings'][idx])\n",
+ "\n",
+ " sorted_cosine = dict(sorted(score_dict.items(), key=lambda item: item[1], reverse=True))\n",
+ " top_k = dict(list(sorted_cosine.items())[:k])\n",
+ "\n",
+ " return score_dict"
+ ],
+ "metadata": {
+ "id": "wrnYKKHR_Hc8"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "All cosine scores are stored in the below pickle file along with thier chunk_id -\n",
+ "* I am directly retrieving it from a saved file, because I accidentally overwrote my embeddings file earlier and then had a series of issues --> creating embeddings again is very costly, so I left it as is for now"
+ ],
+ "metadata": {
+ "id": "7IbeavTx3Ijn"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# import pickle\n",
+ "with open('all_scores.pkl', 'rb') as f:\n",
+ " all_scores = pd.DataFrame(pickle.load(f))"
+ ],
+ "metadata": {
+ "id": "XmneToTayN5S"
+ },
+ "execution_count": 144,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "with open('df_pickle.pkl', 'rb') as f:\n",
+ " df = pd.DataFrame(pickle.load(f))"
+ ],
+ "metadata": {
+ "id": "P5iFlSUa1Bri"
+ },
+ "execution_count": 145,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def get_context(k, all_scores, df):\n",
+ " context_data = ''\n",
+ " top_scores = all_scores.nlargest(k, 'score')\n",
+ " top_scores_id = top_scores[\"chunk_id\"]\n",
+ "\n",
+ " for i in range(k):\n",
+ " id = top_10_scores.iloc[i]['chunk_id']\n",
+ " context_data += df.loc[df['chunk_id'] == id, 'desc'].values\n",
+ "\n",
+ " return context_data[0]"
+ ],
+ "metadata": {
+ "id": "rZtkEgQt0vSv"
+ },
+ "execution_count": 158,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Generating a response"
+ ],
+ "metadata": {
+ "id": "1P26TN7VNdo1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "context = get_context(10, all_scores, df)\n",
+ "query = \"What does the company Caravan Health do?\"\n",
+ "\n",
+ "\n",
+ "\n",
+ "response = openai.chat.completions.create(\n",
+ " messages = [\n",
+ " {\"role\": \"system\", \"content\": \"You will answer questions like a chat system. If you do not understand a question and if it doesnt match any context that is given to you, simply tell the user that you cannot answer it.\"},\n",
+ " {\"role\": \"user\", \"content\": query + \" \" + context}\n",
+ " ],\n",
+ " model = \"gpt-3.5-turbo-0125\",\n",
+ " temperature = 0.0\n",
+ ")\n",
+ "\n",
+ "# print(response.choices[0].message.content)"
+ ],
+ "metadata": {
+ "id": "s0Vi5PgWNiPT"
+ },
+ "execution_count": 159,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "response.choices[0].message.content\n"
+ ],
+ "metadata": {
+ "id": "2BIh1TOjRpRl",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 52
+ },
+ "outputId": "1ffb2c6d-1047-427d-ec88-9ab0090bd563"
+ },
+ "execution_count": 160,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Caravan Health is a company that partners with healthcare providers to help them transition to value-based care and improve patient outcomes. They offer solutions and support to help healthcare organizations succeed in accountable care and population health management.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 160
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "**How would I scale this RAG based LLM?**\n",
+ "* I would use a cloud provider - Amazon\n",
+ "* Move the entire codebase to Sagemaker and deploy my model\n",
+ "* Create a vector database and store embeddings and their source (unique identifier) - vector DB: PineCone\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "5OKZ2rTW6COD"
+ }
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.7"
+ },
+ "colab": {
+ "provenance": [],
+ "collapsed_sections": [
+ "luU2ySX_V4mo"
+ ]
}
- ],
- "source": [
- "%pip install pandas\n",
- "%pip install matplotlib\n",
- "%pip install openai"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Step 2: RAGtime!\n",
- "\n",
- "Now that you're a little more familar with the dataset, let's build a simple RAG system that uses OpenAI to help answer some questions about the dataset. To reiterate, we don't expect you to add anything else to the environment to build this system - for example, you don't need to set up a database or anything like that. You can add any libraries you need to the environment, but we'd like you to use OpenAI for any and all tasks that require a language model (we'll send you a key to use).\n",
- "\n",
- "That being said, we'd like you to show the specifics of how a RAG implementation works so please avoid using any libraries that provide end-to-end RAG implementations.\n",
- "\n",
- "Here is the question that we'd like you to answer via your RAG system:\n",
- "\n",
- "1. What does the company Caravan Health do?"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
},
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.11.7"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/salesforce/README.md b/salesforce/README.md
deleted file mode 100644
index af3e21c..0000000
--- a/salesforce/README.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# Setup
-
-## Prerequisites
-1. Salesforce CLI - https://developer.salesforce.com/docs/atlas.en-us.sfdx_setup.meta/sfdx_setup/sfdx_setup_install_cli.htm
-2. Salesforce Developer Environment - https://developer.salesforce.com/signup
-4. A Github account - https://docs.github.com/en/get-started/quickstart/set-up-git#setting-up-git
-
-## Interview setup
-1. Log into your Salesforce Developer Environment using the Salesforce CLI:
- - sfdx force:auth:web:login -r orgUrl
- - orgUrl should be replaced with the url of your Salesforce Developer Enviornment (i.e. - https://gradientworks5-dev-ed.my.salesforce.com)
-2. [Fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo#prerequisites) and clone the following repo: https://github.com/Gradient-Works/interview
-
-You should be ready to start working through the [interview prompt](https://github.com/Gradient-Works/interview/blob/main/docs/movies_lwc_instructions.md) now.
diff --git a/salesforce/docs/ff_api.md b/salesforce/docs/ff_api.md
deleted file mode 100644
index f777a56..0000000
--- a/salesforce/docs/ff_api.md
+++ /dev/null
@@ -1,135 +0,0 @@
-# FF API
-
-Everything you ever needed to know about the Fast and Furious franchise
-
-URL: https://api.gradientworks.dev/ff
-
-## List Movies
-
-HTTP Method: GET
-
-URL: http://api.gradientworks.dev/ff/movies
-
-### Query params
-
-None
-
-### Results
-
-json document with list of movie info
-
-### Example
-
-```
-$ curl https://api.gradientworks.dev/ff/movies
-{
- "movies": [
- {
- "id": 1,
- "title": "Fast and Furious"
- },
- {
- "id": 2,
- "title": "Too Fast Too Furious"
- },
- ....
- ]
-}
-```
-
-## Get movie
-
-HTTP Method: GET
-
-URL: http://api.gradientworks.dev/ff/movies/{ID}
-
-### Query params
-
-None
-
-### Results
-
-json document with specific movie info
-
-### Example
-
-```
-$ curl https://api.gradientworks.dev/ff/movies/1
-{
- "movie": {
- "id": 1,
- "title": "Fast and Furious",
- ....
- }
-}
-
-$ curl https://api.gradientworks.dev/ff/movies/2
-{
- "movie": {
- "id": 2,
- "title": "Too Fast Too Furious",
- ....
- }
-}
-```
-
-## Get characters in a movie
-
-HTTP Method: GET
-URL: http://api.gradientworks.dev/ff/characters
-
-### Query params
-
-None
-
-### Results
-
-json document with list of actors for a given movie
-
-### Example
-
-```
-$ curl https://api.gradientworks.dev/ff/characters
-{
- "characters": [
- {
- "id": 1,
- "name": "Dominic Torretto",
- "movies": [1,3,4,5,6,7,8,9,10]
- ....
- },
- {
- "id": 2,
- "name": "Brian O'Conner",
- "movies": [1,2,4,5,6,7]
- ....
- },
-}
-```
-
-## Get character
-
-HTTP Method: GET
-URL: http://api.gradientworks.dev/ff/characters/{ID}
-
-### Query params
-
-None
-
-### Results
-
-json document of character info
-
-### Example
-
-```
-$ curl https://api.gradientworks.dev/ff/characters/1
-{
- "character": {
- "id": 1,
- "name": "Dominic Torretto",
- "movies": [1,3,4,5,6,7,8,9,10]
- ....
- }
-}
-```
\ No newline at end of file
diff --git a/salesforce/docs/movies_lwc_instructions.md b/salesforce/docs/movies_lwc_instructions.md
deleted file mode 100644
index 5c5284d..0000000
--- a/salesforce/docs/movies_lwc_instructions.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Build a Lightning Component to display the Fast and Furious movies
-
-## Prompt
-Using the Fast and Furious API, get the list of movies and display them in a datatable.
-
-Please use the provided Apex classes and lwc for your work and make sure to include tests for your Apex classes.
-
-Information about the API can be found [here](https://github.com/Gradient-Works/interview/blob/main/docs/ff_api.md)
-
-## Some Implementation Details
-1. Please use Apex to interact with the API.
-2. Using `lightning-datatable`, build a component containing the following:
- - Columns for: Id, Name, Release Date, Opening Revenue
- - Opening Revenue should be formatted as currency
- - Release Date should be formatted as MM DD, YYYY
-3. Display the table in a Lightning App page named "Movies"
-
-## Submission Instructions
-Once you're done with the assignment, make sure that all of your work is pulled down from your org and included in your forked repository. It should be organized such that we can run this command to successfuly deploy and test your code:
-
- sfdx force:mdapi:deploy -u orgUser -d gradient-works-ff/src/main/default -w 10
-
-Let us know once your code has been pushed up to GitHub so we can make sure we have access to it before you leave.
diff --git a/salesforce/gradient-works-ff/.eslintignore b/salesforce/gradient-works-ff/.eslintignore
deleted file mode 100644
index 5f7b681..0000000
--- a/salesforce/gradient-works-ff/.eslintignore
+++ /dev/null
@@ -1,16 +0,0 @@
-**/lwc/**/*.css
-**/lwc/**/*.html
-**/lwc/**/*.json
-**/lwc/**/*.svg
-**/lwc/**/*.xml
-**/aura/**/*.auradoc
-**/aura/**/*.cmp
-**/aura/**/*.css
-**/aura/**/*.design
-**/aura/**/*.evt
-**/aura/**/*.json
-**/aura/**/*.svg
-**/aura/**/*.tokens
-**/aura/**/*.xml
-**/aura/**/*.app
-.sfdx
diff --git a/salesforce/gradient-works-ff/.forceignore b/salesforce/gradient-works-ff/.forceignore
deleted file mode 100755
index 7b5b5a7..0000000
--- a/salesforce/gradient-works-ff/.forceignore
+++ /dev/null
@@ -1,12 +0,0 @@
-# List files or directories below to ignore them when running force:source:push, force:source:pull, and force:source:status
-# More information: https://developer.salesforce.com/docs/atlas.en-us.sfdx_dev.meta/sfdx_dev/sfdx_dev_exclude_source.htm
-#
-
-package.xml
-
-# LWC configuration files
-**/jsconfig.json
-**/.eslintrc.json
-
-# LWC Jest
-**/__tests__/**
\ No newline at end of file
diff --git a/salesforce/gradient-works-ff/.gitignore b/salesforce/gradient-works-ff/.gitignore
deleted file mode 100644
index f891339..0000000
--- a/salesforce/gradient-works-ff/.gitignore
+++ /dev/null
@@ -1,40 +0,0 @@
-# This file is used for Git repositories to specify intentionally untracked files that Git should ignore.
-# If you are not using git, you can delete this file. For more information see: https://git-scm.com/docs/gitignore
-# For useful gitignore templates see: https://github.com/github/gitignore
-
-# Salesforce cache
-.sf/
-.sfdx/
-.localdevserver/
-deploy-options.json
-
-# LWC VSCode autocomplete
-**/lwc/jsconfig.json
-
-# LWC Jest coverage reports
-coverage/
-
-# Logs
-logs
-*.log
-npm-debug.log*
-yarn-debug.log*
-yarn-error.log*
-
-# Dependency directories
-node_modules/
-
-# Eslint cache
-.eslintcache
-
-# MacOS system files
-.DS_Store
-
-# Windows system files
-Thumbs.db
-ehthumbs.db
-[Dd]esktop.ini
-$RECYCLE.BIN/
-
-# Local environment variables
-.env
\ No newline at end of file
diff --git a/salesforce/gradient-works-ff/.husky/pre-commit b/salesforce/gradient-works-ff/.husky/pre-commit
deleted file mode 100755
index feac116..0000000
--- a/salesforce/gradient-works-ff/.husky/pre-commit
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/bin/sh
-. "$(dirname "$0")/_/husky.sh"
-
-npm run precommit
\ No newline at end of file
diff --git a/salesforce/gradient-works-ff/.prettierignore b/salesforce/gradient-works-ff/.prettierignore
deleted file mode 100755
index f3720b2..0000000
--- a/salesforce/gradient-works-ff/.prettierignore
+++ /dev/null
@@ -1,10 +0,0 @@
-# List files or directories below to ignore them when running prettier
-# More information: https://prettier.io/docs/en/ignore.html
-#
-
-**/staticresources/**
-.localdevserver
-.sfdx
-.vscode
-
-coverage/
\ No newline at end of file
diff --git a/salesforce/gradient-works-ff/.prettierrc b/salesforce/gradient-works-ff/.prettierrc
deleted file mode 100755
index 15683b6..0000000
--- a/salesforce/gradient-works-ff/.prettierrc
+++ /dev/null
@@ -1,13 +0,0 @@
-{
- "trailingComma": "none",
- "overrides": [
- {
- "files": "**/lwc/**/*.html",
- "options": { "parser": "lwc" }
- },
- {
- "files": "*.{cmp,page,component}",
- "options": { "parser": "html" }
- }
- ]
-}
diff --git a/salesforce/gradient-works-ff/.vscode/extensions.json b/salesforce/gradient-works-ff/.vscode/extensions.json
deleted file mode 100644
index 7e6cb10..0000000
--- a/salesforce/gradient-works-ff/.vscode/extensions.json
+++ /dev/null
@@ -1,9 +0,0 @@
-{
- "recommendations": [
- "salesforce.salesforcedx-vscode",
- "redhat.vscode-xml",
- "dbaeumer.vscode-eslint",
- "esbenp.prettier-vscode",
- "financialforce.lana"
- ]
-}
diff --git a/salesforce/gradient-works-ff/.vscode/launch.json b/salesforce/gradient-works-ff/.vscode/launch.json
deleted file mode 100644
index e07e391..0000000
--- a/salesforce/gradient-works-ff/.vscode/launch.json
+++ /dev/null
@@ -1,16 +0,0 @@
-{
- // Use IntelliSense to learn about possible attributes.
- // Hover to view descriptions of existing attributes.
- // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
- "version": "0.2.0",
- "configurations": [
- {
- "name": "Launch Apex Replay Debugger",
- "type": "apex-replay",
- "request": "launch",
- "logFile": "${command:AskForLogFileName}",
- "stopOnEntry": true,
- "trace": true
- }
- ]
-}
diff --git a/salesforce/gradient-works-ff/.vscode/settings.json b/salesforce/gradient-works-ff/.vscode/settings.json
deleted file mode 100644
index 76decfb..0000000
--- a/salesforce/gradient-works-ff/.vscode/settings.json
+++ /dev/null
@@ -1,7 +0,0 @@
-{
- "search.exclude": {
- "**/node_modules": true,
- "**/bower_components": true,
- "**/.sfdx": true
- }
-}
diff --git a/salesforce/gradient-works-ff/README.md b/salesforce/gradient-works-ff/README.md
deleted file mode 100644
index afcda4a..0000000
--- a/salesforce/gradient-works-ff/README.md
+++ /dev/null
@@ -1,18 +0,0 @@
-# Salesforce DX Project: Next Steps
-
-Now that youβve created a Salesforce DX project, whatβs next? Here are some documentation resources to get you started.
-
-## How Do You Plan to Deploy Your Changes?
-
-Do you want to deploy a set of changes, or create a self-contained application? Choose a [development model](https://developer.salesforce.com/tools/vscode/en/user-guide/development-models).
-
-## Configure Your Salesforce DX Project
-
-The `sfdx-project.json` file contains useful configuration information for your project. See [Salesforce DX Project Configuration](https://developer.salesforce.com/docs/atlas.en-us.sfdx_dev.meta/sfdx_dev/sfdx_dev_ws_config.htm) in the _Salesforce DX Developer Guide_ for details about this file.
-
-## Read All About It
-
-- [Salesforce Extensions Documentation](https://developer.salesforce.com/tools/vscode/)
-- [Salesforce CLI Setup Guide](https://developer.salesforce.com/docs/atlas.en-us.sfdx_setup.meta/sfdx_setup/sfdx_setup_intro.htm)
-- [Salesforce DX Developer Guide](https://developer.salesforce.com/docs/atlas.en-us.sfdx_dev.meta/sfdx_dev/sfdx_dev_intro.htm)
-- [Salesforce CLI Command Reference](https://developer.salesforce.com/docs/atlas.en-us.sfdx_cli_reference.meta/sfdx_cli_reference/cli_reference.htm)
diff --git a/salesforce/gradient-works-ff/config/project-scratch-def.json b/salesforce/gradient-works-ff/config/project-scratch-def.json
deleted file mode 100644
index ee4158e..0000000
--- a/salesforce/gradient-works-ff/config/project-scratch-def.json
+++ /dev/null
@@ -1,13 +0,0 @@
-{
- "orgName": "chloecoon company",
- "edition": "Developer",
- "features": ["EnableSetPasswordInApi"],
- "settings": {
- "lightningExperienceSettings": {
- "enableS1DesktopEnabled": true
- },
- "mobileSettings": {
- "enableS1EncryptedStoragePref2": false
- }
- }
-}
diff --git a/salesforce/gradient-works-ff/jest.config.js b/salesforce/gradient-works-ff/jest.config.js
deleted file mode 100644
index f5a9fed..0000000
--- a/salesforce/gradient-works-ff/jest.config.js
+++ /dev/null
@@ -1,6 +0,0 @@
-const { jestConfig } = require('@salesforce/sfdx-lwc-jest/config');
-
-module.exports = {
- ...jestConfig,
- modulePathIgnorePatterns: ['/.localdevserver']
-};
diff --git a/salesforce/gradient-works-ff/package.json b/salesforce/gradient-works-ff/package.json
deleted file mode 100644
index c2df91a..0000000
--- a/salesforce/gradient-works-ff/package.json
+++ /dev/null
@@ -1,41 +0,0 @@
-{
- "name": "salesforce-app",
- "private": true,
- "version": "1.0.0",
- "description": "Salesforce App",
- "scripts": {
- "lint": "eslint **/{aura,lwc}/**",
- "test": "npm run test:unit",
- "test:unit": "sfdx-lwc-jest",
- "test:unit:watch": "sfdx-lwc-jest --watch",
- "test:unit:debug": "sfdx-lwc-jest --debug",
- "test:unit:coverage": "sfdx-lwc-jest --coverage",
- "prettier": "prettier --write \"**/*.{cls,cmp,component,css,html,js,json,md,page,trigger,xml,yaml,yml}\"",
- "prettier:verify": "prettier --list-different \"**/*.{cls,cmp,component,css,html,js,json,md,page,trigger,xml,yaml,yml}\"",
- "postinstall": "husky install",
- "precommit": "lint-staged"
- },
- "devDependencies": {
- "@lwc/eslint-plugin-lwc": "^1.0.1",
- "@prettier/plugin-xml": "^0.13.1",
- "@salesforce/eslint-config-lwc": "^2.0.0",
- "@salesforce/eslint-plugin-aura": "^2.0.0",
- "@salesforce/eslint-plugin-lightning": "^0.1.1",
- "@salesforce/sfdx-lwc-jest": "^0.13.0",
- "eslint": "^7.29.0",
- "eslint-plugin-import": "^2.23.4",
- "eslint-plugin-jest": "^24.3.6",
- "husky": "^7.0.0",
- "lint-staged": "^11.0.0",
- "prettier": "^2.3.2",
- "prettier-plugin-apex": "^1.10.0"
- },
- "lint-staged": {
- "**/*.{cls,cmp,component,css,html,js,json,md,page,trigger,xml,yaml,yml}": [
- "prettier --write"
- ],
- "**/{aura,lwc}/**": [
- "eslint"
- ]
- }
-}
diff --git a/salesforce/gradient-works-ff/scripts/apex/hello.apex b/salesforce/gradient-works-ff/scripts/apex/hello.apex
deleted file mode 100644
index 1fba732..0000000
--- a/salesforce/gradient-works-ff/scripts/apex/hello.apex
+++ /dev/null
@@ -1,10 +0,0 @@
-// Use .apex files to store anonymous Apex.
-// You can execute anonymous Apex in VS Code by selecting the
-// apex text and running the command:
-// SFDX: Execute Anonymous Apex with Currently Selected Text
-// You can also execute the entire file by running the command:
-// SFDX: Execute Anonymous Apex with Editor Contents
-
-string tempvar = 'Enter_your_name_here';
-System.debug('Hello World!');
-System.debug('My name is ' + tempvar);
\ No newline at end of file
diff --git a/salesforce/gradient-works-ff/scripts/soql/account.soql b/salesforce/gradient-works-ff/scripts/soql/account.soql
deleted file mode 100644
index 10d4b9c..0000000
--- a/salesforce/gradient-works-ff/scripts/soql/account.soql
+++ /dev/null
@@ -1,6 +0,0 @@
-// Use .soql files to store SOQL queries.
-// You can execute queries in VS Code by selecting the
-// query text and running the command:
-// SFDX: Execute SOQL Query with Currently Selected Text
-
-SELECT Id, Name FROM Account
diff --git a/salesforce/gradient-works-ff/sfdx-project.json b/salesforce/gradient-works-ff/sfdx-project.json
deleted file mode 100644
index 37979e3..0000000
--- a/salesforce/gradient-works-ff/sfdx-project.json
+++ /dev/null
@@ -1,12 +0,0 @@
-{
- "packageDirectories": [
- {
- "path": "src",
- "default": true
- }
- ],
- "name": "gradient-works-ff",
- "namespace": "",
- "sfdcLoginUrl": "https://login.salesforce.com",
- "sourceApiVersion": "52.0"
-}
diff --git a/salesforce/gradient-works-ff/src/main/default/package.xml b/salesforce/gradient-works-ff/src/main/default/package.xml
deleted file mode 100644
index b4a54dd..0000000
--- a/salesforce/gradient-works-ff/src/main/default/package.xml
+++ /dev/null
@@ -1,36 +0,0 @@
-
-
-
- *
- ApexClass
-
-
- *
- ApexComponent
-
-
- *
- ApexPage
-
-
- *
- ApexTestSuite
-
-
- *
- ApexTrigger
-
-
- *
- AuraDefinitionBundle
-
-
- *
- LightningComponentBundle
-
-
- *
- StaticResource
-
- 52.0
-
\ No newline at end of file