{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Credit Risk Analysis using IBM PowerAI Snap ML
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example we will train a Logistic Regression model on customers' credit history dataset, using both scikit-learn and snap-ml-local.",
"\n",
"Update device_ids list in LogisticRegression of snap_ml based on the number of GPUs available for you.\n",
"\n",
"To avoid 'kernel restart' problem increase CPU and memory for the jupyter environment (e.g. memory 10GB, CPU 100) and restart it"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download input dataset of customers' information\n",
"\n",
"Two wget commands are given below for downloading input dataset - one for reduced dataset and another for bigger/full dataset. Many times better perfomance of snapML training is seen with bigger dataset.\n",
"\n",
"You can comment out downloading code below if running the same wget command more than once."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"File `credit_customer_history.csv' already there; not retrieving.\r\n"
]
}
],
"source": [
"# Many times better training time with snapML compared to sklearn with this full dataset\n",
"!wget -O credit_customer_history.csv -nc https://ibm.box.com/shared/static/c84jns0hy2ty05t3c3a9c17ca1mxpe6s.csv\n",
"\n",
"# Download reduced dataset\n",
"#!wget -O credit_customer_history.csv https://ibm.box.com/shared/static/tr7cz4drh7bwa8kbyw0erjfjh7fort0y.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from __future__ import print_function\n",
"import numpy as np\n",
"import pandas as pd\n",
"pd.options.display.max_columns = 999\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import sklearn\n",
"from sklearn.model_selection import train_test_split, StratifiedKFold\n",
"from sklearn.preprocessing import MinMaxScaler, LabelEncoder, normalize\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.metrics import f1_score, accuracy_score, roc_curve, roc_auc_score\n",
"from scipy.stats import chi2_contingency,ttest_ind\n",
"from sklearn.utils import shuffle\n",
"import time\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"application/javascript": [
"IPython.OutputArea.prototype._should_scroll = function(lines) {\n",
" return false;\n",
"}"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%javascript\n",
"IPython.OutputArea.prototype._should_scroll = function(lines) {\n",
" return false;\n",
"}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dataset Visualization\n",
"\n",
"\n",
"Let's take a quick look at the dataset.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 10000000 observations in the customer history dataset.\n",
"There are 19 variables in the dataset.\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" EMI_TENURE | \n",
" CREDIT_HISTORY | \n",
" TRANSACTION_CATEGORY | \n",
" TRANSACTION_AMOUNT | \n",
" ACCOUNT_TYPE | \n",
" ACCOUNT_AGE | \n",
" STATE | \n",
" IS_URBAN | \n",
" IS_STATE_BORDER | \n",
" HAS_CO_APPLICANT | \n",
" HAS_GUARANTOR | \n",
" OWN_REAL_ESTATE | \n",
" OTHER_INSTALMENT_PLAN | \n",
" OWN_RESIDENCE | \n",
" NUMBER_CREDITS | \n",
" RFM_SCORE | \n",
" OWN_CAR | \n",
" SHIP_INTERNATIONAL | \n",
" IS_DEFAULT | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 77 | \n",
" EXISTING CREDITS PAID BACK | \n",
" EDUCATION | \n",
" 27630 | \n",
" UNKNOWN/NONE | \n",
" above 7 YRS | \n",
" CT | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" NO | \n",
" 0 | \n",
" 4 | \n",
" NO | \n",
" NO | \n",
" No | \n",
"
\n",
" \n",
" 1 | \n",
" 119 | \n",
" EXISTING CREDITS PAID BACK | \n",
" ELECTRONICS | \n",
" 31314 | \n",
" above 1000 K USD | \n",
" 4 to 7 YRS | \n",
" CT | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
" No | \n",
"
\n",
" \n",
" 2 | \n",
" 84 | \n",
" EXISTING CREDITS PAID BACK | \n",
" FURNITURE | \n",
" 27630 | \n",
" above 1000 K USD | \n",
" 4 to 7 YRS | \n",
" PA | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
" No | \n",
"
\n",
" \n",
" 3 | \n",
" 119 | \n",
" DELAY IN PAST | \n",
" FURNITURE | \n",
" 33156 | \n",
" above 1000 K USD | \n",
" up to 1 YR | \n",
" PA | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" NO | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" NO | \n",
" NO | \n",
" Yes | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
" DELAY IN PAST | \n",
" FURNITURE | \n",
" 23946 | \n",
" above 1000 K USD | \n",
" up to 1 YR | \n",
" CT | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
" No | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" EMI_TENURE CREDIT_HISTORY TRANSACTION_CATEGORY \\\n",
"0 77 EXISTING CREDITS PAID BACK EDUCATION \n",
"1 119 EXISTING CREDITS PAID BACK ELECTRONICS \n",
"2 84 EXISTING CREDITS PAID BACK FURNITURE \n",
"3 119 DELAY IN PAST FURNITURE \n",
"4 105 DELAY IN PAST FURNITURE \n",
"\n",
" TRANSACTION_AMOUNT ACCOUNT_TYPE ACCOUNT_AGE STATE IS_URBAN \\\n",
"0 27630 UNKNOWN/NONE above 7 YRS CT NO \n",
"1 31314 above 1000 K USD 4 to 7 YRS CT YES \n",
"2 27630 above 1000 K USD 4 to 7 YRS PA NO \n",
"3 33156 above 1000 K USD up to 1 YR PA YES \n",
"4 23946 above 1000 K USD up to 1 YR CT NO \n",
"\n",
" IS_STATE_BORDER HAS_CO_APPLICANT HAS_GUARANTOR OWN_REAL_ESTATE \\\n",
"0 YES YES YES NO \n",
"1 YES YES YES NO \n",
"2 NO YES YES YES \n",
"3 NO YES NO NO \n",
"4 YES YES YES YES \n",
"\n",
" OTHER_INSTALMENT_PLAN OWN_RESIDENCE NUMBER_CREDITS RFM_SCORE OWN_CAR \\\n",
"0 YES NO 0 4 NO \n",
"1 NO YES 0 3 YES \n",
"2 NO YES 0 3 YES \n",
"3 NO YES 0 3 NO \n",
"4 YES NO 0 3 YES \n",
"\n",
" SHIP_INTERNATIONAL IS_DEFAULT \n",
"0 NO No \n",
"1 YES No \n",
"2 YES No \n",
"3 NO Yes \n",
"4 YES No "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cust_pd_full = pd.read_csv('credit_customer_history.csv')\n",
"\n",
"rows=1000000\n",
"cust_pd = cust_pd_full.head(rows)\n",
"print(\"There are \" + str(len(cust_pd_full)) + \" observations in the customer history dataset.\")\n",
"print(\"There are \" + str(len(cust_pd_full.columns)) + \" variables in the dataset.\")\n",
"\n",
"cust_pd.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# distribution of output variable default"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#cust_pd.IS_DEFAULT.value_counts().plot(kind='pie',colormap='winter',autopct='%1.0f%%').legend(bbox_to_anchor=(1.2, 0.6))\n",
"cust_pd.IS_DEFAULT.value_counts().plot(kind='pie',autopct='%1.0f%%').legend(bbox_to_anchor=(1.2, 0.6))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Default by Credit Program"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"cust_pd.TRANSACTION_CATEGORY.value_counts().plot(kind='pie',autopct='%1.0f%%').legend(bbox_to_anchor=(1.2, 0.7))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Top 3 credit programs with most Merchants are Electronics(28%), New Car(23.4%) and Furniture(18.1%)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"pixiedust": {
"displayParams": {
"aggregation": "COUNT",
"chartsize": "100",
"handlerId": "pieChart",
"keyFields": "IS_DEFAULT",
"rowCount": "1000"
}
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"defaults_by_credit_program = cust_pd.groupby(['TRANSACTION_CATEGORY','IS_DEFAULT']).size()\n",
"percentages = defaults_by_credit_program.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))\n",
"percentages.unstack().plot(kind='bar',stacked=True,color=['blue','red'],grid=False).legend(bbox_to_anchor=(1.2, 0.5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Top 3 credit programs with high default rate are Education(44%), New Car(38%), and Retraining(35.1%)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# Default by IS_STATE_BORDER\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"cust_pd.IS_STATE_BORDER.value_counts().plot(kind='pie',autopct='%1.0f%%').legend(bbox_to_anchor=(1.2, 0.5))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbUAAAETCAYAAACx75guAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFQJJREFUeJzt3XuUVeV9xvHnGQYCoijIgAgoJGU0eEFlYjWJrQaXS1OMYytREiNLDSxduVuvsTHeY8zFGhMviEmJFRKCttgsTGusaZO2Ege5SwKGeCHMyICIEfEyzK9/nD3JYZwbc86Zw7zz/ax11tn73e/e72/gMA97n33e44gQAAApqCh3AQAAFAuhBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEhGZbkLkKThw4fHuHHjyl0GAPQqS5cu3RIRVeWuY2+yV4TauHHjVFdXV+4yAKBXsf1CuWvY23D5EQCQDEINAJAMQg0AkAxCDQCQDEINAJCMTkPN9vdtb7a9Oq9tmO3Hba/Pnodm7bb9HdvP2V5p+7hSFg8AQL6unKn9k6TTW7VdLemJiJgg6YlsXZLOkDQhe8ySdE9xygQAoHOdhlpE/LekV1o1nyVpbrY8V1JtXvsPI+cpSQfYHlWsYgEA6Eh3P3w9MiLqJSki6m2PyNpHS3opr9/GrK2+9QFsz1LubE6HHHJIN8voYXa5K0hLRLkrSAevzeLitdlrFftGkbb+ZbX56oiI2RFRExE1VVXM8gIAKFx3Q+3llsuK2fPmrH2jpLF5/cZI2tT98gAA6LruhtqjkmZkyzMkLcprvyC7C/IESdtbLlMCAFBqnb6nZnu+pJMlDbe9UdJXJd0maYHtiyW9KGla1n2xpI9Kek7SG5IuLEHNAFpx21f50U38afZenYZaRExvZ9OUNvqGpM8UWhQAAN3BjCIAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBQUara/ZHuN7dW259seaHu87SW219v+se0BxSoWAICOdDvUbI+W9HlJNRFxpKR+ks6T9HVJd0TEBEnbJF1cjEIBAOhMoZcfKyUNsl0paR9J9ZI+Imlhtn2upNoCxwAAoEu6HWoR8QdJ35T0onJhtl3SUkmvRkRT1m2jpNGFFgkAQFcUcvlxqKSzJI2XdLCkwZLOaKNrtLP/LNt1tusaGxu7WwYAAH9SyOXHUyX9PiIaI+IdSY9I+qCkA7LLkZI0RtKmtnaOiNkRURMRNVVVVQWUAQBATiGh9qKkE2zvY9uSpkh6VtKTks7J+syQtKiwEgEA6JpC3lNbotwNIc9IWpUda7akqyRdZvs5SQdKeqAIdQIA0KnKzru0LyK+KumrrZo3SDq+kOMCANAdzCgCAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIRkGhZvsA2wtt/8b2Wtsn2h5m+3Hb67PnocUqFgCAjhR6pnanpJ9FxOGSJklaK+lqSU9ExARJT2TrAACUXLdDzfYQSX8l6QFJioi3I+JVSWdJmpt1myupttAiAQDoikLO1N4rqVHSD2wvsz3H9mBJIyOiXpKy5xFt7Wx7lu0623WNjY0FlAEAQE4hoVYp6ThJ90TEsZJ2aA8uNUbE7IioiYiaqqqqAsoAACCnkFDbKGljRCzJ1hcqF3Iv2x4lSdnz5sJKBACga7odahHRIOkl24dlTVMkPSvpUUkzsrYZkhYVVCEAAF1UWeD+n5P0kO0BkjZIulC5oFxg+2JJL0qaVuAYAAB0SUGhFhHLJdW0sWlKIccFAKA7mFEEAJAMQg0AkAxCDQCQjEJvFOlTrCh3CUnhTxNAsXGmBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGcz8CQKKWLl06orKyco6kI5XGSUyzpNVNTU2fnjx58ua2OhBqAJCoysrKOQcddND7q6qqtlVUVPT6OcSbm5vd2Ng4saGhYY6kj7XVJ4XkBgC07ciqqqrXUgg0SaqoqIiqqqrtyp15tt2nB+sBAPSsilQCrUX287SbXYQaAKBkbE+eOXPmmJb16667buRll112cKnG4z01AOgjbE0u5vEitLSzPgMGDIjFixcPra+vbxg1alRTMcdvC2dqAICS6devX1xwwQWNt95668jW29atWzfgxBNPrK6urp544oknVq9fv35AoeMRagCAkrriiis2P/LII8O2bt3aL7/9kksuOeQTn/jE1nXr1j177rnnbr300kvHFjoWoQYAKKlhw4Y1T5s2bettt902Ir992bJlg2fNmvWKJF166aWvLF26dN9CxyLUAAAld80117w8b9684Tt27Chp7hBqAICSGzly5K4zzzxz27x584a3tB177LE75syZM1SS7rvvvmE1NTWvFzoOoQYA6BHXXnttw6uvvvqnu+7vueeeFx988MHh1dXVE+fPn3/g3Xff/VKhY3BLPwD0EV25Bb/Y3njjjWUty2PHjm3auXPnn9YPO+ywt5966ql1xRyPMzUAQDIINQBAMgg1AEAyCDUAQDIKDjXb/Wwvs/3TbH287SW219v+se2Cpz0BAKArinGm9gVJa/PWvy7pjoiYIGmbpIuLMAYAAJ0qKNRsj5H0N5LmZOuW9BFJC7MucyXVFjIGAKB3am5u1uTJkw9bsGDBkJa2OXPmDD3ppJMmlGrMQj+n9o+SrpS0X7Z+oKRXI6Ll6wU2Shpd4BgAgGKwi/rVM4ro8HNvFRUVuvfee18499xz3zd16tRnm5qafNNNN41evHjx+qLWkT9md3e0PVXS5tj9h3IbXdv81lXbs2zX2a5rbGzsbhkAgL3YBz7wgTdPO+207V/5ylcOuvLKKw/++Mc/vvWII45466677jrwqKOOev/hhx8+8fzzzz9k165deuedd1RbWzu+urp64oQJE464+eabR3Q+wu4KOVP7kKSP2f6opIGShih35naA7crsbG2MpE1t7RwRsyXNlqSampqkvm4cAPBnt99++6ajjz564oABA5pXrFix9umnnx64aNGiA5555pm1/fv31/Tp0w+9//77h1VXV7/1yiuvVK5bt+5ZSdqyZUu/zo7dWrdDLSKukXSNJNk+WdLlEfFJ2z+RdI6kH0maIWlRd8cAAPR+Q4YMaa6trX1l33333TVo0KB47LHHhqxcuXLwUUcdNVGS3nzzzYoxY8a8XVtbu33Dhg0DL7zwwrFTp07dfvbZZ7+2p2OVYu7HqyT9yPbNkpZJeqAEYwAAepGKigpVVOTe8YoITZ8+fcudd975rit5a9asWfPwww/vf9ddd41YuHDh0Pnz57+wR+MUo9iI+EVETM2WN0TE8RHxFxExLSLeKsYYAIA0nHHGGX9ctGjRsPr6+kpJamho6Ld+/foBmzZtqmxubtZFF1207cYbb9y0atWqffb02MzSDwDoUccff/zOq6++etMpp5xS3dzcrP79+8fdd9/9Qr9+/TRz5sxxESHbuuWWWzbu6bEdUf57NGpqaqKurq7cZXTKbd3biW7bC156yeC1WVy95bVpe2lE1LS3fcWKFc9PmjRpS0/W1BNWrFgxfNKkSePa2sbcjwCAZBBqAIBkEGoAgGQQagCQrubm5uak3nHNfp7m9rYTagCQrtWNjY37pxJszc3Nbmxs3F/S6vb6cEs/ACSqqanp0w0NDXMaGhqOVBonMc2SVjc1NX26vQ6EGgAkavLkyZslfazcdfSkFJIbAABJhBoAICGEGgAgGYQaACAZhBoAIBmEGgAgGYQaACAZhBoAIBmEGgAgGYQaACAZhBoAIBmEGgAgGYQaACAZhBoAIBmEGgAgGYQaACAZhBoAIBmEGgAgGYQaACAZhBoAIBmEGgAgGYQaACAZhBoAIBmEGgAgGd0ONdtjbT9pe63tNba/kLUPs/247fXZ89DilQsAQPsKOVNrkvT3EfF+SSdI+oztiZKulvREREyQ9ES2DgBAyXU71CKiPiKeyZb/KGmtpNGSzpI0N+s2V1JtoUUCANAVRXlPzfY4ScdKWiJpZETUS7ngkzSinX1m2a6zXdfY2FiMMgAAfVzBoWZ7X0kPS/piRLzW1f0iYnZE1ERETVVVVaFlAABQWKjZ7q9coD0UEY9kzS/bHpVtHyVpc2ElAgDQNYXc/WhJD0haGxHfztv0qKQZ2fIMSYu6Xx4AAF1XWcC+H5L0KUmrbC/P2r4s6TZJC2xfLOlFSdMKKxEAgK7pdqhFxK8kuZ3NU7p7XAAAuosZRQAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJINQAAMkg1AAAySDUAADJKEmo2T7d9m9tP2f76lKMAQBAa0UPNdv9JH1P0hmSJkqabntisccBAKC1UpypHS/puYjYEBFvS/qRpLNKMA4AALupLMExR0t6KW99o6S/bN3J9ixJs7LV123/tgS19FXDJW0pdxGdsctdAcqA12ZxHVruAvY2pQi1tl4O8a6GiNmSZpdg/D7Pdl1E1JS7DqA1XpsotVJcftwoaWze+hhJm0owDgAAuylFqD0taYLt8bYHSDpP0qMlGAcAgN0U/fJjRDTZ/qykf5fUT9L3I2JNscdBh7isi70Vr02UlCPe9XYXAAC9EjOKAACSQagBAJJBqAEAkkGoAQCSQaglwPYI2zfYXmj7J9nyyHLXhb7L9kzbE7Jl2/6B7ddsr7R9XLnrQ7oItV7O9oeU+2ygJP1Q0j9ny0uybUA5fEHS89nydElHSxov6TJJd5apJvQB3NLfy9l+StKlEbGsVfsxku6LiHfNuwmUmu3lEXFMtjxP0pKIuDNbfyYiOFtDSXCm1vsNaR1okhQRyyXtV4Z6AElqtj3K9kBJUyT9PG/boDLVhD6gFBMao2fZ9tCI2NaqcZj4TwvK5zpJdcrNKvRoy6xCtv9a0oZyFoa0cfmxl8u+wmempMslPZM1T5b0deWmKLuvXLWhb7NdKWm//P9w2R6s3O+d18tXGVLG/+R7uewrfG6QdJNyb8w/L+lGSTcTaCgX21dGRFNEbLM9raU9InZI+nIZS0PiOFMDUHT5N4O0vjGEG0VQSryn1svZvq6DzRERN/VYMcCfuZ3lttaBoiHUer8dbbQNlnSxpAOVuywJ9LRoZ7mtdaBouPyYENv7Kfeh14slLZD0rYjYXN6q0BfZbpL0hnJnZYOyZWXrAyOif7lqQ9q4USQBtofZvlnSSuXOvo+LiKsINJTR45ImRcR+EVEZEUOyx34EGkqJUOvlbH9DuWmy/ijpqIi4vvVn1oAy+L6kn9n+sm1CDD2Gy4+9nO1mSW9JatLu71VYuRtFhpSlMPR52WfSrpN0uqQHJTW3bIuIb5erLqSNG0V6uYjgbBt7q3eUu5HpPcpN2dbccXegcIQagKKzfbqkb0t6VLn3eN/oZBegKLj8CKDobP9S0iUtcz4CPYVQAwAkg/djAADJINQAAMkg1AAAySDUAADJINTQY2y/nj1X2P6O7dW2V9l+2vb4Dva7KOu3MtvnLNvfs73c9rO2d2bLy22fk+1TaXuL7a/lHedfsj7P2d6et88Hbf/C9m/z2hZ2UM/1tv+Q9fuN7XtsV2TbbPsfbK+3vc72k7aPyNv3+byf5b9sH5q3bVd2zDW2V9i+LO+4J7eqebntU1vtt9r2v9k+oPt/S0AvFxE8ePTIQ9Lr2fN0SQslVWTrYyQNbWefMZJ+J2n/bH1fSePzto+TtLqN/T4q6X+yfd1q28mSftqq7ReSarr4c1wv6fJsuULSrySdkq1/VtJiSftk66dlNQzM1p+XNDxbvkHS/a3/fLLlEZJ+LumG9mpuZ7+5kq4t9981Dx7lenCmhnIYJak+IpolKSI2RvvzVY5Qbl7L17O+r0fE77swxnRJd0p6UdIJhZfcrgGSBkpqqf8qSZ+L7MPGEfEfkv5X0ifb2Pf/JI1u66CRm4x6lqTP2t6T7x9r95hAX0CooRwWSDozu2T2LdvHdtB3haSXJf3e9g9sn9nZwW0PkjRF0k8lzVcu4LriobxLe9/opO+XbC+XVC9pXUQstz1E0uCI+F2rvnWSjnjXEXJzIv5rewNExAbl/o2OyJpOanX58X35/W33U+7nfrST2oFkEWrocRGxUdJhkq5Rbj7AJ2xPaafvLuV++Z8jaZ2kO2xf38kQUyU9mZ0tPSzp7OwXfmc+GRHHZI8rOul7R0Qco1zgDLZ9Xgd9rd0nm37S9mZJp0qa18k4+Wdpv8yr75i88ByUBexWScOU+9oXoE8i1FAWEfFWRDyWhcetkmo76BsR8euI+Jqk8yT9XSeHny7pVNvPS1qq3DeAn1Kcyt9V2zuSfibpryLiNUk7bL+3VbfjJD2bt36KpEMlrZF0Y3vHzo6zS1Jn34u3MwvYQ5W7HPqZPfohgIQQauhxto+zfXC2XCHpaEkvtNP3YNvH5TUd017frP8QSR+WdEhEjIuIccr9ku/qJcg9kr3f9UHlbgaRpG9I+k52CVTZHYofVqszsojYKemLki6wPayN41ZJulfSdyOiS3PZRcR2SZ+XdDnfYYa+iln6UQ4jJN1v+z3Z+q8lfbedvv0lfTMLwTclNUq6pINj/62k/4yIt/LaFkm63fZ7WrW39pDtndnylog4tYO+X7J9flbfSkl3Z+13SRoqaZXtXZIaJJ2VhdhuIqLe9nzlQvcm/fkyYn/lvh/vQeVmum9xUra9xc0RsdtHDyJime0Vyp3RPthB/UCSmNAYAJAMLj8CAJLB5UfsNWwvUe5bkvN9KiJWlameayVNa9X8k4i4pRz1AOgclx8BAMng8iMAIBmEGgAgGYQaACAZhBoAIBn/D8OiEN68rP9DAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"defaults_by_xborder = cust_pd.groupby(['IS_STATE_BORDER','IS_DEFAULT']).size()\n",
"percentages = defaults_by_xborder.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))\n",
"percentages.unstack().plot(kind='bar',stacked=True, color=['blue','red'], grid=False).legend(bbox_to_anchor=(1.2, 0.5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# HAS_CO_APPLICANT vs. IS_DEFAULT"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbUAAAETCAYAAACx75guAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFNlJREFUeJzt3XuUVeV9xvHnGQYCXlCQAQlgsA2jIorKaLW5NMYsKwlRbIMEYyTGQHTZxtR4QdOYxEu8JI21phgRkxIbMAbNgpWSizFmpTeJAwgKKBiqSJiRwQvGu8P59Y+zxxzGMxfnnMNh3vl+1mLN3u+7z35/M+voM+979uztiBAAACmoqXYBAACUC6EGAEgGoQYASAahBgBIBqEGAEgGoQYASAahBgBIBqEGAEgGoQYASEZttQuQpGHDhsXYsWOrXQYA9CorVqzYHhF11a5jT7JHhNrYsWPV2NhY7TIAoFex/VS1a9jTsPwIAEgGoQYASAahBgBIBqEGAEgGoQYASEaXoWb7e7a32X60oG2o7ftsb8y+DsnabftfbD9he43tYypZPAAAhbozU/s3Sae0a5sj6f6IGCfp/mxfkiZLGpf9my3p1vKUCQBA17oMtYj4raTn2jWfJmlBtr1A0tSC9h9E3oOS9rc9slzFAgDQmZ7+8fWIiGiSpIhosj08ax8l6emC47ZkbU3tT2B7tvKzOR100EE9LGM3s6tdQVoiql1BOnhvlhfvzV6r3BeKFPsvq+i7IyLmRURDRDTU1XGXFwBA6Xoaas+0LStmX7dl7VskjSk4brSkrT0vDwCA7utpqC2VNDPbnilpSUH72dlVkMdL2tG2TAkAQKV1+Zma7UWSPiRpmO0tkr4q6XpJd9s+V9JmSdOyw5dJ+qikJyS9IumcCtQMAEBRXYZaRMzooOukIseGpAtKLQoAgJ7gjiIAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZJQUarb/wfZa24/aXmR7oO2DbS+3vdH2j2wPKFexAAB0psehZnuUpC9IaoiICZL6SfqkpBsk3RQR4yQ9L+ncchQKAEBXSl1+rJU0yHatpL0kNUn6sKTFWf8CSVNLHAMAgG7pcahFxB8kfUvSZuXDbIekFZJeiIjW7LAtkkaVWiQAAN1RyvLjEEmnSTpY0rsl7S1pcpFDo4PXz7bdaLuxpaWlp2UAAPCWUpYfPyLp/yKiJSLelHSvpL+UtH+2HClJoyVtLfbiiJgXEQ0R0VBXV1dCGQAA5JUSapslHW97L9uWdJKkdZIekPSJ7JiZkpaUViIAAN1Tymdqy5W/IGSlpEeyc82TdJmki2w/IekASXeUoU4AALpU2/UhHYuIr0r6arvmTZKOK+W8AAD0BHcUAQAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJKOk56n1NVZUu4Sk8NMEUG6EGpAAfuEqL36avRfLjwCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZJQUarb3t73Y9mO219s+wfZQ2/fZ3ph9HVKuYgEA6EypM7WbJf08Ig6VNFHSeklzJN0fEeMk3Z/tAwBQcT0ONduDJX1Q0h2SFBFvRMQLkk6TtCA7bIGkqaUWCQBAd5QyU/szSS2Svm97le35tveWNCIimiQp+zq82Ittz7bdaLuxpaWlhDIAAMgrJdRqJR0j6daIOFrSy3oHS40RMS8iGiKioa6uroQyAADIKyXUtkjaEhHLs/3FyofcM7ZHSlL2dVtpJQIA0D09DrWIaJb0tO1DsqaTJK2TtFTSzKxtpqQlJVUIAEA31Zb4+r+X9EPbAyRtknSO8kF5t+1zJW2WNK3EMQAA6JaSQi0iHpbUUKTrpFLOCwBAT3BHEQBAMgg1AEAyCDUAQDIINQBAMgg1AEAyCDUAQDIINQBAMgg1AEAyCDUAQDIINQBAMgg1AEAyCDUAQDIINQBAMgg1AEAyCDUAQDJKfUgoAGAPtWLFiuG1tbXzJU1QGpOYnKRHW1tbPzdp0qRtxQ4g1AAgUbW1tfMPPPDAw+rq6p6vqamJatdTqlwu55aWlvHNzc3zJZ1a7JgUkhsAUNyEurq6F1MINEmqqamJurq6HcrPPIsfsxvrAQDsXjWpBFqb7PvpMLsINQBAxdieNGvWrNFt+1deeeWIiy666N2VGo/P1ACgj7A1qZzni9CKro4ZMGBALFu2bEhTU1PzyJEjW8s5fjHM1AAAFdOvX784++yzW77xjW+MaN+3YcOGASeccEJ9fX39+BNOOKF+48aNA0odj1ADAFTUJZdcsu3ee+8d+uyzz/YrbD/vvPMOOvPMM5/dsGHDuunTpz97/vnnjyl1LEINAFBRQ4cOzU2bNu3Z66+/fnhh+6pVq/aePXv2c5J0/vnnP7dixYp9Sh2LUAMAVNzll1/+zMKFC4e9/PLLFc0dQg0AUHEjRozY+fGPf/z5hQsXDmtrO/roo1+eP3/+EEm67bbbhjY0NLxU6jiEGgBgt/jyl7/c/MILL7x11f2tt966+c477xxWX18/ftGiRQfMnTv36VLH4JJ+AOgjunMJfrm98sorq9q2x4wZ0/rqq6++tX/IIYe88eCDD24o53jM1AAAySDUAADJINQAAMkg1AAAySg51Gz3s73K9k+z/YNtL7e90faPbJd82xMAALqjHDO1CyWtL9i/QdJNETFO0vOSzi3DGAAAdKmkULM9WtLHJM3P9i3pw5IWZ4cskDS1lDEAAL1TLpfTpEmTDrn77rsHt7XNnz9/yAc+8IFxlRqz1L9T+2dJl0raN9s/QNILEdH2eIEtkkaVOAYAoBzssj56RhGd/t1bTU2Nvvvd7z41ffr0P58yZcq61tZWX3311aOWLVu2sax1FI7Z0xfaniJpW+z6TbnIoUWfump7tu1G240tLS09LQMAsAc79thjXzv55JN3fOUrXznw0ksvffcZZ5zx7OGHH/76LbfccsARRxxx2KGHHjr+rLPOOmjnzp168803NXXq1IPr6+vHjxs37vBrrrlmeNcj7KqUmdr7JJ1q+6OSBkoarPzMbX/btdlsbbSkrcVeHBHzJM2TpIaGhqQeNw4A+JMbb7xx65FHHjl+wIABudWrV69/6KGHBi5ZsmT/lStXru/fv79mzJjxnttvv31ofX39688991zthg0b1knS9u3b+3V17vZ6HGoRcbmkyyXJ9ockXRwRn7L9Y0mfkHSXpJmSlvR0DABA7zd48ODc1KlTn9tnn312Dho0KH72s58NXrNmzd5HHHHEeEl67bXXakaPHv3G1KlTd2zatGngOeecM2bKlCk7Tj/99Bff6ViVuPfjZZLusn2NpFWS7qjAGACAXqSmpkY1NflPvCJCM2bM2H7zzTe/bSVv7dq1a++55579brnlluGLFy8esmjRoqfe0TjlKDYifhMRU7LtTRFxXES8NyKmRcTr5RgDAJCGyZMn/3HJkiVDm5qaaiWpubm538aNGwds3bq1NpfL6bOf/ezzV1111dZHHnlkr3d6bu7SDwDYrY477rhX58yZs/XEE0+sz+Vy6t+/f8ydO/epfv36adasWWMjQrZ17bXXbnmn53ZE9a/RaGhoiMbGxmqX0SUXu7YTPbYHvPWSwXuzvHrLe9P2ioho6Kh/9erVT06cOHH77qxpd1i9evWwiRMnji3Wx70fAQDJINQAAMkg1AAAySDUACBduVwul9Qnrtn3k+uon1ADgHQ92tLSsl8qwZbL5dzS0rKfpEc7OoZL+gEgUa2trZ9rbm6e39zcPEFpTGJykh5tbW39XEcHEGoAkKhJkyZtk3RqtevYnVJIbgAAJBFqAICEEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGQQagCAZBBqAIBkEGoAgGT0ONRsj7H9gO31ttfavjBrH2r7Ptsbs69DylcuAAAdK2Wm1irpSxFxmKTjJV1ge7ykOZLuj4hxku7P9gEAqLgeh1pENEXEymz7j5LWSxol6TRJC7LDFkiaWmqRAAB0R1k+U7M9VtLRkpZLGhERTVI++CQN7+A1s2032m5saWkpRxkAgD6u5FCzvY+keyR9MSJe7O7rImJeRDRERENdXV2pZQAAUFqo2e6vfKD9MCLuzZqfsT0y6x8paVtpJQIA0D2lXP1oSXdIWh8R3y7oWippZrY9U9KSnpcHAED31Zbw2vdJ+rSkR2w/nLVdIel6SXfbPlfSZknTSisRAIDu6XGoRcR/SXIH3Sf19LwAAPQUdxQBACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJINQAwAkg1ADACSDUAMAJKMioWb7FNuP237C9pxKjAEAQHtlDzXb/ST9q6TJksZLmmF7fLnHAQCgvUrM1I6T9EREbIqINyTdJem0CowDAMAuaitwzlGSni7Y3yLpL9ofZHu2pNnZ7ku2H69ALX3VMEnbq11EV+xqV4Aq4L1ZXu+pdgF7mkqEWrG3Q7ytIWKepHkVGL/Ps90YEQ3VrgNoj/cmKq0Sy49bJI0p2B8taWsFxgEAYBeVCLWHJI2zfbDtAZI+KWlpBcYBAGAXZV9+jIhW238n6ReS+kn6XkSsLfc46BTLuthT8d5ERTnibR93AQDQK3FHEQBAMgg1AEAyCDUAQDIINQBAMgi1BNgebvvrthfb/nG2PaLadaHvsj3L9rhs27a/b/tF22tsH1Pt+pAuQq2Xs/0+5f82UJJ+IOnfs+3lWR9QDRdKejLbniHpSEkHS7pI0s1Vqgl9AJf093K2H5R0fkSsatd+lKTbIuJt990EKs32wxFxVLa9UNLyiLg5218ZEczWUBHM1Hq/we0DTZIi4mFJ+1ahHkCScrZH2h4o6SRJvyroG1SlmtAHVOKGxti9bHtIRDzfrnGo+KUF1XOlpEbl7yq0tO2uQrb/StKmahaGtLH82Mtlj/CZJeliSSuz5kmSblD+FmW3Vas29G22ayXtW/gLl+29lf//zkvVqwwp4zf5Xi57hM/XJV2t/AfzT0q6StI1BBqqxfalEdEaEc/bntbWHhEvS7qiiqUhcczUAJRd4cUg7S8M4UIRVBKfqfVytq/spDsi4urdVgzwJ+5gu9g+UDaEWu/3cpG2vSWdK+kA5Zclgd0tOtgutg+UDcuPCbG9r/J/9HqupLsl/VNEbKtuVeiLbLdKekX5WdmgbFvZ/sCI6F+t2pA2LhRJgO2htq+RtEb52fcxEXEZgYYquk/SxIjYNyJqI2Jw9m9fAg2VRKj1cra/qfxtsv4o6YiI+Fr7v1kDquB7kn5u+wrbhBh2G5YfeznbOUmvS2rVrp9VWPkLRQZXpTD0ednfpF0p6RRJd0rKtfVFxLerVRfSxoUivVxEMNvGnupN5S9kepfyt2zLdX44UDpCDUDZ2T5F0rclLVX+M95XungJUBYsPwIoO9v/Kem8tns+ArsLoQYASAafxwAAkkGoAQCSQagBAJJBqAEAkkGooaJsv9Ru/zO2v9OubbXtRe3ajre93PbDttfb/loX40y23Zgd+5jtbxX0zc7aHrP9O9vv70bddbbftP35du1P2n4kq/mXtg/sRvuwzn4Gts+2/ajttbbX2b64oK/W9nbb17U7x29sNxbsN2Rtf539zB62/ZLtx7PtH3T1PQMpINRQVbYPU/59+MHsDhRtFkiaHRFHSZqg/A2aOzrHBEnfkXRWRByWHb8p65si6fOS3h8Rh0o6T9LCttDpxDRJD0qaUaTvxIiYKKlRuz7wsqP2DtmeLOmLkk6OiMMlHSNpR8EhJ0t6XNIZtts/smV49vq3RMQvIuKo7OfWKOlT2f7Z3akH6O0INVTbmcrfQumXkk4taB8uqUmSImJnRKzr5ByXSro2Ih7Ljm+NiLlZ32WSLomI7VnfSuUD84Iu6poh6UuSRtse1cExv5X03nfQXszlki6OiK1Zfa9FxO3t6rhZ0mZJx7d77Tcl/WM3xwH6BEINlTaoYDnsYUlXteufLulHkhZp11nRTZIet/0T25+3PbCTMSZIWtFB3+FF+hqz9qJsj5F0YET8TvkZ4vQODp0i6ZF30F5Mh7XbHiTpJEk/1dt/PpL0v5Jet31iN8cCkkeoodJebVsOy5bE3npSt+1jJbVExFOS7pd0jO0hkhQRV0lqUH4Gd6akn5exJqvzB1V+Un9a7rxLbw+TB7KAHizpum6099QUSQ9kt5i6R9Lptvu1O+YaMVsD3kKooZpmSDrU9pOSfq98GPxtW2dE/D4iblV+tjLR9gEdnGetpEkd9K0r0ndM1t5ZXZ/J6lqajT2uoP/Ets+pIuKFbrR3prPaZ0j6SFbHCuWfZL7LrCwifi1poN6+NAn0SYQaqsJ2jfIXYxwZEWMjYqyk05TNimx/rODCiHGSdkrqKCi+KekK2/Vt57Z9UdZ3o6Qb2gLR9lGSPiNpbrET2T5E0t4RMaqgruuUn71VwnWSbiy4WvJdtr9ge7Ck90s6qKCOC1T8wpVrlf9cEejzuEs/quWDkv4QEX8oaPutpPG2R0r6tKSbbL+i/LPiPhURO4udKCLW2P6ipEW291J+afE/sr6l2YUe/2M7lH+Y6lkR0dRBXTMk/aRd2z3KL0Ne3ZNvVNKa7Ll3Un5Zc01B7ctsj5D0qyzEQ/kHbP6NpF9HxOsF51mifAC+q/Dk2TlaelgbkBRuaAwASAbLjwCAZLD8iF7D9jmSLmzX/N8R0dXfnHV0vp9IOrhd82UR8YuenA9A9bH8CABIBsuPAIBkEGoAgGQQagCAZBBqAIBk/D8xKIyU9zW/8wAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"defaults_by_rent = cust_pd.groupby(['HAS_CO_APPLICANT','IS_DEFAULT']).size()\n",
"percentages = defaults_by_rent.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))\n",
"percentages.unstack().plot(kind='bar',stacked=True, color=['blue','red'], grid=False).legend(bbox_to_anchor=(1.2, 0.5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From this stacked bar chart, we can see that Merchants who rent their residence have higher default rate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CREDIT_HISTORY vs. IS_DEFAULT"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"cust_pd.CREDIT_HISTORY.value_counts().plot(kind='bar', title='CREDIT_HISTORY')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"defaults_by_history = cust_pd.groupby(['CREDIT_HISTORY','IS_DEFAULT']).size()\n",
"percentages = defaults_by_history.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))\n",
"percentages.unstack().plot(kind='bar',stacked=True,color=['blue','red'],grid=False).legend(bbox_to_anchor=(1.2, 0.5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Preparation"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" EMI_TENURE | \n",
" CREDIT_HISTORY | \n",
" TRANSACTION_CATEGORY | \n",
" TRANSACTION_AMOUNT | \n",
" ACCOUNT_TYPE | \n",
" ACCOUNT_AGE | \n",
" STATE | \n",
" IS_URBAN | \n",
" IS_STATE_BORDER | \n",
" HAS_CO_APPLICANT | \n",
" HAS_GUARANTOR | \n",
" OWN_REAL_ESTATE | \n",
" OTHER_INSTALMENT_PLAN | \n",
" OWN_RESIDENCE | \n",
" NUMBER_CREDITS | \n",
" RFM_SCORE | \n",
" OWN_CAR | \n",
" SHIP_INTERNATIONAL | \n",
" IS_DEFAULT | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 77 | \n",
" EXISTING CREDITS PAID BACK | \n",
" EDUCATION | \n",
" 27630 | \n",
" UNKNOWN/NONE | \n",
" above 7 YRS | \n",
" CT | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" NO | \n",
" 0 | \n",
" 4 | \n",
" NO | \n",
" NO | \n",
" No | \n",
"
\n",
" \n",
" 1 | \n",
" 119 | \n",
" EXISTING CREDITS PAID BACK | \n",
" ELECTRONICS | \n",
" 31314 | \n",
" above 1000 K USD | \n",
" 4 to 7 YRS | \n",
" CT | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
" No | \n",
"
\n",
" \n",
" 2 | \n",
" 84 | \n",
" EXISTING CREDITS PAID BACK | \n",
" FURNITURE | \n",
" 27630 | \n",
" above 1000 K USD | \n",
" 4 to 7 YRS | \n",
" PA | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
" No | \n",
"
\n",
" \n",
" 3 | \n",
" 119 | \n",
" DELAY IN PAST | \n",
" FURNITURE | \n",
" 33156 | \n",
" above 1000 K USD | \n",
" up to 1 YR | \n",
" PA | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" NO | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" NO | \n",
" NO | \n",
" Yes | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
" DELAY IN PAST | \n",
" FURNITURE | \n",
" 23946 | \n",
" above 1000 K USD | \n",
" up to 1 YR | \n",
" CT | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
" No | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" EMI_TENURE CREDIT_HISTORY TRANSACTION_CATEGORY \\\n",
"0 77 EXISTING CREDITS PAID BACK EDUCATION \n",
"1 119 EXISTING CREDITS PAID BACK ELECTRONICS \n",
"2 84 EXISTING CREDITS PAID BACK FURNITURE \n",
"3 119 DELAY IN PAST FURNITURE \n",
"4 105 DELAY IN PAST FURNITURE \n",
"\n",
" TRANSACTION_AMOUNT ACCOUNT_TYPE ACCOUNT_AGE STATE IS_URBAN \\\n",
"0 27630 UNKNOWN/NONE above 7 YRS CT NO \n",
"1 31314 above 1000 K USD 4 to 7 YRS CT YES \n",
"2 27630 above 1000 K USD 4 to 7 YRS PA NO \n",
"3 33156 above 1000 K USD up to 1 YR PA YES \n",
"4 23946 above 1000 K USD up to 1 YR CT NO \n",
"\n",
" IS_STATE_BORDER HAS_CO_APPLICANT HAS_GUARANTOR OWN_REAL_ESTATE \\\n",
"0 YES YES YES NO \n",
"1 YES YES YES NO \n",
"2 NO YES YES YES \n",
"3 NO YES NO NO \n",
"4 YES YES YES YES \n",
"\n",
" OTHER_INSTALMENT_PLAN OWN_RESIDENCE NUMBER_CREDITS RFM_SCORE OWN_CAR \\\n",
"0 YES NO 0 4 NO \n",
"1 NO YES 0 3 YES \n",
"2 NO YES 0 3 YES \n",
"3 NO YES 0 3 NO \n",
"4 YES NO 0 3 YES \n",
"\n",
" SHIP_INTERNATIONAL IS_DEFAULT \n",
"0 NO No \n",
"1 YES No \n",
"2 YES No \n",
"3 NO Yes \n",
"4 YES No "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#cust_pd = cust_pd.sort_values(['IS_DEFAULT'],ascending=[False])\n",
"#cust_pd = shuffle(cust_pd)\n",
"cust_pd = cust_pd_full\n",
"cust_pd.head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Split Dataframe into Features and Label"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cust_pd_X.shape= (10000000, 18) cust_pd_Y.shape= (10000000, 1)\n"
]
}
],
"source": [
"cust_pd_Y = cust_pd[['IS_DEFAULT']]\n",
"cust_pd_X = cust_pd.drop(['IS_DEFAULT'],axis=1)\n",
"\n",
"print('cust_pd_X.shape=', cust_pd_X.shape, 'cust_pd_Y.shape=', cust_pd_Y.shape)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transform Label"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" IS_DEFAULT | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" No | \n",
"
\n",
" \n",
" 1 | \n",
" No | \n",
"
\n",
" \n",
" 2 | \n",
" No | \n",
"
\n",
" \n",
" 3 | \n",
" Yes | \n",
"
\n",
" \n",
" 4 | \n",
" No | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" IS_DEFAULT\n",
"0 No\n",
"1 No\n",
"2 No\n",
"3 Yes\n",
"4 No"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cust_pd_Y.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" IS_DEFAULT | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" IS_DEFAULT\n",
"0 0\n",
"1 0\n",
"2 0\n",
"3 1\n",
"4 0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"le = LabelEncoder()\n",
"cust_pd_Y['IS_DEFAULT'] = le.fit_transform(cust_pd_Y['IS_DEFAULT'])\n",
"cust_pd_Y.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transform Features"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"features df shape = (10000000, 18)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" EMI_TENURE | \n",
" CREDIT_HISTORY | \n",
" TRANSACTION_CATEGORY | \n",
" TRANSACTION_AMOUNT | \n",
" ACCOUNT_TYPE | \n",
" ACCOUNT_AGE | \n",
" STATE | \n",
" IS_URBAN | \n",
" IS_STATE_BORDER | \n",
" HAS_CO_APPLICANT | \n",
" HAS_GUARANTOR | \n",
" OWN_REAL_ESTATE | \n",
" OTHER_INSTALMENT_PLAN | \n",
" OWN_RESIDENCE | \n",
" NUMBER_CREDITS | \n",
" RFM_SCORE | \n",
" OWN_CAR | \n",
" SHIP_INTERNATIONAL | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 77 | \n",
" EXISTING CREDITS PAID BACK | \n",
" EDUCATION | \n",
" 27630 | \n",
" UNKNOWN/NONE | \n",
" above 7 YRS | \n",
" CT | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" NO | \n",
" 0 | \n",
" 4 | \n",
" NO | \n",
" NO | \n",
"
\n",
" \n",
" 1 | \n",
" 119 | \n",
" EXISTING CREDITS PAID BACK | \n",
" ELECTRONICS | \n",
" 31314 | \n",
" above 1000 K USD | \n",
" 4 to 7 YRS | \n",
" CT | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
"
\n",
" \n",
" 2 | \n",
" 84 | \n",
" EXISTING CREDITS PAID BACK | \n",
" FURNITURE | \n",
" 27630 | \n",
" above 1000 K USD | \n",
" 4 to 7 YRS | \n",
" PA | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
"
\n",
" \n",
" 3 | \n",
" 119 | \n",
" DELAY IN PAST | \n",
" FURNITURE | \n",
" 33156 | \n",
" above 1000 K USD | \n",
" up to 1 YR | \n",
" PA | \n",
" YES | \n",
" NO | \n",
" YES | \n",
" NO | \n",
" NO | \n",
" NO | \n",
" YES | \n",
" 0 | \n",
" 3 | \n",
" NO | \n",
" NO | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
" DELAY IN PAST | \n",
" FURNITURE | \n",
" 23946 | \n",
" above 1000 K USD | \n",
" up to 1 YR | \n",
" CT | \n",
" NO | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" YES | \n",
" NO | \n",
" 0 | \n",
" 3 | \n",
" YES | \n",
" YES | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" EMI_TENURE CREDIT_HISTORY TRANSACTION_CATEGORY \\\n",
"0 77 EXISTING CREDITS PAID BACK EDUCATION \n",
"1 119 EXISTING CREDITS PAID BACK ELECTRONICS \n",
"2 84 EXISTING CREDITS PAID BACK FURNITURE \n",
"3 119 DELAY IN PAST FURNITURE \n",
"4 105 DELAY IN PAST FURNITURE \n",
"\n",
" TRANSACTION_AMOUNT ACCOUNT_TYPE ACCOUNT_AGE STATE IS_URBAN \\\n",
"0 27630 UNKNOWN/NONE above 7 YRS CT NO \n",
"1 31314 above 1000 K USD 4 to 7 YRS CT YES \n",
"2 27630 above 1000 K USD 4 to 7 YRS PA NO \n",
"3 33156 above 1000 K USD up to 1 YR PA YES \n",
"4 23946 above 1000 K USD up to 1 YR CT NO \n",
"\n",
" IS_STATE_BORDER HAS_CO_APPLICANT HAS_GUARANTOR OWN_REAL_ESTATE \\\n",
"0 YES YES YES NO \n",
"1 YES YES YES NO \n",
"2 NO YES YES YES \n",
"3 NO YES NO NO \n",
"4 YES YES YES YES \n",
"\n",
" OTHER_INSTALMENT_PLAN OWN_RESIDENCE NUMBER_CREDITS RFM_SCORE OWN_CAR \\\n",
"0 YES NO 0 4 NO \n",
"1 NO YES 0 3 YES \n",
"2 NO YES 0 3 YES \n",
"3 NO YES 0 3 NO \n",
"4 YES NO 0 3 YES \n",
"\n",
" SHIP_INTERNATIONAL \n",
"0 NO \n",
"1 YES \n",
"2 YES \n",
"3 NO \n",
"4 YES "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('features df shape = ', cust_pd_X.shape)\n",
"cust_pd_X.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### One hot encoding for categorical Columns"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" EMI_TENURE | \n",
" TRANSACTION_AMOUNT | \n",
" NUMBER_CREDITS | \n",
" CREDIT_HISTORY_ALL CREDITS PAID BACK | \n",
" CREDIT_HISTORY_CRITICAL ACCOUNT | \n",
" CREDIT_HISTORY_DELAY IN PAST | \n",
" CREDIT_HISTORY_EXISTING CREDITS PAID BACK | \n",
" CREDIT_HISTORY_NONE TAKEN | \n",
" TRANSACTION_CATEGORY_EDUCATION | \n",
" TRANSACTION_CATEGORY_ELECTRONICS | \n",
" TRANSACTION_CATEGORY_FURNITURE | \n",
" TRANSACTION_CATEGORY_NEW CAR | \n",
" TRANSACTION_CATEGORY_OTHER | \n",
" TRANSACTION_CATEGORY_RETRAINING | \n",
" TRANSACTION_CATEGORY_USED CAR | \n",
" ACCOUNT_TYPE_100 to 500 K USD | \n",
" ACCOUNT_TYPE_500 to 1000 K USD | \n",
" ACCOUNT_TYPE_UNKNOWN/NONE | \n",
" ACCOUNT_TYPE_above 1000 K USD | \n",
" ACCOUNT_TYPE_up to 100 K USD | \n",
" ACCOUNT_AGE_1 to 4 YRS | \n",
" ACCOUNT_AGE_4 to 7 YRS | \n",
" ACCOUNT_AGE_TBD | \n",
" ACCOUNT_AGE_above 7 YRS | \n",
" ACCOUNT_AGE_up to 1 YR | \n",
" STATE_CT | \n",
" STATE_NJ | \n",
" STATE_NY | \n",
" STATE_PA | \n",
" IS_URBAN_NO | \n",
" IS_URBAN_YES | \n",
" IS_STATE_BORDER_NO | \n",
" IS_STATE_BORDER_YES | \n",
" HAS_CO_APPLICANT_NO | \n",
" HAS_CO_APPLICANT_YES | \n",
" HAS_GUARANTOR_NO | \n",
" HAS_GUARANTOR_YES | \n",
" OWN_REAL_ESTATE_NO | \n",
" OWN_REAL_ESTATE_YES | \n",
" OTHER_INSTALMENT_PLAN_NO | \n",
" OTHER_INSTALMENT_PLAN_YES | \n",
" OWN_RESIDENCE_NO | \n",
" OWN_RESIDENCE_YES | \n",
" RFM_SCORE_1 | \n",
" RFM_SCORE_2 | \n",
" RFM_SCORE_3 | \n",
" RFM_SCORE_4 | \n",
" OWN_CAR_NO | \n",
" OWN_CAR_YES | \n",
" SHIP_INTERNATIONAL_NO | \n",
" SHIP_INTERNATIONAL_YES | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 77 | \n",
" 27630 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 119 | \n",
" 31314 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 84 | \n",
" 27630 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 119 | \n",
" 33156 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
" 23946 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" EMI_TENURE TRANSACTION_AMOUNT NUMBER_CREDITS \\\n",
"0 77 27630 0 \n",
"1 119 31314 0 \n",
"2 84 27630 0 \n",
"3 119 33156 0 \n",
"4 105 23946 0 \n",
"\n",
" CREDIT_HISTORY_ALL CREDITS PAID BACK CREDIT_HISTORY_CRITICAL ACCOUNT \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" CREDIT_HISTORY_DELAY IN PAST CREDIT_HISTORY_EXISTING CREDITS PAID BACK \\\n",
"0 0 1 \n",
"1 0 1 \n",
"2 0 1 \n",
"3 1 0 \n",
"4 1 0 \n",
"\n",
" CREDIT_HISTORY_NONE TAKEN TRANSACTION_CATEGORY_EDUCATION \\\n",
"0 0 1 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" TRANSACTION_CATEGORY_ELECTRONICS TRANSACTION_CATEGORY_FURNITURE \\\n",
"0 0 0 \n",
"1 1 0 \n",
"2 0 1 \n",
"3 0 1 \n",
"4 0 1 \n",
"\n",
" TRANSACTION_CATEGORY_NEW CAR TRANSACTION_CATEGORY_OTHER \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" TRANSACTION_CATEGORY_RETRAINING TRANSACTION_CATEGORY_USED CAR \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" ACCOUNT_TYPE_100 to 500 K USD ACCOUNT_TYPE_500 to 1000 K USD \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" ACCOUNT_TYPE_UNKNOWN/NONE ACCOUNT_TYPE_above 1000 K USD \\\n",
"0 1 0 \n",
"1 0 1 \n",
"2 0 1 \n",
"3 0 1 \n",
"4 0 1 \n",
"\n",
" ACCOUNT_TYPE_up to 100 K USD ACCOUNT_AGE_1 to 4 YRS \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" ACCOUNT_AGE_4 to 7 YRS ACCOUNT_AGE_TBD ACCOUNT_AGE_above 7 YRS \\\n",
"0 0 0 1 \n",
"1 1 0 0 \n",
"2 1 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"\n",
" ACCOUNT_AGE_up to 1 YR STATE_CT STATE_NJ STATE_NY STATE_PA \\\n",
"0 0 1 0 0 0 \n",
"1 0 1 0 0 0 \n",
"2 0 0 0 0 1 \n",
"3 1 0 0 0 1 \n",
"4 1 1 0 0 0 \n",
"\n",
" IS_URBAN_NO IS_URBAN_YES IS_STATE_BORDER_NO IS_STATE_BORDER_YES \\\n",
"0 1 0 0 1 \n",
"1 0 1 0 1 \n",
"2 1 0 1 0 \n",
"3 0 1 1 0 \n",
"4 1 0 0 1 \n",
"\n",
" HAS_CO_APPLICANT_NO HAS_CO_APPLICANT_YES HAS_GUARANTOR_NO \\\n",
"0 0 1 0 \n",
"1 0 1 0 \n",
"2 0 1 0 \n",
"3 0 1 1 \n",
"4 0 1 0 \n",
"\n",
" HAS_GUARANTOR_YES OWN_REAL_ESTATE_NO OWN_REAL_ESTATE_YES \\\n",
"0 1 1 0 \n",
"1 1 1 0 \n",
"2 1 0 1 \n",
"3 0 1 0 \n",
"4 1 0 1 \n",
"\n",
" OTHER_INSTALMENT_PLAN_NO OTHER_INSTALMENT_PLAN_YES OWN_RESIDENCE_NO \\\n",
"0 0 1 1 \n",
"1 1 0 0 \n",
"2 1 0 0 \n",
"3 1 0 0 \n",
"4 0 1 1 \n",
"\n",
" OWN_RESIDENCE_YES RFM_SCORE_1 RFM_SCORE_2 RFM_SCORE_3 RFM_SCORE_4 \\\n",
"0 0 0 0 0 1 \n",
"1 1 0 0 1 0 \n",
"2 1 0 0 1 0 \n",
"3 1 0 0 1 0 \n",
"4 0 0 0 1 0 \n",
"\n",
" OWN_CAR_NO OWN_CAR_YES SHIP_INTERNATIONAL_NO SHIP_INTERNATIONAL_YES \n",
"0 1 0 1 0 \n",
"1 0 1 0 1 \n",
"2 0 1 0 1 \n",
"3 1 0 1 0 \n",
"4 0 1 0 1 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"categoricalColumns = ['CREDIT_HISTORY', 'TRANSACTION_CATEGORY', 'ACCOUNT_TYPE', 'ACCOUNT_AGE',\n",
" 'STATE', 'IS_URBAN', 'IS_STATE_BORDER', 'HAS_CO_APPLICANT', 'HAS_GUARANTOR',\n",
" 'OWN_REAL_ESTATE', 'OTHER_INSTALMENT_PLAN',\n",
" 'OWN_RESIDENCE', 'RFM_SCORE', 'OWN_CAR', 'SHIP_INTERNATIONAL']\n",
"cust_pd_X = pd.get_dummies(cust_pd_X, columns=categoricalColumns)\n",
"\n",
"cust_pd_X.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Normalize Features"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" EMI_TENURE | \n",
" TRANSACTION_AMOUNT | \n",
" NUMBER_CREDITS | \n",
" CREDIT_HISTORY_ALL CREDITS PAID BACK | \n",
" CREDIT_HISTORY_CRITICAL ACCOUNT | \n",
" CREDIT_HISTORY_DELAY IN PAST | \n",
" CREDIT_HISTORY_EXISTING CREDITS PAID BACK | \n",
" CREDIT_HISTORY_NONE TAKEN | \n",
" TRANSACTION_CATEGORY_EDUCATION | \n",
" TRANSACTION_CATEGORY_ELECTRONICS | \n",
" TRANSACTION_CATEGORY_FURNITURE | \n",
" TRANSACTION_CATEGORY_NEW CAR | \n",
" TRANSACTION_CATEGORY_OTHER | \n",
" TRANSACTION_CATEGORY_RETRAINING | \n",
" TRANSACTION_CATEGORY_USED CAR | \n",
" ACCOUNT_TYPE_100 to 500 K USD | \n",
" ACCOUNT_TYPE_500 to 1000 K USD | \n",
" ACCOUNT_TYPE_UNKNOWN/NONE | \n",
" ACCOUNT_TYPE_above 1000 K USD | \n",
" ACCOUNT_TYPE_up to 100 K USD | \n",
" ACCOUNT_AGE_1 to 4 YRS | \n",
" ACCOUNT_AGE_4 to 7 YRS | \n",
" ACCOUNT_AGE_TBD | \n",
" ACCOUNT_AGE_above 7 YRS | \n",
" ACCOUNT_AGE_up to 1 YR | \n",
" STATE_CT | \n",
" STATE_NJ | \n",
" STATE_NY | \n",
" STATE_PA | \n",
" IS_URBAN_NO | \n",
" IS_URBAN_YES | \n",
" IS_STATE_BORDER_NO | \n",
" IS_STATE_BORDER_YES | \n",
" HAS_CO_APPLICANT_NO | \n",
" HAS_CO_APPLICANT_YES | \n",
" HAS_GUARANTOR_NO | \n",
" HAS_GUARANTOR_YES | \n",
" OWN_REAL_ESTATE_NO | \n",
" OWN_REAL_ESTATE_YES | \n",
" OTHER_INSTALMENT_PLAN_NO | \n",
" OTHER_INSTALMENT_PLAN_YES | \n",
" OWN_RESIDENCE_NO | \n",
" OWN_RESIDENCE_YES | \n",
" RFM_SCORE_1 | \n",
" RFM_SCORE_2 | \n",
" RFM_SCORE_3 | \n",
" RFM_SCORE_4 | \n",
" OWN_CAR_NO | \n",
" OWN_CAR_YES | \n",
" SHIP_INTERNATIONAL_NO | \n",
" SHIP_INTERNATIONAL_YES | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.027542 | \n",
" 0.033533 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.0 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.0 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
" 0.062595 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.041751 | \n",
" 0.037277 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.061398 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.0 | \n",
" 0.061398 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.061398 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.061398 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
" 0.000000 | \n",
" 0.061398 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.029971 | \n",
" 0.033449 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.062439 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.062439 | \n",
" 0.062439 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.062439 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.062439 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.062439 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
" 0.000000 | \n",
" 0.062439 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.041659 | \n",
" 0.039384 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.061264 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
" 0.061264 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.037350 | \n",
" 0.028902 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.062250 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.0 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
" 0.000000 | \n",
" 0.062250 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" EMI_TENURE TRANSACTION_AMOUNT NUMBER_CREDITS \\\n",
"0 0.027542 0.033533 0.0 \n",
"1 0.041751 0.037277 0.0 \n",
"2 0.029971 0.033449 0.0 \n",
"3 0.041659 0.039384 0.0 \n",
"4 0.037350 0.028902 0.0 \n",
"\n",
" CREDIT_HISTORY_ALL CREDITS PAID BACK CREDIT_HISTORY_CRITICAL ACCOUNT \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
"\n",
" CREDIT_HISTORY_DELAY IN PAST CREDIT_HISTORY_EXISTING CREDITS PAID BACK \\\n",
"0 0.000000 0.062595 \n",
"1 0.000000 0.061398 \n",
"2 0.000000 0.062439 \n",
"3 0.061264 0.000000 \n",
"4 0.062250 0.000000 \n",
"\n",
" CREDIT_HISTORY_NONE TAKEN TRANSACTION_CATEGORY_EDUCATION \\\n",
"0 0.0 0.062595 \n",
"1 0.0 0.000000 \n",
"2 0.0 0.000000 \n",
"3 0.0 0.000000 \n",
"4 0.0 0.000000 \n",
"\n",
" TRANSACTION_CATEGORY_ELECTRONICS TRANSACTION_CATEGORY_FURNITURE \\\n",
"0 0.000000 0.000000 \n",
"1 0.061398 0.000000 \n",
"2 0.000000 0.062439 \n",
"3 0.000000 0.061264 \n",
"4 0.000000 0.062250 \n",
"\n",
" TRANSACTION_CATEGORY_NEW CAR TRANSACTION_CATEGORY_OTHER \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
"\n",
" TRANSACTION_CATEGORY_RETRAINING TRANSACTION_CATEGORY_USED CAR \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
"\n",
" ACCOUNT_TYPE_100 to 500 K USD ACCOUNT_TYPE_500 to 1000 K USD \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
"\n",
" ACCOUNT_TYPE_UNKNOWN/NONE ACCOUNT_TYPE_above 1000 K USD \\\n",
"0 0.062595 0.000000 \n",
"1 0.000000 0.061398 \n",
"2 0.000000 0.062439 \n",
"3 0.000000 0.061264 \n",
"4 0.000000 0.062250 \n",
"\n",
" ACCOUNT_TYPE_up to 100 K USD ACCOUNT_AGE_1 to 4 YRS \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
"\n",
" ACCOUNT_AGE_4 to 7 YRS ACCOUNT_AGE_TBD ACCOUNT_AGE_above 7 YRS \\\n",
"0 0.000000 0.0 0.062595 \n",
"1 0.061398 0.0 0.000000 \n",
"2 0.062439 0.0 0.000000 \n",
"3 0.000000 0.0 0.000000 \n",
"4 0.000000 0.0 0.000000 \n",
"\n",
" ACCOUNT_AGE_up to 1 YR STATE_CT STATE_NJ STATE_NY STATE_PA \\\n",
"0 0.000000 0.062595 0.0 0.0 0.000000 \n",
"1 0.000000 0.061398 0.0 0.0 0.000000 \n",
"2 0.000000 0.000000 0.0 0.0 0.062439 \n",
"3 0.061264 0.000000 0.0 0.0 0.061264 \n",
"4 0.062250 0.062250 0.0 0.0 0.000000 \n",
"\n",
" IS_URBAN_NO IS_URBAN_YES IS_STATE_BORDER_NO IS_STATE_BORDER_YES \\\n",
"0 0.062595 0.000000 0.000000 0.062595 \n",
"1 0.000000 0.061398 0.000000 0.061398 \n",
"2 0.062439 0.000000 0.062439 0.000000 \n",
"3 0.000000 0.061264 0.061264 0.000000 \n",
"4 0.062250 0.000000 0.000000 0.062250 \n",
"\n",
" HAS_CO_APPLICANT_NO HAS_CO_APPLICANT_YES HAS_GUARANTOR_NO \\\n",
"0 0.0 0.062595 0.000000 \n",
"1 0.0 0.061398 0.000000 \n",
"2 0.0 0.062439 0.000000 \n",
"3 0.0 0.061264 0.061264 \n",
"4 0.0 0.062250 0.000000 \n",
"\n",
" HAS_GUARANTOR_YES OWN_REAL_ESTATE_NO OWN_REAL_ESTATE_YES \\\n",
"0 0.062595 0.062595 0.000000 \n",
"1 0.061398 0.061398 0.000000 \n",
"2 0.062439 0.000000 0.062439 \n",
"3 0.000000 0.061264 0.000000 \n",
"4 0.062250 0.000000 0.062250 \n",
"\n",
" OTHER_INSTALMENT_PLAN_NO OTHER_INSTALMENT_PLAN_YES OWN_RESIDENCE_NO \\\n",
"0 0.000000 0.062595 0.062595 \n",
"1 0.061398 0.000000 0.000000 \n",
"2 0.062439 0.000000 0.000000 \n",
"3 0.061264 0.000000 0.000000 \n",
"4 0.000000 0.062250 0.062250 \n",
"\n",
" OWN_RESIDENCE_YES RFM_SCORE_1 RFM_SCORE_2 RFM_SCORE_3 RFM_SCORE_4 \\\n",
"0 0.000000 0.0 0.0 0.000000 0.062595 \n",
"1 0.061398 0.0 0.0 0.061398 0.000000 \n",
"2 0.062439 0.0 0.0 0.062439 0.000000 \n",
"3 0.061264 0.0 0.0 0.061264 0.000000 \n",
"4 0.000000 0.0 0.0 0.062250 0.000000 \n",
"\n",
" OWN_CAR_NO OWN_CAR_YES SHIP_INTERNATIONAL_NO SHIP_INTERNATIONAL_YES \n",
"0 0.062595 0.000000 0.062595 0.000000 \n",
"1 0.000000 0.061398 0.000000 0.061398 \n",
"2 0.000000 0.062439 0.000000 0.062439 \n",
"3 0.061264 0.000000 0.061264 0.000000 \n",
"4 0.000000 0.062250 0.000000 0.062250 "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"min_max_scaler = MinMaxScaler()\n",
"features = min_max_scaler.fit_transform(cust_pd_X)\n",
"features = normalize(features, axis=1, norm='l1')\n",
"\n",
"cust_pd_X = pd.DataFrame(features,columns=cust_pd_X.columns)\n",
"cust_pd_X.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Split Train and Test Dataset"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train.shape= (7000000, 51) Y_train.shape= (7000000, 1)\n",
"X_test.shape= (3000000, 51) Y_test.shape= (3000000, 1)\n"
]
}
],
"source": [
"label = cust_pd_Y.values\n",
"features = cust_pd_X.values\n",
"\n",
"label = np.reshape(label,(-1,1))\n",
"X_train,X_test,y_train,y_test = \\\n",
" train_test_split(features, label, test_size=0.3, random_state=42, stratify=label)\n",
"print('X_train.shape=', X_train.shape, 'Y_train.shape=', y_train.shape)\n",
"print('X_test.shape=', X_test.shape, 'Y_test.shape=', y_test.shape)\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Snapml Training"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# Import the LogisticRegression from snap.ml\n",
"from snap_ml import LogisticRegression\n",
"snapml_lr = LogisticRegression(use_gpu=True, device_ids=[0,1], \n",
" max_iter=10, num_threads=1024)\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[snap.ml] Training time (s): 3.23\n"
]
}
],
"source": [
"# Training\n",
"t0 = time.time()\n",
"snapml_lr.fit(X_train, y_train)\n",
"print(\"[snap.ml] Training time (s): {0:.2f}\".format(time.time()-t0))"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"snap ml accuracy score = 0.9598356666666666\n"
]
}
],
"source": [
"# Evaluate accuracy on test set\n",
"snapml_prediction = snapml_lr.predict(X_test)\n",
"print('snap ml accuracy score = ', accuracy_score(y_test, snapml_prediction))\n",
"\n",
"# proba_test = snapml_lr.predict_proba(X_test)\n",
"# from sklearn.metrics import log_loss\n",
"# logloss_snap = log_loss(y_test, proba_test)\n",
"# print(\"[snap.ml] Logarithmic loss: {0:.4f}\".format(logloss_snap))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# sklearn Train"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# Import\n",
"from sklearn.linear_model import LogisticRegression\n",
"sklearn_lr = LogisticRegression(verbose=1)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[LibLinear][sklearn] Training time (s): 99.79876\n"
]
}
],
"source": [
"# TRAIN\n",
"t0 = time.time()\n",
"sklearn_lr.fit(X_train, y_train)\n",
"print(\"[sklearn] Training time (s): {0:.5f}\".format(time.time()-t0))\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sklearn ml accuracy score = 0.9598296666666667\n"
]
}
],
"source": [
"# Evaluate log-loss on test set\n",
"# proba_test = sklearn_lr.predict_proba(X_test)\n",
"# from sklearn.metrics import log_loss\n",
"# logloss_sklearn = log_loss(y_test, proba_test)\n",
"# print(\"[sklearn] Logarithmic loss: {0:.4f}\".format(logloss_sklearn))\n",
"sklearn_prediction = sklearn_lr.predict(X_test)\n",
"print('sklearn ml accuracy score = ', accuracy_score(y_test, sklearn_prediction))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"© Copyright IBM Corporation 2018, 2020"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.15"
}
},
"nbformat": 4,
"nbformat_minor": 2
}