Analyze e-commerce transactional data to identify high-value customers, detect trends, and optimize targeted marketing through effective customer segmentation Using Power Bi dashboard and python.
Source: [Kaggle - E-commerce Dataset]
Size: (541909, 8)
Timeframe: 01/12/2010 to 09/12/2011
Key Variables:
InvoiceNo: Unique transaction ID
StockCode: Unique product identifier
Description: Product description
Quantity: Number of items purchased
InvoiceDate: Timestamp of purchase
UnitPrice: Price per unit in GBP
CustomerID: Unique customer identifier
Country: Location of the customer
1.Handled Missing Values - The percentage of missing values in the CustomerID column is 24.93%.
Since the analysis will revolve around investigating customers and clustering them into categories, the missing values in the CustomerIDs were removed.
2.Removed Duplicates - The number of duplicate rows in the dataset is 5525.
These rows were removed from the dataset.
3.Removed Cancelled Orders - There are 8872 rows for which the quantity is negative which can be either due to data-entry errors or return orders or cancelled orders.
If we look at the InvoiceNo for all these cases, they start with the letter ‘C’ which indicates they are cancelled orders. Thus these rows were removed from the dataset
4.Removing non-product Stock-Codes - There are certain StockCodes which do not belong to any products. All the rows containing such StockCodes were removed.

26% -- customers contribute to 80% of the revenue
21% -- products contribute to 80% of the revenue
What is RFM? Recency (R): Days since last purchase Frequency (F): Number of purchases Monetary (M): Total spending
Customers are segmented into five equal buckets based on Recency, Frequency, and Monetary values. Each customer is ranked for each metric, assigned a score from 1 to 5, and their scores are summed to derive an overall RFM score for analysis
customer segments based on rfm score
1- At-risk customers 2- High value customers 3- loyal customers 4- Dormant customers