Supply Chain Analytics: Construction Company Internship

Find a detailed breakdown of my summer experience with a multinational construction tool company. From database management to machine learning all the way to sales analytics dashboards, find out about the skills I put to practice in this large-scale consulting project.

ANALYTICS

8/9/20249 min read

My summer internship kicked off slow. The first two weeks were dedicated to meeting people at the company - from summer interns to the North America CEO, as I awaited the required permissions from I.T. Naturally, a construction company such size, operating in over 120 countries has numerous hurdles for the simplest of I.T. requests. My manager and I took advantage of this idle time to thoroughly discuss how my skills in business analytics aligned with the business problem. This multinational company had been struggling to quantify the financial impact of returned items and implement a process to mitigate losses. Therefore, I took on the challenge to find feasible solutions to their ongoing issue.

Before diving into the data, I made sure to grasp a solid understanding of the business context and the company strategy. Any potential solutions had to align with the strategy or they would be rendered useless. I condensed the business strategy to three main points:

The business aims to continuously improve processes
The business aims to continuously improve the customer experience
The business aims to improve productivity to stay competitive

I quickly understood that capturing the financial impact of returned tools was imperative and that I would have to tie this in with the aforementioned strategy. After discussing ideas with others in the operations team, it occurred to me that I should model the total costs by incorporating transportation, warehousing, and overhead costs as well as unrealized revenue from the returned material.

First, in order to get a better understanding of the company's transportation costs I asked for access to Parcel BI, a reporting tool that shows relevant financial data for every parcel shipped. Here I found interesting metrics such as average cost per pound for every state, total number of shipments, cost per carrier and others. I collected this data over the past 9 months only.

To model warehousing costs, I used an in-house PowerBI dashboard where I observed similar metrics. I found average warehousing cost per line for Canada and the U.S. I assumed that this level of granularity would be sufficient for my analysis given time constraints. Here is a potential area for improvement in my model. I could have done a more granular analysis, by state or even county. However, I only had 6 weeks left to put together a presentation at this point so I decided to keep it more simple.

For administrative and overhead costs there was already a calculation in place. This calculation accounted for average sales representative time spent on the phone or email filing a return weighed by the cost of labor. Similarly, other administrative costs were added to calculate the final number. I assumed this calculation was reliable although it was computed in 2020. This is also an area where I could have improved my model.

Finally, for unrealized sales revenue I pulled up data directly from SAP. The company had data for every single item returned along with the sales value returned. This was fairly easy to import into PowerBI from the data lake using SQL, although importing an output of 800,000+ lines occasionally caused the application to crash.

In this image we can observe how I connected all of the data in PowerBI. The model had the "main" sales revenue data frame at the center, using the STAR method for database management. I joined every other smaller data frame with the data previously described to the main one using one-to-many and one-to-one relationships.

After importing raw data, I cleaned and transformed it as required. For instance, I changed the date format to make it more clear. I made sure that financial data was properly represented in US dollars for both US and Canada using the current USD/CAD exchange rate to have a homogenous measure of comparison.

Only after cleaning and transforming raw data are we able to begin creating impactful visualizations.

First thing's first. What am I working with?

I began with exploratory data analysis, creating visuals that would:

Reinforce current theories about what items are returned and why
Destroy current theories about what items are returned and why
Surprise me with an unforeseen pattern

We can observe general revenue, profit and loss trends - they are all increasing over the past 9 months. There is no significant difference in the quantity of standard versus combo tools returned. Unrealized revenue for returned standard items are slightly higher when compared to combo items. Lastly, some items are returned disproportionately as an individual item, while others are more likely to be returned as a combo item.

This preliminary analysis only led me to MANY more questions: What is the proportion of standard versus combo items returned? What tools categories are more likely to be returned? And most importantly - what are the leading factors causing tools to be returned?

As a continuation of exploratory data analysis, I mapped the returned material quantities by state and included buttons to visualize Canada and the U.S. Again, this is only helpful to give us a better understanding of general patterns to create hypotheses that we will later test. We can observe top states with highest items returned and a list of returned materials sorted by total loss, which Is just a sum of all the costs. This way we can also analyze what cost categories weigh more for the top returned materials.

After rounding up several interesting hypotheses and suspicions in the exploratory data analysis phase, we must move to descriptive analytics. Here I began creating meaningful measures such as returned percentages in terms of quantities and in terms of sales revenue returned.

The benefit of an interactive dashboard is that we can zoom into areas of interest. By just clicking a point in a graph, a filter will be applied on the entire dashboard. For instance, in the top left graph we have a line for total loss and another line for returned percentage by business unit. The power tool category was immediately alarming because of a disproportionately high loss even though the returned percentage is quite average. After selecting this business unit, we can zoom in to see what sales channels and item groups account for most of the losses. Field sales are responsible for a staggering 77% of returns, while standard items account for 55% of them. We now have a strong reason to suspect that field sales are a particularly risky sales channel, although we currently do not understand why. Finally, we can see on the right side that 2 items have disproportionately high return ratios, leading us to investigate further.

Before we head into finding root causes for returns, I am curious as to what returned tools yield the highest operational expenses. Operational expenses, or OPEX in this context is the sum of administrative, transportation, and warehousing costs, which I believe the company has more control over. Additionally, I want to know what proportion of the top 15 materials (by total loss) was sold in a combo. Furthermore, I want to observe if there is any discrepancy between sales revenue returned and quantity returned when I select an item. Lastly, I wanted to visualize the number of customers who purchased that specific material over the last 9 months and in what states they were sold.

Clearly, the objective of this dashboard is informative. Now we can move on to diagnostic analytics.

Drilling down on the Power Tools business unit, we can continue our investigation. This dashboard shows us new metrics, including the total loss, operational expense (OPEX), and customer trends. I want to clarify that the total loss is the sum of operational expenditure (OPEX) and unrealized sales revenue. Furthermore, we have a card on the very bottom showing the percentage of unrealized sales revenue coming from the selected business unit. Power Tools make up just over 50% of the returned revenue!

However, the most interesting finding is on the top-right donut chart. We finally get to see what ratio of materials get returned from our customers. Surprisingly, the customer service channel (placing an order by phone) has the highest returns per item sold. In-store purchases are a distant second, and field return ratios are under average! What an interesting finding. Here we have started using diagnostic analytics in an attempt to identify root causes.

Here is another dashboard where we have two main insights. The first insight is the Pareto distribution. It shows us that the top 15 states with highest losses account for almost 70% of the entire losses! The actionable insight? Focus on reducing returns in these states for the highest impact.

Secondly, we created a couple of measures that capture the average quantity of items per return and the average cost per return. How is this helpful? We need to capture the financial impact to charge customers when they request a return!

Some dashboards like this one are more useful on the field. Account Managers can quickly pull this one up to determine if a material they are selling has a high probability of being returned. This aids decision-making where it is most important.

Finally, we have a time series displaying how average cost per return has changed over this 9-month period. And the insight is crystal clear: if we do not take any critical measures including implementing a charging process, and reducing oversold material, the company will incur in higher losses over time!

Note: CCN is just another term for a returned order.

METHOD 1: CLUSTERING

To identify how to reduce oversold material, we will move on to predictive analytics. I started with an unsupervised machine learning technique, namely, k-means clustering to better understand the shared characteristics among low and high-return customer groups. Using the elbow method for the optimal number of clusters, we came to the conclusion that there are 3 clearly distinct groups in our data. Based on the average total losses from each cluster I was able to label these 3 groups as: Low-losses, Mid-losses, and High-losses.

The main findings from this method are the following:

High-loss customers have a higher chance of being Large size
High-loss customers have a higher chance of being Mega size
High-loss customers have a higher chance of purchasing on the Field
Highest losses are driven by volume and not average return cost

METHOD 2: NEURAL NETWORK

To validate our findings from the first method, we will attempt to build a neural network model to classify customers as low and high-loss. This is a supervised machine learning method to back up our findings now. You can observe a snippet of the code above where I created the neuron, added layers, fit the model, and made predictions after splitting my data into test and training sets. After hours of fine-tuning, I was able to achieve 95% model accuracy!

For the predicted group labels I computed averages for the most relevant data points and organized them into a table. These are the most notorious findings:

High-loss customers have a higher chance of purchasing Standard items
High-loss customers have a higher chance of purchasing items on the Field
High-loss customers have a higher chance of being Large size

Notice a trend??

After discovering new insights and substantiating some of our initial hypotheses, we came to a deeper understanding of the factors driving financial losses due to returned materials. Now we must align these findings with the business strategy to prescribe feasible solutions. This fundamental step is part of the prescriptive analytics stage. Often times, this final step is an iterative process where outcomes are measured continuously to evaluate improvements.

To summarize my recommendations above, company data backs up the following recommendations:

Assess Productivity Analyses for high-loss (high-risk) customers. Those are customers that share these characteristics: Large size, Standard purchases, Field purchases.
Continue using Predictive Analytics to better understand the characteristics of high-loss customers. Perform A/B testing to quantify the financial impact of special pricing and promotions in this group.
As part of a Kaizen strategy, implement a charging process that aligns with model findings. That is charge a restocking fee that at least covers operational expenses (shipping, warehousing, and overhead costs).
Audit Productivity Analyses and Sales at End of Quarter to cut back on overselling tactics.

Although unfortunately I was not able to see this through, I created impactful recommendations derived from unstructured data. This is another day in the office for a Business Analyst.