The hype or the buzz around Data Science and Machine Learning is increasing continuously. Few early adopters demonstrated how these technologies can enhance user experience and lead to business growth. Uber has been using machine learning for predicting destination or determining an optimal drop-off/pick-up point. Similarly, Amazon has been increasing its revenue through its recommendation engine. While the pioneers have shown the world how useful data science & machine learning can be, several others are struggling to adopt them. I have come across several examples of such use cases during my discussion with clients. Their struggle can be summed up in to statements like:
‘We would like to use AI/ML to solve our problems in this space’
‘We have kept a budget earmarked for adopting advanced analytics this year’
They have the awareness, intent, and budget for such initiatives, but they do not have a ‘use case’ to start with.
Now, is finding an appropriate use case such a big deal?
Apparently, it is! While machine learning is a powerful capability, you must be very careful while choosing the pilot project. Here is my attempt at listing some simple steps which could help you identify the pilot use case:
- The use case should be implementable as a simple proof-of-concept and it should help the team do its job better by solving one of its problems or at least assist it in doing so. How do you find such a problem? … Ask your team!
Ask them to list all the problems they face during their day-to-day work like frequent stock-outs, delayed shipments, unused inventories, etc.
- Next, ask them if there is an unknown that is related to each of these problems. A value, a status, or a sequence of steps which if they come to know beforehand, would help ease that problem. Daily demand of an item for each day of the next week (stock-outs) is an example. There could be many similar unknowns for different industries and departments.
Drop all the problems from the initial list for which an unknown could not be identified.
3. On the curtailed list of problems and unknowns, ask them to list the data items and figures they would depend upon if they were to make an educated guess about each unknown. Say, if the human resources team was to guess whether an employee is going to leave or not, what data items or indicator would they use to guess that. Ask the team to list as many indicators as they can for each of the problem and unknown combinations. Let there be deliberation on the list of indicators before they are finalized.
Once a laundry list of such indicators is ready, verify if the history of those indicators is maintained in your database/data warehouse. Retain only those indicators for which historical data is maintained.
Keep only the top 10 problems by number of indicators with historical data.
4. You have 10 use cases now. Therefore, we can use the following parameters to assess them:
a. On a scale of 1 to 10, how beneficial would a successful implementation of the PoC be to a business?
b.The volume of historical data available for the indicators or data items will be identified in step 3. What is the volume of such data for each use cases? Volume should be judged based only on the number of records. You can assign a score to each use case based on the following reference table:
|History data volume (No. of rows)||Score (on a scale of 10)|
|20000<= rows <=50000||3|
c. The ease with which the impact of a successful implementation can be quantified and measured. Let us call it the ease of quantification (EQ) index. The EQ index should again be on a scale of 1 to 10.
5. The 3 key parameters should have the following weightage:
|History data volume||0.3|
6. List all the use cases and rate them on the key parameters. Once the rating is complete, calculate the final scores using the weightages listed in step 5
|Use Case||Business Benefits (1-10)||History Data Volume (1-10)||EQ Index (1-10)||Weighted Score|
|Use case 1||9||5||7||7.2|
|Use case 2||7||3||10||6.7|
Once all the cases have been listed and scored, keep the top 3 use cases and involve the data scientists or the IT team to understand the implementation effort and time required for each of them. Choose the one that fits your budget and timeline.
This is just an attempt to simplify a complicated process and there can be several other ways to approach the given situation. While I have tried to cover all the key aspects and parameters, some may have been overlooked. If you think this process can be improved further, please share your suggestions.
If you are a data leader and can relate to the scenario described in the blog above, we can help you with the first and the most important step of your data science journey.