In the previous blog of the series (link), I had suggested a framework to identify the first data science use-case. I hope the framework was of some help to you. I assume you now have a use-case and you are ready to embark on the advanced analytics journey.
Global surveys on advanced analytics adoption by enterprises suggest that merely 13% of all the data science projects make it to production. A look at the few success stories makes it evident that the team of data scientists worked directly with the business stakeholders. On the other hand, the data science teams that worked in silos were unable to achieve the desired outcome. In several other cases, the data scientists felt that they were not trusted enough by the business and supervisors.
The findings were not surprising at all. The isolated operation and lack of trust are the two key reasons for the failure of most of data science projects. It is a common practice to treat data science as a supporting function that is expected work in isolation. Usually, businesses come up with a problem statement, provide a dataset and they leave the data scientists to wrangle with that and come up with results. That approach, however, is a recipe for failure which is the outcome often.
Following are the ways in which business stakeholders can add value throughout a data science implementation:
- Identifying the dependent variable: The importance of the dependent variable in any data science implementation cannot be emphasized enough and it takes considerable amount of discussion and brainstorming to identify that. Let’s consider the example of predicting churn propensity. The definition of churn can be different for different businesses. For a retailer, churn can be when a customer has not shopped with them for a month or 90 days. For a subscription-based business, it can be as soon as the customer has cancelled subscription. Even a minor difference in the definition may cause a significant change in the outcome. Only SMEs with considerable experience would be able to help with a valid definition of churn that aligns well with the problem at hand. Thus, it becomes imperative to involve SMEs in that decision.
- Identifying the right features/data to be captured: Mostly, data scientists are provided with whatever data is being recorded at any point in time and are asked to work with that. In most cases, the team suggests the best possible model with the data at hand and most of them fail to productionize it. SMEs and domain experts should be consulted to check whether all the required columns are in the dataset or not before the critical task of model-building is started. It is possible that a critical independent variable is missing from the training dataset or is being captured altogether.
- Handling missing values in the datasets : Business users or SMEs are always best placed to help you with handling the missing values in the datasets. While standard practices may dictate that missing values in a numerical column are replaced by the median, average, or mode of the given column but that may not hold true in many cases and doing so can impact the accuracy of the model.
- Validating the solution: Validating any data science solution will have 2 parts to it i.e. technical validation and business validation. While there are several metrics and figures such as accuracy, precision, recall, AUC, etc. for the technical validation, business validation should be the prerogative of SMEs only.
While I have listed only four instances here, there can be many other ways in which constant communication with business SMEs can help a data science implementation succeed. It is for you as a data leader to develop a culture where data science is not kept in isolation. Rather, it should be integrated with business functions and the data science and business teams should work together to derive the best return on your data investments.