Key highlights
Enhance and augment additional parameters: parameters such as IP address, etc in the overall risk scoring. Identified additional 60 + parameters
ML-Based Risk Scoring: Real-time fraud probability scoring using a model trained on holistic claim features.
Feature Importance Analytics: Enhances reviewer decision-making.
Ethical AI: No demographic features used, ensuring fairness.
Scalable Architecture: Handles 15K claims/day, scalable to 100K.
MLOps Integration: Drift monitoring and model refresh alerts
Overview
Client processes over 350K lifestyle claims monthly in the Connected Living segment. Their existing system flags only 6.2% of claims as risky, leaving a large volume of potentially fraudulent claims undetected.
Challenges
1.
Low Detection Rate: 94% of claims go unflagged, risking undetected fraud.
2.
High False Positives: 74% of flagged claims are actually non-suspicious, burdening the triage team.
3.
Manual Review Overload: Excessive scrutiny of good customers, impacting CSAT and operational efficiency.
Solution
1.
Deep business understanding to identify and shortlist additional parameters
2.
Real-time ML fraud scoring enables instant claim triage and decision-making.
3.
Explainable AI highlights key fraud indicators, guiding reviewer actions.
4.
Scalable Azure architecture processes 15K claims/day, expandable to 100K/day using Cosmos DB, MLFlow, and serverless apps
Databricks solution enablers
Feature Engineering & Model Training: Databricks notebooks provided an interactive and collaborative space for developing the model training pipeline, including feature preparation, data enrichment, and hyperparameter tuning.
Collaborative Environment: The platform’s integrated workspace allowed data scientists and engineers to work together seamlessly, accelerating experimentation and deployment.
Performance & Reliability: With Databricks’ distributed computing and Delta Lake capabilities, we ensured high data quality and faster processing—critical for real-time fraud detection.
Data Engineering at Scale: Databricks offers a powerful, cloud-native data engineering environment that enables ingestion, transformation, and orchestration of massive datasets. Using Apache Spark under the hood, we efficiently processed structured and unstructured data from the data lake.
Created Delta Lake tables and schema enforcement, ensuring data reliability and consistency.
Leveraged Databricks Workflows for automated ETL pipelines and scheduling, reducing manual overhead.
Enabled incremental data processing and optimized storage for faster query performance.
Impact
1.
Precision Improved from 12% to 57%, Recall from 55% to 61%.
2.
False positives reduced up to 90%.
3.
Review load cut by 50%, freeing triage team capacity.
4.
CSAT Boosted by minimizing scrutiny of good actors