Credit risk assessment is the foundational function of lending. Every loan decision — whether to extend credit, at what terms, with what limits — rests on an assessment of the borrower's probability of repayment. Get this assessment systematically wrong in one direction and the lender accumulates non-performing assets. Get it wrong in the other direction and the lender denies credit to viable borrowers, losing revenue and restricting financial access.
For decades, this assessment has relied on credit scorecards: logistic regression models that combine a small number of variables — repayment history, outstanding debt, credit utilisation, account age, recent inquiries — into a numerical score. The FICO model, introduced in 1989, established the template. India's CIBIL score follows a similar methodology. These models are transparent, stable, and well-understood by regulators. They are also products of their era: designed when data was scarce, computing was expensive, and the credit market served a relatively homogeneous population.
India's credit market has changed fundamentally. The JAM trinity (Jan Dhan accounts, Aadhaar, Mobile) brought hundreds of millions of previously unbanked individuals into the formal financial system. Digital lending platforms process applications in minutes rather than weeks. The volume of available data — transaction records, mobile usage patterns, GST filings, e-commerce activity — dwarfs what was available when scorecard models were designed. The credit market these models serve looks nothing like the market they were built for.
The Limitations of Scorecard Models
Scorecard models have specific, well-documented limitations that become more consequential as the credit market evolves. The first limitation is feature rigidity. Logistic regression models use a small number of hand-engineered features, typically 10-20 variables. Each variable is binned, assigned a weight, and combined linearly. This design choice — made for computational tractability in the 1980s — means the model cannot capture complex interactions between variables. A borrower's creditworthiness is not a linear combination of independent factors. Income stability, employment type, spending patterns, and savings behaviour interact in ways that linear models cannot represent.
The second limitation is population bias. Scorecard models are trained on historical lending decisions and outcomes. If historical lending practices excluded certain demographic groups, the model inherits that exclusion. A borrower from a population segment with limited credit history receives a lower score — not because they are a poor credit risk, but because the model has insufficient data to assess them. This creates a circular problem: denied credit, the borrower cannot build credit history, perpetuating the data gap.
The third limitation is temporal rigidity. Scorecard models are calibrated on historical data and updated periodically — typically annually or less frequently. Between updates, the model's assumptions about the relationship between variables and default probability remain fixed. Economic conditions, consumer behaviour patterns, and market dynamics change continuously. A model calibrated during a growth period may misjudge risk during a contraction, and vice versa.
The fourth limitation is the inability to distinguish correlation from causation. A scorecard model identifies that borrowers who have more than three credit inquiries in six months default at higher rates. It treats this as a risk signal. But the causal structure may be more complex: borrowers facing financial stress apply for multiple credit lines simultaneously. The inquiries are a symptom, not a cause. A model that treats symptoms as causes misallocates risk weights and can be gamed — a borrower who understands the model can avoid triggering certain signals without actually reducing their default risk.
Decision Intelligence: Causal Models for Credit Assessment
Decision intelligence addresses these limitations by applying causal modeling, machine learning, and scenario simulation to credit risk assessment. The approach differs from simply replacing logistic regression with a more complex model (gradient boosting, neural networks). Complexity alone improves prediction but does not address the fundamental limitations of correlational reasoning.
Causal credit models construct a directed acyclic graph (DAG) that represents the hypothesised causal relationships between borrower characteristics, economic conditions, and repayment outcomes. Income stability causes repayment capacity. Employment sector determines income volatility. Macroeconomic conditions affect sector-specific employment. The model distinguishes between variables that directly affect repayment (income, expenses, existing obligations) and variables that are correlated with repayment through mediating factors (postcode, phone model, browsing history).
This distinction matters for two reasons. First, causal models are more stable across population shifts and economic cycles because they model the mechanism of default rather than its surface-level correlates. When economic conditions change, the model adjusts through the causal pathway (recession affects sector employment, which affects income stability, which affects repayment capacity) rather than requiring recalibration of correlational weights. Second, causal models produce explanations that satisfy regulatory requirements for decision transparency — they can articulate why a borrower was assessed as high risk in terms of specific causal factors, not just statistical associations.
Alternative Data and Financial Inclusion
India's financial inclusion challenge makes alternative data sources particularly relevant. An estimated 400-500 million adults have thin or no credit files — insufficient formal credit history for traditional scorecard assessment. These individuals are not necessarily poor credit risks. Many have stable income (from informal employment, agriculture, or small business), consistent bill payment patterns, and responsible financial behaviour that simply is not captured in credit bureau records.
Alternative data sources can fill this gap. Bank account transaction data — inflows, outflows, balance patterns, transaction regularity — provides direct evidence of cash flow behaviour. Mobile phone usage patterns — prepaid recharge frequency, data consumption stability, communication patterns — correlate with economic stability (research from multiple markets validates this relationship). GST filing records for small businesses provide revenue and compliance data. Utility payment history demonstrates regular financial commitment.
The challenge with alternative data is responsible use. Not all correlations are appropriate for credit decisions. Phone brand preference may correlate with income, but using it risks encoding demographic bias. Social media activity may contain predictive signals, but raises privacy concerns and regulatory risk. The Predictive Analytics Platform architecture supports the governance frameworks required for responsible alternative data use — tracking which features are used, monitoring for discriminatory impact, and maintaining audit trails for regulatory review.
The Reserve Bank of India (RBI) has progressively expanded the scope of permissible data for credit assessment. The Account Aggregator framework enables consent-based sharing of financial data across institutions. The Digital Personal Data Protection Act establishes the privacy framework within which alternative data must be processed. These regulatory developments create both the opportunity and the guardrails for AI-based credit assessment using non-traditional data.
Model Explainability and Regulatory Compliance
Credit decisions are among the most heavily regulated applications of AI. The RBI requires lenders to provide reasons for credit denials. The Fair Practices Code mandates transparency in lending terms. Basel III and IV frameworks require banks to validate their credit risk models and demonstrate they perform within specified parameters. Any AI model used for credit decisions must satisfy these requirements.
The explainability challenge is not trivial. Complex machine learning models — ensemble methods, deep learning — achieve higher predictive accuracy than logistic regression but produce predictions that are difficult to decompose into individual factor contributions. A gradient boosting model might correctly predict that a borrower has a 12% probability of default, but articulating which specific factors drove that prediction — in language that satisfies a regulator and that a borrower can understand — requires additional interpretability infrastructure.
Several approaches address this. SHAP (SHapley Additive exPlanations) values decompose model predictions into individual feature contributions, providing local explanations for each decision. Counterfactual explanations identify the smallest change in input features that would alter the decision — telling a denied applicant specifically what would need to change for approval. Causal explanations, derived from the DAG structure, articulate the reasoning pathway rather than just the statistical contribution.
For financial services organisations in India, the practical requirement is a layered explainability system: technical explanations for model validation teams and regulators, business explanations for credit officers and risk managers, and consumer-facing explanations for applicants. Each audience requires different levels of detail and different framing, but all must be consistent and derived from the same underlying model.
India's Credit Market Opportunity
India's credit-to-GDP ratio — approximately 55% — is substantially below the global average of 130% and well below China's 180%. This gap represents both the scale of the untapped credit market and the urgency of developing assessment methods that can serve it responsibly. The traditional approach — requiring formal credit history as a prerequisite for credit access — cannot close this gap because the populations most excluded from credit are precisely those without formal credit histories.
Decision intelligence for credit risk offers a path to expand credit access while maintaining risk discipline. Causal models that incorporate alternative data can assess borrowers who are invisible to traditional scorecards. Dynamic models that adapt to changing economic conditions can maintain accuracy through market cycles. Explainable models that satisfy regulatory requirements can be deployed at scale without regulatory friction.
The practical deployment path starts with the existing credit portfolio. Apply decision intelligence models alongside traditional scorecards to the same applicant pool. Measure whether the new model identifies creditworthy borrowers that the scorecard rejects (false negatives) and risky borrowers that the scorecard approves (false positives). Champion-challenger testing over 12-18 months builds the evidence base for regulatory approval and internal confidence.
For the new-to-credit population, deploy alternative-data models with conservative initial credit limits and close monitoring. As repayment data accumulates, the model calibrates and credit limits expand. This crawl-walk-run approach manages risk while building the data assets that improve model accuracy over time.
The financial services organisations that invest in decision intelligence for credit risk now are not just improving their underwriting accuracy. They are building the capability to serve India's next 300 million credit consumers — a market opportunity measured in trillions of rupees that traditional scorecards simply cannot reach. The Decision Intelligence Engine provides the causal modeling, alternative data integration, and regulatory-grade explainability infrastructure required for this transition.
Sources
Rahul Verma
Chief Technology Analyst
Building production AI systems for enterprise and government organizations.
