Churn and Default

Introduction

For the final component of the research, we shift our attention to customer enrollment in current community solar farms. By analyzing tenure length and the presence of an individual departing community solar farms, our research seeks to provide quantitative performance data to add depth to the previously collected survey data.

Given the incipient nature of community solar, this analysis provides an opportunity to quantify the rates of churn or default while controlling for household or demographic attributes. The primary goal of the analysis is to first describe the characteristics of the residents, the prevalence of default and/or churn rates, and any statistically significant differences between groups in default or churn rates. Secondarily, the analysis will seek to determine if enrollment in the community solar farm is associated with any measurable change in credit scores.

Customers in existing community solar projects are evaluated for their subscription and payment status on a monthly basis. Participation in community solar farms is voluntary, hence customers may exit the program at any point, often subject to a cancellation fee set out in the contract. Additionally, customers sometimes fail to pay the monthly solar farm subscription fee. Churn refers to the act of a customer exiting the solar farm, whereas default refers to the customer failing to pay the monthly subscription fee.

Methods

Monthly account level data was collected from two community solar projects from January 2020 to April 2022. Account level information includes monthly payment performance, for which churn and default triggers are captured, along with kWh solar attribute to each account, a tag for which farm an account belongs to, and a tag for if payment method was credit card or direct deposit (ACH).

Monthly, account level performance data is appended using Experian data including Experian’s Income Insight Score and VantageScore, a proxy for FICO scores with the same range of 0,850, along with demographic data such as gender, education, occupation, and homeownership status.

For the primary analysis, a logit model is used to determine likelihood of churn and/or default, taking into account an account’s length of tenure and various demographic and socioeconomic data available.

The secondary analysis is performed to measure the difference in credit scores for customers, to measure if any difference of statistical importance is observed over the length of the enrollment in the farm. Credit scores for each customer are measured in December 2020 and April 2022, offering an approximate pre- and post credit score measurement. A difference of means test (Welch’s t-test) is employed to measure the differences.

Data Cleaning

Data from the solar farms consists of 32,384 monthly observations over 812 utility accounts and 620 unique users. Multiple accounts may be tied to a single user.

Data cleaning mainly involves removing observations for one farm over a particular time period. Data was collected from two sites, referred to as Farm A and Farm B for privacy. As a result of a utility billing issue wherein payment performance was not available starting in October 2021 and not recommencing until April 2022 at Farm A. Hence, the values for October, November and December 2021 are removed. This narrowed dataset consists of 31,703 observations; the number of utility accounts remained constant at 812 and the number of unique users stays constant at 620.

Generate Churn and Default Tags

For the accounts that left the solar farm at any point, we generate Churn and Default ‘triggers’. The table below shows the variety of reasons listed why a particular account left either solar farm during the observed period. Most left the solar farm as a result of moving from the service area or no longer interested in the solar farm.

Defaulted accounts are triggered when the leave reason is ‘Defaulted Payment’; every other leave reason is marked as churn, capturing a wider array of outcomes, from moving to health reasons. Table 1 shows how reasons listed for an account leaving a farm were mapped to Churn and Default triggers. Note, we categorize every default as a churn event as well for simplicity, however not every churn event is a default.

Table 1: Churn and Default Triggers

Table 1: Churn and Default Triggers, Grouped by Utility Accounts
Reason Observed Trigger
Moving out of service territory 39 Churn
No Longer Interested 35 Churn
Billing Complaint 12 Churn
Ineligible Meter 7 Churn
Defaulted Payment 6 Default
Duplicate 6 Churn
Churn 5 Churn
Got Rooftop 5 Churn
Health Issues 3 Churn

Filtering Payment Methods

For some accounts, both a credit card and direct deposit (ACH) payment methods were recorded for the same month. The duplicate values are identical, hence we are able to filter out these duplicate rows. To remove these entries, we first flip the dataframe by utility account ID, then isolate the duplicates and only take the payment method for direct deposit, however using credit card payment as the method would result in the same outcome, as the underlying performance or attribute data does not differ.

Append Experian Data

The data has been grouped by utility account IDs. As seen in Table 2, the information available as collected by Solstice includes:

  • utility account number
  • Participant ID
  • Tenure (in months) of account
  • payment method
  • leave_reason, as described above
  • kWh allotted to each utility account (note this figure is arrived at by dividing the raw kWh reported by 1,229: this is a project specific conversion rate)
  • Churn and Default dummy variables
  • solar farm identification

Table 2: Sample of Data Collected by Solstice

Table 2: Sample of Data Collected by Solstice
utility_acct_number ParticipantID tenure payment_method leave_reason kwh_solar Churn Default solar_farm
1262 311 22 card 0 3.609 0 0 Farm A
1264 386 22 card 0 7.585 0 0 Farm A
1265 430 22 card 0 6.178 0 0 Farm A
1267 473 22 ach 0 15.048 0 0 Farm A
1268 105 14 card Got Rooftop 20.473 1 0 Farm B
1269 554 20 card 0 8.747 0 0 Farm A
1270 495 22 card 0 3.762 0 0 Farm A
1271 509 21 ach 0 9.879 0 0 Farm A
1273 547 17 card Moving out of service territory 6.292 1 0 Farm B
1276 376 22 ach 0 6.239 0 0 Farm A
1277 209 22 ach 0 4.894 0 0 Farm A

To add additional data to the analysis, we contracted with Experian to provide additional demographic and socioeconomic data. This data was provided on the Participant level. Recall that multiple utility accounts may be held by a single participant. The data provided by Experian has a matching ‘ParticipantID’ tag, that corresponds to the Solstice data set.

Note: Not all Participants and Utility accounts are able to be appended, due to limited data availability from Experian. The ‘pin rate’ refers to the rate at which data is successfully collected for a record. For the 620 Participant_Ids, 506 received data, a pin rate of 81.6%. However, even successfully pinned records did not receive full data, or possibly received an ‘unknown’ response. These differences are observed in Tables 4 and 5.

Data from Experian was provided for two different time periods, December 2020 and April 2022. The utility of having differing time periods is to allow for the secondary analysis, wherein change in credit scores is calculated and a difference of means test is conducted to determine if tenure in the solar farms is associated with any measurable change in credit score. An average credit score variable is created and used in the primary analysis as well.

For the primary analysis, all other non-credit score Experian data is appended to the Solstice data set at the December 2020 time frame. The December 2020 data is chosen for two reasons: firstly, little change is observed or expected to be observed in demographic data such as homeownership status, education, marriage, and gender; hence there is little difference in terms of model construction between using one or the other source. Secondly, and more importantly, the data provided as of December 2020 is slightly more complete, meaning fewer data are missing for fewer participants.

Table 3 below shows a sample of the appended data. Note that VANTAGE_V4_SCORE.x refers to the VantageScore as of April 2022, and VANTAGE_V4_SCORE.y refers to VantageScore as of December 2020.

Table 3: Sample of Appended Data

Table 3: Sample of Appended Data
ParticipantID utility_acct_number tenure payment_method leave_reason kwh_solar Churn Default solar_farm VANTAGE_V4_SCORE.x VANTAGE_V4_SCORE.y Vantage_Diff INCOME_INSIGHT_SCORE Vantage_Avg GENDER MARRIAGE HOMEOWNER RENTER EDUCATION OCCUPATION
1 1193 23 ach 0 2.108 0 0 Farm B 817 799 18 186 808.0 Female Unknown Yes Unknown Completed College NA
2 1791 3 card No Longer Interested 5.570 1 0 Farm B 805 782 23 117 793.5 Female Unknown Yes Unknown High School or Some College NA
3 1010 21 card Got Rooftop 16.890 1 0 Farm B NA NA NA NA NA NA NA NA NA NA NA
4 1312 22 card 0 7.188 0 0 Farm A 794 811 -17 217 802.5 Male Unknown Yes Unknown Graduate Degree NA
5 1773 15 card Moving out of service territory 4.817 1 0 Farm B 805 805 0 111 805.0 Male Unknown Unknown Unknown Graduate Degree Healthcare/Education Services
5 1761 17 card Moving out of service territory 2.288 1 0 Farm B 805 805 0 111 805.0 Male Unknown Unknown Unknown Graduate Degree Healthcare/Education Services

Descriptive Statistics

Account level summary statistics by solar farm are provided in Table 4. Note, only 6 instances of default were observed. However, of the 812 accounts, 118 or just over 14.5% of accounts, experienced churn.

Table 4: Summary Statistics by Solar Farm

Table 4: Summary Statistics by Solar Farm Overall, N = 8121 Farm A, N = 4541 Farm B, N = 3581
Churn 118 (15%) 59 (13%) 59 (16%)
Default 6 (0.7%) 1 (0.2%) 5 (1.4%)
Average Tenure 22 (20, 23) 22 (20, 22) 23 (22, 24)
Average kWh 6.1 (3.8, 9.6) 5.9 (3.5, 9.0) 6.3 (4.2, 10.3)
Payment Method
ach 239 (30%) 125 (28%) 114 (32%)
card 565 (70%) 322 (72%) 243 (68%)
Gender
Female 182 (49%) 117 (51%) 65 (44%)
Male 193 (51%) 111 (49%) 82 (56%)
Marital Status
Married 228 (60%) 134 (58%) 94 (64%)
Single 33 (8.8%) 27 (12%) 6 (4.1%)
Unknown 116 (31%) 69 (30%) 47 (32%)
Occupation
Healthcare/Education Services 25 (37%) 16 (44%) 9 (28%)
Management/Technical 6 (8.8%) 0 (0%) 6 (19%)
Self Employed/Other 23 (34%) 14 (39%) 9 (28%)
Services 14 (21%) 6 (17%) 8 (25%)
Education
Completed College 99 (26%) 61 (27%) 38 (26%)
Graduate Degree 114 (30%) 63 (27%) 51 (35%)
High School or Some College 150 (40%) 95 (41%) 55 (37%)
Other 14 (3.7%) 11 (4.8%) 3 (2.0%)
Renter
Unknown 355 (94%) 216 (94%) 139 (95%)
Yes 22 (5.8%) 14 (6.1%) 8 (5.4%)
Homeowner
Unknown 104 (28%) 69 (30%) 35 (24%)
Yes 273 (72%) 161 (70%) 112 (76%)
INCOME_INSIGHT_SCORE 122 (96, 206) 114 (92, 184) 167 (102, 233)
Average VantageScore 804 (778, 819) 803 (774, 818) 806 (788, 820)
VantageScore April 2022 805 (778, 820) 804 (774, 819) 807 (789, 820)
VantageScore December 2020 806 (778, 823) 805 (777, 822) 806 (785, 823)
VantageScore Change 0 (-11, 13) 0 (-14, 12) 0 (-8, 14)
1 n (%); Median (IQR)

Table 5 provides descriptive statistics grouped by tenure length. Average tenure for the entire dataset is just under two years (22 months). Most observations of Churn occurred within the 6-12 month range, 81% of all observed churn.

Table 5: Summary Statistics by Tenure Length

Table 5: Summary Statistics by Tenure Length Overall, N = 7881 < 6 Months, N = 291 6-12 Months, N = 271 12-24 Months, N = 6111 Over 24 Months, N = 1211
Churn 110 (14%) 12 (41%) 22 (81%) 74 (12%) 2 (1.7%)
Default 5 (0.6%) 0 (0%) 1 (3.7%) 3 (0.5%) 1 (0.8%)
Average Tenure 22 (20, 23) 4 (3, 5) 10 (8, 10) 22 (21, 22) 25 (24, 25)
Average kWh 6.2 (3.8, 9.6) 3.7 (1.9, 5.6) 6.4 (3.6, 12.6) 6.1 (3.8, 9.2) 7.3 (4.7, 10.8)
Payment Method
ach 239 (31%) 3 (14%) 8 (30%) 199 (33%) 29 (24%)
card 541 (69%) 18 (86%) 19 (70%) 412 (67%) 92 (76%)
Gender
Female 171 (48%) 7 (44%) 4 (33%) 140 (49%) 20 (49%)
Male 186 (52%) 9 (56%) 8 (67%) 148 (51%) 21 (51%)
Marital Status
Married 217 (60%) 9 (56%) 8 (67%) 174 (60%) 26 (63%)
Single 30 (8.4%) 2 (12%) 1 (8.3%) 23 (7.9%) 4 (9.8%)
Unknown 112 (31%) 5 (31%) 3 (25%) 93 (32%) 11 (27%)
Occupation
Healthcare/Education Services 24 (38%) 0 (0%) 0 (0%) 24 (47%) 0 (0%)
Management/Technical 6 (9.4%) 2 (67%) 3 (100%) 0 (0%) 1 (14%)
Self Employed/Other 20 (31%) 1 (33%) 0 (0%) 15 (29%) 4 (57%)
Services 14 (22%) 0 (0%) 0 (0%) 12 (24%) 2 (29%)
Education
Completed College 91 (25%) 4 (25%) 3 (25%) 75 (26%) 9 (22%)
Graduate Degree 111 (31%) 2 (12%) 5 (42%) 94 (32%) 10 (24%)
High School or Some College 143 (40%) 8 (50%) 4 (33%) 112 (39%) 19 (46%)
Other 14 (3.9%) 2 (12%) 0 (0%) 9 (3.1%) 3 (7.3%)
Renter
Unknown 338 (94%) 14 (88%) 11 (92%) 276 (95%) 37 (90%)
Yes 21 (5.8%) 2 (12%) 1 (8.3%) 14 (4.8%) 4 (9.8%)
Homeowner
Unknown 94 (26%) 6 (38%) 2 (17%) 74 (26%) 12 (29%)
Yes 265 (74%) 10 (62%) 10 (83%) 216 (74%) 29 (71%)
INCOME_INSIGHT_SCORE 122 (97, 208) 112 (78, 126) 110 (90, 232) 123 (97, 203) 146 (98, 235)
Average VantageScore 804 (780, 819) 784 (740, 798) 788 (751, 809) 805 (781, 820) 809 (790, 819)
VantageScore April 2022 806 (779, 821) 793 (758, 808) 788 (704, 800) 807 (779, 821) 808 (794, 821)
VantageScore December 2020 806 (779, 823) 781 (739, 800) 795 (778, 815) 806 (784, 824) 810 (785, 826)
VantageScore Change 0 (-11, 13) -3 (-9, 20) -4 (-29, 7) 0 (-13, 12) 2 (-5, 13)
1 n (%); Median (IQR)

Primary Analysis

Model

To determine the probability an account will experience either churn or default, logistic regression is used to analyze likelihood of either churn or default, using the demographic and socieconomic data available from primary data collected. The model employed is described in below:

\[\begin{equation} Churn_{Prob} = \beta_{0} + \beta_{1}Tenure + \beta_{2}kWh +\beta_{3}Gender + \beta_{4}log(VantageScore) \\ + \beta_{5}Occupation + \beta_{6}Homeowner + \beta_{7}log(IncomeInsight)+ \beta_{8}Marriage + \beta_{9}Education \end{equation}\]

\[\begin{equation} Default_{Prob} = \beta_{0} + \beta_{1}Tenure + \beta_{2}kWh +\beta_{3}Gender + \beta_{4}log(VantageScore) \\ + \beta_{5}Occupation + \beta_{6}Homeowner + \beta_{7}log(IncomeInsight)+ \beta_{8}Marriage + \beta_{9}Education \end{equation}\]

Logarithmic transformations for continuous variables are employed. These include for VantageScore and Income Insight Score. A number of models are considered, each taking different combinations of input data. The results are shown below in Table 6.

Table 6: Churn Models

Table 6: Churn Models
Dependent variable:
Churn
(1) (2) (3) (4) (5)
tenure -0.195*** -0.141*** -0.152*** -0.150*** -0.185***
(0.021) (0.023) (0.024) (0.024) (0.070)
kwh_solar -0.045 -0.008 -0.025
(0.032) (0.028) (0.034)
GENDERFemale -0.554* -0.019
(0.307) (0.658)
log_Vantage_Avg 0.164 0.069 -0.083
(0.180) (0.175) (0.267)
OCCUPATIONManagement/Technical 0.067
(1.397)
OCCUPATIONSelf Employed/Other 0.191
(0.809)
OCCUPATIONServices -0.657
(0.965)
HOMEOWNERYes 0.088 0.246 1.315
(0.334) (0.355) (0.907)
log_INCOME_INSIGHT_SCORE 0.045 0.068
(0.343) (0.340)
MARRIAGEMarried 0.883
(0.703)
MARRIAGEUnknown 0.767
(0.728)
EDUCATIONGraduate Degree 0.491 0.447
(0.382) (0.386)
EDUCATIONHigh School or Some College -0.201 -0.165
(0.399) (0.402)
EDUCATIONOther -1.261 -0.878
(1.166) (1.148)
Constant 1.214 0.776 0.745 0.126 2.068
(1.211) (1.208) (1.691) (1.761) (2.250)
Observations 547 375 373 373 67
Log Likelihood -200.005 -149.166 -145.738 -144.801 -31.512
Akaike Inf. Crit. 408.010 310.333 305.476 307.601 79.024
Note: p<0.1; p<0.05; p<0.01

From Table 6, it is noticeable how tenure length is consistently measured as statistically significant. This effect is measured even when controlling for a combination of demographic and socioeconomic variables. Increased tenure lengths are associated with decreased probability of churn. For example, in Model 4, a one month increase in tenure is associated with a 0.15 decrease in log odds of churn. Taking the odds ratio of the monthly tenure variable, a one month increase in tenure, controlling for solar kWh, Education, income and marital status, is associated with a 14% decrease in the odds of churning. This is supported by the logit curve in the Discussion section.

This is one of the few statistically significant results observed in the analysis. The only other statistically significant variable was Gender. In model 2, Female account holders are associated 43% decrease in odds of churn, controlling for tenure, VantageScore, occupation and owning one’s home.

No other variables were associated with statistically significant effects on probability of churn.

Table 7: Default Models

Table 7: Default Models
Dependent variable:
Default
(1) (2) (3) (4) (5)
tenure -0.104 0.012 -0.012 0.111 -0.000
(0.076) (0.209) (0.146) (0.218) (9,034.631)
kwh_solar -0.263 -0.257 -0.220
(0.258) (0.539) (0.449)
GENDERFemale -56.242 -0.000
(7,545.990) (92,400.530)
log_Vantage_Avg -0.133 -6.274 0.000
(0.476) (5.208) (41,042.950)
OCCUPATIONManagement/Technical -0.000
(198,057.400)
OCCUPATIONSelf Employed/Other -0.000
(112,181.000)
OCCUPATIONServices -0.000
(122,774.100)
HOMEOWNERYes 55.105 18.086 -0.000
(6,628.545) (6,529.975) (106,883.400)
log_INCOME_INSIGHT_SCORE -3.214 -5.023
(2.577) (4.802)
MARRIAGEMarried 20.566
(17,707.200)
MARRIAGEUnknown 0.942
(20,060.560)
EDUCATIONGraduate Degree 0.689 0.020
(9,597.644) (14,127.030)
EDUCATIONHigh School or Some College 17.956 18.805
(6,976.457) (10,230.370)
EDUCATIONOther 0.139 -0.253
(19,802.490) (27,799.450)
Constant -1.266 -17.418 -25.866 -22.131 -26.566
(3.233) (6,628.459) (9,555.711) (20,450.070) (328,904.900)
Observations 547 375 373 373 67
Log Likelihood -16.727 -5.079 -4.662 -4.161 -0.000
Akaike Inf. Crit. 41.454 22.157 23.324 26.322 16.000
Note: p<0.1; p<0.05; p<0.01

Secondary Analysis

The secondary analysis sought to determine if a measurable change can be determined in participant credit scores (VantageScore).

The graph below shows boxplots of each VantageScore collected, at December 2020 and April 2022.

The histogram below plots the distribution of the scores by time period, and includes the standard scale and a logged comparison.

Using a Welch t-test, we measure the difference in means between the two groups. From the below, the means of the two groups are not statistically signifcant; therefore we cannot claim that credit scores for individuals in the solar farms were observed to have changed.

## 
##  Welch Two Sample t-test
## 
## data:  value by variable
## t = -0.92527, df = 1051.9, p-value = 0.355
## alternative hypothesis: true difference in means between group April 2022 and group December 2020 is not equal to 0
## 95 percent confidence interval:
##  -16.276750   5.845305
## sample estimates:
##    mean in group April 2022 mean in group December 2020 
##                    781.0366                    786.2523

Discussion

From the primary analysis, we saw that tenure was consistently associated with lower rates of probability of churning. This held constant throughout all five models, suggesting that higher tenure, controlling for demographic and socioeconomic variables such as education and income, corresponds to higher rates of retention in solar farms. This relationship is visualized in Graph A below.

Secondly, female account holders were less likely to churn compared to male account holders as seen in model 2. This effect is observed when controlling for homeownership status and VantageScore

Finally, no other variables were observed to have statistically significant effect on likelihood of churn. Interestingly, this was observed for income and VantageScore Different levels of income or credit scores were not observed with statistically different rates of churn. Further research should be done in the area of LMI retention in solar farm, but from our research we did not see a significant difference in probability of churn between either income or credit data.

In the Default model, as expected, such a low number of observations does not allow for any significant effects to be observed.

The logit curves below show the relationship between the distribution of both tenure and days late with the probability of churning. The first graph shows that the longer the tenure, the lower the likelihood. The second graph shows that the days of late payments are generally not associated with likelihood of churn.

Logit Curves

The goal of the secdonary analysis was to determine if any statistically significant difference is measured in VantageScore from the December 2020 to April 2022 timeframe for enrolled customers. No statistically significant effect was observed on credit score change after enrollment in solar farm.