Introduction

  • Employees Federal Union (EFCU) is the credit union for a Fortune 500 firm.
  • This case illustrates:
    • the entire research process
    • How a presentation can be prepared with enhanced reproducibility
  • This case can be used for
    • Introductory marketing research course (R1 - R6)
    • Upper level marketing research course (H1 - H5)

Problem Definition

  • CU has a large amount of surplus fund.
  • It has also experienced a lower loan/share ratio than other credit unions of similar size.
  • Average earnings on its investments have declined and profit margins are squeezed

Research Objectives

  1. To determine why its members are not borrowing money from the credit union.
  2. To determine what the members’ attitudes were toward the overall management and operations of the credit union.

Significance of Research

  • The findings from the research will provide critical information that the management can use to increase loan/ratio rate and increase profit margin.
  • Thus, the the research is critical in turning around the company’s struggling financial status.

Background and Research Questions/Hypotheses

Information gathering

  • To address the research objectives, a research firm was hired.
  • The researchers from the consulting company interviewed the management and some customers.
  • Based on the qualitative research and the researchers’ own experience, they came up with some research questions and hypotheses.

Reasons for Consumers’ Use of Credit Union

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “research question” because you do not have enough evidence to state the relationship with confidence.

R1: Why do people join the Credit Union?

Using services from Other Financial Services

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “research question” because you do not have enough evidence to state the relationship with confidence.

R2: Why do members use other financial institutions when they need to borrow funds?

Attitudes and beliefs about the employee proficiency

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “research question” because you do not have enough evidence to state the relationship with confidence.

R3: What are the members’ attitudes and beliefs about the proficiencies of credit union employees?

Differences in Headquarters vs. Elswhere

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “research question” because you do not have enough evidence to state the relationship with confidence.

R4: Are there any perceived differences on attitude towards employee proficiencies, awareness of services, and operational effectiveness between members who live in the area of the firm’s headquarters and members who live elsewhere?

Awareness of Services

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “research question” because you do not have enough evidence to state the relationship with confidence.

R5: How well are the members aware of the services offered by the credit union?

CU’s Operational Effectiveness

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “research question” because you do not have enough evidence to state the relationship with confidence.

R6: What are the members’ attitudes and beliefs about how effectively the credit union is operated?

Adequancy of Finacial Services

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “Hypothesis” when you have enough evidence to state the relationship with confidence.

H1: Members’ awareness of the regular share accounts will influence their opinions on the adequacy of financial services in meeting members’ needs.

CU Members’ Opinion on CU’s Loan Rate

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “Hypothesis” when you have enough evidence to state the relationship with confidence.

H2: Members will not be different in their opinion on CU’s loan rate (Q8r) and the degree to which they agree that CU’s loan rate is lower than competitors (Q30).

Impact of Current loan use and residence lcoatoin on Finacial Services Adequacy

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “Hypothesis” when you have enough evidence to state the relationship with confidence.

H3a: There will be interaction effect between the current loan use and location of residence on adequacy of current financial services such that in headquarter area, members who have a loan will agree, more than those who don’t have, that financial services meet the members’ needs, while in outside headquarter area, loan status doesn’t influence adequacy of financial services in meeting member needs.

H3b: Test H2a controlling for the impact of attitudes toward employee proficiency.

Loan Status and CU Loan Rate

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “Hypothesis” when you have enough evidence to state the relationship with confidence.

H4: Compared with the members who have a loan, those members who don’t have a loan currently with the Credit Union will feel that that CU charges them with higher rates, and that they will agree less with the statement that CU’s loan rates are lower than those offered by other institutions.

Factors Affecting Attitudes towards Credit Union Operational Efficiency

  • What to write on this sub-section?
    • You can use information you learned from academic research
    • You may use white paper available online
    • You can use information you obtained from managers
  • Weave the information from various sources to form a logic to support the relationships among the variables in the research question.
  • You are labeling it “Hypothesis” when you have enough evidence to state the relationship with confidence.

H5a: Member belief about CU keeps personal financial information confidential will positively influence the overall attitude toward credit union’s operational efficiency.

H5b: Loan application processing promptness will positively influence the overall attitude toward credit union’s operational efficiency.

H5c: Member satisfaction with level of financial services in meeting needs will positively influence the overall attitude toward credit union’s operational efficiency.

H5d: Loan application simplicity and easiness will positively influence the overall attitude toward credit union’s operational efficiency.

Methods

Research Design

  • A mail survey was used due to the following reasons:
    • CU membership is widely dispersed geographically
    • Low cost
    • Sensitive questions needs anonymity to induce a candid response
    • Long time needed for mail survey is not an issue for the board.
  • Questions were mostly structured with several key questions asked in unstructured format
  • Likert-type scales were used for attitudes measurements.

Population and Sampling

  • Population: All current members of the EFCU.
  • Sampling frame:
    • A list of members and their addresses is available.
    • 3,531 members as of January 31 for trial balance listing of the EFCU membership
  • Sampling method: Simple random sampling
    • Sample size: 300
      • Calculation: was done using estimatedpopulation standard deviation based on responses of 15 members to Q37.
      • 300 random numbers were generated within the range of 1 to 3,531 members.
      • Each random number was matched with the corresponding number in the sampling frame.

Data Collection and coding

  • Preparatoin of all copies of survey and mailing them were handled by the CU’s staff members.
  • The structured questions in the survey were coded based on the classifications established by researchers
  • Questionnaire responses were examined for the integrity, and a few responses of low quality were excluded

Data Preparation

  • Used R/RStudio for wrangling and visualization
  • Data wrangling and preparation was done separately. Specifically, we did the following
    • Reverse coded
    • Removed non-responses (NA)
    • Removed Not Applicable responses from otherwise Likert-type interval scales to make them suitable for continuous variables (e.g., 1-5 piont scale)
    • Renamed variable names
    • Relabeled variable labels and value labels if needed.
    • The prepared data is saved as Case 31 Data_efcu_Prepped.sav, which will be used for data analysis.

Variables in the data

  • Number of variables: 54
Data frame: case31
ID Name Label Values Value Labels
1 q3 Employees are Courteous 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
2 q4 Employees are helpful 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
3 q5 Employees are professional 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
4 q6 Employees are available 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
5 q7 savings rates 1
2
3
4
5
6
Very High
High
Average
Low
Very Low
No Opinion
6 q8 statement frequency 1
2
3
4
5
6
Very High
High
Average
Low
Very Low
No Opinion
7 q9 statement frequency 1
2
3
4
5
Too Often
Very Often
About Right
Not Often Enough
Never
8 q10 statement accuracy 1
2
3
4
Excellent
Good
Fair
Poor
9 q11 statement understandable 1
2
Yes
No
10 q12 Account Confidential 1
2
Yes
No
11 q13 regular share account 1
2
Aware and Have Used
Aware but Have Not Used
12 q14 special subaccounts 1
2
Aware and Have Used
Aware but Have Not Used
13 q15 Christmas club account 1
2
Aware and Have Used
Aware but Have Not Used
14 q16 IRA 1
2
Aware and Have Used
Aware but Have Not Used
15 q17 Master credit card 1
2
Aware and Have Used
Aware but Have Not Used
16 q18 signature loans 1
2
Aware and Have Used
Aware but Have Not Used
17 q19 new car loans 1
2
Aware and Have Used
Aware but Have Not Used
18 q20 late model car loans 1
2
Aware and Have Used
Aware but Have Not Used
19 q21 older model car loans 1
2
Aware and Have Used
Aware but Have Not Used
20 q22 household goods loans 1
2
Aware and Have Used
Aware but Have Not Used
21 q23 recreational loans 1
2
Aware and Have Used
Aware but Have Not Used
22 q24 share collateralized loan 1
2
Aware and Have Used
Aware but Have Not Used
23 q25 IRA loans 1
2
Aware and Have Used
Aware but Have Not Used
24 q26 line of credit loans 1
2
Aware and Have Used
Aware but Have Not Used
25 q27 currently have loan 1
2
Yes
No
26 q30 CU rates lower 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
27 q31 CU confidential 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
28 q32 CU is prompt 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
29 q33 CU meets needs of members 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
30 q34 CU loan applications simple 1
2
3
4
5
Strongly Disagree
Disagree
Uncertain
Agree
Strongly Agree
31 q37 overall rating 1
2
3
4
5
6
Excellent
Good
Average
Poor
Very Poor
No Opinion
32 overall_employee_performace range: 2.8-5.0
33 q38 headquarter 1
2
yes
no
34 employeeperformance range: 2.8-5.0
35 at_emp range: 2.8-5.0
36 q7r Savings rates 1
2
3
4
5
Very Low
Low
Average
High
Very High
37 q8r Fund borrowing rates 1
2
3
4
5
Very Low
Low
Average
High
Very High
38 q9r Statement Frequency 1
2
3
4
5
Never
Not Often Enough
About Right
Very Often
Too Often
39 q10r Statement Accuracy 1
2
3
4
Poor
Fair
Good
Excellent
40 q13r Regular share account 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
41 q14r Special subaccounts 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
42 q15r Christmas club account 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
43 q16r IRA 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
44 q17r Master credit card 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
45 q18r Signature loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
46 q19r New car loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
47 q20r Late model car loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
48 q21r Older model car loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
49 q22r Household goods loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
50 q23r Recreational loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
51 q24r Share collateralized loan 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
52 q25r IRA loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
53 q26r Line of credit loans 1
2
3
Unaware
Aware but Have Not Used
Aware and Have Used
54 q37r 1
2
3
4
5
Very Poor
Poor
Average
Good
Excellent

Sample Characteristics

  • Number of participants: 128
  • % of the participants who live in the headquarter area vs. non-headquarter area: 44.5% vs. 55.5%.
  • No demographic variables are available on the survey
    • Gender
    • Ethnicity/Race
    • Income
    • Education

Measures

  • Refer to the Construct and Measurement Table.
  • All constructs, operationalization, and scales will be provided later.

Analysis and Results

  • Used R for wrangling and modeling

R1: Overview - One-Sample t-test

R1: Why do people join the Credit Union?

  • Constructs and roles: reasons for why people join the Credit Union
  • Operationalization: q7r (Savings rates)
  • Scales:
    • 1-5 point, Likert-type scale
  • Statistics to be used:
    • Descriptive statistics, followed by one sample t-test

Descriptive Statistics

Frequency Distribution

Savings rates n percent
Very Low 1 0.0078125
Low 12 0.0937500
Average 83 0.6484375
High 22 0.1718750
NA 10 0.0781250
Figure 1: Response Frequency of Savings Rate (q7r)

Descriptive Stat

vars n mean sd median trimmed mad min max range skew kurtosis se
1 118 3.067797 0.5658186 3 3.067797 0 1 4 3 -0.2645695 1.175269 0.05208782
Figure 2: Descriptive Statistics for Savings Rate (Q7r)

Figure 3: Histogram for Savings Rate (Q7r)

One-sample t-test

  • \(H_0: \mu = 3\)
  • \(H_A: \mu \neq 3\)
mean_q7r <- case31 |> #count(q7r)
  summarise(mean = mean(q7r, na.rm = TRUE)) |> 
  round(2)

case31 |> 
  t.test(q7r ~ 1, mu = 3, data = _)

    One Sample t-test

data:  q7r
t = 1.3016, df = 117, p-value = 0.1956
alternative hypothesis: true mean is not equal to 3
95 percent confidence interval:
 2.964639 3.170954
sample estimates:
mean of x 
 3.067797 

Insights about R1

  • The majority of the survey participants (65%) think that the savings rate the CU offers is average.
  • The sample mean is 3.07, which is not statistically different from 3 (“average”) (p > .10).
  • Thus, we can conclude that CU members perceive that the savings rate CU applies to them is about average.

R2: Overview

R2: Why do members use other financial institutions when they need to borrow funds?

  • Constructs and roles: Reasons to use loans from other financial institutions
  • Operationalization:
    • q8r (Fund borrowing rates)
    • q30 (CU rates lower)
  • Scales:
    • 1-5 point, Likert-type scales
  • Statistics to be used:
    • Descriptive statistics, followed by one sample t-test

Descriptive Statistics

One-sample t-test

R3: Overview - One-sample t-test

R3: What are the members’ attitudes and beliefs about the proficiency of credit union employees?

  • Constructs and roles: Attitudes toward employee proficiency
  • Operationalization:
    • q3, q4, q5, q6 –> at_emp
  • Scale: interval scale: 1-5 point, Likert-type scale
  • Statistics:
    • Descriptive statistics
    • Reliabilty/factor analysis,
    • one-sample t-test

Descriptive Statistics

Means & SD

describe(case31$at_emp)
   vars   n mean   sd median trimmed  mad  min max range  skew kurtosis   se
X1    1 128 4.21 0.59      4    4.25 0.74 2.75   5  2.25 -0.32    -0.69 0.05

Histogram

Figure 4: Histogram of Employee Proficiencies (at_emp) with a Mean line

Reliability/factor analysis

  • We already looked at reliability and unidimensionality in the data preparatoin stage.
  • The four items are highly representative of the latent construct.
  • Therefore, we created a composite index and use it

One-sample t-test

  • \(H_0: \mu = 4\)
  • \(H_A: \mu \neq 4\)
case31 |> 
  t.test(at_emp ~ 1, mu = 4, data = _)

    One Sample t-test

data:  at_emp
t = 3.9063, df = 127, p-value = 0.0001515
alternative hypothesis: true mean is not equal to 4
95 percent confidence interval:
 4.101192 4.308965
sample estimates:
mean of x 
 4.205078 

Summary insights about R3

Participants perceived Employee proficiency (M = 4.21 out of 5.0) is statistically significantly greater than 4.0 ("Agree") (p < .001)

R4 Overview

R4: Are there any perceived differences on (a) attitude towards employee proficiency, (b) awareness of services, and (c) operational effectiveness between members who live in the area of the firm’s headquarters and members who live elsewhere?

  • R4a: Two Samples t-Test
  • R4b: Two-way cross-tabs, followed by Chi-square independence test
  • R4c: Two tests possible depending on variables to use.
    • 2-groups t-test
    • Ch-square independence test

R4a Analysis Overview (Two samples t-test)

R4a: Are there any perceived differences on attitude towards employee proficiency between members who live in the area of the firm’s headquarters and members who live elsewhere?

  • Constructs and roles:
    • Member residence: IV
    • Attitudes toward employee proficiency: DV
  • Operationalization:
    • q38
    • at_emp
  • Scale:
    • Nominal scale:
    • interval scale: 1-5 point, Likert-type scale
  • Statistics:
    • Descriptive statistics, followed by independent samples t-test

Descriptives

Boxplot

Figure 5: Attitudes toward employee proficiency by residence

Barplot

Figure 6: Attitudes toward employee proficiency by residence

Two independent samples t-test

library(car)

case31 |> 
  mutate(q38 = haven::as_factor(q38)) |> 
  leveneTest(at_emp ~ q38, data = _)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   1   0.041 0.8399
      126               
case31 |> 
  mutate(q38 = haven::as_factor(q38)) |> 
  var.test(at_emp ~ q38, data = _)

    F test to compare two variances

data:  at_emp by q38
F = 0.89996, num df = 56, denom df = 70, p-value = 0.6864
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.548864 1.496643
sample estimates:
ratio of variances 
         0.8999608 
# t.test: assuming var are equal
case31 |> 
  t.test(at_emp ~ q38, data = _, var.equal = TRUE) 

    Two Sample t-test

data:  at_emp by q38
t = 1.2941, df = 126, p-value = 0.198
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.0721542  0.3448253
sample estimates:
mean in group 1 mean in group 2 
       4.280702        4.144366 

Summary Table

case31 |> 
  mutate(q38 = haven::as_factor(q38)) |> 
  tbl_summary(
    include = c(q38, at_emp),
    by = q38,
    type = at_emp ~ "continuous2",
    statistic = list(all_continuous() ~ c("{mean} ({sd})",
                                          "{min}, {max}"))
  ) |> 
  add_difference() |> 
  add_overall() |> 
  add_n() |> 
  modify_caption("**Test of Differences in Attitudes toward Employee Proficiency between customers who live in Headquarter area and those who live in non-headquarter area**") |> 
  modify_spanning_header(
    c(stat_1, stat_2) ~ "**Live in Headquarter Area?**") 
Table 1: Do Customers Differ in their Attitudes toward Employee Proficiency depending on where they live?
Test of Differences in Attitudes toward Employee Proficiency between customers who live in Headquarter area and those who live in non-headquarter area
Characteristic N Overall
N = 128
Live in Headquarter Area?
Difference1 95% CI1,2 p-value1
yes
N = 57
no
N = 71
at_emp 128


0.14 -0.07, 0.34 0.2
    Mean (SD)
4.21 (0.59) 4.28 (0.57) 4.14 (0.61)


    Min, Max
2.75, 5.00 2.75, 5.00 3.00, 5.00


1 Welch Two Sample t-test
2 CI = Confidence Interval

Summary insights about H4a

  • The group mean value of employee proficiency for customers who live in HQ (M = 4.28) and those who live elsewhere (M = 4.14) are not statistically different (p > 0.10).

  • Thus, we conclude that CU member perception of CU employee proficiency does not differ regardless of whether they live in headquarters area or not.

R4b Analysis Overview (\(\chi^2\) Independence Test)

R4b: Are there any perceived differences on awareness of services between members who live in the area of the firm’s headquarters and members who live elsewhere?

  • Constructs and roles:
    • Member residence: IV
    • service awareness: DV
  • Operationalization:
    • q38
    • q13r - q26r
  • Scale:
    • Nominal scale
    • Ordinal scale
  • Statistics:
    • Two-way cross-tabs, followed by Chi-square independence test

q13r (Share account) by q38

case31 <- case31 |> 
  mutate(q13r = haven::as_factor(q13r),
         q38 = haven::as_factor(q38))
table(case31$q13r, case31$q38)
                         
                          yes no
  Unaware                  34 49
  Aware but Have Not Used  16 16
  Aware and Have Used       7  6
# cross-tab: %
table(case31$q13r, case31$q38) |> 
  prop.table(margin = 2) * 100 
                         
                                yes        no
  Unaware                 59.649123 69.014085
  Aware but Have Not Used 28.070175 22.535211
  Aware and Have Used     12.280702  8.450704
Figure 7: Association of the awareness of regular share account by residence
  • \(H_0\): Awareness of regular share account is independent of members’ residence
  • \(H_A\): Awareness of regular share account is dependent on members’ residence
table(case31$q13r, case31$q38) |>
  chisq.test() 

    Pearson's Chi-squared test

data:  table(case31$q13r, case31$q38)
X-squared = 1.2717, df = 2, p-value = 0.5295

Insights

  • Members’ awareness of regular share account is not associated with the place of members’ residence.
  • Paraphrasing, members’ awareness of regular share account is independent of where they live.

q14r (Special subaccounts) by q38

case31 <- case31 |> 
  mutate(q14r = haven::as_factor(q14r))
# cross-tabulation: count

table(case31$q14r, case31$q38)
                         
                          yes no
  Unaware                  22 24
  Aware but Have Not Used  31 39
  Aware and Have Used       4  8
# cross-tab: %
table(case31$q14r, case31$q38) |> 
  prop.table(margin = 2) * 100 
                         
                                yes        no
  Unaware                 38.596491 33.802817
  Aware but Have Not Used 54.385965 54.929577
  Aware and Have Used      7.017544 11.267606

table(case31$q14r, case31$q38) |>
  chisq.test() 

    Pearson's Chi-squared test

data:  table(case31$q14r, case31$q38)
X-squared = 0.81305, df = 2, p-value = 0.666

q15r by q38 with gt_summary table

  • q15r (Christmas club account)
case31 <- case31 |> 
  mutate(q15r = haven::as_factor(q15r))

case31 |> 
  tbl_summary(
    include = c(q15r, q38),
    by = q38
  ) |> 
  add_p() |> 
  modify_spanning_header(all_stat_cols() ~ "**Live in HQ Area?**") |> 
  add_n() |> 
  italicize_labels() |> 
  #show_header_names()
  modify_header(label ~ "**Awareness**") |> 
  as_gt() |> 
  tab_header(
    title = md("**Chi-Square Test of Independece**"),
    subtitle = md("*Christmas Club Account Awareness by Where Customers Live*")
  )
Table 2: Christmas Club Account Awareness by Location of Residence
Chi-Square Test of Independece
Christmas Club Account Awareness by Where Customers Live
Awareness N
Live in HQ Area?
p-value2
yes
N = 571
no
N = 711
Christmas club account 128

0.9
    Unaware
17 (30%) 25 (35%)
    Aware but Have Not Used
35 (61%) 40 (56%)
    Aware and Have Used
5 (8.8%) 6 (8.5%)
1 n (%)
2 Fisher’s exact test

Further cleaning

case31_clean <- case31 |> 
  mutate(q13r = haven::as_factor(q13r),
         q14r = haven::as_factor(q14r),
         q15r = haven::as_factor(q15r),
         q16r = haven::as_factor(q16r),
         q17r = haven::as_factor(q17r),
         q18r = haven::as_factor(q18r),
         q19r = haven::as_factor(q19r),
         q20r = haven::as_factor(q20r),
         q21r = haven::as_factor(q21r),
         q22r = haven::as_factor(q22r),
         q23r = haven::as_factor(q23r),
         q24r = haven::as_factor(q24r),
         q25r = haven::as_factor(q25r),
         q26r = haven::as_factor(q26r)
         )

Creating a function for chi-square independence test

chisq_ind_test <- function(data, var) {
  
  library(dplyr)
  library(rlang)
  library(glue)
  
  var <- enquo(var)  # Capture the column as a quosure
  
  # Remove NA's
  data <- data |> 
    filter(!is.na(!!var), !is.na(q38))
  
  # Remove unused levels to avoid errors in chi-square test
  data <- data |> 
  mutate(!!var := droplevels(!!var)) # dynamic column name
  
  # Cross-tab
  tab <- table(pull(data, !!var), data$q38)

  # Cross-tab: %
  prop_tab <- prop.table(tab, margin = 2) * 100

  # Mosaic Plot
  tab_c <- table(data$q38, pull(data, !!var))
  
  mosaicplot(
    tab_c,
    main = glue::glue("Awareness of {quo_name(var)} by Residence"),
    color = TRUE, las = 1
  )

  # Chi-square test of independence
  chi_sq_result <- chisq.test(tab)

  # Return results as a list
  list(
    crosstab = tab,
    proportions = prop_tab,
    chi_square = chi_sq_result
  )
}

Testing all

awareness <- vars(q13r, q14r, q15r, q16r, q17r, q18r, q19r, q20r, q21r, q22r, q23r, q24r, q25r, q26r) # Quote the variables

awareness |> 
  map(~ chisq_ind_test(case31_clean, !!.x))

testing for q26r by q38

case31_clean |> 
  chisq_ind_test(q26r)
$crosstab
                         
                          yes no
  Aware but Have Not Used  37 43
  Aware and Have Used      20 28

$proportions
                         
                               yes       no
  Aware but Have Not Used 64.91228 60.56338
  Aware and Have Used     35.08772 39.43662

$chi_square

    Pearson's Chi-squared test with Yates' continuity correction

data:  tab
X-squared = 0.10332, df = 1, p-value = 0.7479

[[1]]
[[1]]$crosstab
                         
                          yes no
  Unaware                  34 49
  Aware but Have Not Used  16 16
  Aware and Have Used       7  6

[[1]]$proportions
                         
                                yes        no
  Unaware                 59.649123 69.014085
  Aware but Have Not Used 28.070175 22.535211
  Aware and Have Used     12.280702  8.450704

[[1]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 1.2717, df = 2, p-value = 0.5295



[[2]]
[[2]]$crosstab
                         
                          yes no
  Unaware                  22 24
  Aware but Have Not Used  31 39
  Aware and Have Used       4  8

[[2]]$proportions
                         
                                yes        no
  Unaware                 38.596491 33.802817
  Aware but Have Not Used 54.385965 54.929577
  Aware and Have Used      7.017544 11.267606

[[2]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 0.81305, df = 2, p-value = 0.666



[[3]]
[[3]]$crosstab
                         
                          yes no
  Unaware                  17 25
  Aware but Have Not Used  35 40
  Aware and Have Used       5  6

[[3]]$proportions
                         
                                yes        no
  Unaware                 29.824561 35.211268
  Aware but Have Not Used 61.403509 56.338028
  Aware and Have Used      8.771930  8.450704

[[3]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 0.42185, df = 2, p-value = 0.8098



[[4]]
[[4]]$crosstab
                         
                          yes no
  Unaware                  16 19
  Aware but Have Not Used  27 37
  Aware and Have Used      14 15

[[4]]$proportions
                         
                               yes       no
  Unaware                 28.07018 26.76056
  Aware but Have Not Used 47.36842 52.11268
  Aware and Have Used     24.56140 21.12676

[[4]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 0.32678, df = 2, p-value = 0.8493



[[5]]
[[5]]$crosstab
                         
                          yes no
  Unaware                  10  5
  Aware but Have Not Used  22 35
  Aware and Have Used      25 31

[[5]]$proportions
                         
                                yes        no
  Unaware                 17.543860  7.042254
  Aware but Have Not Used 38.596491 49.295775
  Aware and Have Used     43.859649 43.661972

[[5]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 3.7885, df = 2, p-value = 0.1504



[[6]]
[[6]]$crosstab
                         
                          yes no
  Unaware                   4  1
  Aware but Have Not Used  45 59
  Aware and Have Used       8 11

[[6]]$proportions
                         
                                yes        no
  Unaware                  7.017544  1.408451
  Aware but Have Not Used 78.947368 83.098592
  Aware and Have Used     14.035088 15.492958

[[6]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 2.6589, df = 2, p-value = 0.2646



[[7]]
[[7]]$crosstab
                         
                          yes no
  Unaware                   2  3
  Aware but Have Not Used  45 54
  Aware and Have Used      10 14

[[7]]$proportions
                         
                                yes        no
  Unaware                  3.508772  4.225352
  Aware but Have Not Used 78.947368 76.056338
  Aware and Have Used     17.543860 19.718310

[[7]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 0.15546, df = 2, p-value = 0.9252



[[8]]
[[8]]$crosstab
                         
                          yes no
  Unaware                   8  4
  Aware but Have Not Used  43 59
  Aware and Have Used       6  8

[[8]]$proportions
                         
                                yes        no
  Unaware                 14.035088  5.633803
  Aware but Have Not Used 75.438596 83.098592
  Aware and Have Used     10.526316 11.267606

[[8]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 2.6291, df = 2, p-value = 0.2686



[[9]]
[[9]]$crosstab
                         
                          yes no
  Unaware                  15 18
  Aware but Have Not Used  38 45
  Aware and Have Used       4  8

[[9]]$proportions
                         
                                yes        no
  Unaware                 26.315789 25.352113
  Aware but Have Not Used 66.666667 63.380282
  Aware and Have Used      7.017544 11.267606

[[9]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 0.67323, df = 2, p-value = 0.7142



[[10]]
[[10]]$crosstab
                         
                          yes no
  Unaware                  19 18
  Aware but Have Not Used  33 47
  Aware and Have Used       5  6

[[10]]$proportions
                         
                                yes        no
  Unaware                 33.333333 25.352113
  Aware but Have Not Used 57.894737 66.197183
  Aware and Have Used      8.771930  8.450704

[[10]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 1.0492, df = 2, p-value = 0.5918



[[11]]
[[11]]$crosstab
                         
                          yes no
  Unaware                  22 33
  Aware but Have Not Used  30 30
  Aware and Have Used       5  8

[[11]]$proportions
                         
                               yes       no
  Unaware                 38.59649 46.47887
  Aware but Have Not Used 52.63158 42.25352
  Aware and Have Used      8.77193 11.26761

[[11]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 1.3775, df = 2, p-value = 0.5022



[[12]]
[[12]]$crosstab
                         
                          yes no
  Unaware                  31 38
  Aware but Have Not Used  25 32
  Aware and Have Used       1  1

[[12]]$proportions
                         
                                yes        no
  Unaware                 54.385965 53.521127
  Aware but Have Not Used 43.859649 45.070423
  Aware and Have Used      1.754386  1.408451

[[12]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 0.039011, df = 2, p-value = 0.9807



[[13]]
[[13]]$crosstab
                         
                          yes no
  Unaware                  36 42
  Aware but Have Not Used  21 25
  Aware and Have Used       0  4

[[13]]$proportions
                         
                                yes        no
  Unaware                 63.157895 59.154930
  Aware but Have Not Used 36.842105 35.211268
  Aware and Have Used      0.000000  5.633803

[[13]]$chi_square

    Pearson's Chi-squared test

data:  tab
X-squared = 3.3178, df = 2, p-value = 0.1903



[[14]]
[[14]]$crosstab
                         
                          yes no
  Aware but Have Not Used  37 43
  Aware and Have Used      20 28

[[14]]$proportions
                         
                               yes       no
  Aware but Have Not Used 64.91228 60.56338
  Aware and Have Used     35.08772 39.43662

[[14]]$chi_square

    Pearson's Chi-squared test with Yates' continuity correction

data:  tab
X-squared = 0.10332, df = 1, p-value = 0.7479

R4c Analysis Overview

  • R4c: Are there any perceived differences on operational effectiveness between members who live in the area of the firm’s headquarters and members who live elsewhere?
  • Constructs and roles:
    • Member residence: IV
    • operatoinal effectiveness: DV
  • Operationalization:
    • q38
    • q9r, q10r,
    • q11-q12r,
    • q31-q34,
    • q37r
  • Scale:
    • Nominal scale
    • interval scales
    • Nominal
    • interval
    • interval
  • Statistics:
    • Descriptive statistics, followed by
      • 2-groups t-test
      • Ch-square independence test
      • 2-groups t-test
      • 2-groups t-test

R5 (\(\chi^2\) Goodness of Fit)

R5: How well are the members aware of the services offered by the credit union?

  • Constructs and roles:
    • Awareness of services:
  • Operationalization:
    • q13r-q26r
  • Scale:
    • Ordinal scale:
  • Statistics:
    • Chi-square goodness-of-fit test

q13r

Summary table

case31_clean |> 
  tbl_summary(
    include = q13r,
    type = all_categorical() ~ "categorical"
  ) |> 
  italicize_labels() |> 
  #show_header_names()
  modify_header(label ~ "**Awareness**") |> 
  modify_caption("**Chi-Square Goodness of Fit Test**:<br>*Regular Share Account Awareness*")
Table 3: Regular Share Account Awareness Frequency
Chi-Square Goodness of Fit Test:
Regular Share Account Awareness
Awareness N = 1281
Regular share account
    Unaware 83 (65%)
    Aware but Have Not Used 32 (25%)
    Aware and Have Used 13 (10%)
1 n (%)

Barplot

case31_clean |>
  ggplot(aes(q13r, fill = q13r)) +
  geom_bar() +
  theme(legend.position = "none")

Chi-square Goodness-of-fit test

  • \(H_0\) = the observed data follows the expected distribution
  • \(H_A\) = the observed data follows the expected distribution
expected = c(1/3, 1/3, 1/3)
table(case31_clean$q13r) |> 
  chisq.test(p = expected)

    Chi-squared test for given probabilities

data:  table(case31_clean$q13r)
X-squared = 61.422, df = 2, p-value = 4.596e-14

Insights

  • There is statistically significant evidence that the observed data doesn’t follow from the theoretical distribution
  • the level of members’ awareness of regular share account is different across the three levels with “unawareness being highest

More efficient testing with a function

# data

awareness_long <- case31_clean |> 
  select(q13r:q26r) |> 
  pivot_longer(cols = everything(),
               names_to = "services",
               values_to = "aware_level")
awareness_long
# A tibble: 1,792 × 2
   services aware_level            
   <chr>    <fct>                  
 1 q13r     Aware and Have Used    
 2 q14r     Aware but Have Not Used
 3 q15r     Aware but Have Not Used
 4 q16r     Aware but Have Not Used
 5 q17r     Aware but Have Not Used
 6 q18r     Aware but Have Not Used
 7 q19r     Aware and Have Used    
 8 q20r     Aware and Have Used    
 9 q21r     Aware but Have Not Used
10 q22r     Aware but Have Not Used
# ℹ 1,782 more rows
# Expected proportions (e.g., equal distribution across levels)
exp_p <- c(`Unaware` = 1/3, 
           `Aware but Have Not Used` = 1/3, 
           `Aware and Have Used` = 1/3) 

# Perform chi-square goodness of fit for each product
awareness_long |> 
  group_by(services) |> 
  count(aware_level) |> 
  complete(aware_level = names(exp_p), fill = list(n = 0)) |> # tidyr
  group_by(services) |> 
  summarise(
    chisq_test = list(chisq.test(
      x = n,
      p = exp_p[levels(factor(aware_level))]
  ))
  ) |> 
  mutate(tidy_results = map(chisq_test, broom::tidy)) |> 
  unnest(tidy_results)
# A tibble: 14 × 6
   services chisq_test statistic  p.value parameter method                      
   <chr>    <list>         <dbl>    <dbl>     <dbl> <chr>                       
 1 q13r     <htest>         61.4 4.60e-14         2 Chi-squared test for given …
 2 q14r     <htest>         39.8 2.26e- 9         2 Chi-squared test for given …
 3 q15r     <htest>         48.0 3.75e-11         2 Chi-squared test for given …
 4 q16r     <htest>         16.4 2.72e- 4         2 Chi-squared test for given …
 5 q17r     <htest>         26.9 1.43e- 6         2 Chi-squared test for given …
 6 q18r     <htest>        135.  6.07e-30         2 Chi-squared test for given …
 7 q19r     <htest>        116.  7.16e-26         2 Chi-squared test for given …
 8 q20r     <htest>        124.  1.30e-27         2 Chi-squared test for given …
 9 q21r     <htest>         62.4 2.88e-14         2 Chi-squared test for given …
10 q22r     <htest>         56.9 4.36e-13         2 Chi-squared test for given …
11 q23r     <htest>         31.2 1.65e- 7         2 Chi-squared test for given …
12 q24r     <htest>         59.8 1.02e-13         2 Chi-squared test for given …
13 q25r     <htest>         64.6 9.56e-15         2 Chi-squared test for given …
14 q26r     <htest>         76   3.14e-17         2 Chi-squared test for given …

Insights about R5

  • Series of Chi-square goodness-of-fit statistics show and for all services, frequency distributions of awareness levels are significantly different from each other and the pattern of differences are quite similar with unawareness being largest across all types of financial services.

  • The company should increase the awareness.

R6 Overview (\(\chi^2\) & t-test)

R6: What are the members’ attitudes and beliefs about how effectively the credit union is operated?

  • Constructs and roles:
    • Attitude toward operational effectiveness
  • Operationalization:
    • Q9r–Q10r
    • Q11–Q12rc,
    • Q31 – Q34
    • Q37r
  • Scale:
    • ordinal/internal scale
    • nominal scale
    • Interval
    • Interval
  • Statistics:
    • DS / Chi-square goodness-of-fit test
    • DS / Chi-square goodness-of-fit test
    • DS / One group t-test
    • DS / One group t-test

H1 Overview - One-way ANOVA

H1: Members’ awareness of the regular share accounts will influence their opinions on the adequacy of financial services in meeting members’ needs.

  • Constructs and roles:
    • Members’ awareness of the regular share account (IV)
    • Adequacy of financial services in meeting member needs (DV)
  • Operationalization:
    • q13r
    • q33
  • Scale:
    • Ordinal scale
    • Interval
  • Statistics:
    • DS / One-way ANOVA

Visualization

Frequency by groups

Figure 8: Frequency Distribution of Adequacy of Financial Services in Meeting Members’ Needs (q33) by Their Awareness of Regular Share Accounts (q13r)

Mean value by groups

Figure 9: Mean VAlue of Adequacy of Financial Services in Meeting Members’ Needs (q33) by Their Awareness of Regular Share Accounts (q13r)

ANOVA Test Procedure

1. Check normality assumption

  • The dependent variable should be approximately normally distributed within each group.
  • \(H_0:\) The data are normally distributed
  • \(H_1:\) The data are NOT normally distributed

Overall test for q33

# Shapiro-Wilk normality test 

case31_clean |> 
  pull(q33) |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  pull(case31_clean, q33)
W = 0.80776, p-value = 1.176e-11

Test by awareness group on q33

case31_clean |> 
  group_by(q13r) |> 
  summarise(p_value = shapiro.test(pull(cur_data(), q33))$p.value)
# A tibble: 3 × 2
  q13r                          p_value
  <fct>                           <dbl>
1 Unaware                 0.00000000133
2 Aware but Have Not Used 0.0000121    
3 Aware and Have Used     0.0136       

Test of Normality on Residuals

h1_aov_model <- case31_clean |> 
  aov(q33 ~ q13r, data = _) 

residuals(h1_aov_model) |> 
  shapiro.test()

    Shapiro-Wilk normality test

data:  residuals(h1_aov_model)
W = 0.87843, p-value = 7.927e-09

2. Homogeneity of variance

  • \(H_0:\) Variance is the same across the groups
  • \(H_1:\) The data is NOT normally distributed
library(car)
case31_clean |> 
  leveneTest(q33 ~ q15r, data = _)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  1.0002 0.3707
      125               

3. ANOVA Test

  • \(H_0:\) All group means are equal (\(\mu_1 = \mu_2 = \mu_3 = ... = \mu_k\))
  • \(H_1:\) At least one pair of group means is different

Under homoskedasticity

case31_clean |> 
  aov(q33 ~ q13r, data = _) |> 
  summary()
             Df Sum Sq Mean Sq F value Pr(>F)  
q13r          2   4.06  2.0307   4.631 0.0115 *
Residuals   125  54.81  0.4385                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since residuals are non-normal

  • Kruskal-Wallis Test
    • non-parametric
    • compares the medians of three or more independent groups to determine if they differ significantly.
case31_clean |> 
  kruskal.test(q33 ~ q13r, data = _)

    Kruskal-Wallis rank sum test

data:  q33 by q13r
Kruskal-Wallis chi-squared = 7.1098, df = 2, p-value = 0.02858

4. Summary Table

anova_model <- case31_clean |> 
  aov(q33 ~ q13r, data = _) 

anova_summary <- summary(anova_model)

anova_pvalue <- anova_summary[[1]][["Pr(>F)"]][1]

case31_clean |> 
  mutate(q33 = as.numeric(q33)) |> 
  tbl_summary(
    include = c(q13r, q33),
    by = q13r,
    type = q33 ~ "continuous2",
    statistic = list(all_continuous() ~ c("{mean} ({sd})",
                                          "{min}, {max}")),
    label = q33 ~ "Adequacy of Financial Services"
  ) |> 
  modify_spanning_header(
    all_stat_cols() ~ "**Regular Share Account Awareness Levels**") |> 
  add_overall() |> 
  modify_caption("**Test of Differences in Perceived Adequacy of Financial Services in Meeting Memberss Needs:**<br>_Diffential Effect by Awareness of Regular Share Account (q13r)_") |> 
  italicize_labels() |> 
  modify_footnote(
    c(stat_1, stat_2, stat_3) ~ glue::glue("One-way ANOVA with equal variance assummed p-value: {format.pval(anova_pvalue, digits = 3)}")) |> 
  #show_header_names()
  modify_header(label ~ "**Variable**")
Table 4: Does Awareness of Regular Share Account Affects Members’ Perceived Adequacy of Financial Services in Meeting their Needs?
Test of Differences in Perceived Adequacy of Financial Services in Meeting Memberss Needs:
Diffential Effect by Awareness of Regular Share Account (q13r)
Variable Overall
N = 128
Regular Share Account Awareness Levels
Unaware
N = 831
Aware but Have Not Used
N = 321
Aware and Have Used
N = 131
Adequacy of Financial Services



    Mean (SD) 3.59 (0.68) 3.63 (0.66) 3.72 (0.63) 3.08 (0.76)
    Min, Max 2.00, 5.00 2.00, 5.00 3.00, 5.00 2.00, 4.00
1 One-way ANOVA with equal variance assummed p-value: 0.0115

5. Post-hoc Analysis if not all group means are the same

When ANOVA is significant

case31_clean |> 
  aov(q33 ~ q13r, data = _) |> 
  TukeyHSD()
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = q33 ~ q13r, data = case31_clean)

$q13r
                                                   diff        lwr         upr
Aware but Have Not Used-Unaware              0.09224398 -0.2345927  0.41908065
Aware and Have Used-Unaware                 -0.54958295 -1.0180950 -0.08107087
Aware and Have Used-Aware but Have Not Used -0.64182692 -1.1584282 -0.12522563
                                                p adj
Aware but Have Not Used-Unaware             0.7816413
Aware and Have Used-Unaware                 0.0170339
Aware and Have Used-Aware but Have Not Used 0.0106252

When Kruskal-Wallis is significant

library(dunn.test)

# Post-hoc pairwise comparisons
dunn.test(as.numeric(case31_clean$q33), case31_clean$q13r, method = "bonferroni")
  Kruskal-Wallis rank sum test

data: x and group
Kruskal-Wallis chi-squared = 7.1098, df = 2, p-value = 0.03

                           Comparison of x by group                            
                                 (Bonferroni)                                  
Col Mean-|
Row Mean |   Aware an   Aware bu
---------+----------------------
Aware bu |  -2.534359
         |    0.0169*
         |
 Unaware |  -2.501555   0.419916
         |    0.0185*     1.0000

alpha = 0.05
Reject Ho if p <= alpha/2

Insights about H1

  • Normality assumption of residuals from ANOVA was not met
  • However, homoskedasticity assumption is hold true
  • Both ANOVA and Kruskal-Wallis test is significant; Thus, post-hoc analysis is conducted.
  • Post-hoc analysis results are the same regardless of the methods.
  • Members’ perception of adequacy of financial services is most negative among who have highest level of awareness of regualr savings account (aware and have used it), while it is not different between the two lower awareness groups (aware but have not used it vs. unaware)

H2: Overview - Paired Samples t-Test

H2: Members will not be different in their opinion on CU’s loan rate (Q8r) and the degree to which they agree that CU’s loan rate is lower than competitors (Q30).

H3a: Overview - Two-way ANOVA

H3a: There will be interaction effect between the current loan use and location of residence on adequacy of current financial services such that in headquarter area, members who have a loan will agree, more than those who don’t have, that financial services meet the members’ needs, while in outside headquarter area, loan status doesn’t influence adequacy of financial services in meeting member needs.

H3b: Overview - Two-way ANCOVA

H3b: Test H2a controlling for the impact of attitudes toward employee proficiency.

H4: Overview - MANOVA

H4: Compared with the members who have a loan, those members who don’t have a loan currently with the Credit Union will feel that that CU charges them with higher rates, and that they will agree less with the statement that CU’s loan rates are lower than those offered by other institutions.

H5: Overview - Multiple Regression

H5a: Member belief about CU keeps personal financial information confidential will positively influence the overall attitude toward credit union’s operational efficiency.

H5b: Loan application processing promptness will positively influence the overall attitude toward credit union’s operational efficiency.

H5c: Member satisfaction with level of financial services in meeting needs will positively influence the overall attitude toward credit union’s operational efficiency.

H5d: Loan application simplicity and easiness will positively influence the overall attitude toward credit union’s operational efficiency.

  • Constructs and roles:
    1. Overall managerial and operational efficiency (DV)
    2. Confidentiality of personal financial information (IV)
    3. Loan application processing promptness (IV)
    4. Adequacy of financial services in meeting needs (IV)
    5. Simplicity and easiness of loan application (IV)
  • Operationalization:
    1. q37r
    2. q31
    3. q32
    4. q33
    5. q34
  • Scale:
    • all of them are interval
  • Statistics:
    • separate visualization, followed by multiple regression

Checking Assumptions of Multiple Regression

1. Linearity Check

# Scatterplot for each predictor against the outcome
ggplot(case31_clean, aes(x = q31, y = q37r)) + 
  geom_point() + 
  geom_smooth(col = "blue") +
  labs(title = "Linearity Check: q31 vs q37r")

# Scatterplot for each predictor against the outcome
ggplot(case31_clean, aes(x = q32, y = q37r)) + 
  geom_point() + 
  geom_smooth(col = "blue") +
  labs(title = "Linearity Check: q32 vs q37r")

# Scatterplot for each predictor against the outcome
ggplot(case31_clean, aes(x = q33, y = q37r)) + 
  geom_point() + 
  geom_smooth(col = "blue") +
  labs(title = "Linearity Check: q33 vs q37r")

# Scatterplot for each predictor against the outcome
ggplot(case31_clean, aes(x = q34, y = q37r)) + 
  geom_point() + 
  geom_smooth(col = "blue") +
  labs(title = "Linearity Check: q34 vs q37r")

  • visualize the relationship between a predictor and the response variable, accounting for the effects of other predictors in the model.
  • If the relationship between the predictor and the response is linear, the points in the plot should roughly follow a straight line.
library(car)
lm_model <- case31_clean |> 
  filter(!is.na(q37r)) |> 
  lm(q37r ~ q31 + q32 + q33 + q34, data = _)
crPlots(lm_model)

2. Independence Check

  • \(H_0:\) There is no first-order autocorrelation in the residuals (errors are independent).
  • \(H_1:\) There is evidence of positive or negative autocorrelation.
  • Test statistics (d)
    • \(d \approx 2\): No autocorrelation
    • \(d < 2\): Positive autocorrelation
    • \(d > 2\): Negative autocorrelation
library(lmtest)
dwtest(lm_model)  # Durbin-Watson test for autocorrelation

    Durbin-Watson test

data:  lm_model
DW = 2.2276, p-value = 0.8995
alternative hypothesis: true autocorrelation is greater than 0

3. Homoscedasticity Check

plot(lm_model, which = 1)  # Built-in residuals vs. fitted plot

  • \(H_0:\) The errors have constant variance (homoscedasticity).
  • \(H_1:\) The errors exhibit heteroscedasticity (i.e., non-constant variance).
library(lmtest)
bptest(lm_model)  # Breusch-Pagan test for heteroscedasticity

    studentized Breusch-Pagan test

data:  lm_model
BP = 4.5897, df = 4, p-value = 0.332

4. Normality of Residuals

residuals <- residuals(lm_model)
hist(residuals, breaks = 20, main = "Histogram of Residuals")

qqnorm(residuals)
qqline(residuals, col = "blue")

  • \(H_0:\) The data is normally distributed
  • \(H_1:\) The data is not normally distributed.
shapiro.test(residuals)  # Shapiro-Wilk test

    Shapiro-Wilk normality test

data:  residuals
W = 0.95406, p-value = 0.0004082

5. Multicollinearity Check

  • VIF > 5: Possible multicollinearity.
  • VIF > 10: Strong multicollinearity, consider dropping or combining predictors.
library(car)
vif(lm_model)  # VIF values
     q31      q32      q33      q34 
1.148240 1.396267 1.065648 1.214352 

6. Outliers and Influential Observations

plot(lm_model, which = 4)  # Built-in Cook's distance plot

# Leverage Plot
hatvalues <- hatvalues(lm_model)
plot(hatvalues, main = "Leverage Values", ylab = "Leverage")

# DFBETAS
dfbetas <- dfbeta(lm_model)
head(dfbetas)  # Shows influence of each observation
   (Intercept)           q31           q32          q33          q34
1  0.009403795 -0.0001015945  0.0001734095 -0.001583996 -0.001698192
2  0.018803471 -0.0047139437  0.0022874976 -0.002830564  0.001025494
3 -0.064176774  0.0078605827  0.0010204438  0.004270726  0.002766490
4  0.009403795 -0.0001015945  0.0001734095 -0.001583996 -0.001698192
5  0.049299034 -0.0005326053  0.0009090924 -0.008304040 -0.008902708
6 -0.059692744  0.0126198994 -0.0041741897  0.002834174  0.005853614

7. Cross-validation

Define control for cross-validation

library(caret)
set.seed(123)

train_control <- trainControl(method = "cv", number = 10)

case31_clean <- case31_clean |> 
  filter(!is.na(q37r)) |> 
  mutate(q37r = as.numeric(q37r))

Train the model

cv_model <- train(
  q37r ~ q31 + q32 + q33 + q34,
  data = case31_clean,
  method = "lm",
  trControl = train_control)

cv_model  # Check cross-validation results
Linear Regression 

121 samples
  4 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 108, 109, 109, 109, 109, 109, ... 
Resampling results:

  RMSE       Rsquared   MAE     
  0.6080421  0.3155103  0.483605

Tuning parameter 'intercept' was held constant at a value of TRUE
cv_model$resample
        RMSE   Rsquared       MAE Resample
1  0.3845295 0.61980439 0.3360649   Fold01
2  1.0946785 0.10531564 0.8239592   Fold02
3  0.4962708 0.03110875 0.4335815   Fold03
4  0.6498566 0.02311003 0.4974842   Fold04
5  0.4871213 0.46218519 0.3913416   Fold05
6  0.6067156 0.26438525 0.4945987   Fold06
7  0.5833334 0.60333089 0.4652271   Fold07
8  0.5420575 0.54103610 0.4160615   Fold08
9  0.4332916 0.44055235 0.3583005   Fold09
10 0.8025663 0.06427416 0.6194310   Fold10

Model Parameters

summary(cv_model) # summary function returns final model fit based on entire data

Call:
lm(formula = .outcome ~ ., data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.70193 -0.24939  0.00387  0.38004  1.29807 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.34721    0.48011   2.806 0.005884 ** 
q31          0.18835    0.08901   2.116 0.036472 *  
q32          0.30787    0.08302   3.709 0.000321 ***
q33          0.29420    0.08650   3.401 0.000921 ***
q34         -0.06829    0.07917  -0.863 0.390138    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6186 on 116 degrees of freedom
Multiple R-squared:  0.3006,    Adjusted R-squared:  0.2764 
F-statistic: 12.46 on 4 and 116 DF,  p-value: 1.823e-08

case31_clean |> 
  lm(q37r ~ q31 + q32 + q33 + q34, data = _)  |> 
  summary()

Call:
lm(formula = q37r ~ q31 + q32 + q33 + q34, data = case31_clean)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.70193 -0.24939  0.00387  0.38004  1.29807 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.34721    0.48011   2.806 0.005884 ** 
q31          0.18835    0.08901   2.116 0.036472 *  
q32          0.30787    0.08302   3.709 0.000321 ***
q33          0.29420    0.08650   3.401 0.000921 ***
q34         -0.06829    0.07917  -0.863 0.390138    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6186 on 116 degrees of freedom
Multiple R-squared:  0.3006,    Adjusted R-squared:  0.2764 
F-statistic: 12.46 on 4 and 116 DF,  p-value: 1.823e-08

Summary Table

Table 5: Regression Analysis
Factors that Affect Overall Efficiency
Variables Beta 95% CI1 p-value VIF1 q-value2
CU confidential 0.19 0.01, 0.36 0.036 1.1 0.049
CU is prompt 0.31 0.14, 0.47 <0.001 1.4 0.001
CU meets needs of members 0.29 0.12, 0.47 <0.001 1.1 0.002
CU loan applications simple -0.07 -0.23, 0.09 0.4 1.2 0.4
1 CI = Confidence Interval, VIF = Variance Inflation Factor
2 False discovery rate correction for multiple testing
Table 6: Regression Analysis with Customer Locatoin Controlled
Factors that Affect Overall Efficiency
Controlling for the location of residency
Variables Beta 95% CI1 p-value VIF1 q-value2
headquarter


1.1
    yes


    no 0.13 -0.10, 0.36 0.3
0.3
CU confidential 0.20 0.03, 0.38 0.025 1.2 0.041
CU is prompt 0.31 0.14, 0.47 <0.001 1.4 0.002
CU meets needs of members 0.29 0.12, 0.46 0.001 1.1 0.003
CU loan applications simple -0.08 -0.24, 0.08 0.3 1.2 0.3
1 CI = Confidence Interval, VIF = Variance Inflation Factor
2 False discovery rate correction for multiple testing

Insights about H5

  • Overall, the assumptions of multiple regression are supported.
    • Linearity assumption is okay in general.
    • Homosckedasticiy assumption is confirmed
    • Multicollinearity is not an issue.
    • Normality assumption is not supported.
  • As expected, confidentiality (p < .05), loan application process (p < .001) and adequacy of financial resources (p < .001) are positively related to the overall efficiency.
  • Simplicity and easiness of loan application is not significantly related to the overall efficiency (p > .10)

References

Sjoberg, Daniel,D., Karissa Whiting, Michael Curry, Jessica,A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13 (1): 570. https://doi.org/10.32614/rj-2021-053.