Introduction
The role of LendingClub as a go-to organization by individuals as well as organizations to access loans makes it essential for the organization to collect activity history. Through collected data, interested stakeholders can examine the data and make their desired conclusion. As a data analyst, data validation is useful, allowing the gauging of its accuracy and helpfulness. Data analysis, which first relies on gathered data, is a key tool helpful to organizations in examining problems and visualizing relationships, all vital in business decision-making (Bowerman, Murphree & O’Connell, 2015).
Q1. Do your numbers match the numbers provided by LendingClub? What explains the discrepancy, if any?
Delegate your assignment to our experts and they will do the rest.
In the excel summary, upon selecting the column “loan_amnt”, the obtained summary values vary from those provided by LendingClub, i.e., the value comes to 235,630 on approved loans. This varies from the LendingClub summary of 235,629. However, on funded loans, the excel summary based on “Count” function aligns with that given by LendingClub, i.e., $3,503,840,175. The figure below shows excel summary
The discrepancy in the total lies on how the “Count” tool operates, which primarily is counting cell numbers selected that are not empty. The “Count” hence includes all selected items, including the word “loan_amnt”. As an accountant, financial analyst, or interested stakeholder, the utilization of this summary, without proper evaluation of the results risks causing poor conclusions. In data analysis, validity and reliability are highly crucial, which makes errors within the dataset risky to the conclusions made (Bowerman, Murphree & O’Connell, 2015). The reliance on the “Count” tool alone is risky, as it creates too much room for errors, e.g., an entry of a letter within the cell is also counted and might give poor outcomes.
Q2. Does the Numerical Count provide a more useful/accurate value for validating your data? Why or why not do you think that is the case?
In applying the numerical count, the excel obtained value tallied with the values presented by LendingClub. Compared to “Count,” the utilization of the “Numerical count” tool in Excel specifically chooses only numerical values. Hence, this becomes a better tool, compared to the adoption of “Count,” that incorporates any item contained in the cells. The figure below shows the excel summary
In accounting and data analysis, data, and results accuracy are pivotal, as they affect the subsequent decision-making process on organizations (Zhang, Yang & Appelbaum, 2015). Thus, as a financial analyst or accountant, using the right data item helps present the right conclusion, that is fact-based.
Q3. What other summary values might be useful for validating your data?
Funded Amount vs. Loan amount:
The first summary values applicable to validate LendingClub data is the “funded amount.” As a comparison column, any difference seen across the column can pinpoint errors within the dataset. In the collected LendingClub data, the values tally, i.e., $3,503,840,175.
Measures of Central Tendency Values
The other crucial summary values applicable to the LendingClub dataset in its validation includes items within measures of central tendencies. That is, with the likelihood of making errors, computing summaries depicting either the mode, the median as well as the datasets mean loan amounts help offer better and a wider pool of data, applicable in the validation process. For example, using excel, table 1 below is generated, depicting various descriptive statistics.
Table 1: Loan Amount Descriptive Statistics
loan_amnt |
funded_amnt |
|||
Mean |
14870.15679 |
Mean |
14870.15679 |
|
Standard Error |
17.38367235 |
Standard Error |
17.38367235 |
|
Median |
13000 |
Median |
13000 |
|
Mode |
10000 |
Mode |
10000 |
|
Standard Deviation |
8438.318193 |
Standard Deviation |
8438.318193 |
|
Sample Variance |
71205213.93 |
Sample Variance |
71205213.93 |
|
Kurtosis |
-0.241793918 |
Kurtosis |
-0.241793918 |
|
Skewness |
0.700520304 |
Skewness |
0.700520304 |
|
Range |
34000 |
Range |
34000 |
|
Minimum |
1000 |
Minimum |
1000 |
|
Maximum |
35000 |
Maximum |
35000 |
|
Sum |
3503840175 |
Sum |
3503840175 |
|
Count |
235629 |
Count |
235629 |
As an organization, LendingClub can provide additional information, e.g., mean loans and their range, i.e., lowest and maximum, which can be compared by other outside stakeholders during the validation process. As explained by Cockcroft and Russell (2018), data analytics is pivotal in finance, especially in undertaking risk and predictive analysis. This makes the proper data management and provision of valid or quality data fundamental to LendingClub. Hence, providing data on items like average loans, and median allocated values can help provide more data for comparison in the validation process. Adding more comparable values is necessary.
Conclusion
The application of excel techniques as one of the tools in data validation is essential to the LendingClub dataset. Data cleaning, part of which entails validating forms a crucial phase, that will impact on the quality of subsequent conclusions made from collected and analyzed data. In the LendingClub case, the given validation summaries (i.e., funded loans and approved loans) tallied with excel outcomes in adopting the “Numerical count” function, but not in use “Count.” As accountants and other financial analysts, efficient validation is vital in data analytics, helpful to businesses get vital insights in collected financial data towards better efficiencies.
References
Bowerman, B. L., Murphree, E. S., & O’Connell, R. T. (2015). Essentials of business statistics . New York: McGraw-Hill/Irwin.
Cockcroft, S., & Russell, M. (2018). Big data opportunities for accounting and finance practice and research. Australian Accounting Review , 28 (3), 323-333.
LendingClub (2020). Loan Data. Retrieved from https://www.lendingclub.com/info/download-data.action
Zhang, J., Yang, X., & Appelbaum, D. (2015). Toward effective Big Data analysis in continuous auditing. Accounting Horizons , 29 (2), 469-476.