30 Nov 2022

88

Data Sets - The Best Data Sets for Machine Learning

Format: MLA

Academic level: High School

Paper type: Coursework

Words: 791

Pages: 1

Downloads: 0

Import the file entitled 101 Dalmations. This is a data set that indicates how friendly people think their dog is on a scale from 1 to 5. Once you have done this, find the mean, median, mode, and normality for the set. Paste the output file below. Interpret the Shapiro-Wilk p-value and indicate whether we can consider the data set to be normal. 

Statistics 

FriendlinessOfDogs 

Valid 

101 

Missing 

28 

Mean 

3.5043 

Median 

3.5200 

Mode 

3.55 

It’s time to jumpstart your paper!

Delegate your assignment to our experts and they will do the rest.

Get custom essay

Case Processing Summary 

 

Cases 

Valid 

Missing 

Total 

Percent 

Percent 

Percent 

FriendlinessOfDogs 

101 

78.3% 

28 

21.7% 

129 

100.0% 

Test for normality 

Descriptives 

 

Statistic 

Std. Error 

FriendlinessOfDogs 

Mean 

3.5043 

.05351 

95% Confidence Interval for Mean 

Lower Bound 

3.3981 

 

Upper Bound 

3.6104 

 

5% Trimmed Mean 

3.5101 

 

Median 

3.5200 

 

Variance 

.289 

 

Std. Deviation 

.53781 

 

Minimum 

1.00 

 

Maximum 

5.00 

 

Range 

4.00 

 

Interquartile Range 

.68 

 

Skewness 

-.714 

.240 

Kurtosis 

4.236 

.476 

Tests of Normality 

 

Kolmogorov-Smirnov a 

Shapiro-Wilk 

Statistic 

df 

Sig. 

Statistic 

Df 

Sig. 

FriendlinessOfDogs 

.085 

101 

.069 

.945 

101 

.000 

a. Lilliefors Significance Correction 

 The data is normally distributed 

 The data is not normally distributed 

In Shapiro-Wilk statistics we reject the null hypothesis if p-value or the significant value is less than 0.05. Since the p-value is less than 0.05 we reject the null hypothesis and conclude that the data is not normally distributed. Hence, we cannot consider the data set to be normal. 

Question 2 

Now, plot a histogram for the dog data. Put a copy below and also identify whether the graph 1) looks normal, and 2) whether you suspect that there might be outliers 

FriendlinessOfDogs 

A visual inspection of the histogram indicates that the friendliness of dogs were not approximately normally distributed, with skewness of -0.714(SE = 0.240) and kurtosis of 4.236(SE = 0.476). there are outliers in the histogram, that is, the two points on the far left on the x-axis position 1.00 and 2.00 and the point on the far right at the x-axis position 5.00. this is because the points falls more than 1.5 times the interquartile range above the third quartile or below the first quartile. 

Question 3 

Now, create a boxplot for the data and paste it below. If you spot any outliers, list them. 

The outliers on a box plot are usually indicated with an asterisk and hence the box plot has outliers at point 99. 

Question 4 

Now, open the “Weights of Cats” file in jamovi. This is a file listing the weights of 201 cats in pounds. Provide the output of the descriptives and tell us the mean, median, mode, standard deviation, and normality. Offer a written interpretation of what you see. Then, provide a frequency histogram and boxplot. Are there outliers? If so, what are they? (Are there “fat cats” or “scrawny cats”? If so, how many and what do they weigh?) 

Descriptives 

       
 

WeightsOfCats 

 

100 

 

Missing 

 

 

Mean 

 

7.44 

 

Median 

 

7.35 

 

Mode 

 

7.14 

ᵃ 

Standard deviation 

 

1.05 

 

Minimum 

 

6.38 

 

Maximum 

 

17.0 

 

Skewness 

 

7.80 

 

Std. error skewness 

 

0.241 

 

Kurtosis 

 

71.6 

 

Std. error kurtosis 

 

0.478 

 

Shapiro-Wilk p 

 

< .001 

 

ᵃ More than one mode exists, only the first is reported 

From the frequency table the mean is 7.44 and the standard deviation of the weight of the cats is 1.05. The standard deviation indicates that if the average weight of the cats is approximately 7.44 then then most cats in our sample are between 8.49 and 6.39. The standard deviation above and below the mean gives us where most cats tended to be. The data set for the weights of the cats is not normally distributed with skewness of 7.80(SE = 0.241) and kurtosis of 71.6(SE = 0.478). Using the Shapiro-Wilt to explain normality, given the null hypothesis as the data is normally distributed, 

  The data is normally distributed 

  The data is not normally distributed 

 

  The Shapiro-Wilt p is less than 0.05, therefore, we reject the null hypothesis and conclude that the data is not normally distributed. 

Plots 

Histogram 

Weight of cats 

A visual inspection of the histogram shows that there are outliers. That is, the point that is far right of the graph. This is because the point falls more than 1.5 times the interquartile range above the third quartile or below the first quartile of the range we expect. 

Box plot 

Inspecting the box plot, there is an indication of the outliers. The outlier is the point that is located above the scale point 15. 

Question 5 

Now, open the “Dog Drool” file in jamovi. This is a file listing the amount dogs drool when they see a bone. Provide the output of the descriptives and tell us the mean, median, mode, standard deviation, and normality. Offer a written interpretation of what you see. Then, provide a frequency histogram and boxplot. Are there outliers? If so, what are they? (Which dogs are too slobbery? Which ones have “drymouth”?) 

Descriptives 

       
 

DogDrool 

 

95 

 

Missing 

 

 

Mean 

 

18.8 

 

Median 

 

17.9 

 

Mode 

 

14.3 

ᵃ 

Standard deviation 

 

4.29 

 

Minimum 

 

7.82 

 

Maximum 

 

28.0 

 

Skewness 

 

0.187 

 

Std. error skewness 

 

0.247 

 

Kurtosis 

 

-0.311 

 

Std. error kurtosis 

 

0.490 

 

Shapiro-Wilk p 

 

0.081 

 

ᵃ More than one mode exists, only the first is reported 

 

  The mean of the dog’s drool when they see above is 18.8 and the standard deviation is 4.29. The standard deviation indicates that if the average number of the dog’s drool is approximately 19 then the dog’s drool is between 23 and 15. The standard deviation above and below the mean indicates where the sample of the dog’s drool lies. The dog’s drool data set is approximately normally distributed with skewness of 0.187(SE = 0.247) and kurtosis of -0.3111(SE = 0.490). Given the null hypothesis as the data is normally distributed, 

  The data is normally distributed 

  The data is not normally distributed 

the Shapiro-Wilk = 0.081 which is greater than 0.05 hence we fail to reject the null hypothesis and conclude that the data is normally distributed. 

DogDrool 

Histogram 

The histogram indicates the presence of outliers in the far left of the graph. 

Box plot 

The boxplot does not indicate presence of outliers. 

Illustration
Cite this page

Select style:

Reference

StudyBounty. (2023, September 14). Data Sets - The Best Data Sets for Machine Learning.
https://studybounty.com/data-sets-the-best-data-sets-for-machine-learning-coursework

illustration

Related essays

We post free essay examples for college on a regular basis. Stay in the know!

17 Sep 2023
Statistics

Scatter Diagram: How to Create a Scatter Plot in Excel

Trends in statistical data are interpreted using scatter diagrams. A scatter diagram presents each data point in two coordinates. The first point of data representation is done in correlation to the x-axis while the...

Words: 317

Pages: 2

Views: 186

17 Sep 2023
Statistics

Calculating and Reporting Healthcare Statistics

10\. The denominator is usually calculated using the formula: No. of available beds x No. of days 50 bed x 1 day =50 11\. Percentage Occupancy is calculated as: = =86.0% 12\. Percentage Occupancy is calculated...

Words: 133

Pages: 1

Views: 150

17 Sep 2023
Statistics

Survival Rate for COVID-19 Patients: A Comparative Analysis

Null: There is no difference in the survival rate of COVID-19 patients in tropical countries compared to temperate countries. Alternative: There is a difference in the survival rate of COVID-19 patients in tropical...

Words: 255

Pages: 1

Views: 250

17 Sep 2023
Statistics

5 Types of Regression Models You Should Know

Theobald et al. (2019) explore the appropriateness of various types of regression models. Despite the importance of regression in testing hypotheses, the authors were concerned that linear regression is used without...

Words: 543

Pages: 2

Views: 174

17 Sep 2023
Statistics

The Motion Picture Industry - A Comprehensive Overview

The motion picture industry is among some of the best performing industries in the country. Having over fifty major films produced each year with different performances, it is necessary to determine the success of a...

Words: 464

Pages: 2

Views: 85

17 Sep 2023
Statistics

Spearman's Rank Correlation Coefficient (Spearman's Rho)

The Spearman’s rank coefficient, sometimes called Spearman’s rho is widely used in statistics. It is a nonparametric concept used to measure statistical dependence between two variables. It employs the use of a...

Words: 590

Pages: 2

Views: 308

illustration

Running out of time?

Entrust your assignment to proficient writers and receive TOP-quality paper before the deadline is over.

Illustration