Week 3 Labs
Description
Lab: Statistical Analysis
On the first sheet which is labeled DATA, of the file StateCrimeDataSet.xlsx are urban crime data for each of the 50 states in columns named URBAN (percent living in urban areas), MURDER (murders per million residents), ROBBERY (robberies per million residents), POLICE (police employees per 100,000 population), and REGION.
You will work with these data and find variance, standard deviation, skewness, kurtosis, and create a frequency distribution for univariate (single variable) statistics. You will then find the correlation coefficient and create a scattergram for bivariate (two variable) data.
Variance measures how far a set of numbers is spread out. A variance of zero indicates that all the values are identical. Variance is always non-negative: a small variance indicates that the data points tend to be very close to the mean and hence to each other, while a high variance indicates that the data points are very spread out around the mean and from each other.
Standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values] A low standard deviation indicates that the data points tend to be very close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values. Standard deviation is the square root of the variance.
Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined.
Kurtosis is any measure of the “peakedness” of the probability distribution of a real-valued random variable. In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution.
A Correlation Coefficient is a measure of the linear dependence between two variables X and Y, giving a value between +1 and ?1 inclusive, where 1 is total positive correlation, 0 is no correlation, and ?1 is total negative correlation.
Variance and Standard Deviation
Copy the data in A1:B52 (STATE and MURDER) onto the same cells on Sheet2.
Label the sheet UNIVARIATE.
Calculate the variance for MURDER by breaking down the formula where (Xbar) is the mean, (Xi) is each of the samples, and is the number of samples. Enter the following labels in the labeled cells.
A54
N =
A55
Mean =
A56
Variance=
A57
Std Dev=
C54
Sums:
C2
(Xi-Xbar)
D2
(Xi-Xbar)^2
Into the indicated cells, enter the following formulas.
B54
=count(b3: b52)
B55
=sum(b3: b52)/b54
Enter the following formula into C3.
C3
=b3-$b$55
Copy the formula in cell C3 into cells C4 to C52.
Enter the following formula.
D3
=c3^2
Copy the formula in cell D3 into cells D4 to D52.
Enter the following formula.
D54
=sum(d3:d52)
This is the sum of the squared deviations, or the numerator for the variance equation in step 3 above. (Note: Steps 4 to 7 could also be accomplished with the function:
D55
=sumproduct(c3:c52,c3:c52)
Do the two cells contain the same values?
Enter the following formula to calculate the variance for the variable MURDER.
B56
=d54/(b54-1)
The standard deviation is simply the square root of the variance which can be calculated with the following formula.
B57
=sqrt(b56)
Skewness and Kurtosis
Enter the following labels into the cells indicated below
A58
Skewness=
A59
Kurtosis=
E2
(Xi-Xbar)^3
F2
(Xi-Xbar)^4
Enter the following formula.
E3
=c3^3
Copy the formula in cell E3 into cells E4 to E52.
Enter the following formula.
F3
=c3^4
Copy the formula in cell F3 into cells F4 to F52.
Enter the following formula.
E54
=sum(e3:e52)
F54
=sum(f3:f52)
Enter the following formulas.
B58
=(b54/((b54-1)*(b54-2)))*(e54/(b57^3))
B59
=(((b54*(b54+1))/((b54-1)*(b54-2)*(b54-3)))*(f54/(b57^4)))-((3*(b54-1)*(b54-1))/((b54-2)*(b54-3)))
Descriptive Statistics Shortcut:
Excel has built in functions to accomplish much of what was done in the previous section.
Enter the following labels.
A63
Shortcuts:
A64
Min. =
A65
Max. =
A66
Mode =
A67
Median =
A68
Mean =
A69
Variance=
A70
Std Dev.=
A71
Skewness=
A72
Kurtosis=
Enter the following formulas.
B64
=min(b3:b52)
B65
=max(b3:b52)
B66
=mode(b3:b52)
B67
=median(b3:b52)
B68
=average(b3:b52)
B69
=var.s(b3:b52)
B70
=stdev.s(b3:b52)
B71
=skew(b3:b52)
B72
=kurt(b3:b52)
Do these values agree with the values found earlier?
Frequency Distribution
Create a frequency distribution for the variable MURDER. Enter the following into the cells indicated below.
H3
50
H4
100
H5
150
H6
200
I3
0-50
I4
51-100
I5
101-150
I6
151-200
I7
Total
J2
Frequency
To create a frequency distribution you must enter an array formula (a formula that is entered into an array of cells). The intervals for the distribution are in cells I3 to I6. The upper value in the range for each interval is listed in cells H3 to H6. the format of the frequency function is: =FREQUENCY(data_range,interval_range)
To enter an array formula, highlight the cells into which the frequency distribution will be located. In this case highlight cells J3 to J6. Type the following formula (but do not press ENTER when you finish).
=frequency(b3:b52,h3:h6)
After you finish typing in the formula press the following three keys simultaneously: CONTROL, SHIFT, and ENTER. The frequency for each interval should appear in the cells J3 to J6.
To complete the frequency distribution enter the following formula.
J7
=sum(j3:j6)
Bivariate Statistics
Return to the DATA sheet and copy cells A1:A52 into cells A1:A52 of Sheet3
Copy cells D1:E52 from DATA and paste this data for the variables B1:C52 on Sheet3.
Label Sheet3 as BIVARIATE.
Calculate a Pearsonænbsp;product moment correlation coefficient () for the variables ROBBERY and POLICE by breaking down the formula for Pearsonænbsp;which is
Enter the following labels into the indicated cells
A54
N =
A55
Mean =
A56
Std Dev =
A57
Covar XY=
A58
Var X=
A59
Var Y=
A60
Pearson r=
D2
(Xi-Xbar)
E2
(Yi-Ybar)
F2
(Yi-Ybar)(Xi-Xbar)
G2
(Xi-Xbar)^2
H2
(Yi-Ybar)^2
Enter the following formulas.
B54
=count(b3:b52)
B55
=average(b3:b52)
B56
=stdev.s(b3:b52)
C54
=count(c3:c52)
C55
=average(c3:c52)
C56
=stdev.s(c3:c52)
In cell D3 enter the formula
D3
=b3-$b$55
Copy the formula in cell D3 to cells D4 to D52.
Enter the following formula into E3
E3
=c3-$c$55
Copy the formula in cell E3 to cells E4 to E52.
Enter the following formula into F3.
F3
=d3*e3
Copy the formula in cell F3 to cells F4 to F52.
Enter the following formula into G3.
G3
=d3^2
Copy the formula in cell G3 to cells G4 to G52.
Enter the following formula.
H3
=e3^2
Copy the formula in cell H3 to cells H4 to H52.
Enter the following formulas.
B57
=sum(f3:f52)/(b54-1)
B58
=sum(g3:g52)/(b54-1)
B59
=sum(h3:h54)/(b54-1)
B60
=b57/sqrt(b58*b59)
Question 2:A much simpler method of calculating this correlation coefficient is to use the PEARSON function in Excel. Enter the following formula into cell B61
B61
=pearson(b3:b52,c3:c52)
Does the value is B61 agree with the value in B60?
Scatterplot
Create a scatterplot for ROBBERY (as Y) and POLICE (as X). Open the INSERT menu and select the CHART option.
Select the data to plot B2:C52
Select a Scatter chart
Format the chart using the Format tool and set: Chart title: Scattergram of Robbery and Police; X-Axis title: Police; Y-Axis title: Robbery
Turn off the Legend
Once completed please upload your completed excel spreadsheet.
Unformatted Attachment Preview
STATE
MURDER URBAN
POLICE
ROBBERY REGION
AL
132
60
230
1321
3
AK
97
64.5
350
900
4
AZ
103
83.8
310
1936
4
AR
92
51.6
190
809
3
CA
143
91.3
270
3842
4
CO
69
80.6
270
1601
4
CT
47
78.8
260
2180
1
DE
69
70.7
280
1370
3
FL
145
84.3
300
3555
3
GA
138
62.3
240
1976
3
HI
87
86.5
290
1902
4
ID
31
54
240
468
4
IL
106
83
320
2170
2
IN
89
64.2
210
1414
2
IA
22
58.6
200
549
2
KS
69
66.7
230
1131
2
KY
88
50.8
200
952
3
LA
157
68.6
290
1970
3
ME
28
47.5
200
308
1
MD
95
80.3
310
3927
3
MA
41
83.8
290
2355
1
MI
102
70.7
250
2440
2
MN
26
66.8
190
991
2
MS
145
47.3
200
810
3
MO
111
68.1
280
2236
2
MT
40
52.9
240
340
4
NE
44
62.7
220
822
2
NV
20
85.3
360
4606
4
NH
25
52.2
240
420
1
NJ
69
89
350
3037
1
NM
131
72.2
280
1279
4
NY
127
84.6
370
6413
1
NC
106
48
220
823
3
ND
12
48.8
180
77
2
OH
81
73.3
210
2237
2
OK
51
67.3
230
1049
3
OR
51
67.9
240
1524
4
PA
68
69.3
240
1779
1
RI
44
87
280
1186
1
SC
114
54.1
240
1181
3
SD
7
46.4
200
201
2
TN
108
60.4
210
1806
3
TX
169
79.6
240
2085
3
UT
38
84.4
240
802
4
VT
22
33.8
200
389
1
VA
WA
WV
WI
WY
86
55
71
29
62
66
73.6
36.2
64.2
62.8
230
210
180
240
310
1201
1351
485
707
444
3
4
3
2
4
Purchase answer to see full
attachment
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."