Answer Project
Description
You need to use data from https://www.kaggle.com/.
Is that possible to get the dataset from this information ‘Data Collection: Students will collect two sets of data from the real world. Set 1 will be collected from a large number of observations (at least 100) for a continuous random variable from a population that is suspected to be Normally distributed. Examples of such data include the body weight of adult males, the circumferences of oranges, the extension length of rubber bands at the point at which they burst, etc. Set 2 will be the inter-arrival time of a sequence of 100 or more events. First, record the actual clock time (to the nearest second, e.g. 2:43:18pm) of each of at least 100 consecutive events, such as the actual time that a customer enters the post office. Then, determine the interval between occurrences by taking the difference between successive event times. Consequently, Set 2 will comprise of at least 99 inter-arrival times. You may use ‘second’ as a unit of time.’? (Task 1)
Task 2
Chi-Square Goodness-of-Fit Test: Using a Chi-Square Goodness of Fit Test with a significance level of 0.05, test the hypothesis that Set 1 is sampled from a Normal Distribution with a population mean equal to the sample mean and a population standard deviation equal to the sample standard deviation. Similarly, test the hypothesis with a significance level of 0.05 that Set 2 is sampled from an Exponential Distribution with a population mean equal to the sample mean. For each test, start with the data classes from your histogram and merge them to ensure each class has a sufficient number of observations. Then, for each data class, calculate the following:
- Numbers of observations in the data.
- Class probability.
- Class expected value.
- Chi-square component values.
Unformatted Attachment Preview
Fall 2022
Due: Friday, Nov 25th by 11:59pm via Canvas
Project submission is individual and should not be shared with other classmates. Any form of
copying and pasting from other sources and projects will be reported to the UT Arlington
Office of Student Conduct.
Aim: The overall aim of these projects is to analyze real-world data. The specific objectives are:
1.
2.
3.
4.
To sample two sets of data from the real-world.
To summarize each set of data statistically.
To perform statistical chi-square tests on each set of data.
To describe the above steps, data, and results in a report.
On the cover of each Project Part report, please transcribe the following statement:
`_________________ did not give or receive any assistance on this project, and
the report submitted is wholly my own.rite your name in the blank and sign below it. You may use an electronic signature, such as
Adobe Sign.
Tasks for Part 1
Data Collection: Students will collect two sets of data from the real world. Set 1 will be
collected from a large number of observations (at least 100) for a continuous random variable
from a population that is suspected to be Normally distributed. Examples of such data include the
body weight of adult males, the circumferences of oranges, the extension length of rubber bands
at the point at which they burst, etc. Set 2 will be the inter-arrival time of a sequence of 100 or
more events. First, record the actual clock time (to the nearest second, e.g. 2:43:18pm) of each
of at least 100 consecutive events, such as the actual time that a customer enters the post office.
Then, determine the interval between occurrences by taking the difference between successive
event times. Consequently, Set 2 will comprise of at least 99 inter-arrival times. You may use
‘second’ as a unit of time.
Descriptive Statistics: For both Sets 1 and 2, use software to do the following:
alculate the sample mean and sample standard deviation.
alculate the quartiles Q1, Q2, and Q3.
onstruct a box-and-whisker plot.
onstruct a frequency table.
onstruct a frequency histogram.
Report: The project report is to be typewritten in clear English with complete sentences. Be sure
to define all notations and include descriptions of all tables and figures in the text. To improve
your writing, you should consider taking your report to the UTA Writing Center. Your report
should include a cover page, the following sections, and two appendices:
I. Data. Describe the data collection process for Sets 1 and 2 with enough detail that the reader
could replicate the process. Appendices I and II should include tables of your raw data for
Sets 1 and 2, respectively. The raw data for Set 2 should consist of the recorded actual clock
times.
II. Descriptive Statistics: Include and explain your descriptive statistics analysis. Interpret the
results of the analysis using your data application topic. Does Set 1 appear to follow a
Normal Distribution? Does Set 2 appear to follow an Exponential Distribution?
Tasks for Part II
Chi-Square Goodness-of-Fit Test: Using a Chi-Square Goodness of Fit Test with a significance
level of 0.05, test the hypothesis that Set 1 is sampled from a Normal Distribution with a
population mean equal to the sample mean and a population standard deviation equal to the
sample standard deviation. Similarly, test the hypothesis with a significance level of 0.05 that Set
2 is sampled from an Exponential Distribution with a population mean equal to the sample mean.
For each test, start with the data classes from your histogram and merge them to ensure each
class has a sufficient number of observations. Then, for each data class, calculate the following:
umbers of observations in the data.
lass probability.
lass expected value.
hi-square component values.
Finally, for each test, calculate the chi-square value, describe the degrees of freedom, and explain
your conclusion.
EXAMPLE SETUP
Class
X?2
2 12
Total
Observed
Frequency (oi)
Count
observations
based on your
collected data.
n
Class Probability
Calculate using
the assumed
probability
distribution.
1.0
Expected
Frequency (ei)
For each class,
take its
probability and
multiply by n.
n
?2 Class
Component
(oi ? ei ) 2
ei
?2 statistic
Report: The project report is to be typewritten in clear English with complete sentences. Be sure
to define all notations and include descriptions of all tables and figures in the text. To improve
your writing, you should consider taking your report to the UTA Writing Center. Your report
should include a cover page and the following additional section:
Goodness-of-Fit Tests: Describe the chi-square tests with tables for the calculated values and
clearly stated conclusions. Show the Excel formulas for your table calculations in an Appendix.
Sample Format
PROJECT PART I (INTER-ARRIVAL TIMES)
RAW DATA
Data Values: Arrival time of customers at a Bank
Interval (Inter-arrival time): Time difference between
arrival times oftwo Successive Customers (in Minutes or
Seconds)
Data Set:
Time
Interval
.
.
.
.
Mean:
Standard Deviation:
Time
Interval
Time
Interval
Frequency Table
Histogram
BOX PLOT Chart
Quartile Values of X :
Q1:
Q2:
Q3:
(SAMPLE FORMAT)
PROJECT PART II
Expended Frequency Table:
INTERVAL
OBSERVED
FREQUENCY
CLASS
PROBABILITY
The class probabilities are:
Sample Mean: and Sample Standard Deviation:
FORMULAS TO CALCULATE P(X
Purchase answer to see full
attachment
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."