New England College Data Mining and Statistical Modeling Case Study Project
Description
Data for this can be found here:
https://www.epa.gov/compliance-and-fuel-economy-data/data-cars-used-testing-fuel-economy
Please use 2014-Present. You will have to merge the files yourself.
Definitions:
https://www.epa.gov/sites/default/files/2016-07/documents/test-car-list-definitions.pdf
You should begin the Final Case Analysis this week. It is due in Week 13, Sunday 11:59 PM EST.
Introduction:
Environmental Protection Agency (or EPA for short) is responsible for regulating the amount of pollutant emission from all automobiles that run on American roads. You are asked to analyze the data released by EPA for more than a decade, specifically for three time periods: 2010 12, 2014-16, and 2018 20. There are several objectives to this case analysis, one of which is to test and learn about the possible changes in the amount of pollutions emitted by vehicles overtime. You are also asked to analyze similarities between vehicles over the three time periods and empirically determine if certain vehicles became more (or less) polluting over the period of study.
You will analyze various aspects of vehicle induced pollution using R programing. You are expected to submit findings in a report format. The report must be at least 20 pages long with written description and explanation of your findings to the questions asked below. Make sure to run all code using R Markdown and create a formal report with your remarks, comments or explanations embedded within the document.
You are given nine years of individual EPA data in csv format. The data files are not very large (each file is approx. 1 MB) . Each yearly file contains thousands of vehicles along with their vital information and pollution testing records. Each file contains 42 columns, the details of which are given in the Data Dictionary document. Please note that the original data had more columns, and some of them were removed for the consistency purposes. The deleted columns also exist in the data dictionary and you are asked to ignore them while referring to the dictionary.
There are three sections to this case study : Merging and cleaning ( 20 points), Data Analysis ( 6 0 points), Visualization ( 20 points) totaling 100 points.
Please note that all code assignments must be submitted as a screenshot with a slice of your desktop showing the timestamp.
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."