Florida International University R Worksheet

Description

As with all submissions, this needs to be submitted as an HTML file, and not as a .RMD file.

Part 1 – Decision Trees

Use the “Titanic Download Titanic” dataset for these questions. Data is from: https://www.kaggle.com/c/titanic/data?select=train.csv Links to an external site.. Metadata is available below:

Data Dictionary

VariableDefinitionKeysurvivalSurvival0 = No, 1 = YespclassTicket class1 = 1st, 2 = 2nd, 3 = 3rdsexSexAgeAge in yearssibsp# of siblings / spouses aboard the Titanicparch# of parents / children aboard the TitanicticketTicket numberfarePassenger farecabinCabin numberembarkedPort of EmbarkationC = Cherbourg, Q = Queenstown, S = Southampton

Question #1

Produce one super awesome visual with this dataset. Explain what this visual shows in 1-2 sentences. Your visual must include a caption and subtitle, in addition to the standard labels.

Question #2

Create a model that can be used to predict õrvival®bsp;of a passenger based on attributes. This is a classification activity. 0 = Did not survive, 1 = Survived

Convert your 0 and 1 to a factor data type to facilitate the creation of a classification model. You probably want to change this to say “Survived” and “Did Not Survive” so that it is easier to read.

Drop the Name, PassengerID, and Ticket columns before building the model (otherwise your model creation will either crash your computer or produce something that doesn’t make sense).

Question #3

Produce a confusion matrix. What does it tell you? 3-5 sentences.

Part 2 – Logistic Regression

Use the same titanic data-set from decision trees for this part.

1. Build a logistic regression model to predict survival for the titanic data set.

2. Based on your logistic regression model, which variables do you think are most important for survival?

3. Produce a confusion matrix and explain your findings related to your model. 2-3 sentences (but it’s always OK if you go over)

Part 3 – Clustering

Data = Mall_Customers.csvDownload Mall_Customers.csv

Metadata = https://www.kaggle.com/shwetabh123/mall-customers

Question #1

Conduct basic exploratory data analysis with the Mall_Customers.csv data set. Create 3 graphs of your choosing. For each, provide a 1-2 sentence summary of what you see.

Question #2

Create clusters that look at both the annual income and spending score (your clustering should only look at these two columns).

Create an elbow plot and write a brief interpretation of 2-3 sentences for it. The explanation should have to do with why you chose a certain value of k.

Make a recommendation for the correct amount of clusters that should be used for this data set.

Question #3

Plot your best k-means model as a scatter plot with the centroids displayed. Refer to the notes that I provided on this to see how to do it with ggplot2.

Question #4

Write 2-3 sentences explaining what can be done with this new insight if you were in charge of the marketing and sales operation of the mall.

Part 4: 0.10 points of extra credit (Principal Component Analysis)

Data: Download pokemon-3.csv pokemon-3-1.csv Download pokemon-3-1.csv

Scenario: you¥ been hired by GameFreak, the makers of the Pokemon games, and they want you to simplify the stats for Pokemon so that younger customers do not need to worry about things like ttack ðeed ðecial Defense and ðecial Attack¼br>

GameFreak wants to turn those 5 columns into one stat that captures the majority of the patterns of those 5 columns. As a Business Analyst, you can help them with this using PCA ®bsp;a dimension reduction technique.

1 attachments

Slide 1 of 1

attachment_1

attachment_1

Explanation & Answer:

2 files

User generated content is uploaded by users for the purposes of learning and should be used following Studypool’s honor code & terms of service.



^{Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."}