Kaggle Competition Using Regression Models in R Worksheet
Description
Description
What makes some songs become popular? The dataset describes popular songs based on auditory features such as loudness and tempo.
Goal
Construct a model using a dataset of popular songs to predict ratings based on auditory features of the songs included in scoringData.csv. (You may use linear regression, logistic regression, feature selection e.g. Lasso, decision tree or advance tree)
Metric
Submissions will be evaluated based on RMSE (root mean squared error). Lower the RMSE, better the model.
Submission File
The submission file should be in text format (.csv) with only two columns, id and rating. The rating column must contain predicted rating. The number of decimal places to use is up to you. The file should contain a header and have the following format:
"id","rating"
50400,37.3065
96747,37.1732
1824,36.9784
67597,36.9780
86944,36.8176
85423,37.0173
An example submission file (example_submission.csv) is shared with the set of files under Data.
Sample Code
Here is an illustration in R of how you can create a model, apply it to scoringData.csv to prepare a submission file.
# ensure analysisData.csv and scoringData.csv are in your working directory
# following code will read data and construct a simple model
songs = read.csv('analysisData.csv')
model = lm(rating~ tempo+time_signature,songs)
# read in scoring data and apply model to generate predictions
scoringData = read.csv('scoringData.csv')
pred = predict(model,newdata=scoringData)
# construct submission from predictions
submissionFile = data.frame(id = scoringData$id, rating = pred)
write.csv(submissionFile, 'sample_submission.csv',row.names = F)
* Disclaimer: This data is to be used solely for the purpose of the Project for this course. It is not recommended for any use outside of this competition.
Submission Count:
By the end of the competition, you must have at least 3 submissions. At least one must use a forest ranger model.
Attached File Descriptions:
- analysisData.csv: Data for building a model
- scoringData.csv: Use for applying predictions or scoring
- example_submission.csv: Sample submission file in the desired format
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."