Crate and Barrel
Description
We are going to use the datasets from the Kaggle Competition Otto Group Product Classification Challenge. You will need to register with Kaggle to get the data. Download both the training and test sets.
The following is the text from that page:
The Otto Group is one of the worldænbsp;biggest e-commerce companies, with subsidiaries in more than 20 countries, including Crate & Barrel (USA), Otto.de (Germany) and 3 Suisses (France). We are selling millions of products worldwide every day, with several thousand products being added to our product line.
A consistent analysis of the performance of our products is crucial. However, due to our diverse global infrastructure, many identical products get classified differently. Therefore, the quality of our product analysis depends heavily on the ability to accurately cluster similar products. The better the classification, the more insights we can generate about our product range.For this competition, we have provided a dataset with 93 features for more than 200,000 products. The objective is to build a predictive model which is able to distinguish between our main product categories. The winning models will be open sourced.For this competition, we have provided a dataset with 93 features for more than 200,000 products. The objective is to build a predictive model which is able to distinguish between our main product categories. The winning models will be open sourced.For this competition, we have provided a dataset with 93 features for more than 200,000 products. The objective is to build a predictive model which is able to distinguish between our main product categories. The winning models will be open sourced.
Submission
You will submit your solution to Kaggle using the Late Submission button. In your github repository place your code and your Kaggle results (an image is fine). If your code is a Python Jupyter Notebook make sure you have a writeup explaining your solution. If your code is just a source code file, have an additional readme file outlining and discussing your solution.
XP
XP is based on having a solution that works, how you place in Kaggle, and how you place relative to others in the class.
Unformatted Attachment Preview
Projects allow you to explore a dataset and experiment with the techniques covered in
the labs–can you apply what you learned on a new dataset? For all projects you are to
create a Jupyter Notebook containing both your code and your commentary.
The commentary is more than just comments on the code. When I normally code, I only
comment what is unclear, meaning I would never have comments like:
# import the pandas library
import pandas as pd
# read in the data file
bach = pd.read_csv(‘../data/bach.csv’)
# get the list of columns
columns = list(bach.columns)
and probably would comment something like
# convert the categorical columns to boolean
for note in notes:
bach[note] = bach[note].apply(lambda x: True if x == ‘YES’ else False)
bach
However, for these project commentaries I want the redundancy shown above. Say what
you are doing in markdown then do it in code. I think this is a good pedagogical
approach and will help you step logically through a problem.
A good starting point for what a notebook could look like is Nadin Tamer’s
notebook Titanic Survivor Predictions (beginner). She even gives a nice outline
explaining the steps she will take.
Notice that part of her notebook involves looking at the data. This is a crucial step. Are
there missing values? Are there categorical columns that need to be one-hot encoded?
The thought processes in your head should be recorded in the markdown sections of
the notebook.
Once you do all this preprocessing, you may want to just build a basic classifier and get
an initial accuracy measure on the test set. This just verifies that the data format is good
and gives you a baseline accuracy before you start investigating ways of making
improvements. Your work does not stop when you make the first classifier.
Projects are intentionally vague. I want you to engage your brain and not simply convert
a set of instructions I write, to code. Again, this is your chance to explore a dataset and
experiment with techniques from the labs.
Purchase answer to see full
attachment
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."