Spark Machine Learning Worksheet
Description
Unformatted Attachment Preview
Use Spark Machine Learning library or scikit-learn (sklearn) to complete this homework.
1. [1 pts] Find a dataset in kaggle or use any of the following datastes:
¨ttps://www.kaggle.com/datasets/dongeorge/seed-from-uci
¨ttps://www.kaggle.com/datasets/uciml/iris
2. [1 pt] Write a detailed description of the dataset.
3. [4 pt] Preprocess the dataset.
4. [8 pts] Using K-means algorithm to cluster the dataset.
5. [5 pts] Use the Elbow method and the Silhouette method to find the optimal K.
References:
¨ttps://www.youtube.com/watch?v=9SfO9Khjklk
¨ttps://www.youtube.com/watch?v=EItlUEPCIzM
¨ttps://www.youtube.com/watch?v=QXOkPvFM6NU&list=PLs8w1CdizvZGyT2Rt0i
eA0G6xGUqn3Xw&index=2
¨ttps://www.youtube.com/watch?v=d7NJGLevmwA
Deliverables:
ne pdf file which contains:
o [1 pt] A cover page which contains names, IDs, Title, and date of submission.
o The solution of each of the above questions.
? code
? output
o [-2 pts] If not tidy
Purchase answer to see full
attachment
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."