Locations clustering proof-of-concept test task.
Given that we have a dataset which has around 60k of location addresses, including city, road, house_number, postcode and state: Download the Locations dataset
The candidate is expected to create a solution - a proof of concept, using Python, to cluster (or simply group) those addresses where they belong to the same place (same address). Candidate is free to implement any machine learning model, or hand-crafting solutions to show how the candidate would approach to solve this problem.
Instructions:
- Fork this repo or create a new public GitHub repo for the solution.
- The solution will open/read the location data CSV file and write the grouped/clustered result in an output CSV file.
- There's no restriction on how the candidate would choose to approach the problem.
Please email to: long@shake.io if you have any questions.
Good luck!