0% found this document useful (0 votes)
16 views5 pages

Ms 2

The document outlines the IPC 144 Project - Milestone 2, detailing three problems related to Olympic medal data analysis. Each problem includes a description, example parameters, expected results, and the methodology used to derive the solutions. Additionally, it lists common tasks across the problems, data storage declarations, and function prototypes needed for the program.

Uploaded by

seboxe1725
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views5 pages

Ms 2

The document outlines the IPC 144 Project - Milestone 2, detailing three problems related to Olympic medal data analysis. Each problem includes a description, example parameters, expected results, and the methodology used to derive the solutions. Additionally, it lists common tasks across the problems, data storage declarations, and function prototypes needed for the program.

Uploaded by

seboxe1725
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

IPC 144 Project - Milestone 2

Name: Bansi Prajapati


Student number: 162011233
Seneca Username: bhprajapati

Project Problems:

1. Get your problems by logging onto matrix (if you did not do MS 1)
2. Then run the program: ~catherine.leung/getproject
Note that this is completely individualized to you. Your classmates will
have a different set. Your profs will check to make sure you are doing
the set of problems assigned to you

Copy and paste the output of getproject here:

Look at the files provided to you in the olympic.zip file (highly recommend
opening the files in Excel and answer the following questions. See milestone
1 specs for clarification.

Problem 1 (Easy level):

a) What is the problem you are doing for this part (copy the text of
the problem, not just the problem number from the specs)?

Given a year, and whether it was a winter or summer olympics, which


country won the most silver medals. Note: ties are possible, if there is a tie,
all countries that tie are part of the answer

b) Example for your problem. Provide the necessary parameters and


what the expected result is

 Year: 2004
 Type: Summer Olympics
 Expected Result: The United States won the most silver medals, with
a total of 39.

c) How did you work out what the solution is? Describe your process
1. Opened the dataset Olympic_Games_Medal_Tally.csv to analyse the
medal distribution.
2. Filtered the dataset for the year 2004 and for entries corresponding to
the Summer Olympics.
3. Identified the maximum number of silver medals won by any country in
this filtered dataset.
4. Extracted the country or countries associated with this maximum
value.
5. Verified the result to ensure the country with the highest silver medal
count is accurately identified.

Problem 2 (Intermediate level):

a) What is the problem you are doing for this part (copy the text of
the problem, not just the problem number from the specs)?

Given a name string, how many unique athletes with name string in
their name won a medal in the olympics. Note the name string isn't a
full name. For example, name string might be "Tan" and anyone who
has Tan as a first, middle or last name should be counted. List the
athletes.

b) Example for your problem. Provide the necessary parameters and


what the expected result is
 Name string: "an"
 Expected Result: 8,679 unique athletes with the name "an" in their
name who won a medal. Examples of such athletes include:
1. Dan Carroll
2. Mannie McArthur
3. Frank Smith
4. Eddie Mandible
5. Paddy Moran

c) How did you work out what the solution is? Describe your process
1. I inspected the dataset to identify relevant columns. The dataset
includes an athlete column with full names and a medal column
indicating whether a medal was won.
2. I filtered rows where:
 The athlete name contains the string "an" (case insensitive).
 The medal column is not empty (indicating a medal was won).
3. I identified unique values in the athlete column from the filtered data
and counted them.
4. The final output includes the count of 8,679 unique athletes and a
sample list of names.

Problem 3 (Hard level):

a) What is the problem you are doing for this part (copy the text of
the problem, not just the problem number from the specs)?

Given a year and whether it was summer or winter olympics produce a


histogram of the 10 top ranked countries based on the number of gold
medals won. Break ties with number of silvers, then number of bronze

b) Example for your problem. Provide the necessary parameters


and what the expected result is

Parameters:
 Year: 2004
 Olympics Type: Summer
Expected Result:

Ran Country Gol Silv Bronz


k d er e
1 United States 36 39 26
2 People's Republic of 32 17 14
China
3 Russian Federation 28 26 36
4 Australia 17 16 17
5 Japan 16 9 12
6 Germany 13 16 20
7 France 11 9 13
8 Italy 10 11 11
9 Republic of Korea 9 12 9
10 Great Britain 9 9 12

c) How did you work out what the solution is? Describe your process
1. Filter the dataset to include only records from the 2004 Summer
Olympics.
2. Sort the countries in descending order based on the number of gold
medals. If there are ties, use silver medals, followed by bronze medals,
as the tiebreakers.
3. Select the top 10 ranked countries.
4. Create a histogram to show the distribution of gold medals among
these top 10 countries.

Finding Commonalities:

What are the common tasks that you might need to do for your 3
problems?

1. Filtering Data
2. Sorting and Ranking
3. Aggregating and Counting
4. Output Generation

Declaration:

Provide a declaration for how you will store the data that you need
for your problems.

 Use a structured dataset containing columns such as:


Year, Type of Olympics, Country, Medal Counts (Gold, Silver,
Bronze), Athlete Names, etc.

Provide the prototypes of the set of functions that you will need for
your program. For each function clearly state:
 what your parameters are and what they will be used for
 what your return type is
 what your function will do

1. Function: filter_data
 Parameters:
o year (int): The year to filter the dataset.
o olympic_type (str): "Summer" or "Winter" Olympics.
 Return Type: Filtered dataset
 Description: Filters the dataset based on year and type of Olympics.
2. Function: get_top_countries
 Parameters:
o data (DataFrame): Filtered dataset.
o medal_type (str): Type of medal to rank.
o top_n (int): Number of top countries to return.
 Return Type: List of dictionaries
 Description: Sorts and ranks countries by medal count, breaking ties
using other medal types.

3. Function: find_athletes
 Parameters:
o data (DataFrame): Full dataset.
o name_string (str): Substring to match in athlete names.
 Return Type: int, List of strings.
 Description: Counts and lists unique athletes whose names contain the
given substring.

4. Function: generate_histogram
 Parameters:
o data (DataFrame): Dataset of top-ranked countries.
o title (str): Title for the histogram.
 Return Type: None (visual output).
 Description: Generates a histogram of medal counts for the top-ranked
countries

You might also like