Skip to content

Data pooled across years #51

@katemjstorey

Description

@katemjstorey

Some papers entered, such as Paterson et al., 2009 recorded one values across many years. How I handled this during entry was the keep the same sample id, but record the data across multiple years, like this

source id sample id year measurement
1 1 2013 12300
1 1 2014 12300
1 1 2015 12300

You can see this when looking at Cal id 33319 to 33328 in table_cal, where there is many duplicates of the value 12300 joules/gram dry weight. When you are playing around with the summary statistics and sort by year, the duplications sort themselves out, so for Paterson et al., 2009, is shows that there are 9 Siscowet samples from Lake Ontario collected in 1995, which is correct. If you don't add year as a summary statistic however, it says that there are much more references than their actually are (for the Paterson Lake Ontario example, 83 as opposed to 18). I checked to see whether this effects the mean statistic, and it does, biasing it towards the samples that had to be duplicated because the given value was from multiple years. For Paterson et al, values for age 5 fish were given for each year, while data for other year classes were given over a range.

Using mean values as well as individual samples and combining them obviously already creates some biases, as individual data points are going to have much more weight. However, it would be nice to figure out a way to fix the year issue, where if there is the same sample id and source id it only reports one value, but it shows up over many years.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions