Data pooled across years

Some papers entered, such as Paterson et al., 2009 recorded one values across many years. How I handled this during entry was the keep the same sample id, but record the data across multiple years, like this 

source id | sample   id | year | measurement
-- |-- | -- | --
1 |1 | 2013 | 12300
1 | 1 | 2014 | 12300
1 | 1 | 2015 | 12300

You can see this when looking at Cal id 33319 to 33328 in table_cal, where there is many duplicates of the value 12300 joules/gram dry weight. When you are playing around with the summary statistics and sort by year, the duplications sort themselves out, so for Paterson et al., 2009, is shows that there are 9 Siscowet samples from Lake Ontario collected in 1995, which is correct. If you don't add year as a summary statistic however, it says that there are much more references than their actually are (for the Paterson Lake Ontario example, 83 as opposed to 18). I checked to see whether this effects the mean statistic, and it does, biasing it towards the samples that had to be duplicated because the given value was from multiple years. For Paterson et al, values for age 5 fish were given for each year, while data for other year classes were given over a range. 

Using mean values as well as individual samples and combining them obviously already creates some biases, as individual data points are going to have much more weight. However, it would be nice to figure out a way to fix the year issue, where if there is the same sample id and source id it only reports one value, but it shows up over many years. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data pooled across years #51

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data pooled across years #51

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions