Crowdlab algorithm question #871

vahuja4 · 2023-10-12T14:41:02Z

vahuja4
Oct 12, 2023

From the crowdlab paper: "Illustrating how many real-world multiannotator datasets look, Figure 1 shows a disparity in annotator quality as well as many examples whose consensus
label will be incorrect if we rely on majority vote (nonetheless often done in practice due to its straightforward appeal).
Unsurprisingly, consensus labels are more likely to be incorrect for those examples with fewer annotations. An effective
method to estimate consensus label quality should properly
account for the number of annotations an example has received, as well as the quality of the annotators who selected
these labels".
So, if we have 3 annotators annotating all the data, does the crowdlab algorithm still help with creating the consensus label as opposed to majority vote? Because there is no concept of fewer annotations in my dataset.

Answered by huiwengoh

Oct 12, 2023

Hi @vahuja4, thanks for the question!

Even if you had 3 annotators (or any constant number) annotating all the data, CROWDLAB will still be helpful in providing more robust consensus labels for your dataset.

This is because in addition to the annotations from your annotators, CROWDLAB utilizes information (predictions) from a trained classifier to determine the best consensus label for each example. The CROWDLAB algorithm will automatically weigh the trustworthiness of the annotators and classifier and ensemble their annotations/predictions to provide you a consensus label, alongside a score for each example indicating the confidence level of that consensus label.

This process is similar …

View full answer

huiwengoh · 2023-10-12T17:09:49Z

huiwengoh
Oct 12, 2023
Maintainer

Hi @vahuja4, thanks for the question!

Even if you had 3 annotators (or any constant number) annotating all the data, CROWDLAB will still be helpful in providing more robust consensus labels for your dataset.

This is because in addition to the annotations from your annotators, CROWDLAB utilizes information (predictions) from a trained classifier to determine the best consensus label for each example. The CROWDLAB algorithm will automatically weigh the trustworthiness of the annotators and classifier and ensemble their annotations/predictions to provide you a consensus label, alongside a score for each example indicating the confidence level of that consensus label.

This process is similar regardless if you have a constant or varying number of annotators for each example. In fact, Appendix C of the CROWDLAB paper showcases CROWDLAB in use with a dataset where each example was annotated by 50 annotators!

Lastly, you can use any trained classifier with CROWDLAB, but as with most of the cleanlab package, a better classifier would likely yield better results 🙂

2 replies

vahuja4 Oct 13, 2023
Author

@huiwengoh - thank you for your reply! I have one more question: for the crowdlab algorithm to work, we need the following:

a trained classifier (M)
we also need a dataset with the following features X(assuming 3 annotators): a1's label, a2's label, a3's label, M's prediction. Here, Y is the ground_truth. We use the crowdlab algorithm using this dataset to figure out the ensemble weighting, etc.
finally, we need a hold-out set which has the same input features X but no ground truth and we use the crowdlab algorithm to give us the consensus label along the with score.

Can you please confirm?

huiwengoh Oct 13, 2023
Maintainer

You are mostly on the right track! But one key details is that CROWDLAB does not require ground truth labels, in fact in most applications the ground truth labels will not be available to you.

Rather, you can obtain out-of-sample predictions by your classifier for all your datapoints by performing cross validation, and then CROWDLAB can provide you with consensus label and scores for all those datapoints as well.

Here is a quick tutorial that might make that process clearer to you, hope it helps and please let us know if you have any other questions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crowdlab algorithm question #871

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Crowdlab algorithm question #871

Uh oh!

vahuja4 Oct 12, 2023

Replies: 1 comment · 2 replies

Uh oh!

huiwengoh Oct 12, 2023 Maintainer

Uh oh!

Uh oh!

vahuja4 Oct 13, 2023 Author

Uh oh!

huiwengoh Oct 13, 2023 Maintainer

vahuja4
Oct 12, 2023

Replies: 1 comment 2 replies

huiwengoh
Oct 12, 2023
Maintainer

vahuja4 Oct 13, 2023
Author

huiwengoh Oct 13, 2023
Maintainer