Learning when to skim and when to read
AR Johansen, R Socher - arXiv preprint arXiv:1712.05483, 2017 - arxiv.org
arXiv preprint arXiv:1712.05483, 2017•arxiv.org
Many recent advances in deep learning for natural language processing have come at
increasing computational cost, but the power of these state-of-the-art models is not needed
for every example in a dataset. We demonstrate two approaches to reducing unnecessary
computation in cases where a fast but weak baseline classier and a stronger, slower model
are both available. Applying an AUC-based metric to the task of sentiment classification, we
find significant efficiency gains with both a probability-threshold method for reducing …
increasing computational cost, but the power of these state-of-the-art models is not needed
for every example in a dataset. We demonstrate two approaches to reducing unnecessary
computation in cases where a fast but weak baseline classier and a stronger, slower model
are both available. Applying an AUC-based metric to the task of sentiment classification, we
find significant efficiency gains with both a probability-threshold method for reducing …
Many recent advances in deep learning for natural language processing have come at increasing computational cost, but the power of these state-of-the-art models is not needed for every example in a dataset. We demonstrate two approaches to reducing unnecessary computation in cases where a fast but weak baseline classier and a stronger, slower model are both available. Applying an AUC-based metric to the task of sentiment classification, we find significant efficiency gains with both a probability-threshold method for reducing computational cost and one that uses a secondary decision network.
arxiv.org