Deep Reinforcement Learning for Two-sided Online Bipartite Matching in Collaborative Order Picking

Luca Begnardi; Hendrik Baier; Willem van Jaarsveld; Yingqian Zhang

Deep Reinforcement Learning for Two-sided Online Bipartite Matching in Collaborative Order Picking

Luca Begnardi, Hendrik Baier, Willem van Jaarsveld, Yingqian Zhang

Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:121-136, 2024.

Abstract

As a growing number of warehouse operators are moving from human-only to Collaborative human-robot Order Picking solutions, more efficient picker routing policies are needed, since the complexity of coordinating multiple actors in the system increases significantly. The objective of these policies is to match human pickers and robot carriers to fulfill picking tasks, optimizing pick-rate and total tardiness of the orders. In this paper, we propose to formulate the order picking routing problem as a more general combinatorial optimization problem known as Two-sided Online Bipartite Matching. We present an end-to-end Deep Reinforcement Learning approach to optimize a combination of pick-rate and order tardiness, and to deal with the uncertainty of real-world warehouse environments. To extract and exploit spatial information from the environment, we devise three different Graph Neural Network architectures and empirically evaluate them on several scenarios of growing complexity in a simulation environment we developed. We show that all proposed methods significantly outperform greedy and more sophisticated heuristics, as well as non-GNN-based DRL approaches. Moreover, our methods exhibit good transferability properties, even when scaling up test problem instances to more than forty times the size of the ones the models were trained on. Code is available at: \url{https://github.com/lbegnardi/DRL-TOBM-CPR}.

Cite this Paper

BibTeX


@InProceedings{pmlr-v222-begnardi24a,
  title = 	 {Deep Reinforcement Learning for Two-sided Online Bipartite Matching in Collaborative Order Picking},
  author =       {Begnardi, Luca and Baier, Hendrik and van Jaarsveld, Willem and Zhang, Yingqian},
  booktitle = 	 {Proceedings of the 15th Asian Conference on Machine Learning},
  pages = 	 {121--136},
  year = 	 {2024},
  editor = 	 {Yanıkoğlu, Berrin and Buntine, Wray},
  volume = 	 {222},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--14 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v222/begnardi24a/begnardi24a.pdf},
  url = 	 {https://proceedings.mlr.press/v222/begnardi24a.html},
  abstract = 	 {As a growing number of warehouse operators are moving from human-only to Collaborative human-robot Order Picking solutions, more efficient picker routing policies are needed, since the complexity of coordinating multiple actors in the system increases significantly. The objective of these policies is to match human pickers and robot carriers to fulfill picking tasks, optimizing pick-rate and total tardiness of the orders. In this paper, we propose to formulate the order picking routing problem as a more general combinatorial optimization problem known as Two-sided Online Bipartite Matching. We present an end-to-end Deep Reinforcement Learning approach to optimize a combination of pick-rate and order tardiness, and to deal with the uncertainty of real-world warehouse environments. To extract and exploit spatial information from the environment, we devise three different Graph Neural Network architectures and empirically evaluate them on several scenarios of growing complexity in a simulation environment we developed. We show that all proposed methods significantly outperform greedy and more sophisticated heuristics, as well as non-GNN-based DRL approaches. Moreover, our methods exhibit good transferability properties, even when scaling up test problem instances to more than forty times the size of the ones the models were trained on. Code is available at: \url{https://github.com/lbegnardi/DRL-TOBM-CPR}.}
}

Endnote

%0 Conference Paper
%T Deep Reinforcement Learning for Two-sided Online Bipartite Matching in Collaborative Order Picking
%A Luca Begnardi
%A Hendrik Baier
%A Willem van Jaarsveld
%A Yingqian Zhang
%B Proceedings of the 15th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Berrin Yanıkoğlu
%E Wray Buntine	
%F pmlr-v222-begnardi24a
%I PMLR
%P 121--136
%U https://proceedings.mlr.press/v222/begnardi24a.html
%V 222
%X As a growing number of warehouse operators are moving from human-only to Collaborative human-robot Order Picking solutions, more efficient picker routing policies are needed, since the complexity of coordinating multiple actors in the system increases significantly. The objective of these policies is to match human pickers and robot carriers to fulfill picking tasks, optimizing pick-rate and total tardiness of the orders. In this paper, we propose to formulate the order picking routing problem as a more general combinatorial optimization problem known as Two-sided Online Bipartite Matching. We present an end-to-end Deep Reinforcement Learning approach to optimize a combination of pick-rate and order tardiness, and to deal with the uncertainty of real-world warehouse environments. To extract and exploit spatial information from the environment, we devise three different Graph Neural Network architectures and empirically evaluate them on several scenarios of growing complexity in a simulation environment we developed. We show that all proposed methods significantly outperform greedy and more sophisticated heuristics, as well as non-GNN-based DRL approaches. Moreover, our methods exhibit good transferability properties, even when scaling up test problem instances to more than forty times the size of the ones the models were trained on. Code is available at: \url{https://github.com/lbegnardi/DRL-TOBM-CPR}.

APA


Begnardi, L., Baier, H., van Jaarsveld, W. & Zhang, Y.. (2024). Deep Reinforcement Learning for Two-sided Online Bipartite Matching in Collaborative Order Picking. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:121-136 Available from https://proceedings.mlr.press/v222/begnardi24a.html.

Related Material

Download PDF