Gebruikersprofielen voor Paul Weng
Paul WengDuke Kunshan University Geverifieerd e-mailadres voor duke.edu Geciteerd door 3608 |
A survey of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL)
that learns from human feedback instead of relying on an engineered reward function. …
that learns from human feedback instead of relying on an engineered reward function. …
Analytics and machine learning in vehicle routing research
…, J Jin, G Kendall, J Li, Z Lu, J Ren, P Weng… - … Journal of Production …, 2023 - Taylor & Francis
The Vehicle Routing Problem (VRP) is one of the most intensively studied combinatorial
optimisation problems for which numerous models and algorithms have been proposed. To …
optimisation problems for which numerous models and algorithms have been proposed. To …
A survey on interpretable reinforcement learning
Although deep reinforcement learning has become a promising machine learning approach
for sequential decision-making problems, it is still not mature enough for high-stake …
for sequential decision-making problems, it is still not mature enough for high-stake …
Dual graph attention networks for deep latent representation of multifaceted social effects in recommender systems
Social recommendation leverages social information to solve data sparsity and cold-start
problems in traditional collaborative filtering methods. However, most existing models assume …
problems in traditional collaborative filtering methods. However, most existing models assume …
Invit: A generalizable routing problem solver with invariant nested view transformer
Recently, deep reinforcement learning has shown promising results for learning fast heuristics
to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an …
to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an …
Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards
As the operations of autonomous systems generally affect simultaneously several users, it is
crucial that their designs account for fairness considerations. In contrast to standard (deep) …
crucial that their designs account for fairness considerations. In contrast to standard (deep) …
Teacher-student framework: a reinforcement learning approach
We propose a reinforcement learning approach to learning to teach. Following Torrey and
Taylor’s framework [18], an agent (the “teacher”) advises another one (the “student”) by …
Taylor’s framework [18], an agent (the “teacher”) advises another one (the “student”) by …
Learning fair policies in decentralized cooperative multi-agent reinforcement learning
We consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement
learning (MARL). We formalize it in a principled way as the problem of optimizing a …
learning (MARL). We formalize it in a principled way as the problem of optimizing a …
Top-k selection based on adaptive sampling of noisy preferences
We consider the problem of reliably selecting an optimal subset of fixed size from a given
set of choice alternatives, based on noisy information about the quality of these alternatives. …
set of choice alternatives, based on noisy information about the quality of these alternatives. …
Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
We introduce a novel approach to preference-based reinforcement learning, namely a
preference-based variant of a direct policy search method based on evolutionary optimization. …
preference-based variant of a direct policy search method based on evolutionary optimization. …