Google Scholar

Gebruikersprofielen voor Paul Weng

Paul Weng

Duke Kunshan University

Geverifieerd e-mailadres voor duke.edu

Geciteerd door 3608

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL)
that learns from human feedback instead of relying on an engineered reward function. …

Opslaan Citeren Geciteerd door 481 Verwante artikelen Alle 5 versies HTML-versie

[PDF] arxiv.org

Analytics and machine learning in vehicle routing research

…, J Jin, G Kendall, J Li, Z Lu, J Ren, P Weng… - … Journal of Production …, 2023 - Taylor & Francis

The Vehicle Routing Problem (VRP) is one of the most intensively studied combinatorial
optimisation problems for which numerous models and algorithms have been proposed. To …

Opslaan Citeren Geciteerd door 185 Verwante artikelen Alle 11 versies

[PDF] arxiv.org

A survey on interpretable reinforcement learning

C Glanois, P Weng, M Zimmer, D Li, T Yang, J Hao… - Machine Learning, 2024 - Springer

Although deep reinforcement learning has become a promising machine learning approach
for sequential decision-making problems, it is still not mature enough for high-stake …

Opslaan Citeren Geciteerd door 253 Verwante artikelen Alle 8 versies

[PDF] acm.org

Dual graph attention networks for deep latent representation of multifaceted social effects in recommender systems

Q Wu, H Zhang, X Gao, P He, P Weng, H Gao… - The world wide web …, 2019 - dl.acm.org

Social recommendation leverages social information to solve data sparsity and cold-start
problems in traditional collaborative filtering methods. However, most existing models assume …

Opslaan Citeren Geciteerd door 445 Verwante artikelen Alle 6 versies

[PDF] arxiv.org

Invit: A generalizable routing problem solver with invariant nested view transformer

H Fang, Z Song, P Weng, Y Ban - arXiv preprint arXiv:2402.02317, 2024 - arxiv.org

Recently, deep reinforcement learning has shown promising results for learning fast heuristics
to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an …

Opslaan Citeren Geciteerd door 71 Verwante artikelen Alle 6 versies HTML-versie

[PDF] mlr.press

Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards

U Siddique, P Weng, M Zimmer - … Conference on Machine …, 2020 - proceedings.mlr.press

As the operations of autonomous systems generally affect simultaneously several users, it is
crucial that their designs account for fairness considerations. In contrast to standard (deep) …

Opslaan Citeren Geciteerd door 166 Verwante artikelen Alle 8 versies HTML-versie

[PDF] hal.science

Teacher-student framework: a reinforcement learning approach

M Zimmer, P Viappiani, P Weng - AAMAS Workshop autonomous …, 2014 - hal.science

We propose a reinforcement learning approach to learning to teach. Following Torrey and
Taylor’s framework [18], an agent (the “teacher”) advises another one (the “student”) by …

Opslaan Citeren Geciteerd door 101 Verwante artikelen Alle 8 versies HTML-versie

[PDF] mlr.press

Learning fair policies in decentralized cooperative multi-agent reinforcement learning

…, C Glanois, U Siddique, P Weng - … conference on machine …, 2021 - proceedings.mlr.press

We consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement
learning (MARL). We formalize it in a principled way as the problem of optimizing a …

Opslaan Citeren Geciteerd door 103 Verwante artikelen Alle 7 versies HTML-versie

[PDF] mlr.press

Top-k selection based on adaptive sampling of noisy preferences

…, B Szorenyi, W Cheng, P Weng… - International …, 2013 - proceedings.mlr.press

We consider the problem of reliably selecting an optimal subset of fixed size from a given
set of choice alternatives, based on noisy information about the quality of these alternatives. …

Opslaan Citeren Geciteerd door 103 Verwante artikelen Alle 25 versies HTML-versie

[PDF] springer.com

Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm

R Busa-Fekete, B Szörényi, P Weng, W Cheng… - Machine learning, 2014 - Springer

We introduce a novel approach to preference-based reinforcement learning, namely a
preference-based variant of a direct policy search method based on evolutionary optimization. …

Opslaan Citeren Geciteerd door 97 Verwante artikelen Alle 21 versies

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Gebruikersprofielen voor Paul Weng

Paul Weng

A survey of reinforcement learning from human feedback

Analytics and machine learning in vehicle routing research

A survey on interpretable reinforcement learning

Dual graph attention networks for deep latent representation of multifaceted social effects in recommender systems

Invit: A generalizable routing problem solver with invariant nested view transformer

Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards

Teacher-student framework: a reinforcement learning approach

Learning fair policies in decentralized cooperative multi-agent reinforcement learning

Top-k selection based on adaptive sampling of noisy preferences

Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm