{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:01:51Z","timestamp":1766066511498,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,7,27]],"date-time":"2015-07-27T00:00:00Z","timestamp":1437955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2015,7,27]]},"abstract":"<jats:p>The locomotion skills developed for physics-based characters most often target flat terrain. However, much of their potential lies with the creation of dynamic, momentum-based motions across more complex terrains. In this paper, we learn controllers that allow simulated characters to traverse terrains with gaps, steps, and walls using highly dynamic gaits. This is achieved using reinforcement learning, with careful attention given to the action representation, non-parametric approximation of both the value function and the policy; epsilon-greedy exploration; and the learning of a good state distance metric. The methods enable a 21-link planar dog and a 7-link planar biped to navigate challenging sequences of terrain using bounding and running gaits. We evaluate the impact of the key features of our skill learning pipeline on the resulting performance.<\/jats:p>","DOI":"10.1145\/2766910","type":"journal-article","created":{"date-parts":[[2015,7,28]],"date-time":"2015-07-28T12:26:38Z","timestamp":1438086398000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":52,"title":["Dynamic terrain traversal skills using reinforcement learning"],"prefix":"10.1145","volume":"34","author":[{"given":"Xue Bin","family":"Peng","sequence":"first","affiliation":[{"name":"University of British Columbia"}]},{"given":"Glen","family":"Berseth","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Michiel","family":"van de Panne","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]}],"member":"320","published-online":{"date-parts":[[2015,7,27]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Box2D 2015. Box2d: A 2d physics engine for games Jan. http:\/\/box2d.org.  Box2D 2015. Box2d: A 2d physics engine for games Jan. http:\/\/box2d.org."},{"key":"e_1_2_2_2_1","unstructured":"Busoniu L. Babuska R. De Schutter B. and Ernst D. 2010. Reinforcement learning and dynamic programming using function approximators. CRC press.   Busoniu L. Babuska R. De Schutter B. and Ernst D. 2010. Reinforcement learning and dynamic programming using function approximators. CRC press."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1409060.1409066"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618516"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781156"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964954"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781157"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102377"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-012-1248-5"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2012.03189.x"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2682626"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366174"},{"key":"e_1_2_2_13_1","doi-asserted-by":"crossref","unstructured":"Hansen N. 2006. The cma evolution strategy: A comparing review. In Towards a New Evolutionary Computation 75--102.  Hansen N. 2006. The cma evolution strategy: A comparing review. In Towards a New Evolutionary Computation 75--102.","DOI":"10.1007\/3-540-32494-1_4"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1477926.1477936"},{"volume-title":"Proc. of Symposium on Computer Animation, 129--138","author":"Kwon T.","key":"e_1_2_2_15_1"},{"key":"e_1_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Lange S. Gabel T. and Riedmiller M. 2012. Batch reinforcement learning. In Reinforcement Learning. Springer 45--73.  Lange S. Gabel T. and Riedmiller M. 2012. Batch reinforcement learning. In Reinforcement Learning. Springer 45--73.","DOI":"10.1007\/978-3-642-27645-3_2"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.gmod.2005.03.004"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618515"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1882261.1866160"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781155"},{"volume-title":"Proc. ICML","year":"2014","author":"Levine S.","key":"e_1_2_2_21_1"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185524"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366173"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531386"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276385"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778808"},{"key":"e_1_2_2_27_1","unstructured":"Muja M. and Lowe D. G. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP (1) 331--340.  Muja M. and Lowe D. G. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP (1) 331--340."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1017928328829"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/127719.122755"},{"key":"e_1_2_2_30_1","first-page":"627","article-title":"A reduction of imitation learning and structured prediction to noregret online learning","volume":"15","author":"Ross S.","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_2_31_1","doi-asserted-by":"crossref","unstructured":"Stewart A. J. and Cremer J. F. 1992. Beyond keyframing: an algorithmic approach to animation. In Graphics Interface 273--281.   Stewart A. J. and Cremer J. F. 1992. Beyond keyframing: an algorithmic approach to animation. In Graphics Interface 273--281.","DOI":"10.21236\/ADA241337"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601121"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276386"},{"volume-title":"Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on, IEEE, 272--279","author":"van Hasselt H.","key":"e_1_2_2_34_1"},{"volume-title":"Reinforcement Learning","author":"van Hasselt H.","key":"e_1_2_2_35_1"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618514"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1966394.1966398"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778811"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276509"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360680"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/545261.545276"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2766910","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2766910","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:56:02Z","timestamp":1750272962000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2766910"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,7,27]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,7,27]]}},"alternative-id":["10.1145\/2766910"],"URL":"https:\/\/doi.org\/10.1145\/2766910","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2015,7,27]]},"assertion":[{"value":"2015-07-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}