The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
단계 크기는 학습 알고리즘, 특히 NPG(자연 정책 기울기) 방법에서 근본적으로 중요한 매개변수입니다. 증분 NPG 추정에서 스텝 크기의 상한을 도출하고, 도출된 상한을 구현하기 위한 적응형 스텝 크기를 제안합니다. 제안된 적응형 단계 크기는 업데이트된 매개변수가 목표를 초과하지 않도록 보장하며, 이는 상대적 중요도에 따라 학습 샘플에 가중치를 부여하여 달성됩니다. 또한 증분 학습에는 적합하지 않지만 단계 크기에 대해 엄격한 상한 및 하한을 제공합니다. 우리는 클래식 벤치마크를 사용하여 제안된 단계 크기의 유용성을 확인합니다. 우리가 아는 한, 이는 NPG 추정을 위한 최초의 적응형 단계 크기 방법입니다.
Ryo IWAKI
Osaka University
Hiroki YOKOYAMA
Tamagawa University
Minoru ASADA
Osaka University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Ryo IWAKI, Hiroki YOKOYAMA, Minoru ASADA, "Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 9, pp. 2346-2355, September 2018, doi: 10.1587/transinf.2017EDP7363.
Abstract: The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7363/_p
부
@ARTICLE{e101-d_9_2346,
author={Ryo IWAKI, Hiroki YOKOYAMA, Minoru ASADA, },
journal={IEICE TRANSACTIONS on Information},
title={Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting},
year={2018},
volume={E101-D},
number={9},
pages={2346-2355},
abstract={The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.},
keywords={},
doi={10.1587/transinf.2017EDP7363},
ISSN={1745-1361},
month={September},}
부
TY - JOUR
TI - Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting
T2 - IEICE TRANSACTIONS on Information
SP - 2346
EP - 2355
AU - Ryo IWAKI
AU - Hiroki YOKOYAMA
AU - Minoru ASADA
PY - 2018
DO - 10.1587/transinf.2017EDP7363
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2018
AB - The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.
ER -