強化学習による土壌熱交換システムの運用制御(その1):反事実の報酬を考慮したアルゴリズムの適用可能性の検証

書誌事項

タイトル別名
  • OPERATIONAL CONTROL FOR EARTH-TO-AIR HEAT EXCHANGER BY REINFORCEMENT LEARNING (PART 1): APPLICABILITY VERIFICATION OF ALGORITHM CONSIDERING COUNTERFACTUAL REWARD

この論文をさがす

抄録

<p> Earth-to-air heat exchangers (EAHE) utilize the heat capacity of soil to pre-cool or pre-heat outside air (OA), and then introduce OA into the air handling unit to reduce the heat load of OA. However, at the operational phase, the control of an EAHE is not optimized (e.g., inefficient energy saving or lower quality of introduced air). It is also difficult to establish the optimal control rule that considering the future effects by sequential operations since the control response takes extremely long time. In this study, we focused on the reinforcement learning (RL) control, which does not require the construction of control rules. In RL, an agent learns to maximize reward in the state s within an environment, and this algorithm achieves its purpose by passing action a that maximizes reward to the environment. One of the advantages of RL is maximizing the cumulative reward instead of maximizing the sequential reward. For this reason, RL is suitable for unsteady problems. </p><p> The purpose of this study is to establish the optimal control rules for an EAHE by RL. In this paper, we validate the RL control rule that achieve the two objectives in which decreasing the heat load of the fresh air handling unit (FAHU) and suppressing the occurrence of condensation in the EAHE. Firstly, we define the control problem for RL using the environment estimated by long-term performance prediction method of EAHEs based on CFD developed by authors. Then, we incorporate training logic to the environment and conduct the training of RL. Secondary, we implement the algorithm for efficient learning by defining rewards as the results of factual and counterfactual actions. Then, we verify the effectiveness of the RL control by comparing the RL control with the scheduled control, etc. Finally, we analyze the tendency of actions selected by RL. </p><p> The reward is defined by a method of feeding back to the actions selected by the Agent to the relative rewards. This method compares and evaluates not only the results of factual action but also the results of counterfactual action based on the Counterfactual Predictor. This makes it possible to give immediate feedback to the Agent which action was appropriate. We call this method Factual and Counterfactual Reward Estimation (FCRE). </p><p> As a result of RL using FCRE, it is possible to obtain rough policy in the early stages of the learning and high convergence of learning. Comparing the RL control with the scheduled control, heat load of FAHU is increased about 3%, but the occurrence of the condensation is extremely decreased by the RL control. It was confirmed that it is possible to control to achieve the above two objectives simultaneously by the RL control.</p>

収録刊行物

被引用文献 (1)*注記

もっと見る

参考文献 (10)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ