王璐: Multi-Objective Reinforcement Learning for Optimizing Tolerant Dynamic Decision Rules

报告时间：2025年7月22日（星期二）10:00-11:30

报告地点：翡翠湖校区科教楼A座1楼第三会议室

报告人：王璐教授

工作单位：美国密歇根大学

举办单位：计算机与信息学院（人工智能学院）

报告简介：

Many real-world problems involve multiple competing priorities, and decision rules differ when trade-offs are present. Correspondingly, there may be more than one feasible decision that leads to empirically sufficient optimization. In this talk, we present a concept of “tolerant regime,” which provides a set of individualized feasible decision rules under a prespecified tolerance rate. A multi-objective tree-based reinforcement learning (MOT-RL) method is developed to directly estimate the tolerant DTR (tDTR) that optimizes multiple objectives in a multistage multi-treatment setting. At each stage, MOT-RL constructs an unsupervised decision tree by modeling the counterfactual mean outcome of each objective via semiparametric regression and maximizing a purity measure constructed by the scalarized augmented inverse probability weighted estimators (SAIPWE). The algorithm is implemented in a backward inductive manner through multiple decision stages, and it estimates the optimal DTR and tDTR depending on the decision-maker’s preferences. Mult-objective tree-based reinforcement learning is robust, efficient, easy-to-interpret, and flexible to different settings.

报告人简介：

王璐，美国密西根大学生物统计学系教授，美国统计协会会士(ASA Fellow)，国际统计学会当选会员（Elected Member of ISI）。2002年本科毕业于北京大学，2008年博士毕业于哈佛大学。研究领域包括评估动态治疗方案的统计方法、个性化医疗、因果推断、非参数和半参数回归、缺失数据分析、以及纵向（相关/聚类）数据分析等。在JASA、Biometrika、Biometrics、AoAS等学术期刊上发表论文180余篇，并合著了一章书籍。现任JASA和Biometrics的副主编。

1	智能计算与工业软件前沿技术研讨会报告九则
2	陈鑫: 原位钛同位素示踪岩浆-热液演化及金属富集成矿过程
3	申广君: Least squares estimation for path-distribution dependent SDEs driven by fractional Brownian motions
4	曾现来: 低碳转型背景下循环经济的挑战与机遇
5	Elakneswaran: 放射性废物固定化的地质聚合物技术——可持续核修复的途径
6	René Michael Koenigs: Photochemistry as a tool for reaction discovery with reactive intermediates
7	Dominik Grall: Lessons learned from grid restoration and islanded operation tests with hydropower plants
8	Wendelin Angermann: Analytical Optimization and Practical Verification of Reactive Power Supply
9	Alexander Fröhlich: Transformers Under DC Bias—Investigating GIC Effects and Mitigation Strategies
10	陶加华: 一维硒化锑薄膜的取向生长与缺陷钝化机制

王璐: Multi-Objective Reinforcement Learning for Optimizing Tolerant Dynamic Decision Rules
发布日期：2025-07-09 字号：大 中 小【打印】

点击排行榜

王璐: Multi-Objective Reinforcement Learning for Optimizing Tolerant Dynamic Decision Rules 发布日期：2025-07-09 字号：大 中 小 【打印】

点击排行榜

王璐: Multi-Objective Reinforcement Learning for Optimizing Tolerant Dynamic Decision Rules
发布日期：2025-07-09 字号：大中小【打印】