I am a postdoc in the Department of Computer Science, Harvard University from July 2020, where I work with Professor Milind Tambe and Professor Hima Lakkaraju . Previously I did my first postdoc in the Department of Computer Science, Dartmouth College from July 2018, where I worked with Professor V.S. Subrahmanian. I obtained my PhD from Interdisciplinary Graduate School (IGS), Nanyang Technological University (NTU) in 2018, where I was advised by Professor Bo An. I got my B.S. in Physics from University of Science and Technology of China (USTC) in 2013.
I study AI techniques that help understand the world and benefit the society. In particular, I am interested in reinforcement learning, graph/network-structured learning models, deep generative models, and predictive decision making.
I am also interested in social aspects of AI systems such as interpretability, fairness and safety.
I work on graph data, textual data, time-series data, and multi-modal data), in domains such as cybersecurity, online social networks, public health and transportation.
Here are some examples of my research projects.
Nowadays, governments face a worsening problem of traffic congestion in urban areas. To alleviate road congestion, a number of approaches have been proposed, among which ETC has been reported to be effective in many countries and areas (e.g., Singapore ERP, and Norway AutoPASS). The tolls of different roads and time periods are different, so that the vehicles are indirectly regulated to travel through less congested roads with lower tolls. However, although current ETC schemes vary tolls at different time periods throughout a day, they are predetermined and fixed at the same periods from day to day.
To make the tolling fully dynamic, adaptive and optimized, we handle the dynamic tolling from a city scale, and model the dynamics of a tolling road systems as a high-dimensional Markov Decision Process (MDP).
The state of the formulated MDP is the number of vehicles on a road that are heading to certain destination, and the action is the toll on each road. Our solution to the formulated MDP is characterized by the following properties (see Figure 1 as an illustration):
Other papers related to traffic control with reinforcement learning:
In complex multiagent systems, sequential decision making has long been a significant challenge with many characteristics. One prominent characteristic is the interactions among the agents. In many practical scenarios (e.g., the Stag-Hunt game), agents need to cooperate with each other to achieve a common goal with high rewards (e.g., hunting stags), while each individual agent is self-interested and might deviate from the cooperation for less risky goals with low rewards (e.g., hunting rabbits). Another critical characteristic comes from various uncertainties. One type of uncertainty arises from the lack of accurate knowledge of environment and other agents where the uncertain information can be modelled probabilistically. A more challenging type of uncertainty lies in the environment-related factors which we do not know how to model.
These two challenges are significantly amplified when it comes to sequential decision making, where we need to look at not only short term rewards but also rewards in the long run. Therefore, one has to consider subsequent effect of the current action, especially in a dynamic environment. Another crucial characteristic is the limited learning trials. On the Minecraft platform, it usually takes several seconds to complete one episode of game. Therefore, it is extremely time consuming to learn an effective policy.
The Microsoft Malmo Collaborative AI Challenge (MCAC), which is designed to encourage research relating to various problems in Collaborative AI, builds on a Minecraft mini-game called “Pig Chase”. Pig Chase is played on a 9 × 9 grid where agents can either work together to catch the pig and achieve high scores, or give up cooperation and achieve low scores. After playing certain episodes (e.g., 100) of games, the agent who achieves the highest average scores wins the challenge. Despite its simple rules, this game has all the above stated key characteristics. Though there are numerous papers studying sequential decision making in complex multiagent systems, they only address a subset of these characteristics. We hope to shed some light on solving this class of problems by presenting HogRider, a champion agent which won 2017 Microsoft Malmo Collaborative AI Challenge.
The solutions underlying HogRider are characterized by
Paper and media coverage:
Other papers at the intersections of Data, Machine Learning, and Game Theory:
The number of software vulnerabilities disclosed every year is staggering, leading to increasing risks for system security officers. Because patching is expensive (time for patch installation time, patch purchase and risk of disruption to production systems), many vulnerabilities go unpatched because of limited resources to tackle thousands of patching tasks. To help alleviate the situation, when a new vulnerability is discovered, a Common Vulnerability and Exposure (CVE) numbering authority (such as the MITRE Coporation) assigns a CVE number to that vulnerability, along with a brief description. The US National Institute of Standards and Technology (NIST) then studies the CVE and releases a severity score as well as its associated attributes (e.g., attack vector, attack complexity) via the Common Vulnerability Scoring System (CVSS). However, we have observed that on average, it takes NIST around 130 days to conduct the investigation and analysis. In this project, we extract online discussions (e.g., Twitter) of the vulnerabilities, and predict i) when a vulnerability will be exploited, and ii) how severe that vulnerability is.
Algorithmic Game Theory , by Noam Nisan, Tim Roughgarden, E´va Tardos and Vijay V. Vazirani.
Game Theory (Coursera), by Matthew O. Jackson, Kevin Leyton-Brown and Yoav Shoham.
Computational Aspects of Cooperative Game Theory , by Georgios Chalkiadakis, Edith Elkind and Michael Wooldridge.
Pattern Recognition And Machine Learning, by Christopher M. Bishop.
ML course: Stanford CS229, by Andrew Ng.
DL course: CMU 11-785 , by Bhiksha Raj.
CV course: Stanford CS231n , by Feifei Li, Justin Johnson and Serena Yeung.
NLP course: Stanford CS224n , by Chris Manning.
RL course: UCL Course on RL, by David Silver.
DRL course: UC Berkeley CS294 , by Sergey Levine.
RL book: Introduction to Reinforcement Learning, by Richard S. Sutton and Andrew G. Barto.
Convex Optimization, by Stephen Boyd and Lieven Vandenberghe.
CMU 10-725, by Ryan Tibshirani.