An intelligent agent should have the abilities to see and interact with the world in many different ways. In this talk, I summarize our recent work on seeing and interacting using language (e.g., video captioning), interacting by taking actions in specific applications (e.g., viewing angle selection in 360 videos), and interacting by attacking other agents. Ultimately, we long for an embodied intelligent agent to assist us in our daily life. Hence, we also propose a system to anticipate human intention in order to proactively provide assistance.
Min Sun is an assistant professor at National Tsing Hua University in Taiwan. Before that, he was a postdoctoral researcher at Washington University in Seattle and he graduated from the University of Michigan with a Ph.D. degree in EE: System. He also won the best paper award of 3dRR in 2007, paper award of CVGIP in 2015, 2016, and 2017, and the winner of Thor challenge in CVPR 2017.