What will happen to a young data scientist who just graduated from school after entering an industry? What is the difference between experiments and applications to solve real world problems using data science? In this talk, I will share my journey from a new machine learning and data mining graduated student to a senior data engineer (Or sometimes, the data scientist). How to detect and define problems? How to understand the domain knowledge and combine it into data models? How to apply data science to solve them? How to ensure the data correctness during ETL process? How to evaluate the performance? Is there any better solution? Etc. Becoming a data science unicorn for one single person is difficult. How about a group of talents? To build a successful data product, cooperation with people from different functions is needed. In the end, I will share the experience working with different departments and how we achieve better result by leveraging knowledge from every domain.
Advanced computing and imaging technologies enable scientists to study natural phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Internet and mobile device users is expected to be even greater. To make sense and maximize utilization of such vast amounts of data for knowledge discovery and decision making, we need a new set of tools beyond conventional data mining and statistical analysis methods. Visualization transforms large quantities of, often multiple-dimensional, data into graphical representations that exploit the high-bandwidth channel of the human visual system, leveraging the brain's remarkable ability to detect patterns and draw inferences. It has been shown very effective in understanding large, complex data, and thus become an indispensable tool for many areas of research and practice. I will present several use cases of visualization based on new concepts and techniques that my group at UC Davis has introduced to further advance the visualization technology as a powerful discovery and communication tool.
近年相當火紅的聊天機器人、虛擬語音助理、智慧客服等技術商品,背後使用的人工智慧(Artificial Intelligence)涵蓋多樣技術領域範疇,其中,為了理解人類的語言意涵,自然語言處理(Natural Language Processing)的語意分析(Semantic Analysis)技術演著舉足輕重的角色。傳統語意分析技術,必須仰賴大量標記語料,為此,必須耗費龐大人力,其所必須耗費的成本,不啻為該技術進行技術商品化過程中相當沈重的負擔。本演講內容將藉由國際大廠的知名案例中所採使用的語意分析技術,引介近年語意分析技術之趨勢演變,包含如何減少語意分析技術研發過程中的人力介入。此外,亦將分享近年新興公司以語意分析技術衍生出的各應用案例。
This is a statistician's view of the development of Data Science. In 2002, Professor Ryan Rifkin, from a mathematical perspective, entitled his PhD thesis as "Everything old is new again." Statistics has a shorter history comparing with mathematics and other natural sciences. In this talk, we will go over some classical statistical papers, and try to connect their thoughts with modern data science. From their thoughts, we wish to relax some barriers as learning and using statistics in such a modern era.