I believe the value of data science could reach a incredible level if we eliminate clear boundaries between different roles in a data science team. Every analyst is able to code and modeling, while every statistician is able to manipulate data and visualize data. Besides staying passionate about learning, another good way to is solving problems in work or on your own side projects. In this section, I would like to talk about the concept of a unicorn in data science, the skill sets, and a few practical problems.
Large-scale network data analysis has emerged into one of the most important tools in th big data regime for almost all scientists and practitioners. Among all characteristics and properties, community and centrality are two major components for a detailed understanding of a large-scale network. Unfortunately, traditional methods are either computationally infeasible for large-scale network, or without statistical verification and inference. This talk introduces a SOP for accurate and efficient analysis of large-scale network data. It consists of four main steps. First, a screening stage is proposed to roughly partition the whole network into communities via complement graph coloring. Then a likelihood-based statistical test is introduced to test for the significance of the detected communities. Once these significant communities are detected, another likelihood-based statistical test is introduced to check for the focus centrality of each community. Finally, a metaheuristic swarm intelligence based (SIB) method is proposed to fine tune the range of each community from its original circular setting. Our proposed SOP is demonstrated in several real-life data, showing how this method can provide extra suggestions from the data.