Original article by Joseph Rickert
A run of 35 conferences is impressive indeed, but the social network analysts have been at it for an even longer time than that:
and today they are still on the cutting edge of the statistical analysis of networks. The conference presentations have not been posted yet, but judging from the conference workshops program there was plenty of R action in Brighton.
Social network analysis at this level involves some serious statistics and mastering a very specialized vocabulary. However, it seems to me that some knowledge of this field will become important to everyone working in data science. Supervised learning models and statistical models that assume independence among the predictors will most likely represent only the first steps that data scientists will take in exploring the complexity of large data sets.
And, maybe of equal importance is that fact that working with network data is great fun. Moreover, software tools exist in R and other languages that make it relatively easy to get started with just a few pointers.
From a statistical inference point of view what you need to know is Exponential Random Graph Models (ERGMs) are at the heart of modern social network analysis. An ERGM is a statistical model that enables one to predict the probability of observing a given network from a specified given class of networks based on both observed structural properties of the network plus covariates associated with the vertices of the network. The exponential part of the name comes from exponential family of functions used to specify the form of these models. ERGMs are analogous to generalized linear models except that ERGMs take into account the dependency structure of ties (edges) between vertices. For a rigorous definition of ERGMs see sections 3 and 4 of the paper by Hunter et al. in the 2008 special issue of the JSS, or Chapter 6 in Kolaczyk and Csárdi’s book Statistical Analysis of Network Data with R. (I have found this book to be very helpful and highly recommend it. Not only does it provide an accessible introduction to ERGMs it also begins with basic network statistics and the igraph package and then goes on to introduce some more advanced topics such as modeling processes that take place on graphs and network flows.)
In the R world, the place to go to work with ERGMs is the statnet.org. statnet is a suite of 15 or so CRAN packages that provide a complete infrastructure for working with ERGMs. statnet.org is a real gem of a site that contains documentation for all of the statnet packages along with tutorials, presentations from past Sunbelt conferences and more.
I am particularly impressed with the Shiny based GUI for learning how to fit ERGMs. Try it out on the Shiny webpage or in the box below. Click the Get Started button. Then select “built-in network” and “ecoli 1” under File type. After that, click the right arrow in the upper right corner. You should see a plot of the ecoli graph.