Saturday, November 17, 2012

First steps with networkx

One of my favorite topics is the study of structures and, inspired by the presentation of Jacqueline Kazil and Dana Bauer at PyCon US, I started to use networkx in order to analyze some networks. This library provides a lot facilities for the creation, the visualization and the mining of structured data. So, I decided to write this post that shows the first steps to start with it. We will see how to load a network from the gml format and how to prune the network in order to visualize only the nodes with a high degree. In the following examples the coappearance network of characters in the novel Les Miserables, freely available here, will be used. In this network each node represents a character and the connection between two characters represent the coappearance in the same chapter.

Let's start with the snippets. We can load and visualize the network with the following code:
# read the graph (gml format)
G = nx.read_gml('lesmiserables.gml',relabel=True)

# drawing the full network
figure(1)
nx.draw_spring(G,node_size=0,edge_color='b',alpha=.2,font_size=10)
show()
This should be the result:


It's easy to see that the graph is not really helpful. Most of the details of the network are still hidden and it's impossible to understand which are the most important nodes. Let's plot an histogram of the number of connections per node:
# distribution of the degree
figure(2)
d = nx.degree(G)
hist(d.values(),bins=15)
show()
The result should be as follows:


Looking at this histogram we can see that only few characters have more than ten connections. Then, we decide to visualize only them:
def trim_nodes(G,d):
 """ returns a copy of G without 
     the nodes with a degree less than d """
 Gt = G.copy()
 dn = nx.degree(Gt)
 for n in Gt.nodes():
  if dn[n] <= d:
   Gt.remove_node(n)
 return Gt

# drawing the network without
# nodes with degree less than 10
Gt = trim_nodes(G,10)
figure(3)
nx.draw(Gt,node_size=0,node_color='w',edge_color='b',alpha=.2)
show()
In the graph below we can see the final result of the analysis. This time the graph makes us able to observe which are the most relevant characters and how they are related to each other according to their coappearance through the chapters.

5 comments:

  1. GML is so similar to plain JSON, isn't it? Good article, let's see what happens when the wonderful D3.js comes integrated into the IPython notebook :)

    ReplyDelete
  2. what should I import to have access to the figure and show methods?

    ReplyDelete
    Replies
    1. Hello, these are the import you need to run the examples:

      from pylab import show, hist, figure
      import networkx as nx

      Delete
  3. @rouli, from http://networkx.lanl.gov/ it appears all you need to do is

    >>> import networkx as nx

    and you're good to go :-)

    ReplyDelete
  4. Hi, I pasted the code of this post in www.pypedia.com and I made some minor changes to make it work there. Enjoy!
    http://bit.ly/14YdsH2
    Press the "Run" button. You can change the code and check how the graphs behave.

    ReplyDelete

Note: Only a member of this blog may post a comment.