Modelling Network Growth

Modelling Network Growth under Resource Constraints

We develop a resource-constrained growth model that jointly explains the emergence of multiple structural properties in real-world networks.

Harshay Shah¹ Suhansanu Kumar¹ Hari Sundaram¹

¹University of Illinois at Urbana-Champaign (UIUC)

ArXiv Code Bibtex

1 The Problem

Individuals form connections in social and information networks under constraints of limited information and partial network access. Individuals' edge formation decisions cumulatively shape the structure of large real-world networks. As shown in Figure 1, large-scale academic and judicial networks exhibit rich structural properties: heavy-tailed degree distribution, high clustering and accelerated network growth.

Structural properties of citation networks — Figure 1Structural properties of academic citation network APS and judicial network USSC. Both networks exhibit heavy-tailed degree distribution, high clustering and accelerated network growth.

Network growth models study how edge formation mechanisms give rise to distinct macroscopic properties of real-world networks. Classic models of network growth can preserve multiple structural properties, but make unrealistic assumptions about individuals' decisions to form edges. Conversely, existing resource-constrained models do not jointly preserve multiple structural properties of real-world networks.

The problem of developing a model of network growth, where agents act under resource constraints, including access to only local information is hard. The problem lies in identifying simple mechanisms, with few parameters, where the agents only use local information and jointly preserve multiple structural properties of real-world networks.

2 Proposed Model

We propose a resource-constrained growth model in which new nodes that join the network use a random walk process to link to existing nodes. The mechanism that new nodes use to select an entry point into the network and subsequently form edges intuitively corresponds to how we expect researchers to find references to cite. A researcher first finds one or more relevant paper as “starting points”. Then, under time and information constraints, he or she searches for potential references by navigating through a chain of references. After repeating this process one or more times, the researcher selects to cite a subset of these papers.

Random Walk Model — Figure 2Random walk edge formation mechanism. A new node u joins the network at time t_u with a prescribed outdegree of three and initiates a random walk from seed node v_a. The dark labeled edges denote the order in which node u traverses the graph using a random walk. Node u stops the random walk after linking to three nodes: v_b , v_d and v_f.

Similarly, as shown in Figure 2, every new node that joins the network selects a seed node from which it initiates the random walk process to search for potential links. The new node terminates the random walk process after linking to a subset of visited nodes. The random walk is parameterized to link to visited nodes, jump back to the seed and traverse outgoing & incoming edges with differing probabilities.

Our model incorporates the resource constraints that influence edge formation as nodes concurrently acquire information and form edges by exploring neighborhoods of existing nodes, without access to the entire network. Moreover, it captures two important sociological phenomena – preferential attachment and triadic closure – as new nodes are more likely to visit & link to popular, high-degree nodes and neighbors of nodes to which it has already linked.

3 Results

We conduct extensive experiments to compare our model against four well-known baseline growth model, which are represetative of common edge formation mechanisms, on five large-scale citation networks. We show that our model can accurately preserve multiple structural properties – heavy-tailed degree distribution, skewed local clustering distribution, degree-clustering relationship – under accelerated network growth.

Preserving network structure — Figure 3Accuracy of growth models in preserving structural properties of the APS citation network

As shown in Figure 3, our model outperforms the baselines in jointly preserving all three properties. Note that the Dorogotsev-Samukhin-Mendes (DMS) model can only preserve the degree distribution and models such as Holme-Kim that preserve average clustering cannot preserve the skewed clustering distribution or the degree-clustering relationship. Section 7 of the paper discusses the experiments in detail.