Graph Theory: Measures and Indices
Authors: Dr. Cesar Ducruet and Dr. Jean-Paul Rodrigue
1. Measures at the Network Level
Several measures and indices can be used to analyze the
network efficiency. Many of them were initially developed
by Kansky and can be used for:
Expressing the relationship between values and the network
structures they represent.
Comparing different transportation networks at a specific
point in time.
Comparing the evolution of a transport network at different
points in time.
Outside the description of the network size by the
number of nodes and edges, and its total
length and traffic, several measures are used to define
the structural attributes of a graph; the diameter, the
number of cycles and the order of a node.
Diameter (d).
The length of the shortest path between the most distanced
nodes of a graph is the diameter. d measures the
extent of a graph and the topological length between two
nodes.
The diameter enables to measure the
development of a network in time.
The higher diameter, the less linked a network tends to
be. In the case of a complex graph, the diameter can be
found with a topological distance matrix (Shimbel
distance), which computes for each node pair its minimal
topological distance. Graphs which extent remains constant,
but with a higher connectivity, have lower diameter values.
Planar networks often have a large diameter due to the presence
of many intermediate stops between two distant nodes.
Number of Cycles (u).
The maximum number of independent cycles in a graph. This
number (u) is estimated through the number of nodes
(v), links (e) and of sub-graphs (p).
Trees and simple networks have a value of 0 since they have
no cycles. The more complex a network is, the higher the
value of u, so it can be used as an indicator of the level
of development and complexity of a transport system.
2. Indices at the Network Level
Indices are more complex methods to represent the structural
properties of a graph since they involve the comparison
of a measure over another. Some indices take into account
spatial features (distance, surface) as well as the level
of activity (traffic), while others solely rest on the topological
dimension of the network.
Cost (C).
Represents the total length
of the network measured in real transport distances where
aij is the presence (1) or absence (0) of a link
between i and j and lij the length
of the link. This measure can also be calculated based on
two other dimensions of the network: the Minimum Spanning
Tree (MST) and the Greedy Triangulation (GT). The MST represents
the shortest and/or lowest cost subtree of the network;
it can be obtained by applying, among other shortest path
algorithms, the Kruskal algorithm, which allows finding
the lowest cost route connecting all nodes in the network.
The GT refers to the maximal connected planar graph keeping
the same number of nodes than in the original network but
adding all possible links without breaking its planarity.
Such operations take into account both the topology and
the geography of the network, while comparing the latter
with its optimal configurations. More efficient networks
have relative costs near to 1, while less efficient networks
are closer to 0.
Detour Index. A measure of the efficiency of a transport
network in terms of how well it overcomes distance or the
friction of distance. The closer the detour index gets to
1, the more the network is spatially efficient. Networks
having a detour index of 1 are rarely, if ever, seen and
most networks would fit on an asymptotic curve getting close
to 1, but never reaching it. For instance, the straight
distance, D(S), between two nodes may be 40 km but
the transport distance, D(T); real distance, is 50
km. The detour index is thus 0.8 (40 / 50). The
complexity of the topography
is often a good indicator of the level of detour.
In order to derive a measure of relative efficiency, the
Detour Index Relative Efficiency (Erel)
is thus the ratio between the Detour Index calculated from
the original network and the Detour Index calculated either
from the MST (minimum spanning tree) or the GT (greedy triangulation).
Network Density. Measures the territorial occupation
of a transport network in terms of km of links (L)
per square kilometers of surface (S). The higher
it is, the more a network is developed.
Pi Index. The relationship between the total length
of the graph L(G) and the distance along its diameter
D(d). It is labeled as Pi because of its similarity
with the real Pi value (3.14), which is expressing the ratio
between the circumference and the diameter of a circle.
A high index shows a developed network. It is a measure
of distance per units of diameter and an indicator of the
shape of a network.
Eta Index. Average length
per link. Adding new nodes will cause a decrease of Eta
as the average length per link declines.
Theta Index. Measures
the function of a node, which is the average amount of traffic
per intersection. The higher theta is, the greater the load
of the network. The measure can also be applied to the number
of links (edges).
Beta Index. Measures
the level of connectivity in a graph and is expressed by
the relationship between the number of links (e) over the
number of nodes (v). Trees and simple networks have Beta
value of less than one. A connected network with one cycle
has a value of 1. More complex networks have a value greater
than 1. In a network with a fixed number of nodes, the higher
the number of links, the higher the number of paths possible
in the network. Complex networks have a high value of Beta.
The rich-club coefficient is the Beta index applied to relations
among larger order (degree) nodes; it verifies whether the
connectivity is higher among larger degree nodes than for
the whole network.
Alpha Index. A measure
of connectivity which evaluates the number of cycles in
a graph in comparison with the maximum number of cycles.
The higher the alpha index, the more a network is connected.
Trees and simple networks will have a value of 0. A value
of 1 indicates a completely connected network. Measures
the level of connectivity independently of the number of
nodes. It is very rare that a network will have an alpha
value of 1, because this would imply very serious redundancies.
This index is also called Meshedness Coefficient in the
literature on planar networks.
(for planar graphs)(for
non-planar graphs)
Gamma Index (g). A
measure of connectivity that considers the relationship
between the number of observed links and the number of possible
links. The value of gamma is between 0 and 1 where a value
of 1 indicates a completely connected network and would
be extremely unlikely in reality. Gamma is an efficient
value to measure the progression of a network in time.
(for planar graphs)(for
non-planar graphs)
Being solely based on the number of nodes and links, Alpha,
Beta, and Gamma indices remain limited in revealing structural
differences between two networks of equal size. More robust
measures have thus been proposed by physics, which take
into account the internal complexity of the graph.
Hierarchy (h). The exponent of the slope
for the power-law line drawn in a bi-log plot of node frequency
over degree distribution. Networks characterized by strong
hierarchical configurations, such as scale-free networks
(few large degree nodes and many small degree nodes), often
have values over 1 or 2. A value lower than 1 indicates
the absence of scale-free properties and a limited hierarchy
among nodes.
Transitivity (t). Also called clustering
coefficient, it is the overall probability for the network
to have adjacent nodes interconnected, thus revealing the
existence of tightly connected communities (or clusters,
subgroups, cliques). It is calculated by the ratio between
the observed number of closed triplets and the maximum possible
number of closed triplets in the graph. Another way calculating
transitivity is to calculate the average clustering coefficient
of all nodes. Complex networks and notably small-world networks
often have a high transitivity and a low diameter. Because
triplets are not the only way for looking at neighborhood
density among nodes, this measure can be extended to cycles
of length 4 and 5.
Average shortest path length (s). A measure
of efficiency that is the average number of stops needed
to reach two distant nodes in the graph. The lower the result,
the more efficient the network in providing ease of circulation.
In comparison, the diameter is the maximum length of all
possible shortest paths.
Assortative coefficient (r). This coefficient
is the Pearson correlation between the order (degree) of
nodes at both ends of each link (edge) in the network. The
result ranges from -1 (low degree nodes often connect high
degree nodes) to 1 (nodes of equal or similar degree are
often connected). Disassortative networks (r is significantly
negative) are often those with strong hierarchical configurations
with large nodes connecting smaller nodes, as in scale-free
networks, while regular networks are often assortative.
3. Measures and Indices at the Node Level
Numerous measures exist for highlighting the situation of
a node in a network. Some of them are made at the "local
level" based on links with adjacent nodes, while others
on the "global level" consider the node's situation in the
whole network.
Order (degree) of a Node
(o). The number of its attached links and is
a simple, but effective measure of nodal importance. The
higher its value, the more a node is important in a graph
as many links converge to it. Hub nodes have a high order,
while terminal points have an order that can be as low as
1. A perfect hub would have its order equal to the summation
of all the orders of the other nodes in the graph and a
perfect spoke would have an order of 1.The percentage of
nodes directly connected in the entire graph is thus a measure
of reachability. An isolate is a node without connections
(degree equals to 0). The difference between in-degree and
out-degree in a directed graph (digraph) may underline interesting
functions of some nodes as attractors or senders. The order
may be calculated at different depths: adjacent nodes (depth
1), adjacent nodes of adjacent nodes (depth 2), etc. The
weighted degree is simply the total of values associated
with links.
Koenig number (or associated number, eccentricity).
A measure of farness based on the number of links needed
to reach the most distant node in the graph.
Shimbel Index (or Shimbel distance, nodal
accessibility, nodality). A measure of accessibility representing
the sum of the length of all shortest paths connecting all
other nodes in the graph. The inverse measure is also called
closeness centrality or distance centrality.
Betweenness centrality (or shortest-path
betweenness). A measure of accessibility that is the number
of times a node is crossed by shortest paths in the graph.
Anomalous centrality is detected when a node has a high
betweenness centrality and a low order (degree centrality),
as in air transport.
Hub Dependence (hd). A measure of node
vulnerability that is the share of the highest traffic link
in total traffic (weighted degree). Weak nodes depending
on few links will have a high hub dependence, especially
if they locate in the neighborhood of a large node, while
hubs will have a more even traffic distribution among their
connections. It indicates to what extent removing the largest
traffic link would affect the node's overall activity. The
measure can be extended to more links (2, 3 ... 10 largest
flow links).
Average nearest neighbors degree (knn).
A measure of neighborhood indicating the type of environment
in which the node situates. A node with low order (degree)
may be surrounded by a variety of other nodes, small or
large, which has a direct influence on its own centrality
and growth potential. A network is assortative or disassortative
depending on the similarity of the order (degree) among
neighboring nodes, which can be tested by means of Pearson
correlation (assortativity coefficient). Neighbor connectivity
is the correlation between the order (degree) of nodes and
the average order (degree) of their neighbors.
Cohesion index (ci). For a given (link)
edge ij, this index measures the ratio between
the number of common neighbors connected to nodes i and
j and the total number of their neighbors. Links with highest
values typically connect dense communities (or clusters)
in the graph, and can be removed in order to bisect the
graph and reveal such subgroups. Multiplying this index
by the weight (e.g. traffic) of links allows coupling topology
and flows. This cohesion index is also called Strength index
and it corresponds to the observed number of cycles of length
3 and 4 to which the edge belongs divided by the maximum
number of such cycles.
Within-module degree (Zi; or z-score).
Shows how well connected a node is to other nodes in the
same module (or cluster, community), where Ki is the order
(degree) of node i in the cluster si, Ksi is the average
order (degree) of all nodes in the cluster si, and δKsi
is the standard deviation of K in si. Because two nodes
having same z-score may play different roles within the
cluster, this measure is often compared with the Participation
Coefficient (Pi). Both measures are applied to nodes once
the clusters in the network are known.
Participation coefficient (Pi). Compares
the number of links (order, degree) of node i to nodes in
all clusters with its number of links within its own cluster.
Zi and Pi reveal whether nodes are truly hubs in the network,
while others are bound to local links and therefore do not
act as connectors between clusters.
Several critiques have been addressed to such indexes as
they do not always take into account the real length, quality,
and weight of the links; networks of equal size may exhibit
contrasted topological forms. However, they remain useful
for describing the changing structure of one given network.
Media
The Diameter of a Graph
The Diameter of a Graph - 2
Number of Cycles
Cost
The Effects of Topography on Route Selection
Pi Index and the Shape of Transportation Networks
Eta Index
Theta Index
Iota Index
Beta Index
Alpha Index
Gamma Index
Hierarchy
Transitivity
Order of a Node