Graph traversals question
Description
Unformatted Attachment Preview
This lecture is our first in a series on Graph Algorithms. We begin by studying methods of searching a
graph. In order to follow these lectures, it is important that you have a basic knowledge of graph definitions,
including adjacency lists and matrix representations. References are included on the class webpage.
In this series of lectures, we use the usual notation for a graph, G, which consists of a set of vertices, V ,
and a set of edges, E. Each edge is represented as (u, v) ? E where u, v are vertices in V . Furthermore,
following the notation of CLRS, we assume that a vertex v stores the neighbors in and adjacency list:
Adj [v].
1
Breadth-first search
We begin with one of the simplest graph searching algorithms. The breadth-first search algorithm searches
through the graph G starting at node s, typically referred to as the source. From s, the breadth-first search
eventually discovers all other vertices in the graph that are reachable from s. The BFS search algorithm
has the following properties:
FS discovers all vertices reachable from s.
t computes the shortest distance from s to every vertex in G
t produces a ²eadth-first search tree7ith root s that contains all reachable vertices from s.
he algorithm works by exploring all vertices at distance k from s before discovering any vertices at
distance k + 1
Before presenting the technical details of the algorithm, we begin with a demonstration of the search
process of BFS. The graph below begins with all vertices as ndiscovered(marked in red). BFS starts
the search procedure at a given source node, s. This source node is typically part of the input. This is the
first node that is visited (marked in blue in the figure below) by BFS.
Next, BFS visits all vertices that are at a distance of one from the source. Recall that we measure
distance as the number of edges from s to the vertex.
1
Next, BFS visits all nodes at minimum distance 2 from s, as seen below.
This search process continues until all vertices are visited.
Whenever a new node is discovered along a particular edge in the graph, that edge is added to the BFS
tree (shown as orange edges). This tree is like a 2ace/f the BFS search process. We shall see in the
next section that this tree is not necessarily unique. Depending on which vertices were visited along which
path, different BFS trees may result.
The shortest distance from the source to vertex v is given by the path of length 4 shown in the tree on
the right. Notice that there are certainly other paths in G that lead from s to v, however the path in the
BFS tree is guaranteed to have the shortest length.
This important property is stated below:
The shortest path from s to any vertex v in G is represented by a path the BFS tree
1.1
Additional tools required by the BFS algorithm:
In this section we look at the actual details of the BFS algorithm. Letàbegin by looking at the data
structures and graph attributes that are needed. The BFS algorithm processes the vertices of G by
2
marking them as ©sited/nce they are discovered. Therefore we will need some way to keep track of this
property. Secondly, we need a data structure that maintains a list of the vertices that should be searched
next – and that structure should ensure that the vertices are visited in order of distance from
s. We also need to keep track of which edges are part of the final BFS tree, and what the final shortest
distance is from the source to each vertex. The tools necessary for each of these requirements are explained
below:
1. Recording when a vertex has been visited:
Each vertex v contains an attribute v.visited which is a boolean variable that indicates if this vertex
has been visited. There are variations of this concept in different versions of BFS. You may come across
the use of colors, or other boolean variables. In most cases, the idea is the same: we need a key associated
to vertex v that stores whether or not that vertex has been visited. This attribute of the vertex v will be
set to true when that vertex is discovered by the BFS algorithm:
2. How to keep track of which vertices to visit next:
The BFS algorithm uses a QUEUE, Q, to maintain a list of the vertices to be visited. When a vertex
v is visited by BFS, the unvisited neighbors of v are added to the queue. The vertices are processed one
at a time by removing them (Dequeue) from the queue. An example of this process is shown below. In
this example, vertex A is removed from the queue and the neighbors of A, vertices B and C, are marked
as visited and added to the queue. At the next stage, vertex B is dequeued, and its unvisited neighbor
(vertex d) is marked as visited and added to the queue. The properties of the queue are such that we
enqueue on one end and dequeue on the other. Therefore the vertices are added to the queue in order of
their distance from s.
3. Recording the distance from s to v
The BFS procedure not only visits all the nodes in G that are reachable from s, but it also records the
minimum distance form s to any vertex v. This is done by using an additional attribute called v.distance
which represents the minimum path length from s to vertex v.
3
4. Building the breadth-first search tree during the search:
In order to store the structure of the BST tree, we need a method to record which vertices are discovered
along which path. This is done using the attributes v.children and v.parent. When a vertex v is removed
from the queue, his unvisited neighbors are added to the queue (as described above) and then we add these
neighbors as children of vertex v. In the example below left, when vertex A is removed from the queue,
the unvisited neighbors B and C are added to A.children. The parent pointers are set as: B.parent = A
and C.parent = A. The next vertex to be removed from the queue is vertex B (below right). The unvisited
neighbor D is added to B.children, and we set D.parent = B.
In summary then, each vertex uses the additional attributes which will be necessary for BFS.
v.distance,
v.parent, v.children, v.visited
We are now ready to describe the BFS algorithm using the above mentioned attributes, along with the
Queue Q.
1.2
The BFS algorithm
Input: The input to BFS is a graph G, consisting of a list of vertices V and a list of edges E. Recall
that the neighbors of a vertex v are stored in Adj [v]. In the example below, Adj [s] = {a, c}. The algorithm
also takes as input a source vertex s.
The Queue operations: The algorithm uses a queue Q, and the operation Enqueue(Q, v) in order to
add a vertex to the queue, and the operation Dequeue(Q) to remove a vertex from the queue.
BFS(G,s)
Step 1: For all vertices in G, initialize them as unvisited: for all v ? V set v.visited = f alse.
Initialize the parent pointers to NIL for each vertex. Initialize the list v.children to NIL for
each vertex.
Step 2: Node s is the starting node, set s.visited = true and s.distance = 0. Add s to Q.
4
Step 3: Now we are ready to carry out the search. Vertices are removed from Q one at a time. When
a vertex is visited, its unvisited neighbors are added to Q.
while Q 6= empty
u = DEQUEUE(Q)
for each v in Adj[u]
if v.visited = f alse
v.visited = true
v.distance = u.distance + 1
v.parent = u
Add v to u.children
EN QU EU E(Q, v)
We trace through the execution of the above algorithm using the example shown in step 2.
Vertex S is removed from Q. Neighbors C and A are added to Q and marked as visited. This distances
are set to s.distance + 1, and their parent pointers are set to s. Furthermore, both A and C are added to
s.children.
Vertex C is removed from Q. Neighbors B and G are added to Q and marked as visited. This distances
are set to C.distance + 1 = 1 + 1 = 2, and their parent pointers are set to C. Furthermore, both B and G
are added to C.children.
Vertex A is removed from Q. There are no unvisited neighbors.
Vertex B is removed from Q. Neighbor D is marked as visited and added to Q. We set D.distance =
B.distance + 1 = 2 + 1 = 3, and D.parent = B, and B.children = {D}.
5
Vertex G is removed from Q. Neighbors H, L, F are marked as visited and added to Q. We set each
distance to G.distance + 1 = 2 + 1 = 3, and we set each parent to G. Finally G.children = {H, L, F }.
Vertex D is dequeued. Neighbor E is marked as visited and enqueued. We set E.distance = D.distance+
1 = 3 + 1 = 4 and E.parent = D and D.children = {E}.
Vertex H is dequeued. Neighbor J is marked as visited and enqueued. We set J.distance = H.distance+
1 = 3 + 1 = 4, and H.children = {J} and J.parent = H.
Vertex L is dequeued. Neighbor K is enqueued. We set K.distance = L.distance + 1 = 3 + 1 = 4, and
L.children = {K} and K.parent = L.
6
Finally, the remaining vertices F, E, J and K are dequeued one by one. None of them have any unvisited
neighbors. The main while loop of BFS terminates. Each vertex has the correct shortest distance from s
shorted in v.distance, and the BFS tree is complete.
Runtime:
The initialization of BFS in Step 1 runs in time O(V ). During the execution of BFS, each vertex is
placed in the queue once. Therefore the runtime of adding and removing to Q over the entire execution
of BFS is O(V ). Secondly, the for loop above is executed for each vertex u in Adj [v] for every vertex v.
Since each edge is stored twice in the adjacency list, the body of the for loop is actually executed twice for
each edge in the tree. However, the run time of the for loop is still O(E). The overall runtime of BFS is
therefore O(V + E).
In the next section we look at a different search algorithm, one that doesn branch out in the same
way BFS does, but instead opts to search %eper)nto the graph G before branching.
2
Depth-first search
2.1
Overview
The depth-first approach is to search at s far as possible)n the graph before branching out. Imagine
searching along a path exploring unvisited vertices as far as possible. When there are no more unvisited
vertices to discover, the search backtracks and looks for paths from the last visited vertex. As with BFS,
the DFS algorithm can run on both undirected and directed graphs.
DFS takes as input both a graph G and a source vertex s, and similarly uses the attribute v.visited.
Initially, only the source vertex is marked as visited (in blue below):
Next, DFS searches along a neighbor of the current vertex, continuing in this way as far as possible:
7
At this point the search ¡cktracks!nd looks for a previous neighbor which leads to an undiscovered
node. The search then continues along edges to new unvisited neighbors:
Eventually all vertices are visited and DFS terminates:
Analogously, the path traced out by this DFS search is referred to as a DFS tree (shown in orange).
Notice that DFS tree in this example is very different from the BFS tree.
2.2
The DFS Algorithm
The DFS algorithm requires an attribute v.visited in order to keep track of which vertices have been
visited. The proper execution of the algorithm does not actually need anything else! However, as we shall
see in the next lecture, some other attributes may be useful once DFS has completd. Therefore we typically
use the following attributes:
he attribute v.visited is used to mark whether or not the vertex has been visited in the search
he attribute v.parent is used to store the parent node in the DFS tree
he attribute v.distance is used to store the distance from s to v in the DFS tree. Note that this is
not the minimum distance in the graph G.
The algorithm does not need an external data structure such as a queue or stack, because it can be written
recursively. It is the recursive algorithm that ensures that the search continues along edges to new vertices
as long as possible. The easiest way to trace through the algorithm is to keep track of the recursive calls.
We demonstrate the process on the small undirected graph below.
DFS(s)
Step 1: For all vertices in G, initialize them as unvisited: for all v ? V set v.visited = f alse.
Initialize the parent pointers to NIL for each vertex. Set s.distance = 0.
Step 2: Call the recursive algorithm DFS-visit(s). This is a call to the recursive algorithm below:
8
DFS-visit(u)
Mark node u as visited: u.visited = true
For each v ? Adj [u]
If v.visited = f alse
v.parent = u
v.distance = u.distance + 1
DFS-visit(v)
We now illustrate the execution of the above recursive algorithm. The first call is to DFS-visit(s), using
the source vertex, s:
DFS-visit(s): The call to DFS-visit(s) marks vertex s as visited, and the neighbors of s are processed
by the for loop. Since neighbor B is not visited, its parent is set to s, itàdistance is set to s.distance + 1 =
0 + 1 = 1 and a recursive call is made to DFS-visit(B). Note that the next neighbor of s, namely C, will
not be verified until DFS-visit(B) terminates.
DFS-visit(B): Vertex B is marked as visited, and suppose the first unvisited neighbor in the for loop
is D. The parent of D is set as B, and D.distance = 2. Next, a recursive call is made to DFS-visit(D).
DFS-visit(D): Vertex D is marked as visited, and suppose the first unvisited neighbor in the for loop
is F . The parent of F is set as D, and F.distance = 3. Next, a recursive call is made to DFS-visit(F).
DFS-visit(F): Vertex F is marked as visited, and the only unvisited neighbor is E. The parent of E is
set as F , and E.distance = 4. Next, a recursive call is made to DFS-visit(E).
DFS-visit(E): Vertex E is marked as visited. There are no unvisited neighbors, so no recursive calls
are made. The call to DFS-visit(E) terminates. The current stage of the recursive process is shown in the
right below:
9
At this point in the recursion tree, the next vertex to be processed is any other unvisited neighbors of
F . Again, there are none, so DFS-visit(F) terminates. The next vertex to be processed is any unvisited
neighbors of D. Neighbor C of vertex D is unvisited. The parent of C is set as D and C.distance = 3.
Next, a recursive call is made to DFS-visit(C).
DFS-visit(C): Vertex C is marked as visited. Since there are no unvisited neighbors, the call to DFSvisit(C) terminates. At this point in the recursion tree, the next vertex to be processed is any other
unvisited neighbors of D. There are none. So the next vertex is any unvisited neighbors of B. Again,
there are none. Finally, since vertex S also has no unvisited neighbors, the exectution of all levels of the
recursion tree terminates and DFS-visit(s) is complete.
Runtime:
The DFS algorithm marks each vertex as visited only once. For each vertex, the algorithm carries out
a for loop for each neighbor of v. Over all vertices in the graph, this is equivalent to doing a constant
amount of work for each edge in the tree. The total runtime is then ?(V + E).
10
2.3
Running DFS on the entire graph G
The DFS(s) algorithm only discovers nodes in the graph that are connected to the original node s (by
an undirected path for undirected graphs, or a directed path for directed graphs). The DFS(s) algorithm
may terminate before the entire graph G has been discovered. In order to visit each node of the graph, we
simply restart DFS as long as there are more unvisited nodes in G. The algorithm below does not take a
source vertex as input, and instead calls DFS-visit(v) for any unvisited node v. This is repeated until all
vertices of the graph are visited.
DFS(G)
Step 1: Initialize all vertices in G to unvisited. Initialize all parent pointers to NIL, and all distances
to 0.
Step 2: Now carry out DFS-visit as long as there are still unvisited vertices:
for all v ? V
if v.visited = f alse
DFS-visit(v)
2.4
DFS on directed graphs
The DFS algorithm on a directed graph works in the same way. The resulting DFS tree is stored in the
parent pointers which are set during the DFS algorithm. The result of calling DFS(G) may be a single
tree, or a set of trees called a DFS forest. The example below shows the result of DFS(G) on the directed
graph. The DFS forest is drawn on the right. The edges belonging to the DFS trees are shown in orange.
The remaining edges of the graph G that are not part of the DFS trees can be classified as either Back
edges, Forward Edges or Cross edges.
In summary, the completion of DFS(G) on a directed graph results in the following 4 types of edges:
ree Edges: edges that are in G and also part of the DFS tree
ack edges: non-tree edges (u, v) ? E that connect u to an ancestor v in the tree
orward edges: non-tree edges (u, v) that connect a vertex u to a descendant v in the DFS tree.
ross edges: all other edges
11
Graph Algorithms 2: Topological sort and Strongly
connected components
In this lecture we study algorithms on directed graphs. Our first algorithm is Topological sort which is a
sorting algorithm on the vertices of a directed graph. Suppose the nodes of a graph represent courses that
are required for a specific program, and directed edge (a, b) indicates that course a must be taken before
course b. If the requirement is to take all the courses in order to complete the program, then it would be
necessary to find an ordering of the courses that respects the prerequisite requirements. Such an ordering
is in fact a Topological sort: A linear ordering of the courses so that for all edges (a, b) in the graph, course
a precedes course b in the ordering:
1
Topological sort
Formally, a topological sort is a linear ordering of V on the graph G = (V, E) such that for all (u, v) ? E,
the vertex u appears before v in the ordering. If the graph contains a cycle, then no topological ordering
is possible. Therefore we consider only directed acyclic graphs in this section, commonly referred to as
dags. Note that a directed graph may have more than one possible toplogical ordering, as in the example
below:
The DFS algorithm from our last lecture is in fact at the heart of finding this topological order. LetÊstart by looking at an example of a dag and the resulting DFS tree. In the figure below, we show the
original graph G, and the result of DFS-visit(A). The DFS tree is drawn on the right, showing not only
the tree edges, but also the other edge types of G (back, forward, and cross). By examining the DFS tree
and the additional edges, it is relatively easy to identify a topological ordering: A, C, D, B, F, E. How
does this ordering relate to how the vertices were discovered by DFS? It is not exactly the order in which
they were discovered. Instead, this is in fact the order in which they were finished. Notice that vertex e
1
is the first vertex from which a ¡cktrack7as made. In other words, the order of the recursive calls
was DFS-visit(A), DFS-visit(C), DFS-visit(D), DFS-visit(E), and then DFS-visit(E) terminated because
there were not more unvisited neighbors. Next, DFS-visit(D) made a call to DFS-visit(F), which then also
terminates because it has no more unvisited neighbors. Therefore the second-last vertex to finish is vertex
F.
This is in fact the key to selecting a topological ordering from the DFS tree: write the nodes in
decreasing order of the )me!t which they were finished in the DFS algorithm. It is important to
note that the graph may have other topological sorts, but this method produces one of them.
1.1
The use of time-stamps in DFS
DFS can be implemented to keep track of a pair of time-stamps for each vertex. These time-stamps are
useful in the Topological sort algorithm and in many other algorithms.
Time stamps in DFS
The nit/f time is incremented each time a vertex is discovered, or when a vertex terminates
DFS-visit(v).
In the example below, we write next to each vertex a pair of time stamps that represent when the
DFS-visit(A) first discovers a vertex, and when it is finishing processing the neighbors of that vertex.
One confusion regarding DFS time-stamps is that they are often incorrectly associated withálking/n
the edges (probably due to many online animations that associate this walking withe the DFS algorithm).
In the example above, vertex A is discovered at time 1, then vertex C, then vertex D, then vertex E. At
this point it is )me4. Vertex E now terminates its DFS call, and this termination counts as`step
therefore its termination occurs at step 5. The finish time stamp of vertex E is 5. The next vertex to be
discovered is vertex F , which occurs at time step 6. Be careful! The time step is 6 – NOT 7. Often the
misconception is that we had to somehow walk over to vertex F , which takes 2 steps. This is incorrect.
2
Think of the time-stamps as the next event that happens, where events are either discovering or ending a
DFS call. Therefore vertex F is discovered next, at step 6, then next it terminates at time 7. The next
vertex to be discovered is vertex B, at time 8, it then terminates at time 9, and then finally D terminates
at time 10 since it has no more undiscovered neighbors. Finally C terminates at time 11, then A terminates
at time 12.
Finally, using the finish times of DFS, we can write the vertices is decreasing order of their finish time
and the result is a valid topological order, as shown in the example above.
The fact is given below:
Theorem 1. For a directed acyclic graph G, by ordering the vertices in decreasing order of their
finish time in DFS, the result is a topological ordering.
Proof: We just nee to argue that for any edge (u, v) ? E, the finish time of v is less than the finish time
of u. At some point during the execution of DFS, the algorithm must visit node u and consider the edge
(u, v). There are three distinct cases.
Case 1: If node v is not yet visited, then node v is visited after u, and DFS will completely finish
visiting v before it finishes u. This represents the case for edge (c, b) in the above example. When DFS
first visits vertex c, the neighbor b is not yet visited. This means that the finish time of c will be after that
of b.
Case 2:. If node v is already visited, but not yet finished, then this means there is a path from v to
u through some other part of the graph, and the edge (u, v) creates a cycle. This is impossible in a dag.
Case 3: If node v is already visited but is finished, then itàfinish time has already been stamped.
Therefore since we are still exploring u, itàfinish time will be greater than that of v. This is the case
with edge (f, e) in the above example. When DFS explores vertex f , its neighbour e is already finished.
Therefore the time stamp of f 0 s finish time will be greater than that of e.
1.2
Update DFS algorithm to record time stamps
It remains to show how to update the DFS algorithm so that we can record these time-stamps. We alter
the DFS-visit algorithm by adding a global variable called time, which is initialized as 0. When all the
neighbors of a vertex have been explored, the vertex is ©nished!nd we time-stamp the attribute v.f inish
with the current value of time
DFS-visit(u)
time = time + 1
Mark node u as visited: u.visited = true
u.start = time
For each v ? Adj [u]
If v.visited = f alse
v.parent = u
v.distance = u.distance + 1
DFS-visit(v)
time = time + 1
u.f inish = time
The topological sort algorithm follows directly from the information stored in each node after the
execution of this updated DFS:
Topological-sort(G)
Step 1: Call DFS(G) to compute the finish times.
Step 2: As each vertex is finished in DFS, insert that vertex at the front of the topological ordering.
Runtime:
3
The runtime of DFS is ?(V + E), and the extra time needed to simply place the vertices in a linked
list in the order that they are finished is O(1) per node. Therefore the overall runtime is ?(V + E).
2
Strongly Connected Components
One important notion that characterizes graphs is the notion of connectivity. For directed graphs, we can
describe connectivity in terms of the existence of directed paths between pairs of vertices.
A Strongly connected component, or SCC, of a graph is a maximal set of vertices such that
directed paths exist between every pair of vertices (from both u to v and v to u).
The directed graph below consists of four strongly connected components. Note that each vertex within
a strongly connected component can reach every other vertex along a path. Vertex G is in its own strongly
connected component, because there is no path to and from another vertex in G. Similarly, vertex H is in
its own SCC.
The algorithm that identifies these strongly connected components is again based on DFS. There are
three main steps:
Step 1: The first step is to call DFS(G) and compute the start/finish time stamps. The first vertex
that is picked to start DFS can be any of the vertices of the graph. Recall if DFS-visit() terminates before
visiting all vertices, it restarts from a new vertex. In the example below, DFS-visit starts on vertex E,
visits vertices G, F and then terminates at E. Next, it restarts at A, visits D, C, B and terminates at
A. Finally DFS-visit restarts at H and terminates. The corresponding DFS trees are shown on the right,
making up the DFS forest.
Step 2: Create a new graph which is the result of reversing all the edges in G. This graph is called
G .
T
4
Step 3:. Call DFS(GT ). In the main loop of DF S the vertices are considered in decreasing order of
finish time. In the example below, we start with DFS-visit(H) since it has the highest finish time. The
call terminates immediately since H has no neighbors. The next call is to DFS-visit(A) since it has the
next largest finish time. The resulting search visits A, B, C, D and then terminates. Next the call is made
to DFS-visit(E), with the third largest
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."
Our Service Charter
1. Professional & Expert Writers: Eminence Papers only hires the best. Our writers are specially selected and recruited, after which they undergo further training to perfect their skills for specialization purposes. Moreover, our writers are holders of masters and Ph.D. degrees. They have impressive academic records, besides being native English speakers.
2. Top Quality Papers: Our customers are always guaranteed of papers that exceed their expectations. All our writers have +5 years of experience. This implies that all papers are written by individuals who are experts in their fields. In addition, the quality team reviews all the papers before sending them to the customers.
3. Plagiarism-Free Papers: All papers provided by Eminence Papers are written from scratch. Appropriate referencing and citation of key information are followed. Plagiarism checkers are used by the Quality assurance team and our editors just to double-check that there are no instances of plagiarism.
4. Timely Delivery: Time wasted is equivalent to a failed dedication and commitment. Eminence Papers are known for the timely delivery of any pending customer orders. Customers are well informed of the progress of their papers to ensure they keep track of what the writer is providing before the final draft is sent for grading.
5. Affordable Prices: Our prices are fairly structured to fit in all groups. Any customer willing to place their assignments with us can do so at very affordable prices. In addition, our customers enjoy regular discounts and bonuses.
6. 24/7 Customer Support: At Eminence Papers, we have put in place a team of experts who answer all customer inquiries promptly. The best part is the ever-availability of the team. Customers can make inquiries anytime.