Oct 17, 2024
Graph Representation and Coding Interview Questions
Graphs are a data structure that shows how objects relate to each other. A graph has vertices (or nodes) and edges that connect them. Graphs can be directed or undirected, weighted or unweighted, and can have cycles or be acyclic.
Types of Graph Representation
Adjacency Matrix
An adjacency matrix is a 2D array where the element at row i and column j tells you if there's an edge between vertex i and vertex j. This representation works well for dense graphs where there are many edges.
Example:
# Adjacency Matrix Representation
graph = [
[0, 1, 0, 0],
[1, 0, 1, 1],
[0, 1, 0, 1],
[0, 1, 1, 0]
]
Weighted Adjacency Matrix
A weighted adjacency matrix is a 2D array where the element at row i and column j stores the weight of the edge between vertex i and vertex j. You can use 0 or None to show that no edge exists between those vertices.
Example:
# Weighted Adjacency Matrix Representation
graph = [
[0, 2, 0, 0],
[2, 0, 3, 1],
[0, 3, 0, 4],
[0, 1, 4, 0]
]
Adjacency List
An adjacency list is an array of lists. The index represents the vertex, and each element in the list contains the vertices adjacent to it. This is the go-to representation for sparse graphs where there aren't many edges.
Example:
# Adjacency List Representation
graph = {
0: [1],
1: [0, 2, 3],
2: [1, 3],
3: [1, 2]
}
Weighted Adjacency List
A weighted adjacency list is an array of lists where each list contains tuples. The index represents the vertex, and each tuple in the list has an adjacent vertex and the weight of the edge connecting them. This works well for sparse graphs with weighted edges.
Example:
# Weighted Adjacency List Representation
graph = {
0: [(1, 2)],
1: [(0, 2), (2, 3), (3, 1)],
2: [(1, 3), (3, 4)],
3: [(1, 1), (2, 4)]
}
Edge List
An edge list is exactly what it sounds like. It is a list of all edges in the graph, where each edge is represented as a pair of vertices. This format is handy for algorithms that work with edges directly.
Example:
# Edge List Representation
edges = [
(0, 1),
(1, 2),
(1, 3),
(2, 3)
]
Weighted Edge List
A weighted edge list is a list of all edges where each edge is a tuple containing two vertices and the weight of the edge. This format is useful for edge-centric algorithms that need to consider weights, like Kruskal's algorithm for finding a Minimum Spanning Tree.
Example:
# Weighted Edge List Representation
edges = [
(0, 1, 2),
(1, 2, 3),
(1, 3, 1),
(2, 3, 4)
]
Pros and Cons of Different Graph Representations
Adjacency Matrix
Pros:
- Simple and easy to implement.
- Efficient for dense graphs where the number of edges is close to the maximum possible.
- Checking if an edge exists between two vertices is an O(1) operation.
Cons:
- Takes up O(V²) space, which is wasteful for large, sparse graphs.
- Iterating through all edges takes O(V²) time, which is slow for sparse graphs.
Adjacency List
Pros:
- More space-efficient for sparse graphs, using O(V + E) space.
- Faster iteration over all edges, taking O(V + E) time.
Cons:
- Checking if a specific edge exists can take O(V) time in the worst case.
- A bit more complex to implement than an adjacency matrix.
Edge List
Pros:
- Very space-efficient for storing edges, especially in sparse graphs.
- Simple to implement and understand.
- Great for edge-centric algorithms like Kruskal's algorithm.
Cons:
- Checking if an edge exists can be slow, requiring O(E) time.
- Not ideal for algorithms that need quick access to all neighbors of a vertex.
Each representation has trade-offs. The right choice depends on your specific problem. Think about how dense the graph is and what operations you need to perform.
Coding Interview Questions
For the questions below, we assume the graph is represented as an adjacency list. Here is the format:
graph = {
0: [1],
1: [0, 2, 3],
2: [1, 3],
3: [1, 2]
}
Graph Traversal
Problem: Implement Depth-First Search (DFS) and Breadth-First Search (BFS) for a given graph.
Solution:
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
for next in graph[start]:
if next not in visited:
dfs(graph, next, visited)
return visited
def bfs(graph, start):
visited = set()
queue = [start]
while queue:
vertex = queue.pop(0)
if vertex not in visited:
visited.add(vertex)
queue.extend(set(graph[vertex]) - visited)
return visited
Cycle Detection for Directed Graph
Problem: Determine if a cycle exists in a given directed graph. In a directed graph, being able to travel from A to B does not mean you can travel from B to A.
Main Idea: Use depth-first search (DFS) to detect cycles by tracking visited nodes and the recursion stack.
Solution:
def has_cycle(graph):
def visit(vertex, visited, rec_stack):
visited.add(vertex)
rec_stack.add(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
if visit(neighbor, visited, rec_stack):
return True
elif neighbor in rec_stack:
return True
rec_stack.remove(vertex)
return False
visited = set()
rec_stack = set()
for node in graph:
if node not in visited:
if visit(node, visited, rec_stack):
return True
return False
Cycle Detection for Undirected Graph
Explanation: In an undirected graph, each edge goes both ways. If there is an edge from node A to node B, there is also an edge from node B to node A. Detecting cycles in these graphs requires care. You do not want to mistake the two-way nature of edges for a cycle. The approach uses DFS while keeping track of the parent node so you do not count going back to where you came from as a cycle.
Solution:
def has_cycle_undirected(graph):
def visit(vertex, visited, parent):
visited.add(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
if visit(neighbor, visited, vertex):
return True
elif parent is not None and neighbor != parent:
return True
return False
visited = set()
for node in graph:
if node not in visited:
if visit(node, visited, None):
return True
return False
This solution uses DFS similar to the directed graph case, but it adds a parent parameter to track which node we came from. This prevents the algorithm from falsely detecting a cycle when it encounters the edge that brought it to the current node.
Shortest Path for Unweighted Graph
Problem: Find the shortest path between two nodes in an unweighted graph.
Main Idea: Use breadth-first search (BFS) to explore all possible paths from the start node. BFS naturally finds the shortest path in unweighted graphs because it explores all nodes at distance k before moving to distance k+1.
Solution:
from collections import deque
def shortest_path(graph, start, goal):
queue = deque([(start, [start])])
while queue:
(vertex, path) = queue.popleft()
for next in graph[vertex] - set(path):
if next == goal:
return path + [next]
else:
queue.append((next, path + [next]))
return None
Shortest Path for Weighted Graph
Problem: Find the shortest path between two nodes in a weighted graph.
Main Idea: Use Dijkstra's algorithm, which is the standard approach for finding shortest paths in graphs with non-negative edge weights. The algorithm uses a priority queue to always explore the node with the smallest current distance first. As it explores, it updates the shortest known distance to each node.
Solution:
import heapq
def dijkstra(graph, start, goal):
# Priority queue to store (distance, vertex) tuples
queue = [(0, start)]
# Dictionary to store the shortest distance to each node
distances = {start: 0}
# Dictionary to store the path taken to reach each node
previous_nodes = {start: None}
while queue:
current_distance, current_vertex = heapq.heappop(queue)
# If we reach the goal, reconstruct the path
if current_vertex == goal:
path = []
while current_vertex is not None:
path.append(current_vertex)
current_vertex = previous_nodes[current_vertex]
return path[::-1]
# Explore neighbors
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
# If a shorter path to the neighbor is found
if neighbor not in distances or distance < distances[neighbor]:
distances[neighbor] = distance
previous_nodes[neighbor] = current_vertex
heapq.heappush(queue, (distance, neighbor))
return None
This implementation of Dijkstra's algorithm uses a priority queue to always expand the least costly node first, which guarantees that the shortest path is found. The algorithm also keeps track of previous nodes so you can reconstruct the actual path once you reach the goal node.