Contents
- Guide to Data Structures & Algorithms for Coding Interviews
Guide to Data Structures & Algorithms for Coding Interviews
A practical guide to recognizing patterns, acing technical interviews, and thinking like a problem-solver.
The technical interview. It’s a daunting hurdle for aspiring software engineers. You’re faced with a blank whiteboard, a complex problem, and the ticking clock of an interviewer’s expectation. Many developers try to prepare by memorizing hundreds of individual problems, but this is a losing game. You can’t possibly predict the exact question you’ll be asked.
The secret isn’t about memorizing solutions. It’s about recognizing patterns. The goal is to shift your mindset from “Have I seen this exact problem before?” to “What category of problem is this, and what tools do I have to solve it?”
This guide will walk you through the essential data structures and algorithmic patterns that form the backbone of most interview questions. By understanding these core concepts, you’ll develop the mental toolkit to deconstruct any new problem and build a solution from the ground up.
Part 1: The Building Blocks - Core Data Structures
Think of data structures as the “nouns” of programming. They are the containers that hold and organize data in specific ways, making them efficient for certain operations. Choosing the right one is the first and most critical step in solving any problem.
HashTable (aka Hash Map or Dictionary)
A HashTable stores data in key-value pairs. It uses a hash function to compute an index from a key, allowing it to rapidly locate the data.
- Analogy: A real-world dictionary or a phone book. You use a unique name (the key) to instantly look up a definition or phone number (the value) without searching every page.
- When to use it: When you need near-instant lookups, insertions, or deletions. It’s the go-to for frequency counting (e.g., “how many times does each character appear in this string?”) or for caching results to avoid re-computing (a technique called memoization). The classic “Two Sum” problem is a perfect example of its power.
- Complexity: Average time for access, insert, and delete is O(1). However, in the worst case, multiple keys can generate the same index (a “hash collision”). When this happens, performance can degrade to O(N) as the structure has to search through a list of items in that single “bucket”. A well-designed hash function minimizes these collisions.
Code Example (Python Dictionary for Frequency Count):
word = "interview"
char_counts = {}
for char in word:
char_counts[char] = char_counts.get(char, 0) + 1
# char_counts is now {'i': 2, 'n': 1, 't': 1, 'e': 2, 'r': 1, 'v': 1, 'w': 1}
print(char_counts['e']) # Output: 2
Linked List
A Linked List is a sequence of nodes, where each node contains data and a pointer to the next node. Unlike arrays, elements are not stored in contiguous memory, meaning they can be scattered all over in memory.
- Analogy: A scavenger hunt. Each clue (node) contains a piece of information and tells you exactly where to find the next one.
- Variations:
- Singly Linked List: Traversal is one-way, forward only.
- Doubly Linked List: Each node also points to the previous node, allowing for efficient forward and backward traversal. This makes certain operations, like deleting a node, much easier.
- When to use it: When your data size is dynamic and you need efficient insertions and deletions at the beginning or end of a sequence. Arrays are faster for access by index (due to better cache locality), but resizing them can be a slow O(N) operation.
- Complexity: Accessing an element by index is O(N) because you must traverse from the head. Inserting/deleting at the beginning is a speedy O(1).
Code Example (Simple Node Implementation):
class ListNode:
def __init__(self, value=0, next=None):
self.value = value
self.next = next
# Creating a simple linked list: 1 -> 2 -> 3
head = ListNode(1)
head.next = ListNode(2)
head.next.next = ListNode(3)
# Traverse the list
current = head
while current:
print(current.value, end=" -> ")
current = current.next
# Output: 1 -> 2 -> 3 ->
Stacks & Queues
These are abstract data types defined by their access rules. They can be implemented using arrays or linked lists.
- Stack (LIFO): Last-In, First-Out.
- Analogy: A stack of plates or the “Undo” function in an editor. The last action you took is the first one to be undone.
- When to use it: Managing function calls (the call stack), checking for balanced parentheses in an expression, and as the underlying mechanism for Depth-First Search (DFS).
Code Example (Using a Python list as a stack):
stack = []
stack.append('a')
# Push
stack.append('b')
stack.append('c')
print(stack.pop())
# Pop: 'c'
print(stack.pop())
# Pop: 'b'
- Queue (FIFO): First-In, First-Out.
- Analogy: A line at a coffee shop or a printer queue. The first job submitted is the first one to be printed.
- When to use it: Processing tasks in the order they were received, managing server requests, and as the engine for Breadth-First Search (BFS).
Code Example (Using Python’s deque for an efficient queue):
from collections import deque
queue = deque()
queue.append('a')
# Enqueue
queue.append('b')
queue.append('c')
print(queue.popleft())
# Dequeue: 'a'
print(queue.popleft())
# Dequeue: 'b'
Trees
Trees are hierarchical structures with a root node and child nodes, representing parent-child relationships.
- Binary Search Tree (BST): A tree where for any given node, all left children have smaller values and all right children have larger values. This ordering is key. If a BST becomes unbalanced (e.g., you insert sorted numbers), it can degenerate into a linked list, making search times O(N).
- When to use it: When you need to keep data ordered to allow for efficient range queries (e.g., find all numbers between 10 and 50) in addition to fast search, insert, and delete.
- Complexity (Balanced): Access, Insert, Delete are a very efficient O(log N).
Code Example (Simple Tree Node):
class TreeNode:
def __init__(self, value=0, left=None, right=None):
self.value = value
self.left = left
self.right = right
# Creating a simple BST:
# 10
# / \
# 5 15
root = TreeNode(10)
root.left = TreeNode(5)
root.right = TreeNode(15)
- Heap (Priority Queue): A tree where parents are always larger (max-heap) or smaller (min-heap) than all of their children. It doesn’t sort the entire collection, it just ensures the top element is always the min/max.
- When to use it: When you only care about the largest or smallest element at any given time. It’s perfect for scheduling (highest priority task first) or problems like “Find the Top K Elements” and “Find Median from Data Stream.” Think of an emergency room triage system-the most critical patient (highest priority) is always seen next, regardless of who arrived first.
- Complexity: Find Min/Max is O(1); Insert/Delete is O(log N).
Code Example (Python’s heapq for a Min-Heap):
import heapq
# Find the 3 smallest numbers
data = [3, 8, 1, 9, 4, 2]
heapq.heapify(data)
# Transforms list into a heap, in-place, in O(n) time
print(heapq.heappop(data))
# Pop smallest: 1
print(heapq.heappop(data))
# Pop smallest: 2
Graph
A Graph consists of nodes (vertices) connected by edges. They are the ultimate structure for modeling networks and relationships.
- Analogy: A social network (people are nodes, friendships are edges), a city map (intersections are nodes, roads are edges), or the internet itself.
- When to use it: For any problem involving paths, connections, or dependencies. This includes pathfinding algorithms (like GPS navigation), checking network connectivity, and dependency resolution (e.g., figuring out the build order for a software project based on library dependencies).
Code Example (Adjacency List Representation):
# A -> B, A -> C, B -> C
graph = {
'A': ['B', 'C'],
'B': ['C'],
'C': []
}
print(graph['A']) # Output: ['B', 'C']
Part 2: The Playbook - Common Algorithmic Patterns
If data structures are the nouns, algorithms are the verbs. These are the repeatable strategies you’ll apply to solve problems.
🧩 Sliding Window
This pattern is for problems involving a contiguous subarray or substring. You create a “window” with start and end pointers. This window expands by moving the end pointer and shrinks by moving the start pointer, efficiently scanning the data to find a segment that meets a certain criteria without re-evaluating the entire segment each time.
- When to use it: Problems asking for the longest/shortest/best subarray or substring. Look for keywords like “contiguous,” “subarray,” or “substring.”
- Complexity: Time is typically O(N) because each element is visited at most twice. Space is O(1) or O(K) if a data structure is needed to track the elements in the window (where K is the window size).
- Key Question: “Find the maximum sum of any contiguous subarray of size ‘k’.”
Example Solution:
def max_sum_subarray(arr, k):
max_sum = 0
window_sum = 0
window_start = 0
for window_end in range(len(arr)):
window_sum += arr[window_end]
if window_end >= k - 1:
max_sum = max(max_sum, window_sum)
window_sum -= arr[window_start]
window_start += 1
return max_sum
# Example: max_sum_subarray([2, 1, 5, 1, 3, 2], 3) -> 9
👉 Two Pointers
This technique uses two pointers that move through a data structure, often in relation to each other, to solve a problem in a single pass.
- Variations:
- Opposite Ends: One pointer starts at the beginning, one at the end, and they move toward each other. This is extremely effective for finding pairs in a sorted array, because the sorted nature allows you to make intelligent decisions about which pointer to move next.
- Fast & Slow (Tortoise and Hare): One pointer moves one step at a time (slow) while the other moves two steps (fast). If they ever meet, you’ve detected a cycle. This is the canonical solution for finding cycles in a linked list.
- Complexity: Time is O(N). If the array must be sorted first, it becomes O(N log N). Space is usually O(1) unless sorting requires extra space.
- Key Question: “Given a sorted array, find two numbers that sum to a target.”
Example Solution:
def two_sum_sorted(arr, target):
left, right = 0, len(arr) - 1
while left < right:
current_sum = arr[left] + arr[right]
if current_sum == target:
return [left + 1, right + 1]
elif current_sum < target:
left += 1
else:
right -= 1
return []
# Example: two_sum_sorted([2, 7, 11, 15], 9) -> [1, 2]
🌳 Tree & Graph Traversal (BFS & DFS)
These are the two fundamental ways to explore every node in a tree or graph.
- Complexity: For both BFS and DFS, time is O(V + E) where V is the number of vertices and E is the number of edges. Space is O(V) for storing nodes in the queue or on the recursion stack.
- Breadth-First Search (BFS): Explores level by level. It uses a Queue to visit all neighbors at the present depth before moving on to the next level.
- Analogy: The ripple effect from a stone dropped in water. It explores everything one step away, then two steps away, and so on.
- When to use it: Finding the shortest path in an unweighted graph or any problem involving level-by-level traversal.
- Key Question (BFS): “Given a binary tree, return the level order traversal of its nodes’ values.”
Example Solution (BFS):
from collections import deque
def level_order_traversal(root):
if not root:
return []
result = []
queue = deque([root])
while queue:
level_size = len(queue)
current_level = []
for _ in range(level_size):
node = queue.popleft()
current_level.append(node.value)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
result.append(current_level)
return result
# Assuming TreeNode class from earlier
# root = TreeNode(3, TreeNode(9), TreeNode(20, TreeNode(15), TreeNode(7)))
# level_order_traversal(root) would return [[3], [9, 20], [15, 7]]
- Depth-First Search (DFS): Goes as deep as possible down one path before backtracking. It naturally uses Recursion (or an explicit Stack). Think of it as exploring a maze by taking one path until you hit a dead end, then backtracking to the last junction to try a different way.
- When to use it: Checking for connectivity, finding a path between two nodes (not necessarily the shortest), or counting connected components.
- Key Question (DFS): “Given a grid of ‘1’s (land) and ‘0’s (water), count the islands.”
Example Solution (DFS):
def num_islands(grid):
if not grid: return 0
rows, cols = len(grid), len(grid[0])
island_count = 0
def dfs(r, c):
if r < 0 or c < 0 or r >= rows or c >= cols or grid[r][c] == '0':
return
grid[r][c] = '0'
dfs(r + 1, c); dfs(r - 1, c); dfs(r, c + 1); dfs(r, c - 1)
for r in range(rows):
for c in range(cols):
if grid[r][c] == '1':
island_count += 1
dfs(r, c)
return island_count
🔄 Backtracking
Backtracking is a refined brute-force technique for exploring all potential solutions to a problem and discarding those that don’t satisfy the constraints. It incrementally builds candidates and abandons a candidate (“backtracks”) as soon as it determines it cannot possibly lead to a valid solution.
- When to use it: When you need to find all possible solutions to a problem involving choices and constraints. Think generating permutations, combinations, or solving puzzles like Sudoku or N-Queens.
- Complexity: Time is often exponential (e.g., O(2^N) for subsets, O(N!) for permutations) because it explores all possible candidates. Space is typically O(N) for the recursion call stack depth.
- Key Question: “Given a set of distinct integers, return all possible subsets.”
Example Solution:
def subsets(nums):
result = []
def backtrack(start_index, current_subset):
result.append(list(current_subset))
for i in range(start_index, len(nums)):
current_subset.append(nums[i])
backtrack(i + 1, current_subset)
current_subset.pop()
backtrack(0, [])
return result
# Example: subsets([1, 2, 3]) -> [[], [1], [1, 2], [1, 2, 3], [1, 3], [2], [2, 3], [3]]
📈 Dynamic Programming (DP)
DP is a powerful technique for solving complex problems by breaking them down into simpler, overlapping subproblems. The core idea is to solve each subproblem only once and store its result (memoization or tabulation), avoiding redundant work.
- When to use it: Look for problems with two key properties: 1) Overlapping Subproblems (solutions to subproblems are reused multiple times) and 2) Optimal Substructure (the optimal solution can be constructed from the optimal solutions of its subproblems).
- Complexity: Time and space are usually polynomial, corresponding to the dimensions of the DP table (e.g., O(N^2) or O(N * M)). Space can often be optimized.
- Key Question: “How many distinct ways can you climb n stairs if you can take 1 or 2 steps at a time?”
Example Solution:
def climb_stairs(n):
if n <= 2: return n
prev2, prev1 = 1, 2
for i in range(3, n + 1):
current = prev1 + prev2
prev2 = prev1
prev1 = current
return prev1
# Example: climb_stairs(5) -> 8
⏳ Merge Intervals
This pattern deals with problems involving overlapping intervals. The strategy is almost always the same: sort the intervals by their start time. This crucial first step makes it simple to iterate through and check for overlaps with the preceding interval.
- When to use it: Scheduling problems (“how many conference rooms are needed?”), geometric overlaps, or anything involving ranges on a number line.
- Complexity: Time is dominated by the initial sort, making it O(N log N). Space is O(N) to store the sorted intervals and the merged result.
- Key Question: “Given a collection of intervals, merge all overlapping intervals.”
Example Solution:
def merge(intervals):
if not intervals:
return []
intervals.sort(key=lambda x: x[0])
merged = [intervals[0]]
for current in intervals[1:]:
last = merged[-1]
if current[0] <= last[1]:
last[1] = max(last[1], current[1])
else:
merged.append(current)
return merged
# Example: merge([[1,3],[2,6],[8,10],[15,18]]) -> [[1,6],[8,10],[15,18]]
Final Thoughts
Mastering these patterns won’t happen overnight. It takes consistent, deliberate practice. The next time you’re stuck on a problem on LeetCode or HackerRank, don’t just jump to the solution. Take a step back and engage in a mental checklist:
- What data structure best represents the input and the operations I need to perform?
- Does this problem involve a contiguous subarray? (→ Sliding Window)
- Is the input a sorted array or a linked list? (→ Two Pointers)
- Do I need to find the shortest path or explore level by level? (→ BFS)
- Do I need to explore all possible combinations or permutations? (→ Backtracking)
- Can I build the solution for n from the solutions for n-1 and n-2? (→ Dynamic Programming)
As you get more comfortable, you’ll start to see even more specialized patterns emerge. Keep an eye out for problems that might require:
- Topological Sort: For ordering tasks with dependencies (e.g., course prerequisites).
- Two Heaps: For finding the median in a constantly changing stream of numbers.
- Trie (Prefix Tree): For efficiently searching words by their prefix (e.g., an autocomplete feature).
By asking these questions, you’ll transform from a solution-memorizer into a true problem-solver. That’s the skill that interviewers are really looking for, and it’s the skill that will make you a more effective and confident engineer.
Happy coding!
Contacts
If you want to keep updated with my latest articles and projects follow me on Medium and subscribe to my mailing list. These are some of my contacts details:
