Data Structures

02-03-2020 Data Structures Comments Word Count: 4.7k(words) Read Count: 21(minutes)

Bianry Search

Use O(1) to make a T(n) problem T(n/2). So the total time complexity would be O(logn).
Ignore half of the problems in O(1) time.
First Bad Version
Search in a Big Sorted Array. Double the search number every time, then serach from n to 2n
Find Minimum in Rotated Sorted Array. Divide the array into a normal sorted array and a smaller rotated sorted array.
Search in a 2D matrix. Perform Binary Search 2 times. First find the row, then perform binary search on that row.
Search for a Range. Find the target number then perform binary search on its left and right.
Maximum Number in Mountain Sequence. Check mid number to see if it is incresing or decresing, we can return any mountain value.

Bianry Tree and Divide & Conquer

There are 3 ways to traverse a binary tree. Pre order, In order and post order
You need to know both resursive version and iterative version.
DFS includes pre order, in order, post order, divide & conquer
Bianry tree has height from O(logn) to O(n), traverse a binary tree takes O(n) time.
In order traverse a binary search tree is an increasing order traverse.
Maximum depth of Binary tree. Resursive: Divide and conquer/ traverse
Subtree with Maximum Average
Lowest Common Ancestor: Divide & Conquer, check leftsubtree and rightsubtree. See if they return A or B or null
Validate Binary Search Tree. Bottom up, return max and min return to the parent node.
Convert Binary Search Tree to Doubly Linked List. Inorder traverse it into a list and convert it.
Flattern Binary Tree to Linked List. node.right = head of left subtree, tail of left subtree = head of right subtree.
Insert and delete a node from a Binary Search Tree

BFS

1. make a list and put all start nodes into the list
1. traverse the list to get new nodes for the next level
1. use Queue for BFS
Graph BFS
Social Network
六度理论你和世界上任何一个人之间最多间隔6个人 Linkedin BFS看第几层有相同好友
树和图的区别，树是单向的，图可以是双向的（或者单项）
在遍历图中，会有环，可以用HashSet检查是否曾经访问过某个节点
树的特性，树中如果有N个点，就会有N-1条边。树中的所有点连通。用Queue和Set访问并保存所有Graph中的点
构造Graph，可以用 Map<Integer, Set<Integer>> Graph其实就是保存着（每个点和与他相邻的点的Set）的Map
BFS on a 2D-array
Number of islands
Wall and Gates
什么时候使用BFS？
1. Tree的层次遍历
1. 2D-array中求连通性，灌水
1. 拓扑排序
1. 图的最短路径

DFS

Recursion
Combination
Permutation
Graph
Non-Recursion
什么时候使用DFS？
1. 找所有方案，排列，组合
1. 找最优方案（最短，最长）（大部分是动态规划，也有可能是DFS）
Questions
1. Combination sum
1. Palindrome Partitioning 所有的切割问题都是组合问题
切割abc字符串，有两个切割位 a1b2c，可以切ab之间也可以切bc之间，把数字12放入代表切割位，所以切割的方式有
a b c [1,2]
a bc [1]
ab c [2]
abc []
以上可以看出四种切割方式可以用数字表示
所以n个字母的切割问题可以看作是n - 1个数字的组合问题
Backtracking
Permutation的去重与Combination相似，可以先定义一个boolean数组存放visited信息
N皇后
每一行和每一列都是1…n的一种排列
如何判断皇后在同一斜线上：横坐标与纵坐标之差相等，或者横坐标与纵坐标之和相等
Word Ladder: BFS, 把ListWord转换成graph的形式，对于每个word，先找到所有可能的变换，总共有单词长度*26种，然后这些变换中存在于Wordlist中的就是他的neighbor，就可以变成标准的BFS问题
Word Ladder 2：Backtracking + BFS，先从End往Begin做BFS找到每个单词距离End的长度，保证Backtrack时不会遍历更远的单词。再从Begin往End做backtracking

Linked List and Array

Reverse linked list
Reverse linked list in k group
Use dummy node to save the list head
画图去理解linked list的变化
dummy node practices

partition-list
merge two sorted list
reverse linked list 2
swap two nodes in linked list
reorder list
rotate list

Copy list with random pointer -> similar to clone graph
Linked List Cycle 1 and 2 -> Floyd’s Tortoise and Hare
Sort List -> how many sort algorithm has time complexity O(nlogn)?

Quick Sort, with O(1) Space Complexity
Merge Sort, with O(n) space
Heap Sort

Merge two sorted array
Merge small array into big array
intersection of two arrays
Median of two sorted arrays -> find the (m + n) / 2th number, use O(logn)
Median
Kth largest element
Merge Sort practice
Quick Sort practice
Same integers will not remain their relative order in the old list after using quick sort. Merge sort will keep the relative order of same integers
Merge sort will first split the array into two subarray until there is only one number left in the array, then sort, then merge the two sorted array back to one array. From small array to big array
Qucik sort will first pick a pivot value, sort the entire array based on the pivot value. Then sort the two small array. From big array two small array
Note in Quick Sort, when we find a value equals to the pivot value, we don’t swap it, leave it as where it is. We need to split these values as even as possible.
Quick sort while loop condition: while (left <= right). Don’t forget the equal sign

Subarray

PrefixSum, HashMap to store prefixsum, then linear scan, So O(n) time and space
Maximum subarray
Minimum subarray
subarray sum equals k
number of subarray equals k

Two pointers

O(n) time complexity
同向双指针
Move zeros
Remove duplicate number in array
Remove duplicate numbers in sorted array -> tortise and hare cycle detection
相向双指针
valid palindrome
rotate string -> three step reverse, when reverse a string, one start from begin and another start from end
recover rotated sorted array -> similar to above, find the rotate point, three step reverse.
Two Sum 1-7
Two Sum 1 -> 最普通的Two Sum，可以使用HashMap，或者先排序在使用相向双指针，这样可以不是用额外空间，但是时间复杂度是O(nlogn)
Two Sum 2 - Data Structure Design -> HashMap
Two Sum 3 - Array is sorted -> 相向双指针
Two sum 7 - unique pairs -> 找到所有符合的数对,相向双指针，当找到一对时，left++,right–,并继续。如果遇到重复比如1,1,3,4,4,每次我们移动时，移动到与之前的数不一样的数为止。注意此方法要求数组是有序的。
3Sum -> 使用Two sum的结果，对于array中的每一个不等的数，检查是否有一对数的和等于它的相反数。可以使用HashMap或者two pointers，记住two pointers必须先排序数组。Two pointers的速度远快于HashMap
Valid Triangle Number -> 检查有多少组三个数可以组成合法的三角形的三条边。注意我们只需要检查最小的两个数的和大于第三个数即可. 从最大的数开始loop，使用Two pointers，对于每一个符合的left and right，所有大于left的数都是符合的，所以每当我们找到一组合法的left and right，count += （right - left）. 详情查看leetcode 611
对于求2个变量如何组合的问题，可以循环其中一个变量，然后研究另外一个变量如何变化
对于求3个变量如何组合的问题，可以循环其中一个变量，然后研究另外2个变量如何变化
对于求4个变量如何组合的问题，可以循环其中两个变量，然后研究另外2个变量如何变化
Two Sum less than k
Two Sum greater than k -> same as triangle count
Two Sum closest to target -> two pointers start from begin and end, keep updating diff
3 Sum closest to target -> similar to above, loop one value and use two sum as a template
4 Sum
Two Sum - difference equals to target -> 同向双指针。两数之差等于target，先排序，两个pointers都在开始，如果他们的差大于target，小数往后移，如果他们的差小于target，大数往后移 O(n) time complexity
Partition array
move elements < k to the left and elements >= k to the right, this is a smaller step in quick sort
Parition array while loop condition: while (left < right). No equal sign
Letters by case
Partition array by parity
Interleaving positive and negative numbers -> first count positive and negative numbers, then decide starts from positive or negative, then use two pointers in the same direction.

Sort Colors

sort 3 colors in group, we can use partition array two times, first time split one color, next split the rest two colors.
We could also count the number of each color and modify the array to match the occurance of colors
three pointers
i pointer only moves when find ones and zeros, if it finds two, it will swap it with right. Throw two to the right
so left pointer will never find two.
so when i pointer finds zero, after swap it with left pointer, throw zero to the left, left pointer will only throw one back, so they can both move to right by 1

Sort Colors 2 (rainbow sort) -> 使用多次Partition，每次将一般的颜色分到array的左右，比如只有四种颜色，一次partition将1，2分到左边，3，4分到右边。花费O（n）的时间将T(k)的问题变成T(k/2)的问题。总共的时间复杂度为O(nlogk)
这个解法有点像Quick Sort。注意Quick sort and Rainbow sort的while loop条件均为 while(left < right)。可以不用要等号
其他比较高频的排序方法：
Pancake Sort
Sleep Sort
Spaghetti Sort
Bogo Sort

3 Step Reverse

Three step reverse
Recover rotated sorted array -> First use 3 step reverse then two pointer reverse.
3 step reverse
First find the rotate point
Reverse the first part(before rotate point) of the String/array
Reverse the second part(after rotate point) of the String/array
Reverse the entire String/array

Quick Select

Quick Select 快速选择算法
思想类似于快速排序,利用O(n)的时间找到前right个大数，再看k与right的关系决定下一步recursion的范围
O(n) + O(n/2) + O(n/4) +… = O(n) time
kth largest element
Median
kth smallest element
Median of two sorted arrays

Hash & Heap

了解他们的原理和应用
TreeMap
队列Queue
支持操作O(1) push O(1) pop O(1) top,多做跟BFS有关的题目
栈Stack
支持操作O(1) push O(1) pop O(1) top,非递归实现DFS的主要数据结构
哈希表Hash
支持操作O(1) insert O(1) find O(1) delete, Hash table / Hash map / Hash set的区别是什么
Hash Set只存key不存value
Hash map存key和value
Hash table，多线程安全，当多个线程同时访问一个Hash table时，它可以保证数据安全，Hash Map无法做到
Hash function -> 对于任意的Key，得到一个固定且无规律的介于0到capacity - 1的整数
数据结构分为连续性和离散型，array为连续性，List为离散型
一些著名的Hash算法：MD5， SHA-1， SHA-2，主要用于加密
用于算法中的Hash function很想进制转换

int hashfun(String key) {
    int sum = 0;
    for (int i = 0; i < key.length(); ++i) {
        sum = sum * 31 + (int)(key.charAt(i));
        sum = sum % HASH_TABLE_SIZE;
    }
    return sum;
}

Magic number - 31
通过经验得出31的冲突更少（Collision）
Magic number取质数更好，如果数太大影响效率，数太小冲突太多
Closed Hashing - line sweep
当哈希表发生冲突时，把新的数据插入到下一个空的位置
寻找，不断的找下一个位置直到找到value或者找到空位为止
删除，把值删除后用一个deleted标记出来，这个位置并不为空
缺点，当我们插入和删除的次数很多时，很多位置都会被标记为deleted，寻找的效率会变低
Open Hashing
每一个位置都是一个链表，当发生冲突时加到链表的开头，这样不用每次都遍历到链表末尾，寻找时搜寻整个链表
Rehashing
当hash table size不够用了怎么办？
像ArrayList那样不断倍增
怎么定义满？当实际的存储个数达到总共空间的1/10(经验值)时，我们就需要rehash
回顾ArrayList的倍增：当ArrayList满时，把size扩大两倍，把前面的所有数复制到后一半新扩大的ArrayList中
Hash Table的倍增：如果移动之前的数，会影响到hash function，所以我们会先扩大hash table，再把hash table中的所有值重新放到hash function算出它在新的hash table中的位置，重新插入
Rehashing很慢，所以在定义hash table的时候最好提前定义一个size，让他尽量不要rehash
在存储快的和存储慢的介质之间都会存在Cache的问题，不仅在CPU和内存之间有
使用链表储存数据的的插入顺序，使用哈希表判断将要插入的数据是否已经存在于链表中
LinkedHashMap = DoublyLinkedList + HashMap
LRU Cache -> 使用HashMap + DoublyLinkedList可以解决
Heap
支持操作O(logn) add O(logn) remove O(1) MIN or MAX,求最大值或最小值只可取其一
ugly number 2
top k largest number 2 -> 使用PriorityQueue只保存k个数
merge k sorted list -> Add all list heads to a heap, poll the smallest head, add to ans, then add next node to heap.
注意要会写heap的comparator
Practice

high five
k closest points
data stream median
kth smallest number in sorted matrix

Dynamic Programming

动态规划的分类
Triangle -> 入门题
注意以下代码中n为三角形的高度
第一种方法 DFS Traverse, Top-down

void traverse(int x, int y, int sum) {
    if (x == n) {
    // found a whole path from top to bottom
    if (sum < best) {
        best = sum;
    }
    return;
    }

    traverse(x + 1, y, sum + A[x][y]);
    traverse(x + 1, y + 1, sum + A[x][y]);
}

best = MAXINT;
traverse(0, 0, 0);

每一个节点都分出来两条路径，相当于一个binary tree，它的节点个数是2^n, 所以这个方法的时间复杂度是O(2^n)
第二种方法 DFS Divide and Conquer, bottom-up

// return minimum path from (x, y) to bottom
int divideConquer(int x, int y) {
    if (x == n) {
    return 0;
    }
    return A[x][y] + Math.min(
    divideConquer(x + 1, y),
    divideConquer(x + 1, y + 1)
    );
}

divideConquer(0, 0);

与之前一样，每一个节点都分出来两条路径，相当于一个binary tree，它的节点个数是2^n, 所以这个方法的时间复杂度是O(2^n)
DFS实际上是在枚举，把所有的方案都列出来然后看哪个更好
我们做了很多重复计算，因为实际上我们只需要计算三角形所有节点（n^{2个）的最短路径，DFS却计算了（2}n个）节点，很多节点我们计算了很多遍。所以我们需要把每个节点的结果保存下来，下次需要的时候就不需要重新算了
第三种方法 DFS Divide and conquer + Memorization

// return minimum path from (x, y) to bottom
int divideConquer(int x, int y) {
    // row index from 0 to n - 1
    if (x == n) return 0;

    // if we already got the minimum path from (x, y) to bottom, just return it
    if (hash[x][y] != Integer.MAX_VALUE) return hash[x][y]

    // set before return
    hash[x][y] = A[x][y] + Math.min(divideConquer(x + 1, y), divideConquer(x + 1, y + 1));

    return hash[x][y]
}

initialize: hash[*][*] = Integer.MAX_VALUE;
answer: divideConquer(0, 0);

在每次计算节点最小值之前，先看看之前有没有算过这个节点的结果，如果有直接从数组中拿到结果并返回，没有算过我们再递归, O(n^2) time
记忆化搜索的本质：动态规划，省去了重复计算
多重循环 vs 记忆化搜索
第四种方法多重循环，自底向上

int[][] dp = new int[row][row]; // dp[i][j] 表示从i， j出发走到最后一层的最小路径长度

// 初始化，终点先有值,最后一层
for (int i = 0; i < n; ++i) {
    dp[n - 1][i] = triangle[n - 1][i];
}

// 循环递推求解
for (int i = n - 2; i >= 0; --i) {
    for (int j = 0; j <= i; ++j) {
    dp[i][j] = Math.min(dp[i + 1][j], dp[i + 1][j +1]) + A[i][j]
    }
}

// 求结果： 起点
return dp[0][0];
}

O(n^2) time,没有递归
第五种方法多重循环，自顶向下

int[][] dp = new int[row][row];

// 初始化，起点
dp[0][0] = triangle[0][0];

// 初始化三角形的左边和右边
for (int i = 1; i < row; ++i) {
    dp[i][0] = dp[i - 1][0] + triangle[i][0]; // 左边
    dp[i][i] = dp[i - 1][i - 1] + triangle[i][i]; // 右边
}

// top down
for (int i = 1; i < row; ++i) {
    for (int j = 1; j < i; ++j) {
    dp[i][j] = Math.min(dp[i - 1][j], dp[i - 1][j - 1]) + triangle[i][j];
    }
}

Math.min(dp[row - 1][0], dp[row - 1][1], dp[row - 1][2]...);

这个方法中dp[x][y]表示从上到下到这个点的最短路径
O(n^2) time,没有递归,但是这个方法我们必须提前算出三角形的左边和右边，因为对于三角形的每一个结点dp[x][y]，我们在计算到他的最短路径的时候，我们需要用到它前一层的两个点dp[x-1][y]和dp[x-1][y-1].但是对于每一层的最左边和最右边的点，我们再上一层找不到再靠左边或者右边的点，所以为了避免越界，我们必须提前算出两条边的所有点。
什么情况下使用动态规划？
满足下满三个条件之一：

求最大值最小值
判断是否可行
统计方案个数

上面这三个情况极有可能需要使用动态规划
什么情况下不使用动态规划？

求出所有具体的方案，而不是方案的个数 -> palindrome-paritioning
输入数据是一个集合而不是序列,意思是如果我们可以将数据调换位置，则很有可能不使用动态规划，动态规划需要有方向性 -> longest-consecutive-sequence
暴力算法的复杂度已经是多项式级别。动态规划擅长优化指数级别的复杂度（2^n, n!）到多项式级别的复杂度(n^2, n^3)

动态规划四要素

状态，储存小规模问题的结果
方程，状态之间怎么联系
初始化，最极限的小状态是什么，起点
答案，最终的状态是什么，终点

动态规划的类型，按照状态分类

坐标型 triangle 10%
接龙型 20%
划分型
匹配型
背包型
区间型 longest palindrome
树图型 tree matrix
博弈型判断是否可行

坐标型动态规划
状态
f[x]:表示我从起点走到坐标x。。。
f[x][y]:表示我从起点走到坐标x, y…
方程：研究走到x， y这个点之前的一步
practice: minimum path sum
动态规划不应该存在循环依赖，从一个状态不会回到之前的一个状态
当我们初始化一个二维数组的动态规划时，需要初始化第0行和第0列
Unique path
Unique path 2
Climbing Stairs
Jump game
Jump game 2
接龙型动态规划
告诉你一个规则然后求最长的状态可以是多少，比如数组中求最大递增子序列
Longest increasing subsequence
Russian Doll Envelopes
Largest Divisible Subset
Frog jump
总结
动态规划的实质是记忆化搜索，避免重复计算中间结果
动态规划四要素：初始化，方程，起点，终点
什么时候使用动态规划:最优，可行，方案数（而非具体方案）
什么时候不使用：求具体方案，输入数据为集合而非序列（可调整顺序），暴力算法时间复杂度已经是O(n^2, n^3)

Link： http://hellcy.github.io/2020/03/02/Data-Structures/

Copyright： This article is using CC BY 4.0 license.

Yuan ChengSoftware Engineer

Wait and see