当前位置:网站首页>Review of 4246 Algorithms for Data Science
Review of 4246 Algorithms for Data Science
2022-07-17 05:23:00 【Alex Tech Bolg】
Contents
- Important algorithms
- Note
- Lecture1: Insertion sort, efficient algorithm
- Lecture2: Merge sort
- Lecture3: Binary search, quicksort
- Lecture5: Graphs, Breadth-First Search (BFS)
- Lecture6: Depth-first search, topological sorting
- Lecture7&8: Strongly connected components, single-origin shortest paths in weighted graphs
- Lecture12: Data compression and Huffman coding
- Lecture9: The dynamic programming principle; segmented least squares
- After midterm
- Lecture 11 Shortest paths in weighted graphs (Bellman-Ford)
- Lecture 15&16 Network flows
- Lecture 20&21 Reductions; independent set and vertex cover; decision problems
- Lecture 22 Satisfiability problems: SAT, 3SAT, Circuit-SAT
- Lecture 23&25 Representative NP-complete problems: TSP, Set Cover
- Lecture18&19 Linear programming
Important algorithms
- Sort: Insertion sort, merge sort, quick sort
- Binary search
- Graph: BFS、DFS and their application
- Greedy: Dijkstra’s algorithm and improved implementation
- Dynamic programming: Bellman-Ford
- Network flow: Ford-Fulkerson algorithm
- NPC: Vertex Cover(D), Independent Set(D), SAT, 3SAT, integer programming
- Linear programming (integer programming)
Note
重点是理解各种算法的应用以及时间复杂度。
Lecture1: Insertion sort, efficient algorithm
Insertion sort
Analysis of algorithms
- Correctness: Proof by induction.
- Running time: Best case: 5n − 4; worst case. 3n^2 + 7n − 4
- Space: in-place algorithm
Efficient algorithms (polynomial running time)
Lecture2: Merge sort
Asymptotic notation
- Asymptotic upper bounds: Big-O notation
- Asymptotic lower bounds: Big-Ω notation
- Asymptotic tight bounds: Θ notation
- Asymptotic upper bounds that are not tight: little-o
- Asymptotic lower bounds that are not tight: little-ω
Divide & conquer principle, application: merge sort
- Correctness
- Running time: O(nlogn)
- Space: Θ(n)
Solving recurrences and running time of merge sort
- recursion trees
- Master theorem
Lecture3: Binary search, quicksort
Binary search
- Running time: O(log n)
Quicksort
- divide and conquer
- Space: in-place
- Running time:
- Its worst-case running time is Θ(n^2) but its average-case running time is Θ(nlogn)
- Correctness:
- strong induction (the induction step at n requires that the inductive hypothesis holds at all steps 1, 2, …, n−1 and not just at step n−1, as with simple induction.)
Lecture5: Graphs, Breadth-First Search (BFS)
Definition
- Undirected / directed
- Simple: all vertices are distinct.
- An undirected graph is connected when there is a path between every pair of vertices.
- The connected component of a node u is the set of all node in the graph reachable by a path from u.
- A directed graph is strongly connected if for every pair of vertices u, v, there is a path from u to v and from v to u.
- The strongly connected component of a node u in a directed graph is the set of nodes v in the graph such that there is a path from u to v and from v to u.
- Tree: connected acyclic graph
- Bipartite graphs
- Degree theorem
- Linear graph algorithms run in O(n+m) time
Representing graphs
- adjacency matrix
- adjacency list
Breadth-first search (BFS)
- queue: FIFO data structure
- s-t connectivity
- shortest s-v paths in unweighted graphs.
- Connected components in undirected graphs
- Testing bipartiteness & graph 2-colorability
- SCC(u): SCC of node u
- All SCC: but not linear
Lecture6: Depth-first search, topological sorting
Depth-first search (DFS)
- stack: LIFO (Last-In First-Out)
- s-t connectivity
- Cycle detection
- Topological sorting in DAGs (directed acyclic graph)
- Run DFS(G); compute finish times.
- Process the tasks in decreasing order of finish times.
- Running time: O(m+n)
- different edges: forward, back, cross --> time intervals for vertices
- Undirected graphs: find all connected components
- SCC(u): SCC of node u
- All SCC: linear
- Compute Gr.
- Run DFS(Gr); compute finish(u) for all u.
- Run DFS(G) in decreasing order of finish(u).
- Output the vertices of each tree in the DFS forest of line 3 as an SCC.
Lecture7&8: Strongly connected components, single-origin shortest paths in weighted graphs
Applications of DFS: Strongly connected components
(combine it with the above)
Shortest paths in graphs with non-negative edge weights (Dijkstra’s algorithm)
- Greedy principle
- implementation and improved implementation
- running time
Lecture12: Data compression and Huffman coding
- Prefix codes and trees
- The Huffman algorithm
Lecture9: The dynamic programming principle; segmented least squares
- Overlapping subproblems
- An easy-to-compute recurrence
- Iterative, bottom-up computations
- Segmented least squares
- Sequence alignment
After midterm
Lecture 11 Shortest paths in weighted graphs (Bellman-Ford)
- Bellman-Ford algorithm (DP solution)
- OPT (i, v) = cost of shortest s-v path using at most I edges
- 二维数组: time: O(nm), space: O(n^2)
- Pseudocode: M [i, v]
- 一维数组: time: O(nm), space: O(n)
- Early termination condition: if at some iteration I no value in M changed, then stop.
- Pseudocode: M[v] similar to Dijkstra algorithm
- Detecting negative cycle
- Update all edges n times (1 more time)
Lecture 15&16 Network flows
Definition:
- Capacity constrains
- Flow conservation
- |f| = f_out
- Max flow and min cut
Residual graph and augmenting paths
- Residual graph (forward, backward)
- P is simple path. Augment f by pushing extra flow on P
- Bottleneck
Ford-Fulkerson algorithm
- Running time: O(nmU) —— pseudo-polynomial
- Correctness
- Application: max bipartite matching
- Reduction (Forward direction, Reverse direction)
Lecture 20&21 Reductions; independent set and vertex cover; decision problems
Reduction:
- x of X
- y=R(x) of Y
Reduction as a means to design efficient algorithm
- Y has polynomial computational steps
- Call polynomial number of Y
- X <= pY
Reduction as a means to argue about hard problems
- X <= p Y
- Y is at least as hard as X
- If X cannot be solved in polynomial time, then Y cannot.
- Relative level of difficulty: X <= pY, Y <= pX
Two hard problems
- Independent set
- Vertex cover
Optimization versions for IS and VC
- Max independent size
- Min vertex cover
Decision version of optimization problems
- Yes/no answer
- Max – lower, min – upper
Rough equivalence of decision & optimization problems
- Suppose we have an algorithm to solve MIS we can use it to solve IS(D)
- Suppose we have an algorithm to solve IS(D) we can use it to solve MIS
Reduction from Independent Set to Vertex Cover
- Forward direction
- Reverse direction
Class P
- Set of decision problems that can be solved by polynomial-time algorithm.
(引出NP)
- If we were given a solution S for such a problem X(D), we could check if it is correct quickly.
- Such an S is a succinct certificate that x 属于X(D)
Class NP
An efficient certifier (or verification algorithm) B for a problem X(D) is a polynomial algorithm that
- Takes two input arguments: instance x (which is a specific input of the problem) and the short certificate t
- B(x, t) = yes and |t| <= Poly(|x|), then we have x 属于 X(D)
Set of decision problems that have an efficient certifier.
P vs NP
- P 属于 NP
- P = NP ?
- Why would NP contain more problems than P? Intuitively, the hardest problems in NP are the least likely to belong to P
NPC
- The hardest problems
- NPC X(D) 定义:
- If X(D) 属于 NP
- For all Y 属于 NP, Y <= p X(D)
Show a problem is NP-complete
- Suppose we had an NP-complete problem X, to show Y is NPC, we only need to show:
- Y 属于 NP
- X <= p Y
- (相比根据定义证明NPC,这种方法只需要做一次reduction,简化了很多)
Lecture 22 Satisfiability problems: SAT, 3SAT, Circuit-SAT
Definition
- truth assignment
- A truth assignment satisfies a clause if it causes the clause to evaluate to 1.
- A formula φ is satisfiable if it has a satisfying truth assignment.
Satisfiability (SAT) and 3SAT
- SAT: Given a formula φ in CNF with n variables and m clauses, is φ
satisfiable? - 3SAT: Given a formula φ in CNF with n variables and m clauses such that each clause has exactly 3 literals, is φ satisfiable?
The art of proving NP-completeness
- Circuit-SAT ≤p SAT
- SAT ≤P 3SAT
- 3SAT ≤p IS(D)
Lecture 23&25 Representative NP-complete problems: TSP, Set Cover
- Circuit SAT
- TSP
- Integer programming

Lecture18&19 Linear programming
Definition
- feasible solutions
- Feasible region
- Optimal solution
Duality
We can alternatively solve the dual to find the optimal objective value.
An optimal dual solution can be used to derive an optimal primal solution (complementary slackness).
The dual may have structure making it easier to solve at scale (e.g., via parallel optimization).

7-step dualization
formulating LPs
- 将其他形式的问题转化为LP或IP
边栏推荐
- Performance test and price comparison of cloud servers of Alibaba cloud, Tencent cloud, Huawei cloud, ucloud and Tianyi cloud
- m基于matlab的协作mimo分布式空时编码技术的仿真
- Sword finger offer question brushing record - offer 05 Replace spaces
- About file upload and download
- Filter过滤器
- 传奇游戏架设教程
- m基于matlab的BTS天线设计,带GUI界面
- m基于Lorenz混沌自同步的混沌数字保密通信系统的FPGA实现,verilog编程实现,带MATLAB混沌程序
- 类与super、继承
- How to open the service of legendary mobile games? How much investment is needed? What do you need?
猜你喜欢

m基于matlab的超宽带MIMO雷达对目标的检测仿真,考虑时间反转

M FPGA implementation of chaotic digital secure communication system based on Lorenz chaotic self synchronization, Verilog programming implementation, with MATLAB chaotic program

爬虫基础—代理的基本原理

Pytorch learning diary (II)

STEAM游戏高主频i9-12900k 搭建CS:GO服务器

Use of urllib Library

express

Crawler foundation - Web page Foundation

Matlab implementation code of image denoising method based on Hidden Markov tree model in wavelet domain

m基于simulink的16QAM和2DPSK通信链路仿真,并通过matlab调用simulink模型得到误码率曲线
随机推荐
Mapping rule configuration of zuul route
M matlab simulation of bit error rate using LDPC, turbo and convolutional channel coding and decoding in VBLAST cooperative MIMO system segment
The principle of SYN Flood attack and the solution of SYN Flood Attack
Alibaba cloud Hangzhou arm ECS performance evaluation
PyTorch学习日记(四)
Servlet 笔记
Solve the problem that the unit test coverage of sonar will be 0
Pytorch learning notes (I)
IP fragment是什么意思?如何防御IP fragment攻击?
edit关闭保存时自动生成配置文件
How to open the service of legendary mobile games? How much investment is needed? What do you need?
5G时代服务器在这里面起着什么作用?
How does the advanced anti DDoS server confirm which are malicious ip/ traffic? ip:103.88.32. XXX
Pytorch learning diary (III)
Quickly understand redirection
PyTorch学习笔记(一)
M based on the MIMO channel capacity analysis of MATLAB, different antenna numbers are compared; Non codebook precoding SVD, GMD; Codebook precoding DFT, TXAA and spatial diversity
网络知识-04 网络层-ICMP协议
m基于matlab的超宽带MIMO雷达对目标的检测仿真,考虑时间反转
2021-10-25 浏览器兼容遇到的问题