For any meaningful research, a fundamental understanding of asymptotic relationships is very important. In our journey towards new algorithms, we compare several algorithms and have to argue using commonly accepted notions, how each of them relate to one another. See this video for a good coverage of the basics where Pratik Shah talks about the O,o,Ω,ω and Θ notations with simple examples.
Having left out the constants, a somewhat imprecise comparison of notations is here:
f(n) = O(g(n)) ≈ f(n) ≤ g(n)
f(n) = Ω(g(n)) ≈ f(n) ≥ g(n)
f(n) = Θ(g(n)) ≈ f(n) = g(n)
f(n) = o(g(n)) ≈ f(n) < g(n)
f(n) = ω(g(n)) ≈ f(n) > g(n)
BTrees and B+Trees are n-ary trees with minimum of 2 keys per node. Root nodes do not contain any key.
These are indexing structures commonly used for the advantages of fast retrieval. Think of each node having large number of keys. Beyond a threshold, we add more nodes.
B+ tree is a specialization of B Tree because only the leaf nodes of B+ trees contain values. All other nodes contain only keys.
On the downside, root node becomes a point of failure as is the case with any tree.
Found this to be a good read.
Found this image here as a good way to refresh my knowledge of sorting algorithms:
Non-Comparitive Sorting Techniques
(Source: To a large extent, wikipedia)
Counting sort (has bucket size = 1)
- Set up an array of initially empty “buckets”.
- Scatter: Go over the original array, putting each object in its bucket.
- Sort each non-empty bucket.
- Gather: Visit the buckets in order and put all elements back into the original array.
- Take the least significant digit (or group of bits, both being examples of radices) of each key.
- Group the keys based on that digit, but otherwise keep the original order of keys. (This is what makes the LSD radix sort a stable sort).
- Repeat the grouping process with each more significant digit.
170, 45, 75, 90, 802, 2,24, 66
170, 90, 802, 2, 24, 45, 75, 66
802, 2, 24, 45, 66, 170, 75, 90
2, 24, 45, 66, 75, 90, 170, 802
Each digit place can be sorted in a single pass.
Big O is not the only notation used to describe algorithm complexity. There are theta and omega as well. In each (O, theta, omega), there are big and little notations.
As I read from online sources:
f(x) = O(g(x)) (big-oh) means that the growth rate of f(x) is asymptotically less than or equal to to the growth rate of g(x).
f(x) = Ω(g(x)) (big-omega) means that the growth rate of f(x) is asymptotically greater than or equal to the growth rate of g(x)
f(x) = o(g(x)) (small-oh) means that the growth rate of f(x) is asymptotically less than the growth rate of g(x).
f(x) = ω(g(x)) (small-omega) means that the growth rate of f(x) is asymptotically greater than the growth rate of g(x)
f(x) = Θ(g(x)) (theta) means that the growth rate of f(x) is asymptotically equal to the growth rate of g(x)
What are P, NP, NP Hard and NP Complete problems?
P = Problems solvable in polynomial time. At least they have polynomial upper and lower bounds (worst and best cases). Polynomial time: O(n2), O(n3), O(1), O(n lg n), O(n^k)
NP (Non-Deterministic Polynomial time) = Verifyable in polynomial time. For example, finding a path may be difficult. However, given a path, verifying if it has at most k nodes is easy.
NP Complete = Can be reduced to another problem x (x is NP Hard) in polynomial time such that if x is solved, this problem can be solved.
NP Hard = There are million dollar awards on such x problems that when solved will help computer science solve several hard problems.
Turing’s “Halting Problem” is not solvable by any computer, no matter how much time is given! Another example: Given any boolean expression (3 sat example), can there exists valid values such that the expression evaluates to true? Apparantly, these can be solved by brute force or guessing the values and not through any polytime algorithm.