Data Science Handbook

  • Data Science Handbook

Foundations

  • Math
    • Combinatorics
    • Vector Spaces
    • Geometry
    • Linear Algebra
    • Linear Programming
    • Non-linear Programming
    • Puzzles
  • Probabilities
    • Expectation and Variance
    • Correlation and Dependence
    • Bayesian’s Theorem
    • Markov Chain
    • Sampling
    • Multivariate Notations
    • Exponential Families
    • Large Sample Theory
  • Statistics
    • Sample Survey
    • Randomized Controlled Trials
    • Hypothesis Testing
    • Common Tests
    • Confusion Matrix
    • Maximum Likelihood Estimator
    • Estimators Evaluation
  • Tools
    • Python
    • R
    • SQL
    • LaTeX
    • MyST Markdown

Algorithms

  • Algorithms Concepts
    • Bisection Search
    • Polynomial Reduction
    • \(P\) and \(NP\)
    • Randomized Algorithms
    • Streaming Algorithms
  • Greedy Algorithms
    • Interval Scheduling
    • Huffman Coding
  • Dynamic Programming
    • Weighted Interval Scheduling
    • Longest Common Subsequence
    • Longest Increasing Subsequence
    • Largest Sum Subsequence
    • Minimum Knapsack
    • Chain Matrix Multiplication
  • Graph Related
    • Shortest Path
    • Minimum Spanning Tree
    • Maximum Flow
    • Matching
    • Maximum Independent Set in Trees
    • LP on Max-flow and Min-cut

Machine Learning

  • Machine Learning Basics
    • Taxonomy
    • Information Theory
    • Kernels
    • Data Issues
    • Model Selection
    • Semi-supervised Learning
    • Self-supervised Learning
    • Fourier Transform-based Representations
  • Regression
    • Linear Models - Estimation
    • Linear Models - Inference
    • Linear Models - Diagnosis
    • Linear Models - Advanced Topics
    • Generalized Linear Models
    • Logistic Regression
    • Multinomial Logistic Regression
    • Ordinal Logistic Regression
    • Poisson Regression
    • Multivariate Regression
    • Penalized Regression
  • Classification
    • K-nearest neighbors
    • For Gaussian Data
    • Linear Discriminant Analysis
    • Support Vector Machine
    • Decision Tree
  • Dimensionality Reduction
    • Principal Component Analysis
    • PCA Variants
    • Canonical Correlation Analysis
    • Multidimensional Scaling
    • Graph-based Spectral Methods
    • SNE and \(t\) -SNE
    • Factor Analysis
    • Correspondence Analysis
    • Independent Component Analysis
  • Clustering
    • \(k\) -means clustering
    • Agglomerative Methods
    • Spectral Clustering
    • Gaussian Mixtures
  • Graphical Models
    • Random Walks in Graphs
    • Hidden Markov Models
    • Topic Models
    • Language Models
    • Computation Issues
  • Neural Networks
    • Stochastic Gradient Descent
    • Trainability
    • Regularization
    • Autoencoders
    • Variational Autoencoders
    • Sequential Models
    • Generative Adversarial Networks
    • Application to Density Fitting
  • For Graph-structured Data
    • Graph Basics
    • Descriptive Analysis
    • Sampling and Estimation
    • Modeling
    • Topology Inference
    • Processes on Graphs
    • Embeddings
    • Graphical Neural Networks
Powered by Jupyter Book
Contents
  • Strictly Increasing
  • Duplicates
  • Root finding
  • Decreasing
  • Bisect module

Bisection Search¶

Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky – Knuth

There are many variants of bisection search algorithms. We introduce a universal template to summarize them.

We discuss two scenarios: the input array is

  • strictly increasing

  • increasing (some duplicated values)

Strictly Increasing¶

Search a target in an array.

  • if exists, return its index

  • else return the insertion location such that the new array is still stricly increasing.

def bisect(nums, target):
    left, right = 0, len(nums)-1
    while left <= right:
        mid = (left + right) // 2
        if nums[mid] > target:
            right = mid - 1
        elif nums[mid] < target:
            left = mid + 1
        else:
            return mid
    return left

Why return left? Think about it.

Duplicates¶

Search a target in an array.

  • if exists

    • return its leftmost index

    • return its rightmost index

  • else return the insertion index such that the new array is still increasing.

We first solve the ‘if exists’ case.

def left_bound(nums, target):
    left, right = 0, len(nums)-1
    while left <= right:
        mid = (left + right) // 2
        if nums[mid] > target:
            right = mid - 1
        elif nums[mid] < target:
            left = mid + 1
        else:
            right = mid - 1 # squeeze right pointer left
    return left # left = mid > right

def right_bound(nums, target):
    left, right = 0, len(nums)-1
    while left <= right:
        mid = (left + right) // 2
        if nums[mid] > target:
            right = mid - 1
        elif nums[mid] < target:
            left = mid + 1
        else:
            left = mid + 1 # squeeze left pointer right
    return right # right = mid < left

For the insertion case,

  • left_bound() works

  • right_bound() returns right which is left-1, i.e. right_bound()+1 is the insertion index.

Related problem: leetcode 35e.

Root finding¶

Let’s see two extensions.

  1. Given a target value target and an increasing array nums, find largest index such that nums[index] <= target.

    Clearly, if target is in nums, we can use right_bound(). If not, the answer should be its insertion index - 1, i.e. left_bound()-1, which is exactly right_bound(). Therefore, we can directly use right_bound().

  2. What if we are going to find the smallest index such taht nums[index] >= target?

    If target is in nums, we can use left_bound(). If not, the answer should be exactly its insertion index. Therefore, we can directly use left_bound().

To sum up, no matter the array is increasing or decreasing, to find the largest index, then use right_bound(); to find the smallest index, then use left_bound().

The above problems can be generalized to root finding problems. We just replace nums[index] by f(x) where f is an increasing function and x is the variable. Related problems: leetcode 69e, 410h, 875m, 1011m, 1482m.

Decreasing¶

We assumed the array (or the function) is increasing. What if they are decreasing?

We just need to exchange the operations when nums[mid]!= target. That is, change from

if nums[mid] > target:
    right = mid - 1
elif nums[mid] < target:
    left = mid + 1

to

if nums[mid] > target:
    left = mid + 1
elif nums[mid] < target:
    right = mid - 1
  • The function left_bound() can still be used for insertion.

  • To find largest index such that nums[index] >= target, use right_bound().

  • To find the smallest index such that nums[index] <= target, use left_bound().

Bisect module¶

There is a bisect module in Python that implements several bisection search functions.

  • bisect.bisect_left(nums, target, lo=0, hi=len(a)) is equivalent to our left_bound(): locates an insertion index for target if it is not in nums, and returns the leftmost index otherwise.

  • bisect.bisect_right(nums, target, lo=0, hi=len(a)) is the same as bisect.bisect(). It locatses an insertion index for target if it is not in nums, but returns the rightmost index +1 otherwise. Hence, it equals right_bound()+1.

  • When target is not in nums, bisect_left() = bisect_right() = left_bound().

previous

Algorithms Concepts

next

Polynomial Reduction

By Dennis Zheng
© Copyright 2020.