Computing Convex Hull in Python

In my studies to become a Google engineer, one of the topics I ran across was geometric algorithms, which involve questions that would be simple to solve by a human looking at a chart, but are complex because there needs to be an automated process.

One example is: given four points on a 2-dimensional plane, and the first three of the points create a triangle, determine if the fourth point lies inside or outside the triangle.

Another geometric problem is: given a number of points on a 2-dimensional plane, compute the minimum number of boundary points, that if connected, would contain all the points without creating a concave angle.

Here is one of the solutions I generated in Python:

convex hull plot

So how do you do it?

I got a clue from a lecture. It involves using a point as a pivot and determining which of two other points are the most clockwise from each other.

Before I watched more of the lecture, I was determined to figure out an algorithm that would solve it in a reasonable amount of time.

My idea was:

  1. sort the points from left to right (least value of x to largest) - O(n log n) where n is the number of (x, y) points
  2. starting with the leftmost point p:
    1. go through each point to the right of that point, and using p as a pivot, find which point is the most clockwise. O(n)
    2. set the most clockwise point as the new p - O(1)
    3. loop again with new p
  3. this continues until the starting point is reached O(h) - where h is the number of hull points

In order to "prematurely optimize" (I know it's bad) I was trying to make the all the comparisons only on points to the right of p, but then I would need to flip and go the other way once the max x value was reached.

It was turning out to be way more complicated than it should be. I was trying to get it from O(n2) down to O(n log n) but really all my optimizations were just making it O((n log n) + (n * h)).

I ended up cleaning it up and just getting the algorithm where it was correct, not fast. Then once it was correct, I would make it faster.

So I tore out a bunch of code and just got it working. And it worked beautifully.

I got rid of all the code that figured out if comparison points were to the right of the pivot point. They didn't help improve the complexity.

I was able to remove the sort, also. It wasn't needed. I could find my start point, the minimum x-value point, in linear time. It didn't matter what order the comparison points were in, since I was keeping track of the maximum clockwise-dness as I went along, the same as a linear search for the maximum value in an unsorted array.

This was the new way:

  1. Find the minimum x-value point, the initial point p - O(n)
  2. Using p as a pivot:
    1. find which other point is the most clockwise - O(n)
    2. set it as new p - O(1)

I ended up with h pivot points, each comparing its n neighbors to the one with the maximum clockwise angle.

This is O(n * h).

So I watched the rest of the lecture and it turns out my algorithm was one of the 2 solutions. It's called the Jarvis march, aka "the gift-wrapping algorithm", published in 1973.

from collections import namedtuple  
import matplotlib.pyplot as plt  
import random

Point = namedtuple('Point', 'x y')


class ConvexHull(object):  
    _points = []
    _hull_points = []

    def __init__(self):
        pass

    def add(self, point):
        self._points.append(point)

    def _get_orientation(self, origin, p1, p2):
        '''
        Returns the orientation of the Point p1 with regards to Point p2 using origin.
        Negative if p1 is clockwise of p2.
        :param p1:
        :param p2:
        :return: integer
        '''
        difference = (
            ((p2.x - origin.x) * (p1.y - origin.y))
            - ((p1.x - origin.x) * (p2.y - origin.y))
        )

        return difference

    def compute_hull(self):
        '''
        Computes the points that make up the convex hull.
        :return:
        '''
        points = self._points

        # get leftmost point
        start = points[0]
        min_x = start.x
        for p in points[1:]:
            if p.x < min_x:
                min_x = p.x
                start = p

        point = start
        self._hull_points.append(start)

        far_point = None
        while far_point is not start:

            # get the first point (initial max) to use to compare with others
            p1 = None
            for p in points:
                if p is point:
                    continue
                else:
                    p1 = p
                    break

            far_point = p1

            for p2 in points:
                # ensure we aren't comparing to self or pivot point
                if p2 is point or p2 is p1:
                    continue
                else:
                    direction = self._get_orientation(point, far_point, p2)
                    if direction > 0:
                        far_point = p2

            self._hull_points.append(far_point)
            point = far_point

    def get_hull_points(self):
        if self._points and not self._hull_points:
            self.compute_hull()

        return self._hull_points

    def display(self):
        # all points
        x = [p.x for p in self._points]
        y = [p.y for p in self._points]
        plt.plot(x, y, marker='D', linestyle='None')

        # hull points
        hx = [p.x for p in self._hull_points]
        hy = [p.y for p in self._hull_points]
        plt.plot(hx, hy)

        plt.title('Convex Hull')
        plt.show()


def main():  
    ch = ConvexHull()
    for _ in range(50):
        ch.add(Point(random.randint(-100, 100), random.randint(-100, 100)))

    print("Points on hull:", ch.get_hull_points())
    ch.display()


if __name__ == '__main__':  
    main()

The other algorithm, at O(n log n), uses a sort and then a simple single pass of all the points, and making only left turns as it goes around the perimeter counter-clockwise. When the next point is a right turn, it backtracks past all points (using a stack and popping points off) until that turn turns into a left turn. This algorithm is called the Graham scan.

Which algorithm is better? It depends on your points. If most of the points will lie on the hull, the n log n algorithm will be better. If you have relatively few hull points bounding most of the points, the n*h will be better. You could always plot a random sample of the points on a graph and then choose your algorithm from there. I think most points that resemble randomness will benefit from the Jarvis march.

Follow along with my studies on this blog. And check out my complete study list in this Github project called "Google interview university":

Article originally appeared here on my other blog.

comments powered by Disqus