Axiom Data Science recently presented their work on using big-data techniques to analyze AIS vessel traffic data at the 2018 Ocean Sciences Meeting in Portland, OR.
The Automatic Identification System (AIS) is a system of shipboard transmitters and land-based and satellite-based receivers that allow vessel locations to be broadcast and recorded. While traditionally used for real-time maritime applications, there is increasing interest in using these rich datasets to provide insight into a wide array of oceanographic problems, such as prioritizing hydrographic surveys, predicting the probability and impact of oil spills, quantifying the amount of vessel interactions with marine wildlife, and more.
Due to immense size of these datasets—typically 10s of billions of raw messages per year—and limitations on infrastructure and computing power, AIS data must currently be processed in small temporal or spatial subsets. This has proven inadequate for decision-making that requires analysis on a national or global scale over an entire year. To overcome the limitations of traditional data storage and processing infrastructure, we have developed a big-data compute cluster using Apache Spark as the computing engine.

As a demonstration of this technical approach, we worked with NOAA’s Office of Coastal Surveys to produce vessel traffic heatmaps for use in their Hydrographic Health Model. Starting with a 2015 terrestrial AIS dataset composed of 74 billion raw messages, we used our computing cluster to parse the messages, clean out invalid data, and aggregate the individual messages into 20 million tracklines, representing distinct ship voyages per day. We then used these voyages to produce a set of heatmaps in GeoTIFF format with 500 meter resolution across two different metrics: total traffic volume and unique vessel count. We also developed the ability to run ad-hoc queries against both the raw messages and ship voyages.
The previous state-of-the art, an ArcMap plugin, takes days to weeks to process raw AIS data for one month in one UTM zone. In comparison, processing time for this analysis, which included all US waters for all of 2015, was only 48 hours using our computing cluster.
We’ve since analyzed almost ten years of AIS data across all US waters, and will be focusing on arctic AIS data from the Marine Exchange of Alaska in the upcoming year.
For more information, see the following resources:
 
      Download the datasets and learn more about the analysis at the publically-available AIS Vessel Traffic Data Products website.
Watch a video that goes into more depth about the project's background.