Building a PaaS for Forensic DNA analysis using AWS

This project was funded by the Department of Justice (DoJ) to build a Cloud-based system for analyzing high-volume genomic (DNA) data that could benefit law enforcement investigations using biological evidence. AWS tools used include S3, EC3, Lambda, and Docker.

Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Amazon Web Services (AWS) were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. In this study, NGS data for forensic DNA markers were concordant with standard previously characterized reference materials. The computing power of the Cloud was implemented with on-demand auto-scaling with Docker to allow multiple file analysis. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity.