APPLYING SPOT INSTANCES AND DOCKER CONTAINERS TO OVERCOME COMPUTATIONAL LIMITATIONS IN USING TAXONOMIC IDENTIFICATION TOOLS TO METAGENOMIC PIPELINES

Published in 26/04/2022 - ISBN: 978-65-5941-645-5

Paper Title
APPLYING SPOT INSTANCES AND DOCKER CONTAINERS TO OVERCOME COMPUTATIONAL LIMITATIONS IN USING TAXONOMIC IDENTIFICATION TOOLS TO METAGENOMIC PIPELINES
Authors
  • Tania Girão Mangolini
  • Deyvid Amgarten
  • Diego Delgado Colombo
  • Michel Chieregato
  • Murilo Cervato
Modality
Xpress presentation
Subject area
Omics
Publishing Date
26/04/2022
Country of Publishing
Brasil
Language of Publishing
Inglês
Paper Page
https://www.even3.com.br/anais/xmeetingxp2021/414418-applying-spot-instances-and-docker-containers-to-overcome-computational-limitations-in-using-taxonomic-identifica
ISBN
978-65-5941-645-5
Keywords
metagenomics, taxonomic identification, diversity of microorganisms
Summary
Bioinformatics pipelines for analysis of metagenomic data have many steps, being taxonomic identification one the core steps. Taxonomic identification tools usually rely on large pre-computed databases and high computational power to quickly analyze the diversity of microorganisms in samples. The CCMetagen tool, for instance, is one of the most effective tools available until now. However, it requires 196 GB of disk space and more than 200 GB of RAM to search against a large subset of the NCBI nt database (ncbi_nt_no_env_11jun2019). In this work, we present the evaluation of a viable alternative to overcome elevated computational requirements, which is a serious limitation for several users and laboratories. Nonetheless, we show that combining temporary virtual machines in the cloud along with docker containers is a very cost-effective solution to the problem. Briefly, cloud service providers offer the spare computing capacity with discounts, with the disadvantage that the service can be interrupted at any time. At Amazon AWS, this service is called spot instances. To assess performance requirements and costs, we evaluated the execution of a public CCmetagen Docker Hub container (https://hub.docker.com/repository/docker/deyvidamgarten/ccmetagen) in Amazon Linux 2 R5 spot instances. The instance type requested was r5n.8xlarge (32 vcpus and 256 GB of RAM) with 500 GB of EBS SSD volume. From 8/15/2021 to 9/7/2021, 20 tests of the sample SRR7239370 (metatranscriptomics of Tadorna tandornoides) available at the Sequence Read Archive (SRA) were carried out in two different AWS regions, São Paulo (sa-east1) and Virginia (us-east-1). The following parameters were assessed: Obtained instance type, total processing time, hourly processing cost, database download time and service interruptions. As a result, there was no service interruption in any of the tests performed. Besides, Amazon provided instances according to the memory and vcpus specification only at the us-east-1 region. At the sa-east-1 region, 80% of the tests were executed in instances with more vcpus and memory than specified, which is possibly due to instances availability. In us-east-1, the average processing time was 2.09h (standard deviation 0.15h), with an average hourly cost of $0.85 (standard deviation $0.13). The average discount compared with on-demand instances was 50.69%. The average database download time was 1.20h (standard deviation 0.04h). In sa-east-1, each test consumed, on average, 3.29h (standard deviation 0.28h and median 3.27h). The average hourly processing cost was $1.74 (standard deviation $1.05 and median $1.08) and the average discount was 59.35%. The average database download time was 2.38h (standard deviation 0.11h), which is approximately twice the time observed in us-east-1. These results show that, if there are no legal restrictions on the geographic location to process the data, it is more advantageous to execute tools such as CCMetagen in the us-east-1 region when compared with the sa-east-1 region. Despite the differences in costs and processing times, the proposed solution offers good cost-effectiveness and stability, being viable alternatives in pipelines for metagenomics analyses.
Title of the Event
X-Meeting XPerience 2021
Title of the Proceedings of the event
X-Meeting presentations
Name of the Publisher
Even3
Means of Dissemination
Meio Digital

How to cite

MANGOLINI, Tania Girão et al.. APPLYING SPOT INSTANCES AND DOCKER CONTAINERS TO OVERCOME COMPUTATIONAL LIMITATIONS IN USING TAXONOMIC IDENTIFICATION TOOLS TO METAGENOMIC PIPELINES.. In: X-Meeting presentations. Anais...São Paulo(SP) AB3C, 2021. Available in: https//www.even3.com.br/anais/xmeetingxp2021/414418-APPLYING-SPOT-INSTANCES-AND-DOCKER-CONTAINERS-TO-OVERCOME-COMPUTATIONAL-LIMITATIONS-IN-USING-TAXONOMIC-IDENTIFICA. Access in: 22/12/2024

Paper

Even3 Publicacoes