Description
Building and operating scalable, cost-efficient pipelines is an essential skill for practitioners in modern bioinformatics. Students package workflows in containers and deploy them on managed cloud services with object storage and elastic compute. In a command-line environment, students accelerate input and output through columnar data layouts, partitioning, and predicate filtering; apply appropriate access controls and private networking; and automate builds, execution, and monitoring with logs and metrics. Working end to end on real datasets, they publish performance-tuned outputs ready for downstream analysis and visualization.

Registration in this course is restricted to students admitted to the Data Visualization in Biological Sciences program.