BINF
3005
Data Preparation with Python & Bash
Lecture Hours
6.0
Seminar Hours
0.0
Lab Hours
6.0
Credits
1.0
Regular Studies
Description
Building and operating scalable, cost-efficient pipelines is an essential skill for practitioners in modern bioinformatics. Students package workflows in containers and deploy them on managed cloud services with object storage and elastic compute. In a command-line environment, students accelerate input and output through columnar data layouts, partitioning, and predicate filtering; apply appropriate access controls and private networking; and automate builds, execution, and monitoring with logs and metrics. Working end to end on real datasets, they publish performance-tuned outputs ready for downstream analysis and visualization.
Registration in this course is restricted to students admitted to the Data Visualization in Biological Sciences program.
Registration in this course is restricted to students admitted to the Data Visualization in Biological Sciences program.