Overview
Make genomics data actionable by analyzing and interpreting data generated by modern genomics technologies using open-source software, big-data analytics, and machine learning services on Azure
Genomics Data Science VM
The Data Science Virtual Machine for Linux or Windows is virtual machine image that makes it easy to get started with deep learning on Azure. The Microsoft Cognitive Toolkit, TensorFlow, MXNet, Caffe, Caffe2, Chainer, NVIDIA DIGITS, Deep Water, Keras, Theano, Torch, and PyTorch are built, installed, and configured so they are ready to run immediately. The NVIDIA driver, CUDA 10, and cuDNN 7 are also included. All frameworks are the GPU versions but work on the CPU as well. Many sample Jupyter notebooks are included. TensorFlow Serving, MXNet Model Server, and TensorRT are included to test inferencing.
Genomics Notebooks
Jupyter notebook is a great tool for data scientists who are working on genomics data analysis. In this repo, we demonstrate the use of Azure Notebooks for genomics data analysis via GATK, Picard, Bioconductor and Python libraries.
How to use ‘genomicsnotebook’ repo in GitHub Codespaces? For more information about Codespaces please visit the product page
Bioconductor on Azure
Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, and an active user community. We’re now offering a mirror of the official Bioconductor docker image on Microsoft Container Registry. This image can also be used as a base for your own custom genomics related docker images.
Bioconductor on Azure notebook example
OpenCRAVAT on Azure
OpenCRAVAT is a Python package that performs genomic variant interpretation including variant impact, annotation, and scoring. OpenCRAVAT has a modular architecture with a wide variety of analysis modules and annotation resources that can be selected and installed/run based on the needs of a given study.