AI_ML_DL’s diary

人工知能、機械学習、ディープラーニングの日記

Deep learning in bioinformatics

バイオインフォマティクス技術者認定試験を受けるために、学会公認のテキストを購入して勉強しているのだが、なんとなく内容が古いような気がしている。このテキストが2015年に出版されているためかなと思ったが、昨年の試験問題を見ても同様に感じた。

 そこで、世の中の状況を知るために、文献を調べたら、こんなのがあった。

Deep learning in bioinformatics: introduction, application, and perspective in big data era

 Yu Li, KAUST CBRC CEMSE, Chao Huang, NICT CAS, Lizhong Ding, IIAI, Zhongxiao Li, KAUST CBRC CEMSE, Yijie Pan, NICT CAS, Xin Gao,∗ KAUST CBRC CEMSE

arXiv:1903.00342v1 [q-bio.QM] 28 Feb 2019

Abstract
Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at https://github.com/lykaust15/Deep_learning_examples.

以下の8つの事例の概要が紹介されている。

この順番に取り上げてみよう。

3.1 Identifying enzymes using multi-layer neural networks

3.2 Gene expression regression

3.3 RNA-protein binding sites prediction with CNN

3.4 DNA sequence function prediction with CNN and RNN

3.5 Biomedical image classification using transfer learning and ResNet

3.6 Graph embedding for novel protein interaction prediction using GCN

3.7 Biology image super-resolution using GAN

3.8 High dimensional biological data embedding and generation with VAE

 

f:id:AI_ML_DL:20190928121056p:plain

style 014

 

f:id:AI_ML_DL:20191115101240p:plain

DeepDream