Bibliography
[1] F.H.C. Crick. On protein synthesis. Symp. Soc. Exp. Biol, XII:139–163, 1956. http://profiles.nlm.nih.gov/SC/B/B/F/T/_/scbbft.pdf.
[2] F. Crick. Central dogma of molecular biology. Nature, 227(5258):561–563, Aug 1970.
[3] David J Lipman and William R Pearson. Rapid and sensitive protein similarity searches. Science, 227(4693):1435–1441, 1985.
[4] Brent Ewing and Phil Green. Base-calling of automated sequencer traces using phred. ii. error probabilities. Genome research, 8(3):186–194, 1998.
[5] Michael J Guertin and John T Lis. Chromatin landscape dictates hsf binding to target dna elements. PLoS Genet, 6(9):e1001114, 2010.
[6] T Tatusova, M DiCuccio, A Badretdin, V Chetvernin, S Ciufo, and W Li. The ncbi handbook. 2013.
[7] James Ostell and Eric W Sayers. Dennis a. benson, ilene karsch-mizrachi, karen clark, david j. lipman[j]. Nucleic acids research, 1:6, 2011.
[8] Kim D Pruitt, Tatiana Tatusova, and Donna R Maglott. Ncbi reference sequences (refseq): a cu-rated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research, 35(suppl 1):D61–D65, 2007.
[9] Donald E. Knuth, Jr. James H. Morris, and Vaughan R. Pratt. Fast pattern matching in strings. SIAMJ. Comput., 6(2):323–350, 1977.
[10] Michael Burrows and David J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, May 1994.
[11] Andrew D Johnson. An extended iupac nomenclature code for polymorphic nucleic acids. Bioinformatics, 26(10):1386–1389, 2010.
[12] Thomas W Burke and James T Kadonaga. Drosophila tfiid binds to a conserved downstream basal promoter element that is present in many tata-box-deficient promoters. Genes & development, 10(6):711–724, 1996.
[13] John C Wootton and Scott Federhen. [33] analysis of compositionally biased regions in sequence databases. Methods in enzymology, 266:554–571, 1996.
[14] G. D. Stormo, T. D. Schneider, L. Gold, and A. Ehrenfeucht. Use of the ’Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res., 10(9):2997–3011, May 1982.
[15] Claude Elwood Shannon. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1):3–55, 2001.
[16] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
[17] Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422–1423, 2009.
[18] Anthony Mathelier, Xiaobei Zhao, Allen W Zhang, Fran ̧cois Parcy, Rebecca Worsley-Hunt, David J Arenillas, Sorana Buchman, Chih-yu Chen, Alice Chou, Hans Ienasescu, et al. Jaspar 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic acids research, page gkt997, 2013.
[19] Gavin E Crooks, Gary Hon, John-Marc Chandonia, and Steven E Brenner. Weblogo: a sequence logo generator. Genome research, 14(6):1188–1190, 2004.
[20] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, (6):721–741, 1984.
[21] Timothy L Bailey, Charles Elkan, et al. Fitting a mixture model by expectation maximization to discover motifs in bipolymers. 1994.
[22] Saul B Needleman and Christian D Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3):443–453, 1970.
[23] Temple F Smith, Michael S Waterman, et al. Identification of common molecular subsequences. Journal of molecular biology, 147(1):195–197, 1981.
[24] Humberto Carrillo and David Lipman. The multiple sequence alignment problem in biology.SIAMJournal on Applied Mathematics, 48(5):1073–1082, 1988.
[25] David J Lipman, Stephen F Altschul, and John D Kececioglu. A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences, 86(12):4412–4415, 1989.
[26] Florence Corpet. Multiple sequence alignment with hierarchical clustering. Nucleic acids research, 16(22):10881–10890, 1988.
[27] C ́edric Notredame, Desmond G Higgins, and Jaap Heringa. T-coffee: A novel method for fast and accurate multiple sequence alignment. Journal of molecular biology, 302(1):205–217, 2000.
[28] Julie D Thompson, Toby J Gibson, and Des G Higgins. Multiple sequence alignment using clustalw and clustalx. Current protocols in bioinformatics, (1):2–3, 2003.
[29] Sing-Hoi Sze, Yue Lu, and Qingwu Yang. A polynomial time solvable formulation of multiple sequence alignment. Journal of computational biology, 13(2):309–319, 2006.
[30] Michael Brudno, Rasmus Steinkamp, and Burkhard Morgenstern. The chaos/dialign www server for multiple alignment of genomic sequences. Nucleic acids research, 32(suppl2):W41–W44, 2004.
[31] Robert C Edgar. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5):1792–1797, 2004.
[32] Michael Brudno, Chuong B Do, Gregory M Cooper, Michael F Kim, Eugene Davydov, Eric D Green, Arend Sidow, Serafim Batzoglou, NISC Comparative Sequencing Program, et al. Lagan and multilagan: efficient tools for large-scale multiple alignment of genomic dna. Genome research, 13(4):721–731, 2003.
[33] Mathieu Blanchette, W James Kent, Cathy Riemer, Laura Elnitski, Arian FA Smit, Krishna M Roskin, Robert Baertsch, Kate Rosenbloom, Hiram Clawson, Eric D Green, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome research, 14(4):708–715, 2004.
[34] Francesca Chiaromonte, VB Yap, and W Miller. Scoring pairwise genomic sequence alignments. In Biocomputing 2002, pages 115–126. World Scientific, 2001.
[35] Arthur L Delcher, Simon Kasif, Robert D Fleischmann, Jeremy Peterson, Owen White, and Steven LSalzberg. Alignment of whole genomes. Nucleic acids research, 27(11):2369–2376, 1999.
[36] Arthur L Delcher, Adam Phillippy, Jane Carlton, and Steven L Salzberg. Fast algorithms for large-scale genome alignment and comparison. Nucleic acids research, 30(11):2478–2483, 2002.
[37] Ramu Chenna, Hideaki Sugawara, Tadashi Koike, Rodrigo Lopez, Toby J Gibson, Desmond G Higgins,and Julie D Thompson. Multiple sequence alignment with the clustal series of programs. Nucleic acids research, 31(13):3497–3500, 2003.
[38] Fabian Sievers, Andreas Wilm, David Dineen, Toby J Gibson, Kevin Karplus, Weizhong Li, RodrigoLopez, Hamish McWilliam, Michael Remmert, Johannes S ̈oding, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Molecular systems biology, 7(1):539, 2011.
[39] Thomas H Jukes, Charles R Cantor, et al. Evolution of protein molecules. Mammalian protein metabolism, 3(21):132, 1969.
[40] Motoo Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of molecular evolution, 16(2):111–120, 1980.
[41] Joseph Felsenstein. Evolutionary trees from dna sequences: a maximum likelihood approach. Journal of molecular evolution, 17(6):368–376, 1981.
[42] Masami Hasegawa, Hirohisa Kishino, and Taka-aki Yano. Dating of the human-ape splitting by a molecular clock of mitochondrial dna. Journal of molecular evolution, 22(2):160–174, 1985.
[43] Simon Tavaré. Some probabilistic and statistical problems in the analysis of dna sequences. Lectures on mathematics in the life sciences, 17(2):57–86, 1986.
[44] Robert R Sokal. A statistical method for evaluating systematic relationship. University of Kansas science bulletin, 28:1409–1438, 1958.
[45] Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, 4(4):406–425, 1987.
[46] James S Farris. Methods for computing wagner trees. Systematic Biology, 19(1):83–92, 1970.
[47] Walter M Fitch. Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Biology, 20(4):406–416, 1971.
[48] Ziheng Yang, Sudhir Kumar, and Masatoshi Nei. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics, 141(4):1641–1650, 1995.
[49] Jeffrey M Koshi and Richard A Goldstein. Probabilistic reconstruction of ancestral protein sequences. Journal of molecular evolution, 42(2):313–320, 1996.
[50] Joseph Felsenstein and Joseph Felenstein. Inferring phylogenies, volume 2. Sinauer associates Sunderland, MA, 2004.
[51] Benjamin L Allen and Mike Steel. Subtree transfer operations and their induced metrics on evolutionary trees. Annals of combinatorics, 5(1):1–15, 2001.
[52] David L Swofford. Phylogeny reconstruction. Molecular systematics, pages 411–501, 1990.
[53] Magnus Bordewich and Charles Semple. On the computational complexity of the rooted subtree prune and regraft distance. Annals of combinatorics, 8(4):409–423, 2005.
[54] Manfred G Grabherr, Brian J Haas, Moran Yassour, Joshua Z Levin, Dawn A Thompson, Ido Amit, Xian Adiconis, Lin Fan, Raktima Raychowdhury, Qiandong Zeng, et al. Full-length transcriptome assembly from rna-seq data without a reference genome. Nature biotechnology, 29(7):644, 2011.
[55] Serena Liu and Cole Trapnell. Single-cell transcriptome sequencing: recent advances and remaining challenges. F1000Research, 5, 2016.
[56] Philip Brennecke, Simon Anders, Jong Kyoung Kim, Aleksandra A Kolodziejczyk, Xiuwei Zhang, Valentina Proserpio, Bianka Baying, Vladimir Benes, Sarah A Teichmann, John C Marioni, et al. Accounting for technical noise in single-cell rna-seq experiments. Nature methods, 10(11):1093, 2013.
[57] Julia Zeitlinger, Alexander Stark, Manolis Kellis, Joung-Woo Hong, Sergei Nechaev, Karen Adelman, Michael Levine, and Richard A Young. Rna polymerase stalling at developmental control genes in the drosophila melanogaster embryo. Nature genetics, 39(12):1512, 2007.
[58] Ginger W Muse, Daniel A Gilchrist, Sergei Nechaev, Ruchir Shah, Joel S Parker, Sherry F Grissom, Julia Zeitlinger, and Karen Adelman. Rna polymerase is poised for activation across the genome. Nature genetics, 39(12):1507, 2007.
[59] I. Jonkers and J. T. Lis. Getting up to speed with transcription elongation by RNA polymerase II. Nat. Rev. Mol. Cell Biol., 16(3):167–177, Mar 2015.
[60] Charles G Danko, Stephanie L Hyland, Leighton J Core, Andre L Martins, Colin T Waters, Hyung WonLee, Vivian G Cheung, W Lee Kraus, John T Lis, and Adam Siepel. Identification of active transcriptional regulatory elements from gro-seq data. Nature methods, 12(5):433, 2015.
[61] Amy C Seila, J Mauro Calabrese, Stuart S Levine, Gene W Yeo, Peter B Rahl, Ryan A Flynn, Richard A Young, and Phillip A Sharp. Divergent transcription from active promoters. science, 322(5909):1849–1851, 2008.
[62] Jennifer E Kurasz, Christine E Hartman, David J Samuels, Bijoy K Mohanty, Anquilla Deleveaux, Jan Mr ́azek, and Anna C Karls. Genotoxic, metabolic, and oxidative stresses regulate the rna repair operon of salmonella enterica serovar typhimurium. Journal of bacteriology, 200(23):e00476–18, 2018.
[63] Matthew K Iyer, Yashar S Niknafs, Rohit Malik, Udit Singhal, Anirban Sahu, Yasuyuki Hosono, Terrence R Barrette, John R Prensner, Joseph R Evans, Shuang Zhao, et al. The landscape of long noncoding rnas in the human transcriptome. Nature genetics, 47(3):199, 2015.
[64] Victor Ambros. The functions of animal micrornas. Nature, 431(7006):350, 2004.
[65] Alexei A Aravin, Gregory J Hannon, and Julius Brennecke. The piwi-pirna pathway provides an adaptive defense in the transposon arms race. science, 318(5851):761–764, 2007.
[66] Anita G Seto, Robert E Kingston, and Nelson C Lau. The coming of age for piwi proteins. Molecular cell, 26(5):603–609, 2007.
[67] Julius Brennecke, Alexei A Aravin, Alexander Stark, Monica Dus, Manolis Kellis, Ravi Sachidanandam,and Gregory J Hannon. Discrete small rna-generating loci as master regulators of transposon activity in drosophila. Cell, 128(6):1089–1103, 2007.
[68] Naoki Sugimoto, Shu-ichi Nakano, Misa Katoh, Akiko Matsumura, Hiroyuki Nakamuta, Tatsuo Ohmichi, Mari Yoneyama, and Muneo Sasaki. Thermodynamic parameters to predict stability of rna/dna hybrid duplexes. Biochemistry, 34(35):11211–11216, 1995.
[69] Julian L Huppert. Thermodynamic prediction of rna–dna duplex-forming regions in the human genome. Molecular BioSystems, 4(6):686–691, 2008.
[70] Fabian A Buske, Denis C Bauer, John S Mattick, and Timothy L Bailey. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome research, 22(7):1372–1381, 2012.
[71] M. F. Lin, I. Jungreis, and M. Kellis. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics, 27(13):i275–282, Jul 2011.
[72] L. Wang, H. J. Park, S. Dasari, S. Wang, J. P. Kocher, and W. Li. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res., 41(6):e74, Apr 2013.
[73] M. Guttman, P. Russell, N. T. Ingolia, J. S. Weissman, and E. S. Lander. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell, 154(1):240–251, Jul 2013.
[74] A. A. Bazzini, T. G. Johnstone, R. Christiano, S. D. Mackowiak, B. Obermayer, E. S. Fleming, C. E. Vejnar, M. T. Lee, N. Rajewsky, T. C. Walther, and A. J. Giraldez. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J., 33(9):981–993, May 2014.
[75] Ruth Nussinov, George Pieczenik, Jerrold R Griggs, and Daniel J Kleitman. Algorithms for loop matchings. SIAM Journal on Applied mathematics, 35(1):68–82, 1978.
[76] David H Mathews, Jeffrey Sabina, Michael Zuker, and Douglas H Turner. Expanded sequence dependence of thermodynamic parameters improves prediction of rna secondary structure. Journal of molecular biology, 288(5):911–940, 1999.
[77] Rahul Tyagi and David H Mathews. Predicting helical coaxial stacking in rna multibranch loops. Rna, 13(7):939–951, 2007.
[78] Padideh Danaee, Mason Rouches, Michelle Wiley, Dezhong Deng, Liang Huang, and David Hendrix. bprna: large-scale automated annotation and analysis of rna secondary structure. Nucleic acids research, 46(11):5381–5394, 2018.
[79] Ronny Lorenz, Stephan H Bernhart, Christian H ̈oner Zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F Stadler, and Ivo L Hofacker. Viennarna package 2.0. Algorithms for molecular biology, 6(1):26,2011.
[80] MO Dayhoff, R Schwartz, and BC Orcutt. 22 a model of evolutionary change in proteins. In Atlas of protein sequence and structure, volume 5, pages 345–352. National Biomedical Research Foundation Silver Spring MD, 1978.
[81] Steven Henikoff and Jorja G Henikoff. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, 89(22):10915–10919, 1992.
[82] Samuel Karlin and Stephen F Altschul. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences, 87(6):2264–2268, 1990.
[83] Arlin Stoltzfus. On the possibility of constructive neutral evolution. Journal of Molecular Evolution, 49(2):169–181, 1999.
[84] Allan Force, Michael Lynch, F Bryan Pickett, Angel Amores, Yi-lin Yan, and John Postlethwait. Preservation of duplicate genes by complementary, degenerative mutations. Genetics, 151(4):1531–1545, 1999.
[85] Donald B Wetlaufer. Nucleation, rapid folding, and globular intrachain regions in proteins. Proceedingsof the National Academy of Sciences, 70(3):697–701, 1973.
[86] S. El-Gebali, J. Mistry, A. Bateman, S. R. Eddy, A. Luciani, S. C. Potter, M. Qureshi, L. J. Richardson, G. A. Salazar, A. Smart, E. L. L. Sonnhammer, L. Hirsh, L. Paladin, D. Piovesan, S. C. E. Tosatto, and R. D. Finn. The Pfam protein families database in 2019. Nucleic Acids Res., 47(D1):D427–D432,Jan 2019.
[87] J. A. Cuff and G. J. Barton. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins, 40(3):502–511, Aug 2000.
[88] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C.Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25(1):25–29, May 2000.
[89] Charles W Dunnett. A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association, 50(272):1096–1121, 1955.
[90] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
[91] Steven T Runyon, Yingnan Zhang, Brent A Appleton, Stephen L Sazinsky, Ping Wu, Borlan Pan, Christian Wiesmann, Nicholas J Skelton, and Sachdev S Sidhu. Structural and functional analysis of the pdz domains of human htra1 and htra3. Protein Science, 16(11):2454–2471, 2007.
[92] David S Johnson, Ali Mortazavi, Richard M Myers, and Barbara Wold. Genome-wide mapping of in vivo protein-dna interactions. Science, 316(5830):1497–1502, 2007.
[93] Raja Jothi, Suresh Cuddapah, Artem Barski, Kairong Cui, and Keji Zhao. Genome-wide identification of in vivo protein–dna binding sites from chip-seq data. Nucleic acids research, 36(16):5221, 2008.
[94] Dominic Schmidt, Michael D Wilson, Christiana Spyrou, Gordon D Brown, James Hadfield, and Duncan T Odom. Chip-seq: using high-throughput sequencing to discover protein–dna interactions. Methods, 48(3):240–248, 2009.
[95] Ben Langmead. Aligning short sequencing reads with bowtie. Current protocols in bioinformatics, 32(1):11–7, 2010.
[96] Joel Rozowsky, Ghia Euskirchen, Raymond K Auerbach, Zhengdong D Zhang, Theodore Gibson, Robert Bjornson, Nicholas Carriero, Michael Snyder, and Mark B Gerstein. Peakseq enables systematic scoring of chip-seq experiments relative to controls. Nature biotechnology, 27(1):66, 2009.
[97] Yong Zhang, Tao Liu, Clifford A Meyer, J ́erˆome Eeckhoute, David S Johnson, Bradley E Bernstein, Chad Nusbaum, Richard M Myers, Myles Brown, Wei Li, et al. Model-based analysis of chip-seq (macs). Genome biology, 9(9):R137, 2008.
[98] Shiliyang Xu, Sean Grullon, Kai Ge, and Weiqun Peng. Spatial clustering for identification of chip-enriched regions (sicer) to map regions of histone methylation patterns in embryonic stem cells. In Stem Cell Transcriptional Networks, pages 97–111. Springer, 2014.
[99] Jason Ernst and Manolis Kellis. Chromhmm: automating chromatin-state discovery and characterization. Nature methods, 9(3):215, 2012.
[100] Peter M Clark, Phillipe Loher, Kevin Quann, Jonathan Brody, Eric R Londin, and Isidore Rigoutsos. Argonaute clip-seq reveals mirna targetome diversity across tissue types. Scientific reports, 4:5947,2014.
[101] Xu Zhou and Erin K O’Shea. Integrated approaches reveal determinants of genome-wide binding and function of the transcription factor pho4. Molecular cell, 42(6):826-836, 2011.