Preface to the fifth edition xv // Plan of the book xxi // Introduction to bioinformatics on the web xxii // Acknowledgements xxiii // 1 INTRODUCTION 1 // Life in space and time 4 // phenotype = genotype + environment + life history + epigenetics 4 // Evolution is the change over time in the world of living things 5 // Biological classification and nomenclature 6 // Dogmas: central and peripheral 9 // The structure of DNA 9 // Transcription and translation 12 // The structures of proteins 12 // Statics and dynamics 17 // Systems biology 17 // The human genome 19 // Variation in human genome sequences 20 // The human genome and medicine 21 // Databases in molecular biology 28 // Observables and data archives 29 // A database without effective modes of access is merely a data graveyard 29 // Information flow in bioinformatics 31 // Curation, annotation, and quality control 32 // The World Wide Web 33 // Electronic publication 34 // Computers and computer science 34 // Programming 35 // Aprčs moi, le déluge? Sorry—too late! 38 // How much sequencing power is there in the world? 41 // How does the amount of data in bioinformatics compare with other large scientific information archives? 41 // Recommended reading 42 // Exercises and Problems 43 // 2 FROM GENETICS TO GENOMES 48 // The classical genetics background 49 // DNA embodies genes 50 // Contents // Maps and tour guides // Linkage maps Linkage // Chromosome banding // High-resolution maps, based directly on DNA sequences Restriction maps // DNA sequencing // Frederick Sanger and the development of DNA sequencing DNA sequencing by termination of chain replication Automation of DNA sequencing // Next-generation sequencing // Paired-end reads Life in the fast lane // Assembly—computational aspects Pattern matching Suffix trees // Fragment assembly Genomics in personal identification DNA ’fingerprinting’ //
Personal identification by amplification of specific regions has superseded the RFLP approach Mitochondrial DNA // Analysis of non-human DNA sequences Parentage testing // Ethical, legal, and social issues // Databases containing human DNA sequence information Use of DNA sequencing in research on human subjects // Recommended reading Exercises and Problems // 3 THE PANORANA OF LIFE // Genomes, transcriptomes, and proteomes Genes // Proteomics and transcriptomics // Eavesdropping on the transmission of genetic information Genome-sequencing projects Genomes of prokaryotes // The genome of the bacterium Escherichia coli // The genome of the archaeon Methanocaldococcus jannaschii // The genome of one of the simplest organisms: Mycoplasma genitalium // Metagenomics: the collection of genomes in a coherent environmental sample // The human microbiome // Genomes of eukarya Gene families // The genome of Saccharomyces cerevisiae (Baker’s yeast) The genome of Caenorhabditis elegans The genome of Drosophila melanogaster The genome of Arabidopsis thaliana // The genome of Homo sapiens (the human genome) // Protein-coding genes Repeat sequences RNA // Single-nucleotide polymorphisms and haplotypes Systematic measurements and collections of single-nucleotide polymorphisms // Genetic diversity in anthropology // DNA sequences and languages // Evolution of genomes // Please pass the genes: horizontal gene transfer Comparative genomics of eukarya Recommended reading Exercises and Problems // 4 ALIGNMENTS AND PHYLOGENETIC TREES 123 // Introduction to sequence alignment 124 // Dotplots and sequence alignments 130 // Measures of sequence similarity 132 // Scoring schemes 132 // Derivation of substitution matrices: PAM matrices 133 // Computing the alignment of two sequences 135 // Variations and generalizations 135 // Approximate methods for quick screening of databases 135 //
The dynamic programming algorithm for optimal pairwise sequence alignment 137 // Significance of alignments 141 // Multiple sequence alignment 143 // Applications of multiple sequence alignments to database searching 143 // Profiles 146 // PSI-BLAST 147 // Complete pairwise sequence alignment of human PAX-6 protein and Drosophila melanogaster eyeless 151 // Hidden Markov Models 152 // Phylogeny 154 // Determination of taxonomic relationships from molecular properties 155 // Use of sequences to determine phylogenetic relationships 159 // Use of SINES and LINES to derive phylogenetic relationships 161 // Phylogenetic trees 162 // Clustering methods 164 // The maximum-likelihood method 165 // Reconstruction of ancestral sequences 165 // Pyruvate decarboxylase: synthesis, activity, and crystal structure of predicted ancestor 167 // The problem of varying rates of evolution 168 // Bayesian methods 169 // Are trees the correct way to present phylogenetic relationships? 169 // Computational considerations 170 // Putting it all together 171 // Recommended reading 171 // Exercises and Problems 172 // 5 STRUCTURAL BIOINFORMATICS AND DRUG DISCOVERY 177 // Introduction 178 // Protein stability and folding 180 // The Sasisekharan-Ramakrishnan-Ramachandran plot describes allowed mainchain conformations 180 // The sidechains 181 // Protein stability and denaturation 183 // Protein folding as a process 185 // Applications of hydrophobicity 187 // Coiled-coiled proteins 187 // Description of the variety of protein structures 190 // Superposition of structures, and structural alignments 192 // Evolution of protein structures 197 // Classifications of protein structures 199 // SCOP 199 // Protein structure prediction and modelling 201 // ? priori and empirical methods 202 // Critical Assessment of Structure Prediction 203 // Secondary structure prediction 204 //
Homology modelling 205 // Fold recognition 205 // Conformational energy calculations and molecular dynamics 207 // ROSETTA 209 // Protein structure prediction from contact maps derved from correlated mutations in multiple sequence alignments 210 // Design of novel proteins 213 // Drug discovery and development 215 // The lead compound 216 // Improving on the lead compound: quantitative structure-activity relationships 217 // Eioinformatics in drug discovery and development 218 // Molecular modelling in drug discovery 219 // Recommended reading // Exercises and Problems // 225 // 228 // 6 SCIENTIFIC PUBLICATIONS AND ARCHIVES: MEDIA, CONTENT, ACCESS, AND PRESENTATION 233 // The scientific literature 234 // Access to scholarly publications 235 // Open access 236 // The Public Library of Science 237 // Traditional and digital libraries 237 // How to populate a digital library 238 // The information explosion 239 // The web: higher dimensions 239 // New media: video, sound 240 // Searching the scientific literature 240 // Bibliography management 241 // Databases 242 // Database contents 242 // Database quality control 243 // The literature as a database 244 // Database organization 244 // Annotation 246 // Markup languages 248 // Database access 250 // Links 250 // Database interoperability 251 // Data mining 251 // Programming languages and tools for database construction and access 255 // Traditional programming languages 255 // Scripting languages 256 // Program libraries specialized for molecular biology 256 // Java—computing over the web 256 // Natural language processing 257 // Natural language processing in mining the biomedical literature 258 // Biomedical applications of text mining 260 // Hypothesis generation 264 // A glaucoma-related network derived by text mining 265 // Recommended reading 268 // Exercises and Problems 269 //
7 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 271 // What are artificial intelligence and machine learning? 272 // Classification and clustering 273 // Binary classifier 276 // Receiver Operating Characteristic (ROC) curves 277 // Artificial neural networks 279 // Self-organizing maps 281 // Decision trees 281 // Support vector machines (SVMs) 286 // Kernel methods 286 // Clustering 287 // Clustering by graph spectral theory 291 // Recommended reading 293 // Exercises and Problems 293 // 8 INTRODUCTION TO SYSTEMS BIOLOGY 296 // Introduction 297 // Networks and graphs 298 // Connectivity in networks 299 // Dynamics, stability, and robustness 301 // Some sources of ideas for systems biology 302 // Complexity of sequences 302 // Shannon’s definition of entropy 303 // Complexity of sequences 304 // The relationship between complexity, randomness, and compressibility 305 // The Burrows-Wheeler Transform 305 // Inverting the Burrows-Wheeler Transform 306 // The Burrows-Wheeler Transform brings repeats together, facilitating compression 306 // Use of the Burrows-Wheeler transform for searching for patterns in strings 306 // Complexity of other types of biological data 308 // Static and dynamic complexity 308 // Predictability and chaos 309 // Analysis and comparison of networks 310 // Analysis of graphs by matrix algebra 311 // Graph isomorphism 312 // Recommended reading 314 // Exercises and Problems 314 // 9 METABOLIC PATHWAYS 317 // Introduction 318 // Classification of protein function 320 // The Enzyme Commission 320 // The Gene Ontology™ Consortium protein function classification 320 // Prediction of protein function 321 // Catalysis by enzymes 324 // Active sites 325 // Cofactors 325 // Contents // protein-ligand binding equilibria 326 // Enzyme kinetics 327 // Measures of effectiveness of enzymes 328 // How do enzymes evolve new functions? 329 //
Control over enzyme activity 329 // Structural mechanisms of evolution of altered or r\ove\ protein functions 329 // Pathways and limits in the divergence of sequence, structure, and function 334 // Evolution by gene duplication 335 // Databases of metabolic pathways 337 // The Kyoto Encyclopedia of Genes and Genomes (KEGG) 339 // Evolution and phylogeny of metabolic pathways 341 // Pathway comparison 341 // Alignment of metabolic pathways 343 // Comparing linear metabolic pathways 343 // Comparing non-linear metabolic pathways: The pentose phosphate pathway and the Calvin-Benson cycle 346 // Dynamics of metabolic networks 347 // Robustness of metabolic networks 347 // Dynamic modelling of metabolism 347 // Simulation of metabolic pathways in Plasmodium falciparum 351 // The Human Metabolome Database supports clinical applications to the study of inborn errors of metabolism, and to cancer 352 // Recommended reading 353 // Exercises and Problems 353 // 10 CONTROL OF ORGANIZATION AND ORGANIZATION OF CONTROL 355 // Transcriptomics 356 // The ENCODE Project 357 // Determination of RNA sequences 358 // RNAseq v. microarrays 358 // DNA microarrays 359 // RNAseq 363 // The Genotype-Tissue Expression (GTEx) project 366 // Expression patterns in different physiological states 367 // Variation of expression patterns during the life cycle of Drosophila melanogaster 368 // Different life stages make different demands on different genes 370 // Protein complexes and aggregates 373 // Properties of protein-protein complexes 373 // Protein interaction networks 375 // Components of the primosome assembly in Bacillus subtilis 378 // Regulatory networks 380 // Signal transduction and transcriptional control 380 // Structural biology of regulatory networks 382 // Examples of relatively simple regulatory control networks 383 // Regulation of the lactose operon in E. coli 383 //
The genetic switch of bacteriophage A. 385 // The diauxic shift in Saccharomyces cerevisiae 389 // Logical structure of regulatory networks 391 // The transcriptional regulatory network of E. coli 391 // The transcriptional regulatory network of Saccharomyces cerevisiae 392 // Adaptability of the yeast regulatory network 393 // Recommended reading 396 // Exercises and Problems 396 // Conclusions 399 // Index 400