The unprecedented advance in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. In the life sciences, data analysis is now part of practically every research project. Genomics, in particular, is being driven by new measurement technologies that permit us to observe certain molecular entities for the first time. These observations are leading to discoveries analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are microarrays and next generation sequencing. This book will cover several of the statistical concepts and data analytic skills needed to succeed in data-driven life science research. We go from relatively basic concepts related to computing p-values to advanced topics related to analyzing high-throughput data.
While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical data-related challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution. By running the code yourself, and seeing data generation and analysis happen live, you will get a better intuition for the concepts, the mathematics, and the theory. The book was created using the R markdown language and we make all this code available to the reader. This means that readers can replicate all the figures and analyses used to create the book.
Acknowledgements
Introduction
What Does This Book Cover?
How Is This Book Different?
Getting Started
Installing R
Installing RStudio
Learn R Basics
Installing Packages
Importing Data into R
Brief Introduction to dplyr
Inference
Introduction
Random Variables
The Null Hypothesis
Distributions
Probability Distribution
Normal Distribution
Populations, Samples and Estimates
Central Limit Theorem and t-distribution
Central Limit Theorem in Practice
t-tests in Practice
The t-distribution in Practice
Confidence Intervals
Power Calculations
Monte Carlo Simulation
Parametric Simulations for the Observations
Permutation Tests
Association Tests
Exploratory Data Analysis
Quantile Quantile Plots
Boxplots
Scatterplots And Correlation
Stratification
Bi-variate Normal Distribution
Plots To Avoid
Misunderstanding Correlation (Advanced)
Robust Summaries
Wilcoxon Rank Sum Test
Matrix Algebra
Motivating Examples
Matrix Notation
Solving System of Equations
Vectors, Matrices and Scalars
Matrix Operations
Examples
Linear Models
The Design Matrix
The Mathematics Behind lm()
Standard Errors
Interactions and Contrasts
Linear Model with Interactions
Analysis of variance
Co-linearity
Rank
Removing Confounding
The QR Factorization (Advanced)
Going Further
Inference For High Dimensional Data
Introduction
Inference in Practice
Procedures
Error Rates
The Bonferroni Correction
False Discovery Rate
Direct Approach to FDR and q-values (Advanced)
Basic Exploratory Data Analysis
Statistical Models
The Binomial Distribution
The Poisson Distribution
Maximum Likelihood Estimation
Distributions for Positive Continuous Values
Bayesian Statistics
Hierarchical Models
Distance and Dimension Reduction
Introduction
Euclidean Distance
Distance in High Dimensions
Dimension Reduction Motivation
Singular Value Decomposition
Projections
Rotations
Multi-Dimensional Scaling Plots
Principal Component Analysis
Basic Machine Learning
Clustering
Conditional Probabilities and Expectations
Smoothing
Bin Smoothing
Loess
Class Prediction
Cross-validation
Batch Effects
Confounding
Confounding: High-throughput Example
Discovering Batch Effects with EDA
Gene Expression Data
Motivation for Statistical Approaches
Adjusting for Batch Effects with Linear Models
Factor Analysis
Modeling Batch Effects with Factor Analysis
2014 © dev-list.com | Terms and Conditions | Privacy Policy | Contact Us