Python高维数据分析

本书从矩阵计算如特征值分解和奇异值分解出发，讨论了正规方程的很小二乘法模型引出欠秩线性方程组的求解方法问题；然后介绍了两种有损的降维方法，即主成分分析（主成分回归）和偏很小二乘回归，包括模型、算法和多个实例，并扩展到线性回归的正则化方法，给出了岭回归和Lasso的原理算法和实例；很后通过红外光谱的标定迁移实例将线性模型扩展到迁移学习领域。本书每章都有基于Python语言和Sklearn机器学习库的红外光谱数据集分析的实例。红外光谱集是关于物质吸光率的纯数据，可以与其标签标示的数据物质浓度直接进行回归分析，读者在阅读中可以把精力优选限度地集中在高维数据的建模、算法实现和分析过程上。本书既可作为信息管理和信息系统专业、计算机相关专业和大数据专业的教学用书，也可作为从事光谱分析、化学分析的工程人员及化学计量学研究人员的参考书，还适合对数据分析和研究感兴趣的其他Python工程师学习阅读。本书引用的原始文献和数据对上述人员是很好有帮助的。

Chapter 1 Basis of Matrix Calculation
1．1 Fundamental Concepts
1．1．1 Notation
1．1．2 “BiggerBlock” Interpretations of Matrix Multiplication
1．1．3 Fundamental Linear Algebra
1．1．4 Four Fundamental Subspaces of a Matrix
1．1．5 Vector Norms
1．1．6 Determinants
1．1．7 Properties of Determinants
1．2 The Most Basic Matrix Decomposition
1．2．1 Gaussian Elimination
1．2．2 The LU Decomposition
1．2．3 The LDM Factorization
1．2．4 The LDL Decomposition for Symmetric Matrices
1．2．5 Cholesky Decomposition
1．2．6 Applications and Examples of the Cholesky Decomposition
1．2．7 Eigendecomposition
1．2．8 Matrix Norms
1．2．9 Covariance Matrices
1．3 Singular Value Decomposition （SVD）
1．3．1 Orthogonalization
1．3．2 Existence Proof of the SVD
1．3．3 Partitioning the SVD
1．3．4 Properties and Interpretations of the SVD
1．3．5 Relationship between SVD and ED
1．3．6 Ellipsoidal Interpretation of the SVD
1．3．7 An Interesting Theorem
1．4 The Quadratic Form
1．4．1 Quadratic Form Theory
1．4．2 The Gaussian MultiVariate Probability Density Function
1．4．3 The Rayleigh Quotient
Chapter 2 The Solution of Least Squares Problems
2．1 Linear Least Squares Estimation
2．1．1 Example： Autoregressive Modelling
2．1．2 The LeastSquares Solution
2．1．3 Interpretation of the Normal Equations
2．1．4 Properties of the LS Estimate
2．1．5 Linear LeastSquares Estimation and the Cramer Rao Lower Bound
2．2 A Generalized “PseudoInverse” Approach to Solving the Leastsquares Problem
2．2．1 Least Squares Solution Using the SVD
2．2．2 Interpretation of the PseudoInverse
Chapter 3 Principal Component Analysis
3．1 Introductory Example
3．2 Theory
3．2．1 Taking Linear Combinations
3．2．2 Explained Variation
3．2．3 PCA as a Model
3．2．4 Taking More Components
3．3 History of PCA
3．4 Practical Aspects
3．4．1 Preprocessing
3．4．2 Choosing the Number of Components
3．4．3 When Using PCA for Other Purposes
3．4．4 Detecting Outliers
References
3．5 Sklearn PCA
3．5．1 Source Code
3．5．2 Examples
3．6 Principal Component Regression
3．6．1 Source Code
3．6．2 KFold CrossValidation
3．6．3 Examples
3．7 Subspace Methods for Dynamic Model Estimation in PAT Applications
3．7．1 Introduction
3．7．2 Theory
3．7．3 State Space Models in Chemometrics
3．7．4 Milk Coagulation Monitoring
3．7．5 State Space Based Monitoring
3．7．6 Results
3．7．7 Concluding remarks
3．7．8 Appendix
References
Chapter 4 Partial Least Squares Analysis
4．1 Basic Concept
4．1．1 Partial Least Squares
4．1．2 Form of Partial Least Squares
4．1．3 PLS Regression
4．1．4 Statistic
Reference
4．2 NIPALS and SIMPLS Algorithm
4．2．1 NIPALS
4．2．2 SIMPLS
References
4．3 Programming Method of Standard Partial Least Squares
4．3．1 Crossvalidation
4．3．2 Procedure of NIPALS
4．4 Example Application
4．4．1 Demo of PLS
4．4．2 Corn Dataset
4．4．3 Wheat Dataset
4．4．4 Pharmaceutical Tablet Dataset
4．5 Stack Partial Least Squares
4．5．1 Introduction
4．5．2 Theory of Stack Partial Least Squares
4．5．3 Demo of SPLS
4．5．4 Experiments
References
Chapter 5 Regularization
5．1 Regularization
5．1．1 Classification
5．1．2 Tikhonov Regularization
5．1．3 Regularizers for Sparsity
5．1．4 Other Uses of Regularization in Statistics and Machine Learning
5．2 Ridge Regression： Biased Estimation for Nonorthogonal Problems
5．2．1 Properties of Best Linear Unbiased Estimation
5．2．2 Ridge Regression
5．2．3 The Ridge Trace
5．2．4 Mean Square Error Properties of Ridge Regression
5．2．5 A General Form of Ridge Regression
5．2．6 Relation to Other Work in Regression
5．2．7 Selecting a Better Estimate of ?
References
5．3 Lasso
5．3．1 Introduction
5．3．2 Theory of the Lasso
References
5．4 The Example of Ridge Regression and Lasso Regression
5．4．1 Example
5．4．2 Practical Example
5．5 Sparse PCA
5．5．1 Introduction
5．5．2 Motivation and Method Details
5．5．3 SPCA for p ≥ n and Gene Expression Arrays
5．5．4 Demo of SPCA
References
Chapter 6 Transfer Method
6．1 Calibration Transfer of Spectral Models［1］
6．1．1 Introduction
6．1．2 Calibration Transfer Setting
6．1．3 Related Work
6．1．4 New or Adapted Methods
6．1．5 Standardfree Alternatives to Methods Requiring Transfer StandardsReferences
6．2 PLS Subspace Based Calibration Transfer for NIR Quantitative Analysis
6．2．1 Calibration Transfer Method
6．2．2 Experimental
6．2．3 Results and Discussion
6．2．4 Conclusion
References
6．3 Calibration Transfer Based on Affine Invariance for NIR without Standard Samples
6．3．1 Theory
6．3．2 Experimental
6．3．3 Results and Discussion
6．3．4 Conclusions