1. Fundamental Concepts in Machine Learning
graph LR
A["**Machine Learning**"] --> B["**Classification**"]
A --> C["**Regression**"]
B --> D["**Spam Detection**
e.g., Email"]
C --> E["**Price Prediction**
e.g., House"]
B --> F["**Group Into**
**Categories**"]
C --> G["**Predict**
**Numbers**"]
F & G --> H["**Test Data**"]
H --> I["**Compare Errors**"]
style A fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#bbf,stroke:#333
style C fill:#bbf,stroke:#333
style D fill:#dfd
style E fill:#dfd
style F fill:#dfd
style G fill:#dfd
style H fill:#ffe,stroke:#333
style I fill:#ffe,stroke:#333
2. Cross Validation
graph LR
A["**Data**"] --> B["**Cross Validation**
**Splits Data**"]
B --> C["**Training Set**"]
B --> D["**Testing Set**"]
subgraph Types
E["**3-Fold CV**"]
F["**10-Fold CV**"]
G["**Leave-One-Out CV**"]
end
B --> E & F & G
C --> H["**Find Patterns**"]
D --> I["**Evaluate Model**"]
style A fill:#ffe,stroke:#333
style B fill:#f9f,stroke:#333,stroke-width:2px
style C fill:#bbf,stroke:#333
style D fill:#bbf,stroke:#333
style E fill:#dfd
style F fill:#dfd
style G fill:#dfd
style H fill:#dfd
style I fill:#dfd
3. Statistical Foundations
graph TD
A["**Probability Distributions**"] --> B["**Types**"]
B --> C["**Discrete Distributions**
**Key Functions**
PMF: \(P(X=x)\)
CDF: \(P(X\le x)\)"]
B --> D["**Continuous Distributions**
**Key Functions**
PDF: \(f(x)\)
CDF: \(F(x)=P(X\le x)\)"]
C --> E["**Binomial Distribution**
$$P(x|n,p) = \frac{n!}{x!(n-x)!} p^x(1-p)^{n-x}$$
Mean: \(\mu = np\)
Var: \(\sigma^2 = np(1-p)\)"]
C --> F["**Bernoulli Distribution**
$$P(x|p)= p^x(1-p)^{1-x}$$
Mean: \(\mu = p\)
Var: \(\sigma^2 = p(1-p)\)"]
C --> G["**Poisson Distribution**
$$P(x|\lambda)= \frac{e^{-\lambda}\lambda^x}{x!}$$
Mean: \(\mu = \lambda\)
Var: \(\sigma^2 = \lambda\)"]
D --> H["**Normal Distribution**
Mean: \(\mu\), Var: \(\sigma^2\)"]
D --> I["**Uniform Distribution**
\(a\le x\le b\)
Expected Frequency: $$\frac{\text{Total Frequency}}{k}$$"]
D --> J["**Exponential Distribution**
$$f(x;\lambda)= \lambda e^{-\lambda x}$$ (x ≥ 0)
CDF: $$P(X\le x)= 1-e^{-\lambda x}$$"]
D --> K["**Beta Distribution**
PDF: $$f(x;\alpha,\beta)= \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}$$
Mean: $$E[X]=\frac{\alpha}{\alpha+\beta}$$"]
D --> L["**T Distribution**
Used for small samples
with unknown population SD"]
style A fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#ffe,stroke:#333
style C fill:#bbf,stroke:#333
style D fill:#bbf,stroke:#333
style E fill:#dfd,stroke:#333
style F fill:#dfd,stroke:#333
style G fill:#dfd,stroke:#333
style H fill:#dfd,stroke:#333
style I fill:#dfd,stroke:#333
style J fill:#dfd,stroke:#333
style K fill:#dfd,stroke:#333
style L fill:#dfd,stroke:#333
graph TD
A["**Bayesian Estimation**"]
A --> B["**Bayes' Theorem**
$$P(\theta|data)=\frac{P(data|\theta)P(\theta)}{P(data)}$$"]
A --> C["**Key Concepts**"]
C --> D["**Likelihood Function**
Measures how well
parameters explain data"]
C --> E["**Prior & Posterior**
Prior updated by data
to form posterior"]
C --> F["**Sample Size Effect**
Larger samples
dominate prior"]
A --> G["**Beta-Binomial Example**"]
G --> H["**Prior Distribution**
$$\text{Beta}(\alpha,\beta)$$"]
G --> I["**Data Collection**
x successes in n trials"]
G --> J["**Posterior Distribution**
$$\text{Beta}(\alpha+x,\beta+n-x)$$"]
G --> K["**Posterior Mean**
$$\frac{\alpha+x}{(\alpha+x)+(\beta+n-x)}$$"]
style A fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#bbf,stroke:#333
style C fill:#ffe,stroke:#333
style D fill:#dfd,stroke:#333
style E fill:#dfd,stroke:#333
style F fill:#dfd,stroke:#333
style G fill:#ffe,stroke:#333
style H fill:#dfd,stroke:#333
style I fill:#dfd,stroke:#333
style J fill:#dfd,stroke:#333
style K fill:#dfd,stroke:#333
4. Linear Algebra & Regression
graph TD
A["**Linear Algebra**"]
A --> MT["**Matrix Theory**
**Rank**: Dimension of row/column space
**Determinant**: $$a(ei-fh)-b(di-fg)+c(dh-eg)$$
**Operations**: Addition, multiplication
(m×n)(n×p) = (m×p)
Non-commutative, distributive"]
A --> VO["**Vector Operations**
**Angle**: $$\cos(\theta)= \frac{u \cdot v}{\|u\|\|v\|}$$
**Cross Product**: $$u \times v = (u_2v_3-u_3v_2,\; u_3v_1-u_1v_3,\; u_1v_2-u_2v_1)$$"]
A --> EA["**Eigen Analysis**
**Eigenvalues & Eigenvectors**:
$$Av=\lambda v$$
Sum = trace of matrix
Product = determinant
**Orthogonal Matrices**:
$$Q^TQ=I$$, $$\det(Q)=\pm1$$
$$Q^{-1}=Q^T$$"]
A --> DP["**Determinant Properties**
Row swap: multiply by -1
Nonzero det ⇒ invertible
$$\det(I)=1$$
Singular if det = 0"]
A --> R["**Regression Methods**"]
R --> OLS["**Ordinary Least Squares**
- Minimizes squared residuals ('least squares')
- BLUE under classical assumptions
- Equals MLE under normal errors
**Slope Coefficient**: change in dependent variable per unit change in independent variable
**'Least squares'**: sum of squared differences between observed & predicted"]
R --> MLE["**Maximum Likelihood**
Maximizes log-likelihood
Same as OLS under normal errors"]
R --> MSE["**Mean Squared Error (MSE)**
- MSE = SSR / n (the average of squared residuals)
- Helps compare SSR across different sample sizes"]
R --> M["**Models**"]
M --> SR["**Sum of Residuals**
$$\sum (y_i - \hat{y}_i)$$
Issue: sign cancellation"]
SR --> SSR2["**SSR (Sum of Squared Residuals)**
$$\sum (y_i - \hat{y}_i)^2$$
- Avoids cancellation of positive/negative residuals by squaring
- Produces a differentiable objective for gradient-based methods
- Applies to any model shape (line, sinusoid, rocket trajectory)
- **Cannot compare across different training data sizes** (SSR grows with more data)"]
SSR2 --> MSE2["**MSE (Mean Squared Error)**
$$\text{MSE} = \frac{\sum (y_i - \hat{y}_i)^2}{n}$$
- MSE = SSR / n (the average of squared residuals)
- Helps compare SSR across different sample sizes"]
MSE2 --> R2_2["**R^2 (Coefficient of Determination)**
- Proportion of variation in dependent variable explained by the model
- Typically 0 ≤ R^2 ≤ 1
- Dimensionless, not affected by scaling
**SSR-based**: $$R^2 = \frac{\text{SSR(mean)} - \text{SSR(fitted)}}{\text{SSR(mean)}}$$
**MSE-based**: $$R^2 = \frac{\text{MSE(mean)} - \text{MSE(fitted)}}{\text{MSE(mean)}}$$"]
R --> MultiColl["**Handling Multicollinearity**
- Dropping or combining correlated variables
- Using Ridge or Lasso regression
- Ignoring is not recommended
- Can lead to unstable or inflated coefficient estimates"]
R --> Diagnostics["**Checking Model Assumptions**
- Residuals vs fitted values (patterns, homoscedasticity)
- Residuals vs each independent variable (linearity)
- Normal QQ plot (normality of errors)
- 'No outliers' is *not* an official assumption
- Hypothesis tests on coefficients do *not* check assumptions"]
R --> Perf["**Performance Evaluation**
- Adjusted R^2 (penalizes extra predictors)
- MSE, RMSE, etc. measure average error
- p-values of coefficients are about significance, *not* performance
- Plotting residuals vs X is for assumptions, *not* performance"]
R --> GradDescent["**Gradient Descent**
- Iterative optimization method
- Updates parameters to minimize cost function
- Addresses 'optimization' problem"]
A --> PCA["**Principal Component Analysis**
**Principal Components**:
- Orthogonal (uncorrelated) axes capturing maximum variance
- Sensitive to data scaling
- 2D visualization: plot top 2 components
- If first 2 PCs explain 85%, remainder is 15%"]
R --> FeatSel["**Feature Selection**
- Reduces overfitting
- Increases interpretability
- Reduces computational cost
- Not about including all features"]
style A fill:#f9f,stroke:#333,stroke-width:2px
style MT fill:#dfd,stroke:#333
style VO fill:#dfd,stroke:#333
style EA fill:#dfd,stroke:#333
style DP fill:#dfd,stroke:#333
style R fill:#bbf,stroke:#333
style OLS fill:#dfd,stroke:#333
style MLE fill:#dfd,stroke:#333
style MSE fill:#dfd,stroke:#333
style M fill:#dfd,stroke:#333
style SR fill:#dfd,stroke:#333
style SSR2 fill:#dfd,stroke:#333
style MSE2 fill:#dfd,stroke:#333
style R2_2 fill:#dfd,stroke:#333
style MultiColl fill:#dfd,stroke:#333
style Diagnostics fill:#dfd,stroke:#333
style Perf fill:#dfd,stroke:#333
style GradDescent fill:#dfd,stroke:#333
style PCA fill:#dfd,stroke:#333
style FeatSel fill:#dfd,stroke:#333
900. Dummy Section
graph TD
A["**Dummy Root Node**
_Click to go home_"]
A --> B["**Category One**"]
A --> C["**Category Two**"]
A --> D["**Category Three**"]
B --> B1["**Sub Category 1.1**"]
B --> B2["**Sub Category 1.2**"]
B1 --> B11["**Detail 1.1.1**"]
B1 --> B12["**Detail 1.1.2**"]
B2 --> B21["**Detail 1.2.1**"]
C --> C1["**Sub Category 2.1**"]
C --> C2["**Sub Category 2.2**"]
C1 --> C11["**Detail 2.1.1**"]
C2 --> C21["**Detail 2.2.1**"]
C2 --> C22["**Detail 2.2.2**"]
D --> D1["**Sub Category 3.1**"]
D --> D2["**Sub Category 3.2**"]
D1 --> D11["**Detail 3.1.1**"]
D2 --> D21["**Detail 3.2.1**"]
B11 --> E["**Shared Node**"]
C22 --> E
D21 --> E
click A "../../index.html" "Go to Homepage"
style A fill:#f9f,stroke:#0366d6,stroke-width:2px,text-decoration:underline
style B fill:#bbf,stroke:#333
style C fill:#bbf,stroke:#333
style D fill:#bbf,stroke:#333
style B1 fill:#ffe,stroke:#333
style B2 fill:#ffe,stroke:#333
style C1 fill:#ffe,stroke:#333
style C2 fill:#ffe,stroke:#333
style D1 fill:#ffe,stroke:#333
style D2 fill:#ffe,stroke:#333
style B11 fill:#dfd,stroke:#333
style B12 fill:#dfd,stroke:#333
style B21 fill:#dfd,stroke:#333
style C11 fill:#dfd,stroke:#333
style C21 fill:#dfd,stroke:#333
style C22 fill:#dfd,stroke:#333
style D11 fill:#dfd,stroke:#333
style D21 fill:#dfd,stroke:#333
style E fill:#f9f,stroke:#333,stroke-width:2px
p-value explanation: https://chatgpt.com/share/67b23858-e9dc-8002-92bd-4f3edc7a2bfb