1. Fundamental Concepts in Machine Learning

graph LR A["**Machine Learning**"] --> B["**Classification**"] A --> C["**Regression**"] B --> D["**Spam Detection**
e.g., Email"] C --> E["**Price Prediction**
e.g., House"] B --> F["**Group Into**
**Categories**"] C --> G["**Predict**
**Numbers**"] F & G --> H["**Test Data**"] H --> I["**Compare Errors**"] style A fill:#f9f,stroke:#333,stroke-width:2px style B fill:#bbf,stroke:#333 style C fill:#bbf,stroke:#333 style D fill:#dfd style E fill:#dfd style F fill:#dfd style G fill:#dfd style H fill:#ffe,stroke:#333 style I fill:#ffe,stroke:#333

2. Cross Validation

graph LR A["**Data**"] --> B["**Cross Validation**
**Splits Data**"] B --> C["**Training Set**"] B --> D["**Testing Set**"] subgraph Types E["**3-Fold CV**"] F["**10-Fold CV**"] G["**Leave-One-Out CV**"] end B --> E & F & G C --> H["**Find Patterns**"] D --> I["**Evaluate Model**"] style A fill:#ffe,stroke:#333 style B fill:#f9f,stroke:#333,stroke-width:2px style C fill:#bbf,stroke:#333 style D fill:#bbf,stroke:#333 style E fill:#dfd style F fill:#dfd style G fill:#dfd style H fill:#dfd style I fill:#dfd

3. Statistical Foundations

graph TD A["**Probability Distributions**"] --> B["**Types**"] B --> C["**Discrete Distributions**

**Key Functions**
PMF: $P(X=x)$
CDF: $P(X\le x)$"] B --> D["**Continuous Distributions**

**Key Functions**
PDF: $f(x)$
CDF: $F(x)=P(X\le x)$"] C --> E["**Binomial Distribution**
$$P(x|n,p) = \frac{n!}{x!(n-x)!} p^x(1-p)^{n-x}$$
Mean: $\mu = np$
Var: $\sigma^2 = np(1-p)$"] C --> F["**Bernoulli Distribution**
$$P(x|p)= p^x(1-p)^{1-x}$$
Mean: $\mu = p$
Var: $\sigma^2 = p(1-p)$"] C --> G["**Poisson Distribution**
$$P(x|\lambda)= \frac{e^{-\lambda}\lambda^x}{x!}$$
Mean: $\mu = \lambda$
Var: $\sigma^2 = \lambda$"] D --> H["**Normal Distribution**
Mean: $\mu$, Var: $\sigma^2$"] D --> I["**Uniform Distribution**
$a\le x\le b$
Expected Frequency: $$\frac{\text{Total Frequency}}{k}$$"] D --> J["**Exponential Distribution**
$$f(x;\lambda)= \lambda e^{-\lambda x}$$ (x ≥ 0)
CDF: $$P(X\le x)= 1-e^{-\lambda x}$$"] D --> K["**Beta Distribution**
PDF: $$f(x;\alpha,\beta)= \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}$$
Mean: $$E[X]=\frac{\alpha}{\alpha+\beta}$$"] D --> L["**T Distribution**
Used for small samples
with unknown population SD"] style A fill:#f9f,stroke:#333,stroke-width:2px style B fill:#ffe,stroke:#333 style C fill:#bbf,stroke:#333 style D fill:#bbf,stroke:#333 style E fill:#dfd,stroke:#333 style F fill:#dfd,stroke:#333 style G fill:#dfd,stroke:#333 style H fill:#dfd,stroke:#333 style I fill:#dfd,stroke:#333 style J fill:#dfd,stroke:#333 style K fill:#dfd,stroke:#333 style L fill:#dfd,stroke:#333

graph TD A["**Bayesian Estimation**"] A --> B["**Bayes' Theorem**
$$P(\theta|data)=\frac{P(data|\theta)P(\theta)}{P(data)}$$"] A --> C["**Key Concepts**"] C --> D["**Likelihood Function**
Measures how well
parameters explain data"] C --> E["**Prior & Posterior**
Prior updated by data
to form posterior"] C --> F["**Sample Size Effect**
Larger samples
dominate prior"] A --> G["**Beta-Binomial Example**"] G --> H["**Prior Distribution**
$$\text{Beta}(\alpha,\beta)$$"] G --> I["**Data Collection**
x successes in n trials"] G --> J["**Posterior Distribution**
$$\text{Beta}(\alpha+x,\beta+n-x)$$"] G --> K["**Posterior Mean**
$$\frac{\alpha+x}{(\alpha+x)+(\beta+n-x)}$$"] style A fill:#f9f,stroke:#333,stroke-width:2px style B fill:#bbf,stroke:#333 style C fill:#ffe,stroke:#333 style D fill:#dfd,stroke:#333 style E fill:#dfd,stroke:#333 style F fill:#dfd,stroke:#333 style G fill:#ffe,stroke:#333 style H fill:#dfd,stroke:#333 style I fill:#dfd,stroke:#333 style J fill:#dfd,stroke:#333 style K fill:#dfd,stroke:#333

4. Linear Algebra & Regression

graph TD A["**Linear Algebra**"] A --> MT["**Matrix Theory**

**Rank**: Dimension of row/column space
**Determinant**: $$a(ei-fh)-b(di-fg)+c(dh-eg)$$
**Operations**: Addition, multiplication
(m×n)(n×p) = (m×p)
Non-commutative, distributive"] A --> VO["**Vector Operations**

**Angle**: $$\cos(\theta)= \frac{u \cdot v}{\|u\|\|v\|}$$
**Cross Product**: $$u \times v = (u_2v_3-u_3v_2,\; u_3v_1-u_1v_3,\; u_1v_2-u_2v_1)$$"] A --> EA["**Eigen Analysis**

**Eigenvalues & Eigenvectors**:
$$Av=\lambda v$$
Sum = trace of matrix
Product = determinant

**Orthogonal Matrices**:
$$Q^TQ=I$$, $$\det(Q)=\pm1$$
$$Q^{-1}=Q^T$$"] A --> DP["**Determinant Properties**

Row swap: multiply by -1
Nonzero det ⇒ invertible
$$\det(I)=1$$
Singular if det = 0"] A --> R["**Regression Methods**"] R --> OLS["**Ordinary Least Squares**

- Minimizes squared residuals ('least squares')
- BLUE under classical assumptions
- Equals MLE under normal errors
**Slope Coefficient**: change in dependent variable per unit change in independent variable
**'Least squares'**: sum of squared differences between observed & predicted"] R --> MLE["**Maximum Likelihood**

Maximizes log-likelihood
Same as OLS under normal errors"] R --> MSE["**Mean Squared Error (MSE)**

- MSE = SSR / n (the average of squared residuals)
- Helps compare SSR across different sample sizes"] R --> M["**Models**"] M --> SR["**Sum of Residuals**
$$\sum (y_i - \hat{y}_i)$$
Issue: sign cancellation"] SR --> SSR2["**SSR (Sum of Squared Residuals)**
$$\sum (y_i - \hat{y}_i)^2$$

- Avoids cancellation of positive/negative residuals by squaring
- Produces a differentiable objective for gradient-based methods
- Applies to any model shape (line, sinusoid, rocket trajectory)
- **Cannot compare across different training data sizes** (SSR grows with more data)"] SSR2 --> MSE2["**MSE (Mean Squared Error)**
$$\text{MSE} = \frac{\sum (y_i - \hat{y}_i)^2}{n}$$

- MSE = SSR / n (the average of squared residuals)
- Helps compare SSR across different sample sizes"] MSE2 --> R2_2["**R^2 (Coefficient of Determination)**

- Proportion of variation in dependent variable explained by the model
- Typically 0 ≤ R^2 ≤ 1
- Dimensionless, not affected by scaling
**SSR-based**: $$R^2 = \frac{\text{SSR(mean)} - \text{SSR(fitted)}}{\text{SSR(mean)}}$$
**MSE-based**: $$R^2 = \frac{\text{MSE(mean)} - \text{MSE(fitted)}}{\text{MSE(mean)}}$$"] R --> MultiColl["**Handling Multicollinearity**

- Dropping or combining correlated variables
- Using Ridge or Lasso regression
- Ignoring is not recommended
- Can lead to unstable or inflated coefficient estimates"] R --> Diagnostics["**Checking Model Assumptions**

- Residuals vs fitted values (patterns, homoscedasticity)
- Residuals vs each independent variable (linearity)
- Normal QQ plot (normality of errors)
- 'No outliers' is *not* an official assumption
- Hypothesis tests on coefficients do *not* check assumptions"] R --> Perf["**Performance Evaluation**

- Adjusted R^2 (penalizes extra predictors)
- MSE, RMSE, etc. measure average error
- p-values of coefficients are about significance, *not* performance
- Plotting residuals vs X is for assumptions, *not* performance"] R --> GradDescent["**Gradient Descent**

- Iterative optimization method
- Updates parameters to minimize cost function
- Addresses 'optimization' problem"] A --> PCA["**Principal Component Analysis**

**Principal Components**:
- Orthogonal (uncorrelated) axes capturing maximum variance
- Sensitive to data scaling
- 2D visualization: plot top 2 components
- If first 2 PCs explain 85%, remainder is 15%"] R --> FeatSel["**Feature Selection**

- Reduces overfitting
- Increases interpretability
- Reduces computational cost
- Not about including all features"] style A fill:#f9f,stroke:#333,stroke-width:2px style MT fill:#dfd,stroke:#333 style VO fill:#dfd,stroke:#333 style EA fill:#dfd,stroke:#333 style DP fill:#dfd,stroke:#333 style R fill:#bbf,stroke:#333 style OLS fill:#dfd,stroke:#333 style MLE fill:#dfd,stroke:#333 style MSE fill:#dfd,stroke:#333 style M fill:#dfd,stroke:#333 style SR fill:#dfd,stroke:#333 style SSR2 fill:#dfd,stroke:#333 style MSE2 fill:#dfd,stroke:#333 style R2_2 fill:#dfd,stroke:#333 style MultiColl fill:#dfd,stroke:#333 style Diagnostics fill:#dfd,stroke:#333 style Perf fill:#dfd,stroke:#333 style GradDescent fill:#dfd,stroke:#333 style PCA fill:#dfd,stroke:#333 style FeatSel fill:#dfd,stroke:#333

900. Dummy Section

graph TD A["**Dummy Root Node**
_Click to go home_"] A --> B["**Category One**"] A --> C["**Category Two**"] A --> D["**Category Three**"] B --> B1["**Sub Category 1.1**"] B --> B2["**Sub Category 1.2**"] B1 --> B11["**Detail 1.1.1**"] B1 --> B12["**Detail 1.1.2**"] B2 --> B21["**Detail 1.2.1**"] C --> C1["**Sub Category 2.1**"] C --> C2["**Sub Category 2.2**"] C1 --> C11["**Detail 2.1.1**"] C2 --> C21["**Detail 2.2.1**"] C2 --> C22["**Detail 2.2.2**"] D --> D1["**Sub Category 3.1**"] D --> D2["**Sub Category 3.2**"] D1 --> D11["**Detail 3.1.1**"] D2 --> D21["**Detail 3.2.1**"] B11 --> E["**Shared Node**"] C22 --> E D21 --> E click A "../../index.html" "Go to Homepage" style A fill:#f9f,stroke:#0366d6,stroke-width:2px,text-decoration:underline style B fill:#bbf,stroke:#333 style C fill:#bbf,stroke:#333 style D fill:#bbf,stroke:#333 style B1 fill:#ffe,stroke:#333 style B2 fill:#ffe,stroke:#333 style C1 fill:#ffe,stroke:#333 style C2 fill:#ffe,stroke:#333 style D1 fill:#ffe,stroke:#333 style D2 fill:#ffe,stroke:#333 style B11 fill:#dfd,stroke:#333 style B12 fill:#dfd,stroke:#333 style B21 fill:#dfd,stroke:#333 style C11 fill:#dfd,stroke:#333 style C21 fill:#dfd,stroke:#333 style C22 fill:#dfd,stroke:#333 style D11 fill:#dfd,stroke:#333 style D21 fill:#dfd,stroke:#333 style E fill:#f9f,stroke:#333,stroke-width:2px

p-value explanation: https://chatgpt.com/share/67b23858-e9dc-8002-92bd-4f3edc7a2bfb