Quantitative Methods

Study Guide

Rates and Returns

Holding Period Return (HPR) is the total return on an asset over a specific period. $HPR = \frac{P_1 - P_0 + D_1}{P_0}$ Where $P_1$ is the ending price, $P_0$ is the beginning price, and $D_1$ is the cash flow received.

Money-Weighted Rate of Return (MWRR) is the internal rate of return (IRR) on a portfolio, accounting for the timing and amount of all cash flows. It is the rate that sets the present value of inflows equal to the present value of outflows. Best used when the manager controls the timing of cash flows.

Time-Weighted Rate of Return (TWRR) measures the compound growth rate of a portfolio. It removes the effects of cash flow timing and is the standard for investment performance reporting (GIPS). It is calculated by finding the HPR for each sub-period and linking them geometrically. $TWRR = [(1+HPR_1) \times (1+HPR_2) \times ... \times (1+HPR_n)]^{1/n} - 1$

Annualizing Returns:

Effective Annual Rate (EAR): Accounts for compounding within a year. $EAR = (1 + \frac{APR}{m})^m - 1$
To convert a holding period return to an EAR: $EAR = (1 + HPR)^{365/t} - 1$ , where t is the number of days in the holding period.

Time Value of Money in Finance

The core principle is that money available now is worth more than the same amount in the future due to its potential earning capacity.

Key Components:

PV: Present Value
FV: Future Value
I/Y: Interest Rate per period
N: Number of compounding periods
PMT: Annuity payment

Future Value (FV) of a Single Sum: $FV = PV(1+r)^N$

Present Value (PV) of a Single Sum: $PV = \frac{FV}{(1+r)^N}$

Annuities: A series of equal cash flows at regular intervals.

Ordinary Annuity: Cash flows occur at the end of each period.
Annuity Due: Cash flows occur at the beginning of each period. The PV and FV of an annuity due are greater than an ordinary annuity.

Perpetuity: An annuity that continues forever. $PV_{perpetuity} = \frac{PMT}{r}$

Statistical Measures of Asset Returns

Measures of Central Tendency:

Arithmetic Mean: Simple average. Best for forecasting a single period's return.
Geometric Mean: Measures compound growth rate over time. Always less than or equal to the arithmetic mean. $G = [\prod_{i=1}^{n}(1+R_i)]^{1/n} - 1$
Harmonic Mean: Used for averaging ratios (e.g., dollar-cost averaging).

Measures of Dispersion:

Variance ( $\sigma^2$ ): Average of the squared deviations from the mean.
- Population Variance: $\sigma^2 = \frac{\sum(X_i - \mu)^2}{N}$
- Sample Variance: $s^2 = \frac{\sum(X_i - \bar{x})^2}{n-1}$
Standard Deviation ( $\sigma$ ): Square root of the variance. Measures the volatility of returns.
Coefficient of Variation (CV): A measure of relative dispersion. Useful for comparing risk of assets with different expected returns. $CV = \frac{s}{\bar{x}}$ (Standard Deviation / Mean)

Measures of Shape:

Skewness: Measures the asymmetry of a distribution.
- Positive Skew: Long tail to the right (Mean > Median > Mode).
- Negative Skew: Long tail to the left (Mean < Median < Mode).
Kurtosis: Measures the degree to which a distribution is more or less "peaked" than a normal distribution.
- Leptokurtic: More peaked, fatter tails (Kurtosis > 3).
- Platykurtic: Less peaked, thinner tails (Kurtosis < 3).
- Mesokurtic: Normal distribution (Kurtosis = 3).

Probability Trees and Conditional Expectations

Key Concepts:

Conditional Probability $P(A|B)$ : The probability of event A occurring, given that event B has occurred. $P(A|B) = \frac{P(AB)}{P(B)}$
Total Probability Rule: Used to find the unconditional probability of an event, given conditional probabilities. $P(A) = P(A|S_1)P(S_1) + P(A|S_2)P(S_2) + ... + P(A|S_n)P(S_n)$
Expected Value $E(X)$ : The weighted average of all possible outcomes. $E(X) = \sum_{i=1}^{n} P(x_i)x_i$
Covariance: Measures how two variables move together. $Cov(R_A, R_B) = E[(R_A - E[R_A])(R_B - E[R_B])]$
Correlation: A standardized measure of covariance, ranging from -1 to +1. $Corr(R_A, R_B) = \rho_{AB} = \frac{Cov(R_A, R_B)}{\sigma_A \sigma_B}$
Bayes' Theorem: Updates a prior probability with new information to get a posterior probability. $P(E|I) = \frac{P(I|E)}{P(I)} \times P(E)$

A probability tree is a visual tool to represent outcomes and their associated probabilities in a sequence of events.

Portfolio Mathematics

The risk and return of a portfolio depend on the risk/return of its individual assets and the correlation between them.

Asset	Weight (w)	Expected Return E(R)	Standard Deviation ( $\sigma$ )
Stock A	$w_A$	$E(R_A)$	$\sigma_A$
Stock B	$w_B$	$E(R_B)$	$\sigma_B$

Portfolio Expected Return: $E(R_p) = w_A E(R_A) + w_B E(R_B)$

Portfolio Variance: $\sigma_p^2 = w_A^2 \sigma_A^2 + w_B^2 \sigma_B^2 + 2w_A w_B \sigma_A \sigma_B \rho_{AB}$ where $\rho_{AB}$ is the correlation coefficient between A and B.

Key Insight: Diversification benefits increase as the correlation between assets decreases. Perfect negative correlation ( $\rho = -1$ ) can potentially eliminate all risk.

Simulation Methods

Monte Carlo Simulation is a computer-based method that uses random sampling to model the probability of different outcomes in a process that cannot easily be predicted due to the intervention of random variables.

Steps:

Specify the model: Define the quantity of interest and the variables that influence it.
Define probability distributions: Specify distributions for the random variables.
Generate random inputs: Draw random values from the specified distributions.
Run the simulation: Calculate the quantity of interest thousands of times.
Analyze the results: The distribution of outcomes provides an estimate of the expected value and risk of the quantity.

Applications: Valuing complex derivatives, modeling portfolio risk (VaR), financial planning. Limitations: Highly dependent on the specified model and input assumptions ("garbage in, garbage out").

Estimation and Inference

Estimation involves using sample data to estimate population parameters (e.g., using sample mean $\bar{x}$ to estimate population mean $\mu$ ).

Central Limit Theorem (CLT): For any population with mean $\mu$ and variance $\sigma^2$ , the sampling distribution of the sample mean $\bar{x}$ for a large sample size (n ≥ 30) will be approximately normal with mean $\mu$ and variance $\frac{\sigma^2}{n}$ . This is crucial because it allows us to make inferences about the population mean without knowing the population's distribution.

Confidence Intervals: An interval estimate provides a range of values within which the true population parameter is expected to lie, with a specified degree of confidence. Formula: Point Estimate ± (Reliability Factor × Standard Error)

Point Estimate: The sample statistic (e.g., $\bar{x}$ ).
Reliability Factor: A number based on the assumed distribution and the confidence level (e.g., 1.96 for 95% confidence using a z-statistic).
Standard Error: The standard deviation of the sample statistic (e.g., $\frac{s}{\sqrt{n}}$ ).

Hypothesis Testing

A formal procedure for accepting or rejecting a statement (hypothesis) about a population parameter based on sample data.

The Process:

State the hypotheses:
- Null Hypothesis ( $H_0$ ): The hypothesis to be tested. Usually a statement of "no effect" or "no difference." It contains an equality sign (=, ≤, or ≥).
- Alternative Hypothesis ( $H_a$ ): The hypothesis accepted if the null is rejected. It contains an inequality sign (≠, <, or >).
Select the test statistic: e.g., z-test, t-test.
Specify the level of significance ( $\alpha$ ): The probability of a Type I error (e.g., 5%).
State the decision rule: Compare the test statistic to a critical value or compare the p-value to $\alpha$ .
Calculate the test statistic from the sample data.
Make a decision: Reject or fail to reject the null hypothesis.

Decision	$H_0$ is True	$H_0$ is False
Do Not Reject $H_0$	Correct Decision	Type II Error ( $\beta$ )
Reject $H_0$	Type I Error ( $\alpha$ )	Correct Decision (Power = 1- $\beta$ )

p-value: The smallest level of significance ( $\alpha$ ) at which $H_0$ can be rejected.

If p-value < $\alpha$ , reject $H_0$ .
If p-value ≥ $\alpha$ , fail to reject $H_0$ .

Parametric and Non-Parametric Tests of Independence

Tests used to determine if a relationship exists between two variables.

Feature	Parametric Tests	Non-Parametric Tests
Assumptions	Require strong assumptions about the population distribution (e.g., normality).	Make few or no assumptions about the population distribution.
Data Type	Interval or ratio data.	Nominal or ordinal data (ranks). Can also be used for interval/ratio if parametric assumptions are violated.
Power	More powerful if assumptions are met.	Less powerful than parametric tests if assumptions are met.
Example	t-test for the significance of a correlation coefficient.	Spearman rank correlation test, Chi-square test of independence.

Use a parametric test when the data is known to be normally distributed and other assumptions are met.
Use a non-parametric test for ranked data or when you cannot verify the assumptions of a parametric test.

Simple Linear Regression

Models the linear relationship between one dependent variable (Y) and one independent variable (X).

Model Equation: $Y_i = b_0 + b_1X_i + \epsilon_i$

$Y_i$ : Dependent variable
$X_i$ : Independent variable
$b_0$ : Intercept (estimated value of Y when X=0)
$b_1$ : Slope coefficient (estimated change in Y for a one-unit change in X)
$\epsilon_i$ : Error term (the portion of Y not explained by X)

Key Regression Outputs:

Coefficient of Determination ( $R^2$ ): The percentage of variation in the dependent variable (Y) that is explained by the independent variable (X). Ranges from 0 to 1.
Standard Error of the Estimate (SEE): Measures the typical distance between the observed values and the values predicted by the regression line. A smaller SEE indicates a better model fit.
F-test: Tests the overall significance of the regression model. A significant F-test indicates that at least one independent variable is significant.
t-test: Tests the significance of individual coefficients ( $b_0$ and $b_1$ ). A significant t-test for $b_1$ indicates a statistically significant relationship between X and Y.

Introduction to Big Data Techniques

Big Data is characterized by its high Volume, Velocity, and Variety (the 3 V's). It requires advanced tools to capture, store, manage, and analyze.

Key Techniques:

Machine Learning (ML): Algorithms that learn from data to make predictions or decisions.
- Supervised Learning: Uses labeled data (e.g., regression, classification). Application: Predicting credit default.
- Unsupervised Learning: Uses unlabeled data to find patterns (e.g., clustering). Application: Customer segmentation.
- Deep Learning: A subset of ML using neural networks with many layers.
Natural Language Processing (NLP): Enables computers to understand and process human language. Application: Analyzing sentiment from news articles or social media posts (text mining).
Data Visualization: The graphical representation of information and data. Application: Using heat maps or tree maps to visualize portfolio exposures.

Challenges in Finance:

Overfitting: Creating a model that is too complex and fits the training data's noise, leading to poor performance on new data.
Data Quality: Unstructured and complex data can be noisy and require significant cleaning.
Spurious Correlation: Finding relationships that are due to chance rather than a true underlying cause.