Skip to content

jeff3388/awesome-etf-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Awesome ETF Analysis Awesome

A practical guide to ETF selection and analysis — expense ratio impact, tracking error measurement, tax efficiency, liquidity assessment, and Python tools for systematic comparison.

License: CC0-1.0 Last Updated PRs Welcome

Why this list exists: As of 2026 there are 10,000+ ETFs globally. Most investors use 3–5 of them, yet the process of selecting those 3–5 is poorly documented in open-source tools. Morningstar and ETFdb are excellent but paywalled or ad-heavy. This list provides Python scripts and free data sources to systematically compare ETFs on the metrics that actually matter.


Table of Contents


ETF Selection Framework

The decision sequence for choosing between similar ETFs:

Step 1: Define what you need
  → What index/exposure do you want?
  → Accumulating (no dividend) or distributing?
  → Account type (taxable vs. IRA)?

Step 2: Filter for adequate liquidity
  → AUM > $500M (minimum) or > $1B (preferred)
  → Average daily volume > 100,000 shares
  → Bid-ask spread < 0.10% for large-caps, < 0.20% for others

Step 3: Compare expense ratios among survivors
  → Difference of 0.05% compounds to real money over decades

Step 4: Check tracking difference (not just tracking error)
  → Tracking difference = actual annual return vs. index annual return
  → A negative tracking difference (ETF outperforms its index) is possible due to securities lending

Step 5: Assess tax efficiency in taxable accounts
  → Capital gains distribution history (past 5 years)
  → Dividend tax treatment (qualified vs. ordinary)
  → ETF structure (open-end vs. UIT vs. grantor trust)

Step 6: Verify index methodology
  → Market-cap weighted vs. equal-weighted vs. fundamental-weighted
  → Rebalancing frequency and reconstitution rules

Critical ETF Metrics Explained

Expense Ratio (ER)

The annual fee deducted from assets. The most important factor for long-term, passive ETFs.

Net Return = Index Return - Expense Ratio - Other Costs

Note: The expense ratio is a ceiling on cost drag. Securities lending income can partially or fully offset it, making the effective cost (tracking difference) lower than the stated ER.

Tracking Difference vs. Tracking Error

These are related but different concepts that are frequently confused:

Tracking Difference = ETF Annual Return − Index Annual Return
  (Negative = ETF outperformed the index net of costs; positive = underperformed)
  → This is what you actually care about

Tracking Error = Standard Deviation of daily return differences
  → Measures consistency of tracking; high TE means erratic performance vs. index
  → A low TE with a consistently negative tracking difference is the ideal

Securities Lending Income

Vanguard, Fidelity, and iShares ETFs lend shares to short sellers and return the income to ETF shareholders. This can partially or fully offset the expense ratio:

Effective Cost = Expense Ratio − Securities Lending Income

Example: Vanguard VTI
  Stated ER:                  0.03%
  Securities Lending Income: -0.02%
  Effective Cost:             ~0.01%

Premium/Discount to NAV

ETFs can trade above (premium) or below (discount) their net asset value. For large, liquid ETFs on major US stocks, this difference is typically < 0.05%. For illiquid or international ETFs, it can be significant.

import yfinance as yf

def check_premium_discount(etf_ticker: str) -> dict:
    """
    Check ETF's approximate premium/discount to NAV.
    Note: Intraday NAV (iNAV) is more accurate but harder to access freely.
    """
    etf = yf.Ticker(etf_ticker)
    info = etf.info

    market_price = info.get('regularMarketPrice', None)
    nav = info.get('navPrice', None)

    if market_price and nav:
        premium_pct = (market_price - nav) / nav * 100
        return {
            'Ticker': etf_ticker,
            'Market Price': f"${market_price:.2f}",
            'NAV': f"${nav:.2f}",
            'Premium/Discount': f"{premium_pct:+.3f}%"
        }
    return {'Ticker': etf_ticker, 'Note': 'NAV not available via yfinance'}

The Long-Term Cost of Expense Ratios

Small differences in expense ratios compound to significant amounts over time:

import numpy as np
import pandas as pd

def expense_ratio_drag(
    investment: float,
    gross_return: float,    # Annual return before fees (e.g., 0.10 for 10%)
    years: int,
    expense_ratios: list    # e.g., [0.0003, 0.0007, 0.0020, 0.0075]
) -> pd.DataFrame:
    """
    Show the compounded cost of different expense ratios over time.
    """
    rows = []
    for er in expense_ratios:
        net_return = gross_return - er
        final_value = investment * (1 + net_return) ** years
        gross_value = investment * (1 + gross_return) ** years
        cost = gross_value - final_value

        rows.append({
            'Expense Ratio': f"{er:.2%}",
            'Net Annual Return': f"{net_return:.2%}",
            f'Value after {years}yr': f"${final_value:,.0f}",
            'Cost vs. No-Fee': f"${cost:,.0f}",
            'Cost %': f"{cost / gross_value:.1%}",
        })

    df = pd.DataFrame(rows)
    print(f"\nExpense Ratio Impact: ${investment:,.0f} invested, {gross_return:.0%} gross return, {years} years\n")
    print(df.to_string(index=False))
    return df


expense_ratio_drag(
    investment=100_000,
    gross_return=0.10,
    years=30,
    expense_ratios=[0.0003, 0.0006, 0.0020, 0.0050, 0.0075, 0.0100]
)

# Output (approximate):
# ER 0.03% → $1,742k (cost: $5k)
# ER 0.06% → $1,724k (cost: $23k)
# ER 0.20% → $1,607k (cost: $140k)
# ER 0.50% → $1,433k (cost: $314k)
# ER 0.75% → $1,308k (cost: $439k)
# ER 1.00% → $1,193k (cost: $554k)

Key takeaways:

  • Going from 1.00% to 0.03% ER saves ~$550,000 over 30 years on a $100,000 investment (at 10% gross return)
  • Even the difference between 0.03% (VTI) and 0.20% (some actively managed index funds) is $135,000 over 30 years

Tracking Error & Tracking Difference

Calculating Tracking Difference

import yfinance as yf
import pandas as pd

def calculate_tracking_difference(etf: str, benchmark: str,
                                   start: str = '2020-01-01') -> dict:
    """
    Estimate tracking difference between an ETF and its benchmark proxy.

    Args:
        etf: ETF ticker (e.g., 'VTI')
        benchmark: Benchmark proxy ticker (e.g., '^VTI' or related index ETF)

    Note: True tracking difference requires the actual index return, which
    is only available from fund providers. This uses a comparable ETF as proxy.
    """
    data = yf.download([etf, benchmark], start=start, auto_adjust=True)['Close']
    returns = data.pct_change().dropna()

    etf_annual = (1 + returns[etf]).prod() ** (252 / len(returns)) - 1
    bmark_annual = (1 + returns[benchmark]).prod() ** (252 / len(returns)) - 1

    tracking_diff = etf_annual - bmark_annual
    tracking_error = (returns[etf] - returns[benchmark]).std() * (252 ** 0.5)

    return {
        'ETF': etf,
        'Benchmark Proxy': benchmark,
        'ETF Annual Return': f"{etf_annual:.3%}",
        'Benchmark Annual Return': f"{bmark_annual:.3%}",
        'Tracking Difference': f"{tracking_diff:+.3%}",
        'Tracking Error (Ann.)': f"{tracking_error:.3%}",
        'Period': f"{start} to {returns.index[-1].date()}",
    }

# Compare two S&P 500 ETFs (VOO vs. IVV as proxies for each other)
print(calculate_tracking_difference('VOO', 'IVV', start='2015-01-01'))

# Compare total market ETFs
print(calculate_tracking_difference('VTI', 'ITOT', start='2015-01-01'))

Annual Tracking Difference: Real-World Examples (2026 estimates)

ETF Index Stated ER Approx. Tracking Difference Explanation
VTI CRSP US Total Market 0.03% ~-0.01% (ETF outperforms) Securities lending offsets cost
VOO S&P 500 0.03% ~0.00% Near-perfect tracking
SCHB Dow Jones US Broad 0.03% ~+0.02% Slight underperformance
IVV S&P 500 0.03% ~-0.01% iShares lending program
SPY S&P 500 (SPDR) 0.0945% ~+0.03% Higher ER, UIT structure
QQQ NASDAQ-100 0.20% ~+0.10% High ER, less lending offset

Note: Tracking difference changes year to year. Verify with fund provider's annual reports.


Tax Efficiency Analysis

ETF Structures and Tax Treatment

Structure Examples Capital Gains Dist. Risk Dividend Treatment
Open-End Fund (ETF) VTI, VOO, SCHB Very low (in-kind creation/redemption) Qualified if held 61+ days
Unit Investment Trust (UIT) SPY, QQQ, DIA Low-medium (must hold all index stocks) Ordinary (dividends held in cash, not reinvested)
Grantor Trust GLD, SLV None (pass-through) N/A (physically held asset)
Exchange-Traded Note (ETN) Some commodity, VIX products No dividends (price appreciation) Different tax treatment — verify

Why open-end ETFs are most tax-efficient: The in-kind creation/redemption mechanism allows large investors to swap ETF shares for the underlying basket of stocks, eliminating the need to sell securities and realize gains. This is why equity ETFs almost never distribute capital gains.

Screening for Capital Gains Distributions

import yfinance as yf

def check_capital_gains_history(tickers: list) -> pd.DataFrame:
    """
    Check capital gains distribution history.
    Note: yfinance combines dividends and capital gains in .dividends
    For accurate CG data, check fund provider websites or Morningstar.
    """
    # Practical approach: check dividend yield relative to category
    results = []
    for ticker in tickers:
        info = yf.Ticker(ticker).info
        results.append({
            'ETF': ticker,
            'Fund Type': info.get('quoteType', 'N/A'),
            'Structure': info.get('fundInceptionDate', 'N/A'),
            'Dividend Yield': f"{info.get('dividendYield', 0) * 100:.2f}%",
            'Tax Note': 'Verify capital gains at fund provider website'
        })
    return pd.DataFrame(results)

# For actual capital gains history, use:
# - Vanguard: personal.vanguard.com → Tax center
# - iShares: ishares.com → Tax information
# - Schwab: schwabfunds.com → Tax information

Tax-Efficient ETF Ranking by Category

Most tax-efficient (for taxable accounts):

  1. Broad US market ETFs (VTI, SCHB, ITOT) — near-zero capital gains distributions
  2. Developed international (IEFA, VEA) — very low capital gains
  3. S&P 500 ETFs (VOO, IVV, SPLG) — essentially zero capital gains

Less tax-efficient (better in tax-advantaged accounts):

  1. Bond ETFs (BND, AGG) — interest income taxed as ordinary income
  2. REIT ETFs (VNQ, SCHH) — dividends mostly ordinary income
  3. High-dividend ETFs (SCHD, HDV) — higher dividend yield = more taxable events
  4. Actively managed ETFs — higher turnover = more potential gain distributions
  5. Commodity ETFs via limited partnership (most use K-1 filing)

Liquidity & Bid-Ask Spread Assessment

For large-cap broad market ETFs (VTI, SPY, QQQ), liquidity is not a concern. It matters most for:

  • Sector ETFs
  • Emerging market ETFs
  • Small/mid-cap ETFs
  • Thematic/niche ETFs
import yfinance as yf

def liquidity_assessment(tickers: list) -> pd.DataFrame:
    """Assess ETF liquidity metrics."""
    results = []
    for ticker in tickers:
        info = yf.Ticker(ticker).info
        aum = info.get('totalAssets', 0)
        avg_volume = info.get('averageVolume', 0)
        price = info.get('regularMarketPrice', 1)

        # Estimate daily dollar volume
        daily_dollar_vol = avg_volume * price

        # Rough bid-ask estimate (yfinance doesn't provide real-time spread)
        # For actual spread: check Bloomberg, Morningstar, or broker platform
        liquidity_tier = (
            'Excellent' if aum > 5e9 else
            'Good' if aum > 1e9 else
            'Adequate' if aum > 500e6 else
            'Caution' if aum > 100e6 else
            'High Risk of Closure'
        )

        results.append({
            'ETF': ticker,
            'AUM': f"${aum/1e9:.1f}B" if aum > 1e9 else f"${aum/1e6:.0f}M",
            'Avg Daily Volume': f"{avg_volume:,.0f}",
            'Daily $ Volume': f"${daily_dollar_vol/1e6:.1f}M",
            'Liquidity': liquidity_tier,
        })

    return pd.DataFrame(results)

# Compare similar ETFs
print(liquidity_assessment(['VTI', 'SCHB', 'ITOT', 'SPTM']))

ETF closure risk: ETFs with < $50M AUM are at risk of being liquidated by their issuer (not a permanent loss, but forces a taxable event). Prefer ETFs with > $500M AUM.


Python Tools & Scripts

Full ETF Comparison Tool

import yfinance as yf
import pandas as pd

def compare_etfs(tickers: list, start: str = '2015-01-01') -> pd.DataFrame:
    """
    Comprehensive ETF comparison: returns, cost, and basic structure.
    """
    price_data = yf.download(tickers, start=start, auto_adjust=True)['Close']
    returns = price_data.pct_change().dropna()

    rows = []
    for ticker in tickers:
        info = yf.Ticker(ticker).info
        r = returns[ticker].dropna()

        annual_ret = (1 + r).prod() ** (252 / len(r)) - 1
        annual_vol = r.std() * (252 ** 0.5)
        sharpe = (annual_ret - 0.05) / annual_vol  # Assume 5% risk-free (adjust to current)
        dd = ((1 + r).cumprod() / (1 + r).cumprod().cummax() - 1).min()
        er = info.get('annualReportExpenseRatio') or info.get('netExpenseRatio') or 0

        rows.append({
            'ETF': ticker,
            'Category': info.get('category', 'N/A'),
            'AUM': f"${info.get('totalAssets', 0)/1e9:.1f}B",
            'Expense Ratio': f"{(er or 0):.2%}",
            f'Ann. Return ({start[:4]}-)': f"{annual_ret:.2%}",
            'Annual Volatility': f"{annual_vol:.2%}",
            'Sharpe (5% RF)': f"{sharpe:.2f}",
            'Max Drawdown': f"{dd:.2%}",
        })

    df = pd.DataFrame(rows)
    return df


# US Total Market comparison
print(compare_etfs(['VTI', 'SCHB', 'ITOT', 'FSKAX'], start='2015-01-01'))

# Core bond comparison
print(compare_etfs(['BND', 'AGG', 'SCHZ', 'FXNAX'], start='2015-01-01'))

ETF Reference: Key Categories

U.S. Equity — Core Holdings

Category Cheapest Options AUM Tier Index
Total US Market VTI (0.03%), SCHB (0.03%), ITOT (0.03%) $300B+ CRSP / DJ / S&P TMI
S&P 500 VOO (0.03%), IVV (0.03%), SPLG (0.02%) $1T+ S&P 500
S&P 500 (legacy) SPY (0.0945%) $550B+ S&P 500 (UIT)
US Small Cap VB (0.05%), SCHA (0.04%) $50B+ CRSP / DJ
US Mid Cap VO (0.04%), SCHM (0.04%) $50B+ CRSP / DJ
NASDAQ-100 QQQ (0.20%), QQQM (0.15%) $200B+ NASDAQ-100

International Equity

Category Options ER Index
Total International VXUS (0.07%), IXUS (0.07%) Low FTSE All-World ex-US / MSCI ACWI ex-US
Developed Markets VEA (0.05%), IEFA (0.07%), SCHF (0.06%) Low FTSE / MSCI EAFE
Emerging Markets VWO (0.08%), IEMG (0.09%) Low FTSE / MSCI EM

Fixed Income — Core

Category Options ER
US Total Bond BND (0.03%), AGG (0.03%), SCHZ (0.03%) Very low
Short-Term VGSH, SHY, SCHO Very low
Intermediate-Term VGIT, IEI, SCHR Very low
Long-Term VGLT, TLH Low
TIPS (Inflation) VTIP (0.04%), SCHP (0.03%) Very low

Specialty / Factor

Category Options ER
Dividend Growth VIG (0.06%), DGRO (0.08%), SCHD (0.06%) Low
Value Factor VTV (0.04%), SCHV (0.04%), IVE (0.18%) Low-medium
Small-Cap Value VBR (0.07%), IJS (0.18%) Low-medium
Momentum MTUM (0.15%), QMOM (0.35%) Medium
Quality QUAL (0.15%), JQUA (0.12%) Low-medium

Free Data Sources for ETF Research

Source What You Get Access
ETF provider websites Exact expense ratios, tracking difference, capital gains history Free, no API
yfinance Basic ETF info, price data, dividends pip install yfinance
ETFdb.com Comparison tools, screener Free basic, paid premium
etf.com Fund flows, analytics Free basic
Morningstar Fund ratings, portfolio analytics Free basic, paid for full data
SEC EDGAR (N-CEN, N-PORT) Fund holdings, expense disclosures Free API
FRED ETF price series for some major funds VFIAX, SP500 etc.

Common ETF Misconceptions

"Higher AUM = Better ETF" — Not always. $500M+ is sufficient for most purposes; beyond that, AUM doesn't affect tracking quality.

"The ETF with the lowest ER is always best" — Tracking difference matters more than stated ER. An ETF with 0.06% ER but -0.02% tracking difference costs less than one with 0.03% ER and +0.03% tracking difference.

"All S&P 500 ETFs are identical" — Nearly identical returns, but structural differences matter: UIT (SPY) can't reinvest dividends intraday, which slightly reduces returns vs. open-end ETFs (VOO, IVV) in rising markets.

"ETFs are always tax-efficient" — Open-end equity ETFs are. Bond ETFs generate ordinary interest income. Commodity ETFs, actively managed ETFs, and some leveraged ETFs can distribute taxable events.

"Inverse/leveraged ETFs are long-term holds" — Volatility decay makes leveraged ETFs unsuitable for long-term holding. They are designed for short-term tactical use. A 2× ETF does not deliver 2× the long-term return.


Contributing

See CONTRIBUTING.md. Most welcome:

  • Tracking difference data with dates and sources (fund provider annual reports)
  • Non-US ETF equivalents (European UCITS ETFs, Australian ETFs)
  • Updated expense ratio data when providers cut fees

License

CC0 1.0 Universal — Public domain.

About

A practical guide to ETF analysis — expense ratio impact, tracking difference vs. tracking error, tax efficiency, liquidity assessment, and Python comparison tools.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors