A practical guide to ETF selection and analysis — expense ratio impact, tracking error measurement, tax efficiency, liquidity assessment, and Python tools for systematic comparison.
Why this list exists: As of 2026 there are 10,000+ ETFs globally. Most investors use 3–5 of them, yet the process of selecting those 3–5 is poorly documented in open-source tools. Morningstar and ETFdb are excellent but paywalled or ad-heavy. This list provides Python scripts and free data sources to systematically compare ETFs on the metrics that actually matter.
- ETF Selection Framework
- Critical ETF Metrics Explained
- The Long-Term Cost of Expense Ratios
- Tracking Error & Tracking Difference
- Tax Efficiency Analysis
- Liquidity & Bid-Ask Spread Assessment
- Python Tools & Scripts
- ETF Reference: Key Categories
- Free Data Sources for ETF Research
- Common ETF Misconceptions
The decision sequence for choosing between similar ETFs:
Step 1: Define what you need
→ What index/exposure do you want?
→ Accumulating (no dividend) or distributing?
→ Account type (taxable vs. IRA)?
Step 2: Filter for adequate liquidity
→ AUM > $500M (minimum) or > $1B (preferred)
→ Average daily volume > 100,000 shares
→ Bid-ask spread < 0.10% for large-caps, < 0.20% for others
Step 3: Compare expense ratios among survivors
→ Difference of 0.05% compounds to real money over decades
Step 4: Check tracking difference (not just tracking error)
→ Tracking difference = actual annual return vs. index annual return
→ A negative tracking difference (ETF outperforms its index) is possible due to securities lending
Step 5: Assess tax efficiency in taxable accounts
→ Capital gains distribution history (past 5 years)
→ Dividend tax treatment (qualified vs. ordinary)
→ ETF structure (open-end vs. UIT vs. grantor trust)
Step 6: Verify index methodology
→ Market-cap weighted vs. equal-weighted vs. fundamental-weighted
→ Rebalancing frequency and reconstitution rules
The annual fee deducted from assets. The most important factor for long-term, passive ETFs.
Net Return = Index Return - Expense Ratio - Other Costs
Note: The expense ratio is a ceiling on cost drag. Securities lending income can partially or fully offset it, making the effective cost (tracking difference) lower than the stated ER.
These are related but different concepts that are frequently confused:
Tracking Difference = ETF Annual Return − Index Annual Return
(Negative = ETF outperformed the index net of costs; positive = underperformed)
→ This is what you actually care about
Tracking Error = Standard Deviation of daily return differences
→ Measures consistency of tracking; high TE means erratic performance vs. index
→ A low TE with a consistently negative tracking difference is the ideal
Vanguard, Fidelity, and iShares ETFs lend shares to short sellers and return the income to ETF shareholders. This can partially or fully offset the expense ratio:
Effective Cost = Expense Ratio − Securities Lending Income
Example: Vanguard VTI
Stated ER: 0.03%
Securities Lending Income: -0.02%
Effective Cost: ~0.01%
ETFs can trade above (premium) or below (discount) their net asset value. For large, liquid ETFs on major US stocks, this difference is typically < 0.05%. For illiquid or international ETFs, it can be significant.
import yfinance as yf
def check_premium_discount(etf_ticker: str) -> dict:
"""
Check ETF's approximate premium/discount to NAV.
Note: Intraday NAV (iNAV) is more accurate but harder to access freely.
"""
etf = yf.Ticker(etf_ticker)
info = etf.info
market_price = info.get('regularMarketPrice', None)
nav = info.get('navPrice', None)
if market_price and nav:
premium_pct = (market_price - nav) / nav * 100
return {
'Ticker': etf_ticker,
'Market Price': f"${market_price:.2f}",
'NAV': f"${nav:.2f}",
'Premium/Discount': f"{premium_pct:+.3f}%"
}
return {'Ticker': etf_ticker, 'Note': 'NAV not available via yfinance'}Small differences in expense ratios compound to significant amounts over time:
import numpy as np
import pandas as pd
def expense_ratio_drag(
investment: float,
gross_return: float, # Annual return before fees (e.g., 0.10 for 10%)
years: int,
expense_ratios: list # e.g., [0.0003, 0.0007, 0.0020, 0.0075]
) -> pd.DataFrame:
"""
Show the compounded cost of different expense ratios over time.
"""
rows = []
for er in expense_ratios:
net_return = gross_return - er
final_value = investment * (1 + net_return) ** years
gross_value = investment * (1 + gross_return) ** years
cost = gross_value - final_value
rows.append({
'Expense Ratio': f"{er:.2%}",
'Net Annual Return': f"{net_return:.2%}",
f'Value after {years}yr': f"${final_value:,.0f}",
'Cost vs. No-Fee': f"${cost:,.0f}",
'Cost %': f"{cost / gross_value:.1%}",
})
df = pd.DataFrame(rows)
print(f"\nExpense Ratio Impact: ${investment:,.0f} invested, {gross_return:.0%} gross return, {years} years\n")
print(df.to_string(index=False))
return df
expense_ratio_drag(
investment=100_000,
gross_return=0.10,
years=30,
expense_ratios=[0.0003, 0.0006, 0.0020, 0.0050, 0.0075, 0.0100]
)
# Output (approximate):
# ER 0.03% → $1,742k (cost: $5k)
# ER 0.06% → $1,724k (cost: $23k)
# ER 0.20% → $1,607k (cost: $140k)
# ER 0.50% → $1,433k (cost: $314k)
# ER 0.75% → $1,308k (cost: $439k)
# ER 1.00% → $1,193k (cost: $554k)Key takeaways:
- Going from 1.00% to 0.03% ER saves ~$550,000 over 30 years on a $100,000 investment (at 10% gross return)
- Even the difference between 0.03% (VTI) and 0.20% (some actively managed index funds) is $135,000 over 30 years
import yfinance as yf
import pandas as pd
def calculate_tracking_difference(etf: str, benchmark: str,
start: str = '2020-01-01') -> dict:
"""
Estimate tracking difference between an ETF and its benchmark proxy.
Args:
etf: ETF ticker (e.g., 'VTI')
benchmark: Benchmark proxy ticker (e.g., '^VTI' or related index ETF)
Note: True tracking difference requires the actual index return, which
is only available from fund providers. This uses a comparable ETF as proxy.
"""
data = yf.download([etf, benchmark], start=start, auto_adjust=True)['Close']
returns = data.pct_change().dropna()
etf_annual = (1 + returns[etf]).prod() ** (252 / len(returns)) - 1
bmark_annual = (1 + returns[benchmark]).prod() ** (252 / len(returns)) - 1
tracking_diff = etf_annual - bmark_annual
tracking_error = (returns[etf] - returns[benchmark]).std() * (252 ** 0.5)
return {
'ETF': etf,
'Benchmark Proxy': benchmark,
'ETF Annual Return': f"{etf_annual:.3%}",
'Benchmark Annual Return': f"{bmark_annual:.3%}",
'Tracking Difference': f"{tracking_diff:+.3%}",
'Tracking Error (Ann.)': f"{tracking_error:.3%}",
'Period': f"{start} to {returns.index[-1].date()}",
}
# Compare two S&P 500 ETFs (VOO vs. IVV as proxies for each other)
print(calculate_tracking_difference('VOO', 'IVV', start='2015-01-01'))
# Compare total market ETFs
print(calculate_tracking_difference('VTI', 'ITOT', start='2015-01-01'))| ETF | Index | Stated ER | Approx. Tracking Difference | Explanation |
|---|---|---|---|---|
| VTI | CRSP US Total Market | 0.03% | ~-0.01% (ETF outperforms) | Securities lending offsets cost |
| VOO | S&P 500 | 0.03% | ~0.00% | Near-perfect tracking |
| SCHB | Dow Jones US Broad | 0.03% | ~+0.02% | Slight underperformance |
| IVV | S&P 500 | 0.03% | ~-0.01% | iShares lending program |
| SPY | S&P 500 (SPDR) | 0.0945% | ~+0.03% | Higher ER, UIT structure |
| QQQ | NASDAQ-100 | 0.20% | ~+0.10% | High ER, less lending offset |
Note: Tracking difference changes year to year. Verify with fund provider's annual reports.
| Structure | Examples | Capital Gains Dist. Risk | Dividend Treatment |
|---|---|---|---|
| Open-End Fund (ETF) | VTI, VOO, SCHB | Very low (in-kind creation/redemption) | Qualified if held 61+ days |
| Unit Investment Trust (UIT) | SPY, QQQ, DIA | Low-medium (must hold all index stocks) | Ordinary (dividends held in cash, not reinvested) |
| Grantor Trust | GLD, SLV | None (pass-through) | N/A (physically held asset) |
| Exchange-Traded Note (ETN) | Some commodity, VIX products | No dividends (price appreciation) | Different tax treatment — verify |
Why open-end ETFs are most tax-efficient: The in-kind creation/redemption mechanism allows large investors to swap ETF shares for the underlying basket of stocks, eliminating the need to sell securities and realize gains. This is why equity ETFs almost never distribute capital gains.
import yfinance as yf
def check_capital_gains_history(tickers: list) -> pd.DataFrame:
"""
Check capital gains distribution history.
Note: yfinance combines dividends and capital gains in .dividends
For accurate CG data, check fund provider websites or Morningstar.
"""
# Practical approach: check dividend yield relative to category
results = []
for ticker in tickers:
info = yf.Ticker(ticker).info
results.append({
'ETF': ticker,
'Fund Type': info.get('quoteType', 'N/A'),
'Structure': info.get('fundInceptionDate', 'N/A'),
'Dividend Yield': f"{info.get('dividendYield', 0) * 100:.2f}%",
'Tax Note': 'Verify capital gains at fund provider website'
})
return pd.DataFrame(results)
# For actual capital gains history, use:
# - Vanguard: personal.vanguard.com → Tax center
# - iShares: ishares.com → Tax information
# - Schwab: schwabfunds.com → Tax informationMost tax-efficient (for taxable accounts):
- Broad US market ETFs (VTI, SCHB, ITOT) — near-zero capital gains distributions
- Developed international (IEFA, VEA) — very low capital gains
- S&P 500 ETFs (VOO, IVV, SPLG) — essentially zero capital gains
Less tax-efficient (better in tax-advantaged accounts):
- Bond ETFs (BND, AGG) — interest income taxed as ordinary income
- REIT ETFs (VNQ, SCHH) — dividends mostly ordinary income
- High-dividend ETFs (SCHD, HDV) — higher dividend yield = more taxable events
- Actively managed ETFs — higher turnover = more potential gain distributions
- Commodity ETFs via limited partnership (most use K-1 filing)
For large-cap broad market ETFs (VTI, SPY, QQQ), liquidity is not a concern. It matters most for:
- Sector ETFs
- Emerging market ETFs
- Small/mid-cap ETFs
- Thematic/niche ETFs
import yfinance as yf
def liquidity_assessment(tickers: list) -> pd.DataFrame:
"""Assess ETF liquidity metrics."""
results = []
for ticker in tickers:
info = yf.Ticker(ticker).info
aum = info.get('totalAssets', 0)
avg_volume = info.get('averageVolume', 0)
price = info.get('regularMarketPrice', 1)
# Estimate daily dollar volume
daily_dollar_vol = avg_volume * price
# Rough bid-ask estimate (yfinance doesn't provide real-time spread)
# For actual spread: check Bloomberg, Morningstar, or broker platform
liquidity_tier = (
'Excellent' if aum > 5e9 else
'Good' if aum > 1e9 else
'Adequate' if aum > 500e6 else
'Caution' if aum > 100e6 else
'High Risk of Closure'
)
results.append({
'ETF': ticker,
'AUM': f"${aum/1e9:.1f}B" if aum > 1e9 else f"${aum/1e6:.0f}M",
'Avg Daily Volume': f"{avg_volume:,.0f}",
'Daily $ Volume': f"${daily_dollar_vol/1e6:.1f}M",
'Liquidity': liquidity_tier,
})
return pd.DataFrame(results)
# Compare similar ETFs
print(liquidity_assessment(['VTI', 'SCHB', 'ITOT', 'SPTM']))ETF closure risk: ETFs with < $50M AUM are at risk of being liquidated by their issuer (not a permanent loss, but forces a taxable event). Prefer ETFs with > $500M AUM.
import yfinance as yf
import pandas as pd
def compare_etfs(tickers: list, start: str = '2015-01-01') -> pd.DataFrame:
"""
Comprehensive ETF comparison: returns, cost, and basic structure.
"""
price_data = yf.download(tickers, start=start, auto_adjust=True)['Close']
returns = price_data.pct_change().dropna()
rows = []
for ticker in tickers:
info = yf.Ticker(ticker).info
r = returns[ticker].dropna()
annual_ret = (1 + r).prod() ** (252 / len(r)) - 1
annual_vol = r.std() * (252 ** 0.5)
sharpe = (annual_ret - 0.05) / annual_vol # Assume 5% risk-free (adjust to current)
dd = ((1 + r).cumprod() / (1 + r).cumprod().cummax() - 1).min()
er = info.get('annualReportExpenseRatio') or info.get('netExpenseRatio') or 0
rows.append({
'ETF': ticker,
'Category': info.get('category', 'N/A'),
'AUM': f"${info.get('totalAssets', 0)/1e9:.1f}B",
'Expense Ratio': f"{(er or 0):.2%}",
f'Ann. Return ({start[:4]}-)': f"{annual_ret:.2%}",
'Annual Volatility': f"{annual_vol:.2%}",
'Sharpe (5% RF)': f"{sharpe:.2f}",
'Max Drawdown': f"{dd:.2%}",
})
df = pd.DataFrame(rows)
return df
# US Total Market comparison
print(compare_etfs(['VTI', 'SCHB', 'ITOT', 'FSKAX'], start='2015-01-01'))
# Core bond comparison
print(compare_etfs(['BND', 'AGG', 'SCHZ', 'FXNAX'], start='2015-01-01'))| Category | Cheapest Options | AUM Tier | Index |
|---|---|---|---|
| Total US Market | VTI (0.03%), SCHB (0.03%), ITOT (0.03%) | $300B+ | CRSP / DJ / S&P TMI |
| S&P 500 | VOO (0.03%), IVV (0.03%), SPLG (0.02%) | $1T+ | S&P 500 |
| S&P 500 (legacy) | SPY (0.0945%) | $550B+ | S&P 500 (UIT) |
| US Small Cap | VB (0.05%), SCHA (0.04%) | $50B+ | CRSP / DJ |
| US Mid Cap | VO (0.04%), SCHM (0.04%) | $50B+ | CRSP / DJ |
| NASDAQ-100 | QQQ (0.20%), QQQM (0.15%) | $200B+ | NASDAQ-100 |
| Category | Options | ER | Index |
|---|---|---|---|
| Total International | VXUS (0.07%), IXUS (0.07%) | Low | FTSE All-World ex-US / MSCI ACWI ex-US |
| Developed Markets | VEA (0.05%), IEFA (0.07%), SCHF (0.06%) | Low | FTSE / MSCI EAFE |
| Emerging Markets | VWO (0.08%), IEMG (0.09%) | Low | FTSE / MSCI EM |
| Category | Options | ER |
|---|---|---|
| US Total Bond | BND (0.03%), AGG (0.03%), SCHZ (0.03%) | Very low |
| Short-Term | VGSH, SHY, SCHO | Very low |
| Intermediate-Term | VGIT, IEI, SCHR | Very low |
| Long-Term | VGLT, TLH | Low |
| TIPS (Inflation) | VTIP (0.04%), SCHP (0.03%) | Very low |
| Category | Options | ER |
|---|---|---|
| Dividend Growth | VIG (0.06%), DGRO (0.08%), SCHD (0.06%) | Low |
| Value Factor | VTV (0.04%), SCHV (0.04%), IVE (0.18%) | Low-medium |
| Small-Cap Value | VBR (0.07%), IJS (0.18%) | Low-medium |
| Momentum | MTUM (0.15%), QMOM (0.35%) | Medium |
| Quality | QUAL (0.15%), JQUA (0.12%) | Low-medium |
| Source | What You Get | Access |
|---|---|---|
| ETF provider websites | Exact expense ratios, tracking difference, capital gains history | Free, no API |
| yfinance | Basic ETF info, price data, dividends | pip install yfinance |
| ETFdb.com | Comparison tools, screener | Free basic, paid premium |
| etf.com | Fund flows, analytics | Free basic |
| Morningstar | Fund ratings, portfolio analytics | Free basic, paid for full data |
| SEC EDGAR (N-CEN, N-PORT) | Fund holdings, expense disclosures | Free API |
| FRED | ETF price series for some major funds | VFIAX, SP500 etc. |
"Higher AUM = Better ETF" — Not always. $500M+ is sufficient for most purposes; beyond that, AUM doesn't affect tracking quality.
"The ETF with the lowest ER is always best" — Tracking difference matters more than stated ER. An ETF with 0.06% ER but -0.02% tracking difference costs less than one with 0.03% ER and +0.03% tracking difference.
"All S&P 500 ETFs are identical" — Nearly identical returns, but structural differences matter: UIT (SPY) can't reinvest dividends intraday, which slightly reduces returns vs. open-end ETFs (VOO, IVV) in rising markets.
"ETFs are always tax-efficient" — Open-end equity ETFs are. Bond ETFs generate ordinary interest income. Commodity ETFs, actively managed ETFs, and some leveraged ETFs can distribute taxable events.
"Inverse/leveraged ETFs are long-term holds" — Volatility decay makes leveraged ETFs unsuitable for long-term holding. They are designed for short-term tactical use. A 2× ETF does not deliver 2× the long-term return.
See CONTRIBUTING.md. Most welcome:
- Tracking difference data with dates and sources (fund provider annual reports)
- Non-US ETF equivalents (European UCITS ETFs, Australian ETFs)
- Updated expense ratio data when providers cut fees
CC0 1.0 Universal — Public domain.