Live here.
Using the investor-company investment network, we compute Personalized PageRank (PPR) with teleportation focused on unicorn companies, normalized by vanilla PageRank (PR). This captures both direct unicorn investments and indirect paths through co-investments, providing a network-aware measure of an investor's historical success in identifying exceptional companies, normalized for high-volume investors.
See the example usage notebook.
Call compute_investor_score with a DataFrame of investments with these column types:
investor_id int64
company_id int64
deal_date datetime64[ns]
(deal_date is only needed if you want to try recency_bias)
and a list of unicorn company ids.
Choose values for ppr_teleport_prob and pr_teleport_prob. The values of 0.1 and 0.999 work well empirically for us (these are the defaults). See below for how to interpret these parameters.
Our method uses personalized PageRank (PPR) with teleportation focused on unicorn companies, effectively measuring how frequently a random walk beginning at unicorns lands on each investor. This captures both direct unicorn investments and indirect paths through co-investments, providing a network-aware measure of an investor's historical success in identifying exceptional companies. PPR_TELEPORT_PROB roughly controls the balance between direct and indirect paths. The PPR score is then normalized by vanilla PageRank (PR) to control for general network centrality, addressing the confound that high-degree investors could achieve high PPR scores through mere investment volume or connectedness. As PR_TELEPORT_PROB approaches 1, the PR computation increasingly ignores network structure, with random teleportation dominating the walk process. At the limit, PR scores converge to uniform values, making PR normalization equivalent to division by a constant — effectively reducing to pure PPR rankings.
To pick our defaults we tuned the two parameters to:
- minimize excessive correlation with total number of investments,
- minimize excessive correlation with total number of direct unicorn investments, and
- agree with our intuition of who the top investors are.
If you want to see how rankings differ as a function of these two parameters see this notebook.
The defaults we landed on are PPR_TELEPORT_PROB=0.1 and PR_TELEPORT_PROB=0.999.
To get our final score we take the log and min-max normalize to [0, 100].
While this method can work with any starting list of companies, we use "unicorn" status as it's an objective, widely-recognized benchmark for exceptional company performance. You could adapt this approach by choosing a different company seed set using whatever company success metrics you find meaningful (IPO performance, acquisition value, etc.).
We experimented with different edge-weighting schemes.
- Temporal decay to emphasize recent activity
- Upweighting earlier investments in a company's history (pure-temporally, round-wise, a mixture, etc)
We didn't feel these made enough of a difference to justify the complexity. A recency_bias parameter is exposed for testing one weighting variant.
- Early-stage investors who haven't existed long enough to make investments in verified unicorns will rank below more established investors who have.
- We don't take into account when a company became a unicorn. Our data isn't complete enough for this.
- We're missing some stable identifiers for some companies (in particular some foreign e.g. Chinese companies might be missing)
With regard to the first point, correlation with direct unicorn investments isn't perfect:
Clearly some investors with zero direct unicorn investments rank above investors with one unicorn investment.
One could imagine extending trying to extend the method to account for the years of activity of the investor, or perhaps otherwise normalizing for number of direct unicorn investments.