• What if the game Asteroids used Wikipedia edits to drive the volume and size of the objects hurling towards your ship? Kevin Payravi makes your dreams into reality with WikiAsteroids:

    I recently launched WikiAsteroids (asteroids.wiki), a browser game that takes the classic arcade space shooter concept and adds a Wikipedia twist: each time someone makes an edit on Wikipedia, a new asteroid spawns. The size of the asteroid corresponds to the size of the edit. Blue asteroids represent edits with a net increase in content, while red ones indicate a net decrease. A new article creation spawns an extra life, and new user registrations spawn one of several possible power-ups (such as a shield, faster shooting, or slow-motion).

  • For the Washington Post, Emily Giambalvo, Kati Perry, and Jesse Dougherty analyze the playing time for players who transferred from another program.

    To understand the phenomenon and its impact, look no further than the men’s and women’s NCAA tournament fields. On the men’s side, 53 percent of all rotation players previously logged minutes at another Division I school, according to a Washington Post analysis. Roughly one-third of these key contributors — the top eight players in total minutes on each roster — played for another D-I program just last season.

    Now I want an analysis for academics and research prowess. “I will be taking my statistical talents to the University of California, Berkeley this semester.”

  • Members Only

    This is how the mortality simulation machine gets made.

  • With absolute certainty, you will die. When will it happen? That is a trickier question. But we can run simulations to explore the possibilities.

  • For Letterform Archive, designer Angie Wang examines a collection of chopstick sleeves as it relates to Japan:

    Paper chopstick sleeves emerged at the turn of the 20th century when disposable chopsticks and packaged meals gained popularity with the advent of train travel. In addition to ensuring cleanliness, printed paper chopstick sleeves became vernacular advertisements for shops and restaurants.

    The latest addition to the Archive’s holdings of Asian ephemera is the hashibukuro collection of Mr. Susumu Kitagawa of Fuji City, Japan. While individually modest in their design and messaging, when considered as a whole the sleeves that comprise this collection map a singular history of Japanese ideology and aesthetics.

  • The purpose of onomatopoeia is to imitate sounds with words, so you might expect the words for animal sounds to be similar across languages. For the Pudding, Vivian Li shows that this is not always the case.

    Onomatopoeia offers a fascinating glimpse into the interaction between sound and language. The way humans mimic animal sounds reflects not only shared biological instincts but also distinct cultural filters. Although onomatopoeia intends to imitate faithfully, its differences are ultimately far from arbitrary. In trying to capture the same auditory essence, English interprets a pig’s sound as [ojŋk], yet Hungarian hears [røf], and Vietnamese hears [ʔut it]. Even among the three animals discussed, cats are more consistent in their sound interpretation, while pigs are more variable — whether because pigs’ vocalizations are innately more complex, or because they call upon different phonotactic rules.

    All the words are clickable so that you can hear pronunciations for different languages. Colors indicate phone groups, such as nasal consonants and mid central vowels.

  • I like this chart set from Bloomberg that shows the top brands, ranked by market share in 2024. Faded lines show true estimates, and thicker lines in the foreground provide the trends. Tick labels are limited to the first column on the left to avoid busyness. Straightforward but effective.

    In the U.S., we usually see BYD, an electric vehicle car brand, mentioned in the context of Tesla as the competition. But it doesn’t look like much of a competition. BYD has rapidly gained market share in China over the last five years.

  • FiveThirtyEight is gone, and along with its visualization-centric projects, so is their poll tracking that they and others used to analyze public sentiment. The New York Times is picking up the baton:

    As one half of the Times/Siena College poll, which has been recognized as one of the country’s premier pollsters, we believe there’s value in an individual poll. But we also think aggregating polls and providing analysis of them collectively, as we did during last year’s election, is a service worth preserving — one that may be needed even more today with the profusion of polling, contradictory findings and loud partisan voices.

    We’re building on the work of the politics website 538, which for several years released this data as a public service until it was shuttered by ABC News this month, and which itself followed in the path of Pollster.com at The Huffington Post. Our goal is to ensure that this resource, which is a foundational tool for many journalists and researchers, remains updated long-term. The data will be made available free for anybody to use as they wish, so long as they provide attribution to The Times. (If you’re still using data collected by 538, you may still need to give it attribution as well.)

    They’re starting with presidential approval ratings.

  • Leading up to the NCAA Men’s basketball tournament, the Athletic has a bracket with projections expressed as win probabilities in each round. Surprise, Duke is heavily favored to win, which can only mean everyone’s brackets will be ruined early.

    On methodology:

    We create an offensive and defensive projection for every college basketball team using various box score metrics. These projections estimate how many points a team would score and allow in a game against an average opponent on a neutral court. We then assign a probability of how likely a team is to win a given game by adjusting for opponent, location and team health. Taking into account the bracket, we use the projections to simulate the tournament 200,000 times.

    After those 200,000 simulations, we calculate how often a team is to make each round of the tournament and win the championship. For example, if a team has a 10 percent chance of making the Final Four, that means that they’ve made the Final Four in 10 percent of the simulations run.

  • When you enter a query in traditional search engines, you get a list of results. They are possible answers to your question, and you decide what resources you want to trust. On the other hand, when you query via AI chatbot, you get a limited number of answers, as a sentence, that appear confident in the context.

    For Columbia Journalism Review, Klaudia Jaźwińska and Aisvarya Chandrasekar tested this accuracy and confidence by using several chatbots to cite articles:

    Overall, the chatbots often failed to retrieve the correct articles. Collectively, they provided incorrect answers to more than 60 percent of queries. Across different platforms, the level of inaccuracy varied, with Perplexity answering 37 percent of the queries incorrectly, while Grok 3 had a much higher error rate, answering 94 percent of the queries incorrectly.

    So not great.

    I am sure someone is working on improving that accuracy, but we’ll have to develop our own skills in separating truth from junk, just like we have with past online things. Going forward, maybe keep an eye out for the younger and older generations who tend to accept online things as automatic truth. Things could get dicey.

  • The “Department of Government Efficiency” keeps a “Wall of Receipts” to signal transparency in how they are “saving” money. However, it’s difficult to take it seriously when the data keeps changing, disappearing, and reappearing. Ethan Singer and Emily Badger, for NYT’s the Upshot, go with the clustered bubbles to show the edits since Feburary 16, 2025.

    I’ll be the first one to tell you that working with data is tricky and that there are bound to be mistakes. But it’s in everyone’s best interest to find the mistakes first instead of making life-changing decisions and then finding out what breaks after.

  • Pam Johnson got an email from her bank about her husband’s death. The Social Security Administration deducted funds from their account. The problem: her husband, Ned Johnson, is still alive. From Danny Westneat for the Seattle Times:

    “We recently received notification of LEONARD A. JOHNSON’s passing,” it began. “We offer our sincerest condolences …”

    At first she figured it was a scam — her husband, after all, was sitting right there. But then the bank got to the point.

    “We know this is a difficult time, and we’re here to help,” the bank wrote. “We received a request from Social Security Administration to return benefits paid to LEONARD A. JOHNSON’s account after their passing.”

    “There’s nothing you need to do — we’ve deducted the funds from LEONARD A. JOHNSON’s account.”

    Uh oh. It itemized how $5,201 had been stricken from their bank account, on the grounds that Ned wasn’t justified to get those benefits — because he was dead. That was for payments he’d received in December and January.

    After several weeks, they were able to get Johnson revived in the SSA database, but they still don’t know why he was marked dead to begin with. Whatever the reason, it should be obvious why it’s important to measure twice and cut once.

  • In efforts to reduce repeat offenses in Spain thirty years ago, researchers developed a formula that assigned a risk score to individuals. The score was used to decide if prisons should grant a prisoner temporary release, and the formula still factors into decision-making today. Civio describes the current downsides of using the scores, which are based on a relatively small sample of prisoners from the 1990s.

    An interactive graphic, shown above, illustrates the system and how a score goes up and down as you change variables in the drop-down menus. Foreign status increases the risk score the most, even more so than if a prisoner tried to escape.

  • For Axios, Marc Caputo reports:

    Secretary of State Marco Rubio is launching an AI-fueled “Catch and Revoke” effort to cancel the visas of foreign nationals who appear to support Hamas or other designated terror groups, senior State Department officials tell Axios.

    Why it matters: The effort — which includes AI-assisted reviews of tens of thousands of student visa holders’ social media accounts — marks a dramatic escalation in the U.S. government’s policing of foreign nationals’ conduct and speech.

    Something tells me that the view into the system’s usage, classification process, and underlying data will be quite fuzzy.

  • Members Only

    This week is about highlighting changes in data visually to make them glaringly obvious.

  • Amanda Shendruk and Catherine Rampell, for Washington Post Opinion, highlight the current strategies of removing data from public view so there’s no baseline to compare against.

    Curating reality is an age-old political game. Politicians spin facts, cherry-pick and create “truth” through repetition. Statistical sleight of hand has long been part of that tool kit, as has burying inconvenient numbers. (In 1994, for instance, U.S. lawmakers blocked federal data collection on “green” gross domestic product.) But Trump’s statistical purges have been faster and more sweeping — picking off not just select factoids but entire troves of public information.

    The deletions self-contradict when the same groups are also saying that “data does not lie” in reference to spending cuts and takedowns. Why delete all the truth about how the United States functions, how we live, and where we are headed?

  • According to data from ActivTrak, people are shortening their work days with higher productivity. For Bloomberg, Nibras Suliman reports on the 36 fewer minutes at the end of 2024 when compared to 2022.

    I don’t know anything about ActivTrak, so I wonder what kind of work they track and how they measure productivity. Either way, it’s good to see minutes going down. I think we could stand to work less, myself included.

  • This might come as a surprise to some, but since congestion pricing in Manhattan began, the number of complaints about honking declined. For The City, Jose Martinez and Mia Hollie looked at the 311 service data:

    “One more reason to love congestion relief — less honking,” Juliette Michaelson, the MTA’s deputy chief of policy and external relations, said in a statement to THE CITY. “Turns out it is, in fact, possible to make Manhattan a little more peaceful.”

    In addition, between Jan. 5 and March 4, the two Department of Environmental Protection noise cameras south of 60th Street didn’t issue a single horn-honking summons, according to numbers provided by the city agency. In contrast, those two cameras issued 27 summonses for excessive horn blowing during the same time period last year.

    311 service data can be found here.

  • In almost every dataset about life and people that stretches back past March 2020, you can find the blip when Covid changed how we live. Aatish Bhatia and Irineo Cabreros, for NYT’s the Upshot, used a stack of 30 charts to show the shifts.

    Each chart shows a pre-Covid gray on the left and a post-Covid red-orange on the right. The lines (or bars) on the post-Covid side extend the past when you scroll. Usually charts that show an empty space to start and then animate the rest are gimmicks, but the extensions highlight the sudden changes in this series.

    The scroll style and dimensions are very mobile-first, as the stack plays out in a more familiar way on a phone. The style also makes the 30 charts feel like not too much.

  • From Pew Research, this political typology quiz is from four years ago but is as relevant as it was then. Answer a handful of questions and see where you fall in the spectrum of nine groups. As the split between Democrat and Republican in the U.S. grows wider, maybe that means it gets easier to see the differences and similarities in the space between.

    On the methodology to define the groups:

    The typology groups are created with a statistical procedure that uses respondents’ scores on all 27 items to sort them into relatively homogeneous groups. The specific statistical technique used to calculate group membership is weighted clustering around medoids (using the WeightedCluster package version 1.4-1 in R version 4.1.1). The items selected for inclusion in the clustering were chosen based on extensive testing to find the model that fit the data best and produced groups that were substantively meaningful. Most prior Pew Research Center typologies used a closely related method, cluster analysis via the k-means algorithm, to identify groups.