Dobbyns on Data

Dobbyns on Data / Recent content on Dobbyns on Data Hugo -- gohugo.io en-us Fri, 10 May 2019 00:00:00 +0000 🔥 /2019/05/10/ Fri, 10 May 2019 00:00:00 +0000 /2019/05/10/ I gave a talk at the New York R Conference on the API pipeline I set up for figuring out when and where fires happen in New York. You can find video of the talk here. It’s similar to the drake talk this winter, but less drake and more fire trivia 😆 Peep the full repo here. Yay R confs! The Malort Report /2019/04/27/the-malort-report/ Sat, 27 Apr 2019 00:00:00 +0000 /2019/04/27/the-malort-report/ Tweets from the Nemesis twitter by our court stenographers documenting the best tradition ever. 2015 Just 24 minutes until we begin #malortcourt… GET READY Defense attorney, Kim Streff. That’s correct, there are no pants here in #malortcourt, or else it is a mistrial. http://t.co/OrQFeM4qqB What our defendants will be sworn in on. #malortcourt http://t.co/NmnNgtsnBK CARGO SHORTS were seen in appearance here at #malortcourt by the only gentleman here. Drake's Plan /2019/02/12/drakes-plan/ Tue, 12 Feb 2019 00:00:00 +0000 /2019/02/12/drakes-plan/ I gave a talk on the drake package for workflow management to the wonderful RLadies of NYC. In it, we hit the Twitter API to get NYCFireWire tweets, clean the raw tweet data, send the resulting addresses to the Google Maps API for geocoding, and then plot where fires happen on a map of New York. Links! Repo Slides Live coding walkthrough Many thanks to drake’s creator and maintainer Will Landau. On Y2King Myself, or "What Would Marie Kondo Do?" /2019/01/24/on-y2king-myself-or-what-would-marie-kondo-do/ Thu, 24 Jan 2019 00:00:00 +0000 /2019/01/24/on-y2king-myself-or-what-would-marie-kondo-do/ This is a fun one and mostly a self-heckle. I’ll set the scene. It’s Jan 1, 2019, the day after New Years Eve. I realize I’ve made it home with my phone. Nice. 2019 is off to a good start. I check it and notice a big, big notifications number on the Inbox app. Like 12,000+ kinda big. Normally, I keep a pretty clean inbox. At any given time I’ll have max 3 or 4 unread emails because I aggressively archive new emails that don’t need attention and religiously unsubscribe from marketing emails1. How does {multicolor} actually work? /2018/07/19/how-does-multicolor-actually-work/ Thu, 19 Jul 2018 00:00:00 +0000 /2018/07/19/how-does-multicolor-actually-work/ Today in R/mildlyinteresting…the multicolor package! It’s built on Gábor Csárdi’s crayon for use in conjunction with Scott Chamberlain’s cowsay. Here’s an example of what it does. library(multicolor) multi_color(things[["buffalo"]]) So yeah, mostly useless! But if you’re interested in how it works, I’ll take it apart and show you the parts that matter. Background The idea came about after I submitted a pull request to cowsay adding the ability to add a single color to the output of a call to cowsay::say. Catching Kareem /2018/06/13/catching-kareem/ Wed, 13 Jun 2018 00:00:00 +0000 /2018/06/13/catching-kareem/ Lighting round of basketball analysis! My friend and coworker Brad, who designed this very blog, is a sports fan and curious person. He wanted to know whether Lebron James is on track to overtake NBA all-time high scorer Kareem Abdul-Jabbar (38,387 career points!) in average number of points scored per game. He threw Kevin Durant in as a third point of comparison. So our question is: who’s on track to unseat Kareem? 98% green spaghetti, sliced and chopped /2018/04/15/98-green-spaghetti-sliced-and-chopped/ Sun, 15 Apr 2018 00:00:00 +0000 /2018/04/15/98-green-spaghetti-sliced-and-chopped/ This is the latest stop in an analysis tour of free-range menu data. One of the goals of fishing for real recipes is to be able to suss out patterns in how foods are combined and in what amounts in order to be able to generate new recipes. However, this post will mostly eschew creating anything useful and just mess around with the words in recipes themselves. As a step toward creating new ingredients and recipes in interesting ways, we’ll Monkeys are like Onions /2018/03/25/monkeys-are-like-onions/ Sun, 25 Mar 2018 00:00:00 +0000 /2018/03/25/monkeys-are-like-onions/ This is part two of a series on scraping content from the satirical news site The Onion and feeding that content to the newly-spruced up monkeylearn package. Part one deals with the scraping and munging of the data itself. In this chunk of work, we’ll go about classifying that data and getting a very unscientific measure of how “well” the classifier performed1. MonkeyLearn Background I’ve spent a really fun chunk of time in the last month or so developing the rOpenSci package text processing package monkeylearn along with the fantastic research software engineer Maëlle Salmon. Peeling back The Onion /2018/03/25/peeling-back-the-onion/ Sun, 25 Mar 2018 00:00:00 +0000 /2018/03/25/peeling-back-the-onion/ In this post I’ll programmatically find The Onion article links, scrape them for content, and clean them up into a tidy format. I chose The Onion because while not real news, the site does a great job of approximating the tone and cadence of real news stories. In the next post, I’ll use the monkeylearn text processing package to hand these to the MonkeyLearn API and then compare the classifications that MonkeyLearn generates with the URL’s subdomain to get an imperfect measure of the classifier’s accuracy. Scraping Together a Recipe, Episode I /2018/02/25/scraping-together-a-recipe-episode-i/ Sun, 25 Feb 2018 00:00:00 +0000 /2018/02/25/scraping-together-a-recipe-episode-i/ The Internet is full of amazing content. Like these names of actual recipes. Methodology for getting these to follow. Recipe Name Sea-Purb Seafood Pasta Tuna Salad for Grown-ups Easy Ham Balls No Ordinary Meatloaf CindyD’s Somewhat Southern Fried Chicken Crust for Two Butterbeer III This is a snapshot-in-time look at where I am with a data analysis project related to building daily menus. Scraping Together a Recipe, Episode II /2018/02/25/scraping-together-a-recipe-episode-ii/ Sun, 25 Feb 2018 00:00:00 +0000 /2018/02/25/scraping-together-a-recipe-episode-ii/ One of the goals here is to see what portion of a menu tends to be devoted to, say, meat or spices or a word that appears in the receipe name etc. In order to answer that, we’ll need to extract portion names and portion sizes from the text. That wouldn’t be pretty simple with a fixed list of portion names (“gram”, “lb”) if portion sizes were always just a single number. Scraping Together a Recipe, Episode III /2018/02/25/scraping-together-a-recipe-episode-iii/ Sun, 25 Feb 2018 00:00:00 +0000 /2018/02/25/scraping-together-a-recipe-episode-iii/ Converting to Grams Rather than rolling our own conversion dictionary, let’s turn to the measurements package that sports the conv_unit() function for going from one unit to another. For example, coverting 12 inches to centimeters, we get: conv_unit(12, "inch", "cm") ## [1] 30.48 Let’s see how that’ll work with our data. Grabbing a the first few recipes from scratch and generating a sample_recipes_df, we begin with sample_recipes_df <- get_recipes(urls[1:3]) %>% dfize() %>% get_portions(pare_portion_info = TRUE) %>% add_abbrevs() ## Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) ## Easy 4-Ingredient Margarita ## Blueberry and Spice Smoothie ## Number bad URLs: 0 ## Number duped recipes: 0 sample_recipes_df %>% select(recipe_name, ingredients, portion, portion_abbrev) %>% slice(1:5) %>% kable() recipe_name ingredients portion portion_abbrev Drowned Beef Sandwich with Chipotle Sauce (Torta Ahogada) 12 ounces chipotle cooking sauce (such a Knorr®) 12. On Brewing Beer-in-Hand Data Science /2018/02/12/on-brewing-beer-in-hand-data-science/ Mon, 12 Feb 2018 00:00:00 +0000 /2018/02/12/on-brewing-beer-in-hand-data-science/ This past summer I spent a chunk of time gathering and analyzing beer data in what I started calling beer-in-hand data science. I ended up giving a talk on the analysis to the wonderful women of RLadies Chicago, and afterward a few people were interested in getting ahold of some beer data for themselves. I hope to spread the wealth in this quick post by going through some of the get-off-and-running steps that I took to grab the data and get it into a usable, clean format. About Me /about/ Mon, 01 Jan 0001 00:00:00 +0000 /about/ Hi! I’m Amanda. I’m a data engineer at Deck. I work remotely from Brooklyn, NY where I also play ultimate frisbee and eat donuts. Contact amanda[dot]e[dot]dobbyn[at]gmail.com CV /vitae/ Mon, 01 Jan 0001 00:00:00 +0000 /vitae/ Current Deck Technologies, Data Engineer Open Source Software Contributor, sendgridr package, October 2021 Author, covid19us (on CRAN, RStudio March 2020 Top 40 Packages) and covid19france (on CRAN) packages, March 2020 Author, votesmart package, March 2020 Author, multicolor package, July 2018 (accepted to CRAN August 2018) Author, postal package, June 2018 (accepted to CRAN July 2018) Co-author, rOpenSci monkeylearn package, February 2018 Co-author, rOpenSci roomba package, May 2018 Co-author, cowsay package, June 2018