WebpageTest Charts Open Source Release

Who doesn’t love WebPagetest? The browsers, the insight, the configurability, it’s got everything. But, unfortunately, it also has an interface that only it’s creator could love :-)

So I’m happy to announce the open source release of Trulia’s WebPagetest Charts API. It was built to run and store WebPagetest test results, and provide endpoints to allow better visualization and exploration of the data. This is demonstrated with the companion project WebPagetest Charts UI that builds charts with the API endpoints.


This tool has proven to be very helpful for us at Trulia, particularly with mobile websites, and we wanted to share it to help others improve their mobile web (and regular web) experiences.

As UIs get more complex, especially on a mobile network, trying to sort out a regression in performance can be really difficult. Having a visible history with a filmstrip and links right into the full WebPagetests results has been a very powerful tool. When trying to see what causes a visitor “bad times,” a history of requests is great to peruse. By flicking through the results it’s easy to see the patterns and get a sense of what’s a consistent contributor to mediocre performance.

Screen Shot 2015-04-20 at 12.23.50 PM

WebpageTest also gives you access to Google’s Speed Index, which is one of their key performance factors for SEO:

Nobody knows exactly how SpeedIndex factors in, but at this point, if SEO matters to your buisiness it’d be irresponsible to not occasionally view your performance through Google’s eyes.

As mentioned above, there are two node applications that you can get on Github:

WebpageTest Charts API
This is a node server that runs tests on a schedule and offers up charting endpoints. Use it to power whatever you want with the data (custom dashboards, alerts, CI systems, etc). If you just want the data, this is all you need.

WebpageTest Charts UI
Another node server that does charting and datapoint display using the api and its data. It’s a pretty way to make a performance point to non-technical stakeholders (the filmstrips have proven invaluable), or just browse performance data.

If you have any questions about getting set up, or if you want to contribute, GitHub Issues and PRs are the best way. While this code is used internally for important things in Trulia, it’s still a young project so there will probably be bugs for new and different use cases. We’ll be making improvements as we continue to figure out how to best measure performance.


Leveling Up

At any fast-paced and successful company, there are areas of your codebase that are not-so-affectionately referred to as “legacy code”, and Trulia is no different. We’re always innovating and have done some amazing innovative things under incredible deadlines.

As we grow, we continue to learn from our mistakes and have made some great advances in performance and scale in line with our engineering organization’s goals. However, we noticed that we were spending too much time on maintenance, and getting bogged down in legacy code when developing new features.

Let’s be honest: writing tests for legacy code is hard. It’s a familiar catch-22: in order to write tests for untestable code you have to refactor; but in order to refactor, you need tests.

Another concern that is obvious to us is that our code is quite heterogeneous. Different people in and across teams use different patterns, styles and OO approaches when programming.

Stability and product execution is important to us: we needed to do something better.

The Solution: Training

blog pic 1

At Trulia we love doing awesome things with awesome people and decided to bring in two well-known trainers from thephp.cc: Sebastian Bergmann (the creator of phpunit) and Stefan Priebsch (enterprise & architecture expert).

Looking at the size of our frontend engineering team (over 80 and growing!), it was clear that with the breadth of topics we wanted to cover with 2 trainers, we needed to create a custom, week-long program for the Trulia Engineering team.

For Trulia, it was worth setting aside our roadmap for a few days to make sure our team keeps doing what we do best: writing great code. In fact, this attitude is old hat for us—every quarter we spend an entire week innovating on new product & engineering features.

training photo2Together with Sebastian & Stefan, we constructed a program that consisted of a series of presentations, workshops, pair-programming, coaching and even team-sized focus groups per team. Loving iteration as much as we do, we modified the content and structure of the sessions daily as we learnt from our observations & feedback.

Now, we have an increased shared knowledge of testing, SOLID principles, patterns and other best practices that we can use for existing and new features. This not only helps us write new code, but allows us to know when & how to make our legacy code better (more testable and reliable).

One thing we learned about our legacy code is that it’s actually not as bad as you might think. Sure, there may be pieces in the wrong places, but with small changes to how we work going forward, it will become more testable and maintainable over time.

At Trulia, we believe that small is beautiful. This philosophy covers both how we design software (small methods, classes have a single responsibility) and how we work (small teams, refactoring the smallest part).

The Results

We learnt a great deal from this program and it’s inspired many of us. Here’s some perspectives from the engineers in training:

“I learned a lot and at the same time [the training] gave more confidence into myself to not be afraid of doing the right thing.”

“I started writings tests for a class I’ve been working on, and I found several bugs.”

“Over time, as you build tests on top of tests we create more and more business value, by being able to implement new features faster, proclaim higher quality of code, and lower maintenance costs.”

“This code is probably run a billion times a day. Making it more efficient will have a huge impact.”

Of course, it doesn’t stop there. Investing in our team is something we take seriously, and is an ongoing process. We will continue to invest and grow our team with future trainings, coding clubs, engineering forums and other goodies. Having an architectural vision for our software is critical for a team of our size, as it provides a guiding light for our design & code. Last year, we saw first hand of the benefits of following a vision, when we designed and launched Object-Oriented CSS (OOCSS). Nicole Sullivan talks about some performance & efficiency gains we achieved on her blog.

We are also building a wiki of patterns & solutions to solve common problems that are inline with our vision. A focus group now meets regularly to iterate on our architectural vision and refine our coding guidelines to help us all apply best practices.

Sound like something that interests you? We’re hiring. Check out current opportunities at trulia.com/jobs.


choroplethr version 1.7.0

Today I am pleased to announce several significant changes to the choroplethr package for R. If you deal with visualizing geographic data in R, you might find these changes to be useful. You can get the latest version of the package by typing the following from the R console:


US Maps Now Include Hawaii and Alaska as Insets

Creating a choropleth map in R where Alaska and Hawaii appear as insets is a challenge. I have implemented a solution to this and it is now the default behavior when you call the functions ?choroplethr or ?choroplethr_acs with a data.frame with all 50 states. Here is an example:

choroplethr(df_pop_county, “county”, title=”2012 County Population Estimates”)


Previous versions of choroplethr did not render Alaska or Hawaii at all. For those who are interested in the technical details: choroplethr first renders the continental US, Alaska and Hawaii as three separate images and then combines them. If you are interested in the code, please type


and look at the examples. Some additional bookkeeping is required to have all three maps use the same scale.

Support for World Map Choropleths

The most requested feature for choroplethr has been support for country-level choropleths. Today I am happy to announce that choroplethr version 1.7 implements such support. This is a new direction for choroplethr and I hope to refine it as I get feedback from the community. Here is an example on how to use it.

df = data.frame(region=country.names, value=sample(1:length(country.names)))
choroplethr(df, lod="world")


A New World Map

I originally attempted to simplify the creation of world choropleths by using the world map that ships with the maps package. However, this map is quite old, and even contains the USSR as a country. This means that modern data cannot be bound to the map, since modern data does not list the USSR as a region. To address this choroplethr now ships with a world map from Natural Earth Data. This map is not without its own problems, though: the resolution of the map seems to have made smaller countries such as Singapore disappear. For details type:


I hope to continue to improve choroplethr’s support for world maps.

An Introduction to GIS for R Programmers

These improvements to choroplethr required me to learn a great deal about mapmaking outside the context of R. In order to help other R programmers make similar contributions I created a page on the choroplethr wiki titled Mapmaking for R Programmers. If you are interested in using R for a customized mapmaking project, this article might be useful for you.


Introducing Choroplethr version 1.2.0


Today I am pleased to announce that the latest version of choroplethr is now available on CRAN.  To install it you can type the following from an R console:


You only need to install the package once.  But in each session that you want to use choroplethr you need to type the library command.
The most requested feature has been to give users more control over how maps are rendered.  Version 1.2 provides three functions  (get_acs_df, bind_df_to_map and render_choropleth) to address this.  Here are 4 examples of using those functions to extract meaning from maps.

Example 1: Showing states with population over 1M

The choroplethr_acs function, available since version 1.0, makes it easy to create maps from ACS data.  Consider the example of creating a map of the population of US states:

# see ?choroplethr_acs for an explanation of the parameters
choroplethr_acs("B01003", "state")

Image 1

This map is very informative, but no map can tell you everything.  For example, in this map you cannot tell which states have a population above or below 1 million residents.  The new features in choroplethr make this easy:

Use get_acs_df to get an ACS table as a data.frame.
df = get_acs_df("B01003", "state")

# Use bind_df_to_map to bind the data to a map
df.map = bind_df_to_map(df, "state")
# change the population from a number to a factor
# which shows whether the value is above or below 1M
library(Hmisc) # for cut2
df.map$value = cut2(df.map$value, cuts=c(0,1000000,Inf))

# use render_choropleth to render the resulting object
render_choropleth(df.map, "state", "States with a population over 1M", "Population")


Example 2: Comparing States and Counties with Populations over 1M

These features open a new door for analysis.  As a small example, let’s create a map that compares states with populations over 1M with counties that have populations over 1M:

# States with greater than 1M residents
library(Hmisc) # for cut2
df = get_acs_df("B01003", "state") # population
df.map = bind_df_to_map(df, "state")
df.map$value = cut(df.map$value, cuts=c(0, 1000000, Inf))
state.pop = render_choropleth(df.map, "state", "States with a population over 1M", "Population")
# Counties with greater than 1M residents
df = get_acs_df("B01003", "county") # population

df.map = bind_df_to_map(df, "county")
df.map$value = cut(df.map$value, cuts=c(0, 1000000, Inf))
county.pop = render_choropleth(df.map, "county", "Counties with a population over 1M", "Population")

grid.arrange(state.pop, county.pop, nrow=2, ncol=1)

Image 3

Example 3: “The 1%”

One of the most talked about demographic features on the news today is “The 1%”.  This refers to individuals whose income is in the 99th percentile in a given year.  In this example we use the new features in choroplethr to highlight counties where the median family income is in the 99th percentile of all counties nationwide:

df = get_acs_df("B19113", "county") # median family income
df.map = bind_df_to_map(df, "county")
df.map$value = cut2(df.map$value, cuts=c(min(df$value), quantile(df$value, 0.99), max(df$value)))
render_choropleth(df.map, "county", "Counties with the Top 1% Median Family Income")</pre>


Example 4: ZIPs in California

As a final example, let’s consider trying to identify ZIP codes in California where the median age is between 20 and 30.  (In this post I use the word ZIP code, which is not technically correct, because it’s more widely understood.  The correct term to use here is ZCTA; I explain why in this blog post.)  Note that we can simply remove ZIPs which we are not interested in.

df = get_acs_df("B01002", "zip") # median age
df.map = bind_df_to_map(df, "zip")
ca_zips = render_choropleth(df.map, "zip", "CA ZIPs", "Median Age", states="CA")

df = df[df$value &gt;= 20 &amp; df$value &lt;= 30, ]
df.map = bind_df_to_map(df, "zip")
ca_zips_20s = render_choropleth(df.map, "zip", "CA ZIPs with median age between 20 and 30", "Median Age", states="CA")

grid.arrange(ca_zips, ca_zips_20s, nrow=2, ncol=1)



Hopefully these new features allow you to do more interesting work with visualizing and analyzing spatial information.  If you would like to share any of you work, please feel free to contact me on twitter or post on the choroplethr forum.


choroplethr version 1.1.0 released

I am pleased to announce that choroplethr version 1.1.0 is now available on CRAN!  (If you are new to the choroplethr package, please see this blog post).  There are three features that I would like to make users aware of.

Feature #1: Available on CRAN

Version 1.1.0 is the first version that is available on CRAN. This means that you can install and load it by typing the following on an R console:


The first line will download and install the package – you only need to type this once. Each session where you want to use the package, you must type the library command.

Feature #2: Subsets of States

My previous blog post demonstrated how to use the choroplethr_acs command to create choropleth maps from data from the 2011 American Community Survey (ACS). Let’s look at an example of county level total population, table B01003.

choroplethr_acs("B01003", "county")


This map is very interesting in and of itself – the distribution of people across the US has many interesting patterns. But it also has limitations: many small but important counties such as New York County (Manhattan) are not visible at all. And because there is a population cluster in the Northeast, they all appear the same color. Each county in this map is assigned one of 9 colors, and there are an equal number of counties with each color. This means that we can’t see how counties in the Northeast compare with each other.

Version 1.1 of choroplethr attempts to solve these problems by allowing users to specify a states parameter which defaults to state.abb (a vector that has abbreviations of all 50 states). Here is how the county-level population map looks when we zoom in on the Northeast

northeast_states=c("CT", "DE", "ME", "MD", "MA", "NH", "NY", "PA", "RI", "VT")
choroplethr_acs("B01003", "county", states=northeast_states)


When choroplethr zooms in, it also recalculates the buckets. This map shows that there is a much larger variation in the Northeastern US than origionally appeared on the national map. Hopefully this feature will be useful to users in their own analyses.

Feature #3: View “any” ACS

By default choroplethr_acs maps data from the 2011 5 year ACS. However, the Census Bureau has been conducting the American Community Survey since 2005.  You can see a list of all ACS here. I was originally quite excited by this wealth of data, and was hoping to  make animated gifs showing demographic changes from 2005 to 2012. However, Ezra Glenn, the author of the package choroplethr uses to get census data, told me that this is not possible.

Simply put, the data that the Census makes available via their API is only a subset of all its data; choroplethr can only access the 5-year surveys that ended since 2010.

Comparing two Choropleths

Even though the data is limited, let’s still try to compare the population of the Northeast between 2010 and 2012 (the largest date range that the Census API supports).  In order to make the comparisons as valid as possible, let’s set the num_buckets parameter to 1 so that we use a continuous scale in both images.  We can then compare the maps side by side:

pop_2010 = choroplethr_acs("B01003", "county", num_buckets=1, endyear=2010, span=5, states=northeast_states)
pop_2012 = choroplethr_acs("B01003", "county", num_buckets=1, endyear=2012, span=5, states=northeast_states)

grid.arrange(pop_2010, pop_2012, nrow=1, ncol=2)


Interpreting the Results

The problem with the above comparison is that the legends are not equal.  Therefore comparing the changes in colors between the maps has limited value.  The problem is exacerbated if you choose a value of num_buckets greater than 1 (because then you are comparing divisions between populations with diffferent ranges).  In the future I hope to create helper functions to support the comparison between choropleths.  (One idea is create a new data.frame that is the difference between the data.frames that make each constituent map, and then plot that (as either a percent change or absolute value).  Another idea is to force the same scale onto both choropleths).  But since the range of data that is available via the Census API is so small, I’m not sure how high a priority to make this.  This leads me to my final point: asking the Census Bureau to make more data available from its API.

Petitioning the Census Bureau to Add Historical Data

I think that if the Census Bureau added historical data to its API it would have tremendous value to researchers and all citizens who want to better understand the changing demographic nature of the US. To help make this happen I created a petition on whitehouse.gov to try and make this happen.  The text of the entire petition reads:


make all Census data available via the Census API.

The US Census Department provides the most comprehensive demographic information about America. Unfortunately, most of this data is difficult for researches to access.

As an example, here is a list of all American Community Surveys (ACS): http://1.usa.gov/1geFSSj. However, only surveys taken since 2010 are accessible via the Developer API (http://1.usa.gov/1nYk8OU).

This makes it extremely difficult to perform historical demographics anlaysis. As an example of research that this API enables, please see my choroplethr R package which creates thematic maps (choropleths) of Census data: http://bit.ly/1eZzNWP.

I would like to empower people to view demographic maps of any Census data from any year. That is only possible if the data is made available via the API.

If this is something which you would also like to see, please consider signing the petition here.

Feedback is Welcome

If you would like to give feedback on version 1.1.0 of choroplethr, report bugs, request features, or share your own interesting choropleths, please consider posting on the choroplethr google group.


Zip Code Population and Per-Capita Income in the 2011 ACS

Recently I was pleased to write a blog post introducing the choroplethr package for R.  One of the goals of choroplethr is to easily display data from the Census’ American Community Survey (ACS). To accomplish this, the choroplethr_acs function works with R’s ACS package to map results from the Census API.  To demonstrate this, I displayed several images including this map showing 2011 per-capita income on a zipcode basis.  (Technically, the census uses Zip-Code Tabulated Areas, or ZCTAs, and not postal ZIP codes.  See this page for details about ZCTAs).


Twitter user @BrashEQLibrium made an interesting comment on this:

If you click thru you will see a funny comic that says that many geographic profile maps wind up just being population maps. It wasn’t immediately clear to me whether ZCTA population and income would be correlated, so I decided to investigate.

Analysis 1: Side-by-side comparison

As a first pass, I simply used the choroplethr_acs function to place maps of population and income side by side:


# create two maps, side by side, of the tables in question
# 2011 ACS table ids can be found here: http://factfinder2.census.gov/faces/help/jsf/pages/metadata.xhtml?lang=en&type=dataset&id=dataset.en.ACS_11_5YR#
incomeTableId     = "B19301"
populationTableId = "B01003"
map_income     = choroplethr_acs(tableId = incomeTableId    , lod = "zip");
map_population = choroplethr_acs(tableId = populationTableId, lod = "zip");
grid.arrange(map_income, map_population, nrow = 1, ncol = 2)


These maps appear to have significant differences.  For example, the northern central part of the county appears much darker in the income map than in the population map.

Analysis 2: Animated GIF

My colleague Chris Vensko recommended creating an animated GIF of these two images on an infinite loop.  That way the differences between the images would have a stronger contrast.  Here is the resulting image with a five second delay.

animated gif

Analysis 3: Scatterplot

The traditional way to explore the relationship between two variables is a scatterplot with a smoothed conditional mean.

# data from 2011 ACS for ZCTAs for income
income.data = acs.fetch(geography=choroplethr:::make_geo("zip"), table.number = incomeTableId, col.names = "pretty")
income.df = choroplethr:::make_df("zip",income.data, 1)
colnames(income.df)[2] = "Income"

# data from 2011 ACS for ZCTAs for population
population.data = acs.fetch(geography=choroplethr:::make_geo("zip"), table.number = populationTableId, col.names = "pretty")
population.df = choroplethr:::make_df("zip",population.data, 1)
colnames(population.df)[2] = "Population"

final_df = merge(income.df, population.df)
ggplot(final_df, aes(Population, Income)) +
 geom_jitter(alpha=1/5) +
 geom_smooth() +
 scale_x_continuous(label=comma) +
 scale_y_continuous(label=comma) +
 ggtitle("Relationship between ZIP Code Population and Income\nData from 2011 Census ACS ZCTAs")


Again, there doesn’t seem to be a strong relationship here between ZIP (ZCTA) population and income.  It does, indeed, seem to go up a bit in the beginning.  But then it goes down.


The data seem to not bear out the relationship that some people expected between ZIP/ZCTA population and income.  I thought about this for a while and have a possible explanation: people might be confusing ZCTA population with whether or not a given ZIP is rural or urban.

I am not an expert in demography, but I seem to remember there being many studies of the demographic differences between rural and urban America.  For example, this article claims that there is a large difference in per-capita income between rural and urban counties.  Some other references on the demographic differences between rural and urban America are here and here.

But the key point, I believe, is this: ZCTA population counts need not correlate with whether or not a ZIP is in a rural or urban county.  ZIP codes can be of varying size, so the same population count in two different ZIP codes can mean two different population densities.  ZIP codes are created and maintained by the postal service for the sole purpose of facilitating mail delivery.  Here is a quote from my previous article which attempts to explain some of the issues related to using ZCTAs for data analysis:

The highest level of detail that choroplethr supports is the zip code. From both a mapping and demographic standpoint zip codes are problematic. On the one hand, zip codes are useful because they are smaller than counties (so you can get a higher level of detail) and everyone knows which zip code they live in (so they are an intuitive unit for people). On the other hand zip codes are managed by the postal service for the sole purpose of delivering mail. This means that they can change without notice and are not always polygons. For an in depth discussion of these problems see this article from georeference.org; for an overview of zip codes in general see this article on Wikipedia.

Despite these problems the US Census Bureau attempts to capture demographics at the zip level. They have created ZCTAs (Zip Code Tabulated Areas) which roughly correspond to zip codes. You can learn more about ZCTAs here. Because of these issues choroplethr renders zip code choropleths as scatterplots. It uses the zipcode package, created by Jeffrey Breen, to map each zip code to a longitude and latitude point.


The choroplethr package for R


At Trulia we deal with a lot of spatial information: housing markets vary dramatically from one part of the country to another, as do the demographics of each region. Being able to visualize these regional differences helps us to understand them. Choropleth maps are a useful way to visualize this kind of information. In a choropleth, regions are colored based on some metric, such as which presidential candidate a state voted for. I recently created a package in R to facilitate creating choropleths called choroplethrchoroplethr also makes it easy to visualize data from the US Census. You can install it from an R console like this:

# install.packages("devtools")
install_github("choroplethr", "trulia")

The American Community Survey (ACS)

choroplethr was initially created to visualize information from the American Community Survey (ACS). The ACS is an ongoing statistical survey run by the US Census Bureau. Most people are familiar with the decennial census, which asks a handful of questions of all Americans every 10 years. The ACS, by contrast, asks a large number of questions of a sampling of the population every year. You can learn more about the ACS here. An important point to note is that, because the ACS only samples the population, all of the reported numbers are estimates. The results from the ACS are summarized and available as tables. You can see a list of the 2011 5-year ACS tables here. Thoughout this blog post we will be using table B19301 which contains information about per capita income in the last 12 months. choroplethr uses the R package acs to get ACS data. The acs package was developed by Ezra Glenn, a lecturer in the Department of Urban Studies and Planning at MIT.

The choroplethr_acs function

The State level of detail

To view a choropleth of an ACS table you simply need to call the function choroplethr_acs and pass it a table number from the 2011 5-year ACS and a level of detail (LOD). Valid LODs  are “state”, “county” and “zip”. For example, to see a choropleth of state-level per-capita income type:

choroplethr_acs(tableId="B19301", lod="state")


By default choroplethr divides the lower 48 states into 9 equally sized buckets and colors the buckets using a sequential brewer scale, where darker colors indicate a larger value. Many patterns become immediately apparent when the data is displayed this way. For example, there are clusters of wealth in the northeast and west coasts, as well as the north central part of the county. Additionally, there is a cluster of lower-income states in the southeast. From the legend we can see that the difference between the richest and poorest states is approximately $17,000, and each bucket covers approximately $2,000. choroplethr renders maps with the ggplot2 library.

The County level of detail

Things change dramatically when we look at the same dataset at the county level of detail:

choroplethr_acs(tableId="B19301", lod="county")


Many people are not familiar with county-level maps of the continental US and are surprised by both the number of counties (3,076) and their relative size (counties on the west coast tend to be larger than counties on the east coast). Like before, choroplethr divides each region into 9 equally sized buckets. This map allows us to look within a state, and see that some states have both extremes of wealth, while some are more consistent. It is instructive to compare and contrast these two maps.

It is worth studying the legend as well. Now the scale has a range of $53,000; moving from the state LOD to the county LOD increased the range of our scale by over 3x. This is a trend that occurs frequently in choropleths: as the level of detail becomes higher the range of the scale increases as well. Another trend is that the buckets at the extremes cover an increasingly large amount. The highest bucket now covers a range of approximately $32,000, which is larger than the entire range covered in the state choropleth. The lowest bucket now covers approximately $9,000.

The Zip level of detail

The highest level of detail that choroplethr supports is the zip code. From both a mapping and demographic standpoint zip codes are problematic. On the one hand, zip codes are useful because they are smaller than counties (so you can get a higher level of detail) and everyone knows which zip code they live in (so they are an intuitive unit for people). On the other hand zip codes are managed by the postal service for the sole pupose of delivering mail. This means that they can change without notice and are not always polygons. For an in depth discussion of these problems see this article from georeference.org; for an overview of zip codes in general see this article on Wikipedia.

Despite these problems the US Census Bureau attempts to capture demographics at the zip level. They have created ZCTAs (Zip Code Tabulated Areas) which roughly correspond to zip codes. You can learn more about ZCTAs here. Because of these issues choroplethr renders zip code choropleths as scatterplots. It uses the zipcode package, created by Jeffrey Breen, to map each zip code to a longitude and latitude point. To render an estimate of per capita income at the zip code level type this:

choroplethr_acs(tableId="B19301", lod="zip")


The acs package returns 32,481 ZCTAs for this query, so overplotting is a serious issue. That being said, it is still an informative map. For example, many people are surprised by the low number of zip codes in the western part of the US. Also, the color distribution between the county and zipcode maps is roughly analogous. Additionally, the range of the scale has increased dramatically to $375,900. The highest bucket alone accounts for $339,000 of that range.

At the zip LOD outliers and sampling error become a serious issue. For exampe, it is unlikely that the median annual per-capita income in zip 54307 is truly $137. The acs package was developed to make it easy to access not only estimates, but also the statistical uncertainty measurements that accompany these estimate. You can learn more about these features of the acs package here. As an aside, you can learn more about zip code 54307 by simply typing zip 54307 in google.

Discrete and ?Continuous Scales

Discrete Scales

By default choroplethr creates a scale by dividing each region into 9 equally sized buckets. This is an example of a discrete scale. For discrete scales you can choose between 2 and 9 equally sized buckets, and each bucket size provides you with different information. For example, using two buckets will show you which regions are above and below the median. Here is how to show which counties have above and below the median income:

choroplethr_acs(tableId="B19301", lod="zip", num_buckets=2)


Continuous Scale

Setting num_buckets to 1 will force a continuous scale:

choroplethr_acs(tableId="B19301", lod="county", num_buckets=1)


What’s notable about this map is that most of the regions appear to be the same color. To understand why, it is ?useful to view the values as a boxplot:


Most counties have a per capita income in the range of $20,000-$25,000. But there are outliers both over $60,000 and below $6,000. Because a single color range must contain all values, most values are mapped to a similar color.

The choroplethr function

All of our examples so far has used the choroplethr_acs function to create choropleths of ACS data. But we can create similar maps of arbitrary data with the choroplethr function. All of the parameters are the same except for the first: instead of a tableId, we pass in a data.frame with one column named region and one column named value. For state level choropleths region can be any common naming of a state (e.g. “California”, “california” or “CA”):

df = data.frame(region=state.abb, value=sample(100, 50))
choroplethr(df, lod="state")


For county level choropleths region must be a 4 or 5 digit county FIPS code:

data(county.fips, package="maps")
df = data.frame(region=county.fips$fips, value=sample(100, nrow(county.fips), replace=TRUE))
choroplethr(df, lod="county", num_buckets=2)


For zip level choropleths, region must be a 5 digit zip code

data(zipcode, package="zipcode", envir=environment())
df = data.frame(region=zipcode$zip, value = sample(100, nrow(zipcode), replace=TRUE))
choroplethr(df, lod="zip", num_buckets=1)



I hope that you found this tour of choroplethr version 1.0.0 useful. In summary, choroplethr seeks to provide a simple interface to create choropleths in R at 3 levels of detail (state, county and zip) and 2 scale types (discrete and continuous). It attempts to work seemlessly with the acs package to create choropleths of US Census data.

Version 1.1.0, which is already under development, will support rendering choropleths for a subset of states, as well as mapping data from any ACS, not just the 2011 5-year survey.

Questions and Discussion

If you have any technical support issues, feature requests or want to share your results, please post at the choroplethr google group.


Natural Hazard Maps: the Why’s and How’s

Hi, My name is Peter Black and I’m the lead geospatial engineer at Trulia. We’ve been making some interesting maps here at Trulia, displaying crime heatmaps,  a commute tool that selects homes within a travel time polygon, and home value estimates down to the parcel level. Today, I’m writing to tell you the why’s and how’s of our most recent series on natural hazards.


When Hurricane (ok ok, it was an extra tropical storm) Sandy slammed into the New Jersey shoreline on October 30th, I watched with horror and tried to stay in contact with my loved ones and friends in harms way. Seeing the awful damage that resulted cemented my feeling that I had to incorporate maps on natural hazards that I knew were readily available from various federal sources into the Trulia experience. Doing so would open up a new avenue for millions of people to better understand the natural world and the risks they face when they’re making the decision on where to buy a home.

There are many types of natural hazards of course, and we couldn’t possibly put all of them on Trulia. So we chose the five hazards that have caused the most damage in the past few decades. These are: hurricanes, floods, earthquakes, tornadoes, and wildfires. Fortunately there is excellent data available for each hazard, mostly from federal sources. In compiling the data, I noticed some interesting things. For example, why was the Charleston South Carolina area at risk for earthquake? As it turns out, there was a magnitude 7.3 that shook the area in 1886.

Charleston SC m7.3 1886

There have been tornados in my neck of the woods in northern California. Southern New Jersey (the pine barrens) is at risk for a forest fire.

Pine Barrens Forest Fire 2007

Given these revelations, it only made us work harder to create a new audience for this insightful information. We noticed that there wasn’t any really good mashup for all of the historical information around hurricanes and tornadoes. So for each, we took the historical track data along with their attributes, and assigned them to an underlying nested hexagon grid. Once that was accomplished, we classified the data and created a really cool visualization of historical hotspots for each hazard. I stress the word historical intentionally since we have no idea where the next hurricane or tornado will hit. Our intention is to solely show where the storms have hit in the past 60 years or so, when this meteorological data became more reliable and sophisticated due to the advent of technologies like radar (in the 40s) and satellite imagery (in the 60s).

On the technical side of things, we import shapefile or raster data from the various sources and use shp2pgsql or geodjango to import into our database. For that, we use the open source database postgresql, with the geographic information system extension PostGIS. We model our maps using the excellent and open source Tilemill by MapBox.. We utilize tilestache to serve the maps and if we desire interactivity in our layers we use geojson if the feature set is small enough to not slow down the browsing experience. For larger feature sets we use utfgrid and a modified version of wax for rendering within javascript. Utf is pretty amazing, as it allows for millions of points or polygons to have an interactivity that not long ago would have been completely out of the question.

I hope you enjoy the maps. They are pretty informative and provide an interesting tool for homebuyers that can help people make more informed choices. I’d like to thanks my excellent team of engineers whose talent and professionalism are truli-amazing, the awesome pr crew we have, as well as the senior management team at Trulia who supported this idea from its inception.


Trulia-mazing Events at Trulia


Trulia is at it again, hosting unique and informational events right here in our San Francisco office for employees and the local community alike. We take tech (and socializing) pretty seriously around here by staying up on the latest gadgets, understanding the current market and interacting with fellow data-obsessed techies. Over the last several weeks we have felt honored to host some “trulia-mazing” speakers and welcomed visitors from far and wide. During the UX in Space! vol.2 event Eric Bell, Jess Zak, and Ulrika Andersson joined us to discuss how their designs have solved interesting challenges in varying environments. The Storylines Meetup Group has brought in both Wendy Yu and Trulia’s very own Heather Fernandez to share their personal stories of where they are today and the path they took to get there. Being the data-geeks that we are, we were excited to host the Urban Data Challenge Showcase, brought on by Young Professionals in Transportation, where participants demoed their submissions and spoke about the challenges, the process, and the findings that came along from their projects. SF Data Mining stopped by our offices to host a Crowdsourcing Meetup, where Edwin Chen spoke to human-powered machine learning in regards to use cases, methods for quality control and running your first task.

Take a look at what’s coming up next at our office and grab a spot while you still can. We look forward to seeing you at Trulia shortly!


Upcoming Events @ Trulia:
• 6.25.2013      Storylines presents: Anne Raimondi’s Story
• 6.26.2013      Mobile Advertising: Challenges and Opportunities
• 7.23.2013      DataMining
• 7.25.2013      Data Visualization



What’s Happenin’ at Trulia?

Now showing at trulia

Trulia HQ has been the host to many tech events here in the Bay Area, welcoming an array of organizations into our office space. Data visualization, tech leadership, UXD, and data mining groups have all spent an evening or two with us. Our unique rooftop event space (aka the “Trulia Penthouse”) is the perfect backdrop, offering beautiful views of the city. Trulia is proud to support the tech community in San Francisco and foster a great environment for idea exchange and learning. Our employees are encouraged to attend these evening events and mingle with fellow SF Techies. One of our last featured events was Pamela Fox’s Story, where Pamela joined us to discuss her experience as a Front End Engineer and her involvement in Girl Develop It. We also hosted speaker Scott Murray, author of Interactive Data Visualization for the Web, and Chris Viau, the force behind @d3visualization, during the Bay Area d3 User Group event. Trulia is closely linked to many groups in the Meetup.com world and we always seem to have an event just around the corner, check it out…

Upcoming Events @ Trulia:

4.22.2013  Data Mining:  streamdrill and H2O from 6:30 – 8:30 PM

4.24.2013  SF Bayarea Machine Learning: Real-time Online Learning for Event Streams from 6:30 – 8:30 PM