Real estate photos come in all shapes and sizes. By this we mean that there is a lot of diversity in terms of both image quality and content. At Trulia, we know the first property photo we show a consumer is the most important one. It’s what we call the hero image and it’s the photo that represents a listing across Trulia on all platforms, and it needs to be aesthetically pleasing in order for people to want to click through for more information on the listing.
How Trulia Traditionally Chose Hero Images
Each new property listing we receive at Trulia comes with a collection of photos. Historically, the first photo in this collection is selected as the hero image. But, we have found this is not always ideal. While some property listings come with high resolution and professionally staged photos, others contain low resolution and poorly taken photos, with artifacts like low brightness, blurriness, watermarks, etc.
We often find cases where the first photo is lower quality or has irrelevant content, and a better, more representative photo is available in the collection. In this case, surfacing that first photo would not only leave a sub-optimal first impression for the consumer, it would also likely reduce the chances of a consumer clicking through to get more information and eventually submit a lead. So, we decided to test a new method, leveraging image recognition and deep learning to help determine which photo in a listing’s collection should be selected as the hero image.
Looking for a Change: Our Hypothesis and Methodology
We hypothesized that by selecting a more attractive and relevant photo from the collection to be the hero image, we would be able to improve the user experience while increasing the likelihood of properties getting clicked on. To test our hypothesis, we focused on Trulia Rentals and scored each photo in the property collection across three parameters:
- Image Appropriateness
Unlike image quality, image appropriateness can be easily defined in terms of artifacts that violate Trulia photo upload policies. This includes photos with prominent text and watermarks, advertisements, humans, animals, non-real estate content, etc. Like before, we trained another CNN to differentiate between appropriate and inappropriate photos using the curated dataset maintained by our quality control team. - Image Quality
Image quality is subjective, but for the purpose of selecting our hero images, we want photos that are staged and professionally captured, are high resolution, and contain luxury elements, like chandeliers, fireplaces, etc., to score higher, while images that are low resolution, have clutter, etc., score lower. One of the challenges in training such a visual model for image quality is collecting a large enough labeled dataset of high and low quality images. To overcome this we resorted to using predictions from another machine learned attribute based model that categorizes properties as either “Luxury” or “Fixer,” based on features like price, keywords, etc.When we did this, we found the homes scored as “Luxury” often had high resolution and professionally taken photos of attractive spaces, while “Fixer” homes usually had low resolution and poorly captured photos. This matched our above definition for image quality and we trained a Deep Convolutional Neural Network (CNN) to learn to differentiate between photos from luxury homes (positive class) from photos from fixer homes (negative class). - Image Relevance
Another key aspect in determining the hero image is understanding which scene or room type the photo belongs to, ensuring the content is relevant and representative of the property. For example, it might make more sense to prefer an indoor photo of the property to an outdoor view of the neighborhood. Similarly, a kitchen photo might appeal more than a bathroom photo. By using our scene classification model (again a CNN), we enable our product team to weigh certain scenes more than the others based on their criteria of relevance.For each of the above models, we used CNN architecture very similar to the GoogleNet model that won the ILSVRC 2014 competition. All the above deep networks were trained using caffe deep learning framework on a Titan X GPU card.Finally, each photo in the property is scored using the above three models and a final score is computed as a weighted sum. The photo with the highest score is selected as the hero image.
The Results
We conducted thorough AB tests and analysis to compare our hero image selection algorithms against the existing method of selecting the first photo as the hero image. We found significant improvements in consumer engagement as well as consumer and landlord contact rates on all Trulia Rentals platforms, thereby confirming our hypothesis. Not surprisingly, the improvements were most pronounced on the mobile app and web platforms, which incorporate more of the image photo when displayed in the UI.