Big, Weird, and Pretty: Visualizing HAM Museum Data

For our curation assignment, we explored Palladio, RawGraphs, and Tableau to reflect upon how these platforms could visualize our research questions for the Harvard Art Museums’ API. We asked probing questions about the collection’s provenance and provenience, including:

  1. What is the relationship among accession year, date of first pageview, and last pageview?
  2. What is the relationship among accession year, culture, and department?
  3. What is the relationship between total unique pageviews and culture?
  4. What is the relationship between exhibition count and culture?  

The goals of this assignment were to (1) compare different visualization platforms, (2) gather big picture information about the collection’s provenance and provenience, and (3) identify objects in the collection that could be useful for our final digital storytelling project. This blog post recounts our reflections on the first objective, while our final digital storytelling project will feature what we learned in pursuit of the second and third objectives.

 

Palladio

We were drawn to Palladio because of its potential for visualizing multiple aspects of an object’s provenance and provenience. Palladio can “visualize complex historical data with ease” by generating maps, graphs, lists, galleries, and timelines/timespans. We were hooked by this suite of data visualization tools, and how Palladio could create a comprehensive, multilayered narrative of the collection’s provenance and provenience.   

Overall, we found Palladio easy to use (especially once we looked over Miriam Posner’s guide), though restrictive in the ways it recognized different types of data. The platform couldn’t process special characters, and was picky with other data formatting. To take full advantage of Palladio’s offerings, we would need to devote significant time to re-formatting dates, locations, and other categories within the HAM’s API.

The Good

Our favorite Palladio visualization addressed our second research question—what is the relationship among accession year, culture, and department? We used Palladio’s timeline feature to display accession year (x-axis) and number of objects (y-axis). The data points were grouped according to culture, which we could highlight by hovering over a section of the timeline. While we were not able to layer in department data, this visualization allowed us to easily detect collecting patterns—not just for culturally specific items, but for the sculpture collection as a whole. The timeline also revealed clusters of objects with “unidentified” and/or “null” cultural data, which we found particularly intriguing for thinking about the collection’s provenance and provenience.

Palladio timeline – accession year, culture

The Bad

We found that Palladio is ultimately limited by the inability to embed or easily disseminate visualizations beyond the platform. To share our timeline beyond Palladio, for example, we can either download a static SVG, or share the API’s CSV with instructions on how to upload the data into Palladio and re-make the interactive visualization.

The Takeaway

Palladio works best as a research method, not a visualization tool. The platform can handle a variety of data relevant to our research questions, both qualitative and quantitative, but only if that data is formatted in specific ways. We found the timeline and list functions, in particular, to be useful for analyzing the provenance and provenience of different objects in the collection. Using these tools, we were able to find overarching collecting patterns, in addition to identifying outliers that we might pursue as object case studies. Because we can’t easily share interactive visualizations beyond the platform, however, we are less interested in using Palladio for public-facing aspects of our project.

 

RAWGraphs

RAWGraphs bills itself as the “missing link” between spreadsheets and data. We were interested in the platform, as Patrick Rashleigh recommended it a lighter version of D3, the  JaveScript library for data visualizations that works through HTML, SVG, and CSS. As powerful as D3 is, it’s quite a challenge to learn. RAWGraphs offers a similarly robust library of visualizations that work (somewhat) seamlessly with user data without the need for heavy coding. In fact the site itself is built on d3.js and offers documentation on GitHub. We opted to use the web-hosted version of RAWGraphs.

The Good

The site is incredibly easy to use. Users upload data, create visualizations, and export images (as PNGs and JSON models) all within their browser. RAWGraphs also clearly explains the design and hierarchy behind each graph. This was especially helpful, as we approached the curation assignment as a way to explore the aesthetics of data visualizations.

The Bad

RAWGraphs did not fit well within our project. 

RAWGraphs circle packing – exhibition count, culture

The site favored quantitative data, while our research questions were largely qualitative. We were able to create visualizations, but they weren’t necessarily site’s most exciting. (Fig. #) shows us exhibition count by culture, with the larger circles representing the most frequently exhibited. At the same time, RAWGraphs generates static images. Our data is messy. And there’s a lot of it. So it really benefits from interactive detail. Consider this gant chart (Fig. ##). Each rectangle represents a

RAWGraphs Circular Dendogram – accession year, department, culture, object

sculpture in the HAM collection. The width  tracks the span of time between its accession year to the last page view online, with the different colors representing departments. The visualization makes for a colorful Richter scale-esque image of physical provenience and digital provenance. Still, the

sheer amount of data limits our ability to gather information at a quick glance.

RAWGraphs Gant chart – accession year, first page view, department

The Takeaway

Still, RAWGraphs made some compelling data visualizations. The circular dendrogram was one of the few graphs that didn’t require a lot of numerical data, so we used it to plot accession year by culture and department. You get this—a weird HAM tree ring. It’s nearly impossible to read with the way RAWGraphs imports text, but it still reflects the connections between objects. In this sense, making visualizations in RAWGraphs helped us get a better sense of the data. The site also provided a helpful reminder of scale—how our aesthetic choices change when we’re working with 4000+ objects versus a scant 97.

 

 

Tableau

We were familiar with Tableau from our previous work with the HAM’s API. The platform had impressed us with its ability to generate dynamic, interactive data visualizations, and we wanted to revisit Tableau with a more critical eye. We asked, what could Tableau—frequently used by businesses to “make an impact” with their data analytics—offer us in terms of our new provenance and provenience-themed research questions? The answer: a lot. We found ourselves overwhelmed by the visualization possibilities and by the amount of data we were working with. Tableau, however, continued to meet our expectations as a nimble, customizable, and easily sharable platform that could address our research questions, once we got over the learning curve.

The Good

Perhaps our most arresting visualization answered our third research question—what is the relationship between total unique page views and culture? Beyond looking like an optometrist exam, this visualization could easily demonstrate which cultural proveniences were attracting the most attention on the HAM’s website. By selecting a culture on the sidebar, we could also see data clusters and identify variation within a cultural category. This visualization revealed how the collection, comparatively, seems to have very few “null” or “unidentified” proveniences—represented by the center blue and outer light green, respectively.

The Bad

This visualization, however, also reveals some of Tableau’s flaws. The platform doesn’t seem to automatically recognize individual objects with the identical titles. We found at least several instances when Tableau combined the pageviews of separate objects to reach an incorrect total for unique pageviews. Additionally, this visualization doesn’t include objects with “zero” pageviews. This is a significant omission, as we believe several hundred objects in the HAM’s sculpture collection have never been viewed online. We also found that, while we think this visualization is fascinating to mine for more information about the HAM, the sheer number of cultures represented ultimately detracts from its readability.

The Takeaway

With a little more practice on our end, we think Tableau could offer our project a dynamic, sharable, and compelling way to visualize our research questions about the HAM’s collection. The drawbacks of a large dataset could be mitigated through further cleaning of the data, and we think that time investment would significantly enhance Tableau’s outputs. Plus, with the TableauOnline hosting platform, we can easily embed interactive visualizations to share with readers.

 

Final Thoughts

In thinking through the first objective of our curation assignment—to compare different visualization platforms—we agreed that Tableau best suites our project’s aims. The platform allows us the most control over our data and design. We realized in this exercise how closely aesthetics were bound to data. The qualitative nature of both our collections data and research questions limited our ability to experiment with visualizations across all platforms. Some graphs were illegible. Others too static. A few did not work at all. Still, this work reiterated how effective data visualizations can be as a research method. Rather than create basic, uncritical graphics, we appreciated the process of cleaning, interpreting, and visualizing data as a way to conceptualize new paths into the museum that can be further explored in our final project.

Leave a Reply

Your email address will not be published. Required fields are marked *