Telling a Story

  • Data visualization helps make an argument more persuasive by being easier to digest. Data visualization is a means to tell a story about the world.
    • The story may or may not be true. Data visualization can be used to lie.
  • Sketch a data story with this four-part process:
    • Define the problem to be solved (“We need to X because Y”)
    • Frame the problem as a question.
      • Use this question to facilitate the search for data.
      • Be open to changing this question when evidence points to a more meaningful direction.
      • Always state the question before the answer.
      • Make sure the question is aligned with the problem being solved.
    • Define the means to answer the question.
    • Plan a data visualization to tell the story.
  • Choose an appropriate data visualization tool. (see here)
  • Data and numbers are not necessarily neutral. When working with data, ask whose story is being told. Data can be used to lie or obfuscate.
  • Tell, Show, Why
    • Tell the interesting aspects of the data. Tell it in the most succinct way possible.
    • Show visual evidence to support this argument
    • Remind the reader why it matters.
  • Do not leave it up to the audience to guess what the data means. Data visualizations do not speak for themselves.

Data Preprocessing

  • Consider what kind of data are out there, where and how they can be obtained, and whether the data raise privacy issues or should not be published.
  • Source your data by having a descriptive file name, detailed notes, and a backup of the data. Also explain any blank or null values.
  • Clean your data. Look out for missing or incorrect values within the data.
    • If the data are too messy, it is at the analyst’s discretion whether or not to continue using these data.
    • Most of the work in data analysis is often just making sure the data is clean.
    • Automate this process:
      • Use Find and Replace liberally
      • Use Histograms to find outliers or weird entries.
      • Transpose the data for data visualization purposes.
      • Split data into more columns by splitting on some delimiter (i.e., spaces or commas).
      • Use Formulas (as in spreadsheets) liberally to get derived values
      • Automate data extraction through OCR or Tabula
  • Question your data. Understand how to interpret the data and any inherent methodological or sociopolitical biases within the data. Know which aspects are unanswerable from the data alone.
  • Know the limitations of the data. Specify these in the actual data story.

Data Processing

  • Data are only useful if you use it in comparison to something else.
    • Be precise. Use the correct terminologies. Be wary of data phrased in a misleading way.
    • Causation != correlation. Phrase with this in mind.
    • Prefer to normalize the data to make meaningful comparisons. Prefer to normalize data when it is not formatted in a common scale or metric to allow for direct comparison.
      • Adjust to rescale data appropriately. Compare rates not values.
      • Adjust to account for seasonality or change over time.
      • Use a standard score or index for the data.
    • Account for biases in the data when making a comparison.

“It’s unrealistic to pretend that we can create a perfect model. But we can certainly come up with a good enough one” 1

Techniques

Charts

  • What chart type should you use?
  • Before creating a chart ask: Does a visualized data pattern really matter to the story?
    • Judge a chart based on its goals. (for more on this, see here)

Rules of Thumb

  • The title should tell a story of its own. This may be combined with a more technical subtitle.
  • Use legends and annotations to provide more context.
  • Specify Data Sources and Credits to give context on where the data came from.
  • Specify uncertainties within the data.
  • Make sure all essential information is visible without any user interaction.
  • Bar, Column and Area charts always start at the zero baseline. The baseline of the chart should be obvious as this is one way to lie with the data.
  • Pie charts represent 100% of the quantity.
  • Line charts are judged based on the shape of the line. Use appropriate scales.
  • Use only one vertical axis. Any more than this will confuse the reader.

Aesthetic Guidelines

  • Add elements as they are needed to convey the story. Do not add bloat to the chart.
    • Avoid decorative elements as these can mislead the viewer.
  • Sort the slices in a pie chart from largest to smallest starting at 12 o clock.
    • Favor rendering the data as a bar chart instead if there are many slices.
  • Do not make people turn their heads to read labels. Orient the labels horizontally.
  • Order the categories appropriately.
  • Choose natural increments that space equally. Keep typography simple
  • Choose an appropriate color palette. Consider as well the inherent symbology of some colors (i.e., red means loss, green means profit)

Maps

  • The main consideration here is the data format and the data story being told.
    • Before using a map, ask if your data can be mapped as points or polygons. These data should have a spatial component
    • Maps should correspond to only one variable. Any more and the map becomes overloaded.
    • Choropleths should use smaller geographies to better visualize the polygons.
    • Use the appropriate scale for the data story. (beware the Modifiable Areal Unit Problem)
    • Normalize all the values.
  • What Map Type should you use?
  • Choose the appropriate color palettes to help readers correctly interpret the information.
    • Use linear interpolation to emphasize outliers.
    • Use quantiles and non-linear groupings to reveal diversity in the middle ranges.
    • Use a continuous color palette to show nuances in the data unless the data story has a compelling reason to display discrete steps.
  • Use cartograms to remove biases due to map area or map projection at the cost of the map becoming more abstract.

Tables

  • Tables make sense when readers want to look up a specific row of data that is highly relevant to them.
  • Use tables when there are no visual patterns or spatial data.

Rules of Thumb

  • Make column headers stand out above data.
  • Use light shading to separate rows or columns.
  • Left align text and right align numbers for easier reading.
  • Avoid repetition by placing labels only on the first row.
  • Group and sort data to highlight meaningful patterns.
  • Use color to highlight key items or outliers.
  • Place independent variables at the column headers, and dependent variables on the side of each row
  • Percentages go vertically downward not horizontally.

Data Visualization as Interpretative

  • Visualizations are wrong if they misstate the evidence or violate design rules for good visualization.
  • Visualizations are misleading if they follow good design rules, but unreasonably hide or twist the appearance of relevant data.
  • Visualizations are truthful if they both follow the design rules, and also show accurate data.

Lying with Charts

  • Exaggerate change by truncating the baseline and zooming in.
  • Diminish change by lengthening the vertical axis or warping its aspect ratio.
  • Add more data and a dual vertical axis to mislead the reader.

Lying with Maps

  • Changing the color ranges and scales can create the wrong impression about the data.

Lying with the Data Itself

  • Be aware of the biases present within the data such as the following:
    • Sampling biases
    • Cognitive biases
    • Algorithmic biases
    • Intergroup biases
    • Map Area bias - the tendency to focus on large regions than smaller ones.
    • Projection bias - biases due to how a map is projected.
    • Disputed Territory bias - the tendency for map providers to display different view of the world.
    • Map exclusion bias - the tendency to fail to represent people or land by omitting them or aggregating them as part of another territory.

Tools

  • OpenRefine - automate cleanup of messy tabular data.
  • Tabula - extract tables from text-based PDFs.
  • DataWrapper - offers sophisticated charts.
  • Tableau Public -data visualization tool that also allows one to make dashboards.
  • Geojson.io - for creating GeoJSON files for geospatial data.

Links

Footnotes

  1. Alberto Cairo, The Truthful Art: Data, Charts, and Maps for Communication_(Pearson Education, 2016),