by Thomas Mejtoft, PhD
People stop and stare at me. We just walk on by. We just keep on dreaming.
Blondie*
It is easy to use data to deceive your readers by making it hard to understand your results. However, don’t we all want our work to be read and have an impact. It is important to make visualizations of data to be understandable. It is important to think about how we visualize the data that we have.
This short note gives some suggestions on how to visualize different data to increase the understandability of it.
These guidelines are loosely based on the following sources:
Publication manual of the American Psychological Association by APA
Tables and Figures by APA Style
ACM Web Accessibility Statement by ACM
Resolution and Size by IEEE Author Center
Day, R., & Gastel, B. (2012). How to write and publish a scientific paper (7th ed.). Cambridge University Press
Hope you find this material useful!
If you are looking for other resources around writing to use, here is a page with resources and material. If you are looking for how to use and cite figures, screenshots, code etc. please refer to the following documents: How to use and cite figures from other sources, How to cite screenshots, References to secondary sources and review articles, Writing references to personal communication, Writing references to programming code, and Citing content created by generative AI. Regarding quotes and visualizing data, please read the following documents: Master quotes in writing and How to visualize your data in an understandable way.
Visualizing data in figures
Going from data to a useful visualization take practice. There are many simple tips and tricks when creating your own figures that should be followed for higher readability.
The most important things when creating figures
Think about the readability and accessibility when creating the figures:
- Make figures that is readable “stand alone”. Use labels, scales, units, legends etc. in your figures to make people understand what the different parts in the figure are about. A figure (with the figure text) should be readable without any further explanation. This means that you should avoid uncommon abbreviations etc. in the figure.
- Use fonts that are of a readable size. When the figure is mounted in the text, the font should be readable and comparable to other text in the writing. In general, 8–10pt is usually a minimum to make the text in a figure readable. In general, the smallest font in a figure should not be smaller than the smallest font in the rest of the text.
- Avoid distracting elements. Keep the figures simple and without elements that are not needed for the figures to be understood. This can, e.g., be to keep purely esthetic elements to a minimum.
- Design for high accessibility. Avoid colors or grayscale in figures and other elements that might create accessibility issues with your figures. Use colors that are distinguishable for those with color vision deficiency or combine e.g., colors with patterns for increased accessibility of a figure. This is also important for those printing your documents in black and white. For highest accessibility, also provide good descriptions in the text of what is depicted in the figures. Furthermore, metadata, so-called “alternative text”, should be provided to a figure for making the digital document accessible.
- Be honest about the sample size. Add the number of respondents (n) to each figure to be transparent with the sample size. This can be done in either the figure or in the figure text by adding
n
, e.g.,(n=142)
. - Create high quality graphics. Creating vector graphic figures is best practice and makes both the reading experience on screen and in print of the best quality. However, most graphics are bitmap (e.g., jpg, png, etc.) and using these formats, the size of figures should be so the print resolution will become >300dpi for color and grayscale figures and >600dpi for black and white line art. This also creates graphics with high readability on screen. Using vector graphic (e.g., eps etc.) is preferred but fairly uncommon as output.
- Give appropriate credit. If you have based you figure on someone else’s material, give a reference or other appropriate credit in the figure of figure text. Here you can read about how to cite and use of figures from other sources.
Be consequent — Don’t use numbers to deceive the readers
Be consequent in how your data is presented. Don’t mix units in-between data that should be comparable. You can (most often use any unit, but data that the reader should be able to compare should have the same unit in your writings. It is kind of obvious to not mix units that needs a conversion, such as °C
and °F
or kWh
and Joules
. Nevertheless, avoid even mixing units that just need simple conversion, such as decimal
, text
, fraction
, and percentages
.
Good example:
Out of the respondents, 50% preferred email, 33% text messages, and 17% personal communication.
The easiest way is not to vary the units but to simple denote all data in the same unit. In the example above, percentage is used to illustrate all of the different means of communication.
Bad example:
(mixing units of comparable data)
Half of the respondents preferred email, 1/3 text messages, and 17% personal communication.
This way of visualizing the data makes it hard for most readers to compare due to the mix of units used. In this example, text, fraction, and percentage is used to illustrate the different means of communication in the study.
Think about how your choice of illustration of the data affects how it is perceived.
A table, a list or a figure stands out compared to writing results in the text. Results of similar importance should be illustrated in a similar way. Don’t use e.g., a figure to make other results “disappear” in the writings. Using different ways of visualizing data is not only confusing, but it also makes it hard to compare and different people might perceive the data differently due to their ability to read the text and analyze the figures. But in general people then to look at figures, table and lists and regard that information as more important.
Bad example (mixing visualizations to unintentional highlight some of the data):
The female respondents preferred sms (35%) as their primary communication channels (Figure 1) and among male respondents, email was preferred (50%) other means of communication stated were phone call (1%), sms (10%), WhatsApp (7%), Messenger (2%), postal mail (30%).
In the example above, the data from the female respondents are visualized in Figure 1 and the most common means of communication of females is mentioned in the text, while the data of the male respondents are only stated in the text. Combining these visualization methods make the female data stand out compared to the male data and be more visual. If this was not intended all data should be either in-text or in figures and described in a similar way.
Use the same scale on the y-axis for comparable data.
Using automatic functions when creating different diagrams usually adjusts the y-axis to the highest number. This cases a situation when different figures contain similar number to be hard to compare and one column might appear more important than it is.
Bad example:
(different scale)
The female respondents preferred sms (35%) as their primary communication channels (Figure 2) while male respondents preferred email (50%) (Figure 3).
In the example above, different scales are used on the y-axis in Figure 2 and Figure 3, even though the data is supposed to be comparable. In this example 35% in Figure 2 seems higher than 50% in Figure 3 due to Figure 2 having a scale on the y-axis (max 40%) that differ from Figure 3 (max 60%). In this case, not having the same y-axis, especially if the figures are next to each other, is a way to deceive most readers.
Good example:
(same scale in comparable figures)
The female respondents preferred sms (35%) as their primary communication channels (Figure 4) while male respondents preferred email (50%) (Figure 5).
In the example above, the same scale of the y-axis in both Figure 4 and Figure 5 (max 60%) is used. In this case the height of the columns can be visually comparable between figures be the readers.
Good example:
(data in same figure and same scale)
The female respondents preferred sms (35%) as their primary communication channels while male respondents preferred email (50%) (Figure 6).
In the example above, one figure (Figure 6) is used to visualize the data from both series (female and male respondents). This makes the data easy to compare for the reader and it both increases the understandability and is space efficient in the publication. If it is possible to combine data that should be comparable in same figure, it should be done.
Do not let you scale be bigger than your theoretical maximum
Having the maximum of the y-axis higher than the maximum in the study might confuse (or trick) the reader. The best practice might be to have e.g., the max of the y-axis at 100% (if not over 100% can be obtained in the study) or if a number is used on the y-axis not having a higher number than the maximum that can be obtained as the maximum on the y-axis.
Bad example:
(scale out-of-bounds)
Out of all respondents, 3% did not know if they were smart or not.
In the example above, the maximum of the y-axis of Figure 7 is 120% even though no column can ever exceed 100%. Consequently, the maximum of the y-axis should be set to 100% or lower (in this example 100% is the most suitable maximum for the y-axis).
Start your scale at zero
If there is a natural starting point or a zero in your data, start your y-axis from this zero point (e.g., 0%). This is especially important if when longitudinal differences are to be illustrated so an increase or drop do not show bigger than it is.
Bad example:
(scale blown up)
The number of people defining themselves as "pet lovers" have declined since 2000.
In the example above, the decrease is only 1% over the first 10 years and 2% over the next 10 years (Figure 8). However, in the figure is seems way larger due to the y-axis being zoomed in on 57% to 62%. If this scale is not used intentional to point out differences, the scale on the y-axis should be larger to show that it is “almost no change”.
It is important to make the figure show significant differences and not insignificant differences.
Adjust the axis to make differences show
If the data is comparable only with a figure, adjust axis to make differences in the data to show in the visualization. Do not use a to large scale to hide differences or a to small scale to show insignificant differences. However, avoid making the mistake to not start at the natural zero-point.
If the data should be comparable between figures, it is more important to create comparable scales and, consequently, the adjustment is done according to the data in all comparable figures.
Good example:
Hippos, Kangaroos and Ants are significantly more preferred as pets than cats and dogs (Figure 9).
In the example above, the y-axis of Figure 9 is adjusted to the range of the data. This makes it easy to see significant differences and understand the data.
Bad example: — however, not incorrect
Hippos, Kangaroos and Ants are significantly more preferred as pets than cats and dogs (Figure 10).
In the example above, the y-axis of Figure 10 is from 0% to 100%, which makes it hard to see the significant differences between columns and there is general waste of space.
Put units on the axis
Do not forget to put units on the axis. This makes the data hard to understand and might end up in misunderstandings. Even if the units are stated in the figure text, units on the axis are preferred.
Good example:
(units on y-axis)
More energy is used during the weekdays compared to weekends (Figure 11).
In the example above, there is a label (kWh) on the y-axis of Figure 11. Hence, the reader can understand the figure and interpret the energy consumption.
Bad example:
(no units)
More energy is used during the weekdays compared to weekends (Figure 12).
In the example above, there is no label on the y-axis of Figure 12. Consequently, the readers have no idea what the energy consumption is or what is shown in the figure.
Visualizing different types of data in the most understandable way
Visualizing simple data — is a figure even necessary?
The first starting point is to decide if a figure of the data is necessary. In some cases, we have data that is low in complexity, e.g., the data of female vs. male or yes vs. no or similar data. In this case it is possible to visualize the data in three different ways — in the text, as a table, and a figure (see the example below). Looking strictly at the cost of space vs. value, the first (text only) is the most efficient way to visualize the data. Please note that a table or a figure also need some text in the document to refer to the table or the figure.
A general rule of thumb is that simple data might not need a complex visualization, but can rather be written in-text. However, other ways of visualizing data are in no way incorrect, but might be inefficient in terms of space and might not increase the understanding of the data.
Alternative 1 (text only):
Among the respondents in the survey, 57% were female and 43% were male.
Writing the data in-text when there is a simple data set, with few options, is good practice since it is space efficient and easy to understand.
Alternative 2 (table):
Table 1. Distribution of male and female respondents.
Respondents | |
Female | 57% |
Male | 43% |
A table (Table 1) gives a good overview of the data and is an alternative in-between in-text only and a figure. However, it might be seen as waste of space compared to using only text.
Alternative 3 (figure): — Might be seen as a waste of space, but not wrong.
In the example above with the sex of the respondents (Figure 13), a figure gives a good overview of the data, but it might be seen as waste of space.
Visualizing non-ordered data
Unordered data is a set that don’t have any defines order and, consequently, where the nearby alternatives have no connection to the current alternative. This can be to e.g., make people choose between different options. This type of data can be presented either in the text, as a table or as a figure. The most space-efficient way might be text only, but table or figures are preferred due to understandability. Please note that a table or a figure also need some text in the document to make a cross-reference to the table or figure.
Alternative 1 (text only):
According to the results, the preferred way of communicating with other students are phone call (2%), sms (23%), email (35%), WhatsApp (8%), Messenger (12%), postal mail (20%).
Alternative 2 (table):
Table 2. Preferred way of communication among respondents.
Preferred way of communication | |
Phone call | 2% |
Sms | 23% |
35% | |
8% | |
Messenger | 12% |
Postal mail | 20% |
The table in this case (Table 2) is fairly easy to understand. It is, however, rather space consuming compared to using only text and do not provide the same visual understanding as a figure.
Alternative 3 (figure):
A pie chart illustrates the distribution nicely (Figure 14). Please note that the colors might be problematic for someone with color vision deficiency.
A bar chart (Figure 15) give the reader a good overview of the data and it has high accessibility.
Visualizing Likert-like data
Data that is ordered from e.g., low to high, 1 to 5, etc. (e.g., Likert scales), could be either visualized by using a table, list, or as a figure (stacked bar chart or histogram). A stacked bar chart or histogram should be used to increase the understanding of the data since the alternatives selected by the respondents in the survey are ordered. Using e.g., a circle diagram makes it hard to see the distribution in reference to the order of the answers and should not be used. Writing the data in text only usually makes it hard to understand. Please note that a table or a figure also need some text in the document to refer to the table or figure.
Alternative 1 (figure): — preferred
A stacked bar chart (Figure 16) is space efficient and show the data in a way that is easy to read.
There are many other different ways of creating understandable stacked bar charts that works very well.
A histogram (Figure 17) clearly shows the distribution of the data.
A stacked bar chart can have a fixed zero-baseline to easily show results from different questions and makes it easy to compare (Figure 18).
A stacked bar chart with a fixed zero-baseline (Figure 18) makes it easy to compare Likert scale answers between different questions and make understandable visualizations. If you have several comparable questions, it is also a space saver.
Using e.g., a circle diagram for Likert scale style answers makes it hard for the reader to see the distribution in relation to the order of the answers and should not be used.
Alternative 2 (table):
Table 3. Opinion on smartness by respondents.
Do you think you are smart? | |
Strongly disagree | 4% |
Disagree | 12% |
Neutral | 24% |
Agree | 32% |
Strongly agree | 28% |
In this case, a table (Table 3) is fairly easy to understand. It is rather space consuming and do not provide the same visual understanding as a figure.
*Quote from the song Dreaming (Stein & Harry, 1979).
Stein, C., & Harry, D. (1979). Dreaming. Dreaming [Single]. Chrysalis.
Cite this page as (APA style):
Mejtoft, T. (2024). How to visualize data in an understandable way. Notes on (scientific) writing, no 7. Retrieved from https://www.mejtoft.se/thomas/education/academic-writing/how-to-visualize-your-data/
If you have any questions, praise, critique, or feed-forward, please do not hesitate to contact me.
(First published by Thomas Mejtoft: 2023-11-21; Last updated: 2024-11-25)