- Central Tendency - Which Measure is Best?
- Written by tutor Yvonne W.
- How to Choose the Right Charts for Your Infographic
- Election Data Academy
- Criterion 1: Choice driven by intended use of measurement
- Choosing the best option to summarize data
- Describing & Summarizing Data
- Criterion 2: Choice driven by characteristics of the data collected
- List of the Best 30 Summarizing Tools in 2019
- Rating Scales
- Free infographic Maker
- Representing Data Practice Quiz
- How to Write a Summary
- 7 Important Ways to Summarise Data in R
Central Tendency - Which Measure is Best?
Written by tutor Yvonne W.
To understand a set of data, it is helpful to organize it and provide summary descriptions of the set. Central tendency measures are used to describe the middle value of a data set. There are (at least) three different ways to describe the middle value: mean, median and mode.
Which method you use depends on the characteristics of the data set and how you plan to use the information.
How to Choose the Right Charts for Your Infographic
Let us explore this a bit more. Before we get started, please refer to Table 1 for a review of the definitions for mean, median and mode.
Table 1 : Review of Central Tendency Measures
Criterion 1: Choice driven by intended use of measurement
Let's consider a candy shop that sells mints, chocolate and taffy. In this case, what you want to know about your sales will influence the measure you select to describe your data.
For example, you would use the mode if you wanted to know the most popularly sold item. The mode is generally used to describe the most common or most popular item in the data set.
Choosing the best option to summarize data
You would also choose this measure if you wanted to know the maximum number of customers waiting for service per day or the day of the month where you had the most product become stale.
The mean would be selected if you wanted to know how much money your shop collected per customer this week. The mean is used when you want to know the average value in a set of values. This number represents the value that will produce the lowest amount of error from all other values in the data set each time you take the measure, run the test or ask the question.
Other examples for using the mean include the average number of boxes of chocolate sold each year around Valentine’s Day or the usual number of hours an employee works in the month of December.
Describing & Summarizing Data
If you wanted to describe how much money a typical customer spent at the candy shop you would use the median. Median is chosen when you want to be sure that the number represents the midpoint in a list of values.
This measure is used often in survey research. Let’s say that you run a customer satisfaction survey to determine how successful you are in generating repeat business. You already know that customers will return to your store if they rated your service at 4 or more out of 10 for total satisfaction.
If you surveyed 10 customers, you would want to be sure that at least 50% of your customers gave you a rating of 4 or higher. To be sure that you have an accurate picture of customer’s opinions, you would want to know the median satisfaction rate of the customers surveyed.
Criterion 2: Choice driven by characteristics of the data collected
Characteristics of the data being measured will sometimes drive your choice of measurement.
List of the Best 30 Summarizing Tools in 2019
These characteristics are summarized in Table 2 below.
Table 2 : Data Characteristics
Mode is best used with categorical (nominal) or discrete data.
It is difficult to use it with continuous data because often a single value is not repeated exactly. There often are one or two distinct favorites in categorical or discrete data.
Mode has a drawback in that it may not be a measure of centrality if the most common item is away from the rest of the data set.
Measure is most often chosen when the data is continuous and symmetrical (normal).
Free infographic Maker
If the data has outliers or is skewed, then the mean would paint a skewed view of centrality. Mean should be used carefully with ordinal data. For instance, the mean placement of all the runners in an eight-person race will always be 4.5 and as such really does not deliver meaningful information.
Mean is best used with interval data or ratio data. It is chosen when it is important to reduce the amount of error in a prediction.
Median is especially useful with skewed distributions as it draws the line right in the middle of your data set.
It provides a better measure of centrality as 50% of your data is above the median. Median can be used with interval or ratio data. Median is usually the preferred measurement to use with ordinal data.
Representing Data Practice Quiz
Problem 1: Use the data below to answer the following questions.
During the past week the candy shop sold 25 boxes of chocolate, 18 boxes of mints and 40 boxes of taffy. There were five customers—Customer A spent $93, Customer B spent $152, Customer C spent $219, Customer D spent $108 and Customer E spent $123.
How to Write a Summary
Problem 3: Use the data below to answer the following questions.
There are 3 bags containing seven checks each.
Bag A has $5, $10, $20, $20, $50, $50, $125
Bag B has $5, $10, $20, $20, $75, $75, $75
Bag C has $10, $10, $10, $50, $50, $50, $100
Choose the best bag using the mean, median or mode to make your choice.
Critical Thinking: You can draw one bill from the bag of your choice to keep.
Which bag would you choose to draw from?
Depends on which measure you used. If you chose to use the mode, the answer is Bag B. This bag has more $75 checks than any other check in that bag.
I am more likely to draw $75 than any other check value. For bag A, I would have an equal chance to draw a $20 or a $50 bill, both whose value is lower than the most frequently occurring bill in bag B.
7 Important Ways to Summarise Data in R
Likewise, for bag C, I would most likely draw a $10 or a $50 bill.
If you chose to use the median, the answer is Bag C because you would have a 50% chance of drawing a bill of $50 or more. Whereas, for bags A and B, you would have a 50% chance of drawing a bill of $20 or more. Bag C has more bills greater than $20 than either of the other two bags.