Home

Basic statistics of Deal or no deal


Played the world over, Deal or no Deal is a simple game based on risk and reward, a blend of psychology and statistics, and it's a game from which we can learn a bit of statistics, taught in all first statistics courses.

To get you in the mood, here is an edited version of a game starring Olly who appears in the UK's 2009 X-Factor singing competition.

Descriptive statistics

The game begins with 22 numbers. Faced with data, it's a good idea to have a pictorial representation of the data which will give us an idea of any features of the data. In this case, a histogram is a good choice of plot.


In a histogram the data is split into groups and the graph shows the number of values within a group. What general points can you make from the histogram?

  • numbers bunch towards the left with smaller values
  • most values less than 80,000
  • Shape has a long right tail
In more formal terms, we can say the distribution is right skewed (aka positive skewed). This means that the contestant is immediately faced with an uphill struggle to win big (like over 40,000). There is a formula to calculate the degree of skewness in the histogram: using a stats package, the measure of skewness comes to +3.3, hence the term "positve skew". A skewness of 0 means the distribution is symmerical. A positive value tells us the distribution has a long right tail.

Measure of central tendency

A next question might be: "What value is 'typical' of the values in the game?" In stats we measure the 'typical' value in these basic ways: mean, median, and mode. Using a package like SPSS/PASW or Minitab we compute:

mean = 25,712gbp;   median = 875gbp;   modal group is the 1st group of tiny values;

What we can see is that mean  is greater than the median which is greater than the mode. This relationship between mean, median, and mode is true of right skewed distributions.

Now all the measures of the 'typical value' in the data are different, and they are very different. Which should we trust? Here, we would not use the mean. Why? Because the value 25,712 is affected by a few very high values (100k and 200k) which pushes up the measure. The mode is too low. It would seem the median is a better measure.

Say we accept the typical value on the board is 875gbp, so what? If we were to pick a box at random we would expect a figure of about 875gbp. Of course you won't get 875gbp because it is not a value on the board.

What we know the histogram and hence the measures of central tendency will change as the game is played and boxes are eliminated. If the measure of the typical value increases over the course of the game, the higher the value on the board.

Suppose the contestant gets lucky as the game goes on. A strong board will have a histogram with a negative/left skew with more higher numbers and lower ones.

Measure of dispersion

The main measures of the variability of the data are: variance and standard deviation (s.d). These measures are numbers that indicate the variability of the data about the mean. (So we need to calculate the mean before we can get the variance or s.d.). The "higher" the number the more variability there is in the data. The variability is high if the s.d. is large relative to the mean.

The game begins with a board with mean value 25,712gbp, and standard deviation 160gbp.

The board becomes more risky as the value of the standard deviation is below but close to the mean, or if it takes a value above the mean.

Discuss Deal or no Deal on the forum.