Different Types of Data

Created on Wed, 04/19/2017 - 03:55
Last updated on Sat, 07/22/2017 - 19:44

Generally speaking, data can be classified as qualitative or quantitative, though the distinction is illusory (qualitative data can be represented numerically, and vice versa). Qualitative data contains categorical variables and quantitative data contains numerical variables. Categorical variables come in nominal or ordinal flavours, whereas numerical variables can be discrete or continuous. The type of data tends to determine the level of sophistication one can achieve with their statistical tests. 

This chapter answers parts from Section A(c) of the Primary Syllabus; "Describe the different types of data". It has no corresponding topic among the Fellowship exam revision chapters. Among the Primary papers, it is represented only by Viva 1 from the second paper of 2007 and Question 17 from the second paper of 2015. "Any reasonable classification was awarded marks", according to the examiners. Usually, examples are required in such questions, and the author has made some effort to offer some.

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

"Standard textbooks well describe this topic, often in their opening chapter" according to the primary examiner's answer to Question 17. They were clearly referring to Myles and Gin, where (at least in my 2000 edition) "Data Types" is the title of Chapter 1. Most of this summary was created using this chapter as the main source. An alternative resource is  "Types of data" by Derek Richards (2007). 

In summary, types of data can be rapidly classified in a nested list:

  • Qualitative data: described by a characteristic
    • Categorical (i.e. described as a category)
      • Nominal data: an unordered list of categories
  • Quantitative data: described by a numerical scale
    • Numerical (i.e. described as a number)
      • Ordinal (in a scale, or ordered by magnitide)
      • Discrete (where the data plugs into a limited range of values)
      • Continuous (where there is an infinite range of possible values)
        • Interval (no true zero value)
        • Ratio (there is a true zero value)

Qualitative vs. quantitative data

Qualitative data: defined by some characteristic. An example might be blood group or gender.

Quantitative data: measured on some numerical scale. An example might be heart rate or blood pressure.

Categorical vs numerical variables

Categorical variable: a variable can only have one value from a limited range of values. For example, blood group and gender are forms of categorical data. The values belong to some sort of category, on the basis of a qualitative property. Essentially, "categorical" is a synonym for "qualitative".

Numerical variable: when the variable takes some numerical value. An example might be heart rate or blood pressure.

Nominal vs ordinal data

Nominal data: the range of values is not ordered in any sense, but simply named (hence the nom). Again, blood groups, gender, etc. This is a form of categorical data.

Ordinal data: the range of values is ordered along a scale, e.g. disease staging (advanced, moderate, mild) or degree of pain (severe, moderate, mild, none). 

Discrete vs. continuous data

Discrete data: when the variable is restricted to specific defined values. For example, "male" or "female" are categorical discrete data values. Mortality (eg. 20 patients dead at 6 months) is an example of numerical discrete data values. There can be no 20.5 dead patients.  

Continuous  data: when the variable is unrestricted and can have any value from a potentially infinite range, eg. "blue" and "red" might be the categorical data range but the true value can be any subtle shade of purple. An example of numeric continuous data is weight - i.e. one does not have to be exactly 65 or 70 kg; one may easily be 67.5567kg.

Scales of Measurement

Nominal scale: only an identity; values assigned to variables are merely descriptive. An example is gender.

Ordinal scale: values have both an identity and a magnitude. A familiar ordinal scale is an exam which ranks you first, second or third. You know the rank, but you don't know by how much you failed. 

Interval scale: values have identity, magnitude, and equal intervals. An example is temperature (every degree Celsius is the same interval). 

Ratio scale: values have identity, magnitude, equal intervals, and a minimum value of zero. An example of this is weight.

Importance of data types

Why do we need to know this? Well: data types determine the sort of statistic tests which are applicable.

  • If your measurement scale is nominal or ordinal then you use non-parametric statistics
  • If you are using interval or ratio scales you use parametric statistics.

 

References

Lecture on types of data; by Keith G. Calkins

Richards, Derek. "Types of data." Evidence-based dentistry 8.2 (2007): 57-58.