The Science

The science behind our personality questionnaires.

Page contents:

Brief History of Personality Theory and Psychometric testing

The origins of type theory, factor analysis and psychometric testing

The systematic study of human personality dates to the beginning of human civilisation, with some of the earliest philosophers outlining their theories on character and temperament thousands of years ago. However, since the advent of psychology (and particularly psychometrics) as an empirical science, our understanding of personality has grown considerably, as has our ability to reliably and accurately measure it. Although certain theories of personality are more strongly supported by the research than others, “the science” behind human personality is highly developed, and academic consensus is widely supportive.

Perhaps the most famous of humanity's early attempts to systematise human personality was the work of Hippocrates with his proto-psychological theory of the four temperaments. This theory held that four temperaments, underpinned by four bodily fluids known as “humors”. It was believed that mood, temperament, and emotion were the result of imbalances of blood, yellow bile, black bile, and phlegm within the body. For example, people with an excess of blood would be described as “Sanguine”, causing them to become extraverted and energetic. An excess of black bile however, would cause someone to become “Melancholic”, resulting in an introverted or neurotic disposition. Although these theories have long been discredited in favor of more empirical models of personality, they represent early attempts to systematize human personality, providing a framework for which more evidence-based practice can be applied.

Perhaps the most famous of humanity's early attempts to systematise human personality was the work of Hippocrates.

Following the work of Francis Galton, the science of psychometrics began to formalize, allowing psychologists to measure psychological constructs using the questionnaire method. The questionnaire method was further extended by early psychologists including Pearson, Thurstone, and Thorndike, propelling psychometrics into a robust quantitative field. In particular, the creation of factor-analysis by Pearson has proven to be a major boon to future personality researchers, and has become a major component of psychometric R&D in both the academic and commercial settings. Their work provided a framework for which human personality could be measured scientifically, enabling subsequent researchers to develop empirically supported personality questionnaires and accompanying theoretical models.

Jungian Type-based Models of Personality

Popular type-based model with 16 types based on the work by Carl Jung

Type-based models of personality posit that people belong to distinct categories that qualitatively describe their personality, rather than scoring quantitatively on a continuous scale. For example, a person could be described as being an “Introvert” or an “Extravert”, rather than merely scoring 8/12 on a measure of extraversion.

This represents a major conceptional distinction between type and trait-based (continous scale) models of personality. Type-based models are perhaps best represented by Jungian models of personality, as used by Myers-Briggs Type Indicator assessments. These assessments posit that each person has a specific personality type, which is itself an amalgamation of four distinct dichotomies i.e. extraversion vs introversion, sensing vs intuition, thinking vs feeling, and judgment vs perception.

Type-based theorists posit a number of practical advantages to this approach over a continuous trait-based approach. Firstly, trait-based assessments may implicitly overestimate the precision of scores, and thus a person's position on that continuum. Even when scales meet the minimum requirements for reliability (Taber, 2018), a person's “true score” in classical test theory is always obfuscated by measurement error (Crocker & Algina, 1986), and thus could deviate significantly from their actual observed score. Because of this, type-based theorists argue that assigning numbers to quantify a person's level of, say, extraversion, is somewhat misleading, and that classifications would be more appropriate. This is especially true when representing information to a lay audience, who are often simply interested in knowing whether they are an introvert or an extravert, and are not interested in their Z-score on an extraversion scale.

face reading open book

An advantage of type-based models of personality over continous trait-based models is that trait-based models may imply an overestimation of accuracy.

Trait-based theorists in response highlight that, at particularly high levels of reliability, the standard errors of measurement can be very low, and thus scores can be quite precise (Crocker & Algina, 1986). Similarly, research suggests that most people tend to display fairly average scores overall, in line with the normal distribution (Cowles, 2005). As a result, dichotomising extraversion into either “Extravert” or “Introvert” can also be misleading, as most people tend to be neutral. This is a particularly salient criticism of binary type models, as statistically speaking, few people are actually true introverts or extraverts, and instead most could be described as ambiverts (Georgiev, Christov & Philipova, 2014). Lastly, the lack of continuous scoring makes practical applications difficult. For example, type-based assessments cannot be used in employee selection, as candidates cannot be rank-ordered by scores, and thus it would not be possible to distinguish between candidates within personality types.

Figure 1: Concept of dichotomous scales (type-based model of personality)

dichotomous scales

Although academics remain divided over the Myers-Briggs model on theoretical grounds, the evidence does suggest that the MBTI® assessment broadly measures the same psychological constructs as measures of the Big Five (Furnham, Moutafi & Crump, 2003). Research also shows support for the criterion-related validity of the MBTI® assessments, predicting certain key real-world life outcomes such as promotion prospects (Barrett, 1991; Furnham & Crump, 2015). Lastly, other research shows the MBTI® meets psychometric standards for reliability, and displays acceptable levels of internal consistency and test-retest reliability (Schaubhut, Herk & Thompson, 2009). Therefore, theoretical and philosophical differences aside, the MBTI® assessment in particular does show favorable psychometric properties both in terms of validity and reliability, providing empirical support for the model.

However, these standards of validity and reliability are more applicable to trait-based models, especially those that employed factor-analytic approaches during their initial R&D. For type-based models, alternative forms of validity and reliability may be more in keeping with their philosophical underpinnings, and could be more salient. For example, the concept of experiential validity, which asks the question “does the person taking the assessment experience the whole process (including feedback) as personally valuable?” (Moyle & Hackston, 2018) which could be more useful. This definition of validity is more in line with the model's original Jungian and psychodynamic underpinnings than the statistical definitions of validity posited by trait-based theorists.

Regardless of the controversy and differing opinions, the Myers-Briggs model and associated assessments are among the most commonly used, and rank among the most popular personality assessments on the market.

Enneagram of Personality

Used in personal development, a type-based model based on nine types

The Enneagram of personality represents an alternative approach to personality, differing significantly from both the Big Five and MBTI®. Rather than merely providing a descriptive account of personality, the Enneagram was designed with therapeutic counselling and personal development in mind. The model itself draws heavily from Islamic, Judea-Christian, and Greek Philosophy, positing the existence of nine distinct personality types, which can be arranged graphically into a nine sided figure called an enneagram (Wagner & Walker, 1983).

Each personality type (or enneatype) within the enneagram has a numerical designation i.e. Type 1, Type 2, Type 3 etc, but is also commonly accompanied by a qualitative title i.e. the reformer, the helper etc. Each of the 9 enneagram types is unique, displaying a distinct behavioral style which distinguishes it from other enneatypes within the model. However, in addition to their primary type, many people are said to have a secondary type, known as a “Wing”, and thus a person could be described as a Type Two with a Three Wing (2w3). Wings are always the adjacent enneatypes to the primary type when displayed in the enneagram figure, and thus a Type Two can only have a One-Wing or a Three-Wing.

Figure 2: Enneagram of personality model


enneagram model


What distinguishes the Enneagram of personality from other models is its prescriptive nature, with practical recommendation for personality development inbuilt into the model. Each personality type has levels of health, determining whether a person of a specific enneatype is healthy, average, or unhealthy. Naturally, personality configurations predispose people to certain development needs, personal struggles, or even mental health issues (Lamers, Westerhof, Kovács & Bohlmeijer, 2012), and the enneagram tries to account for this. Therefore, identifying a person's enneatype should help to generate a standard set of recommendations which would otherwise require considerable coaching or counselling to identify.

Although little empirical research has been conducted on the Enneagram of personality thus far, there is some evidence to support its validity and reliability. The Enneagram Personality Inventory (EPI) has shown sufficient test-retest reliability and internal consistency, as well as stability among enneatypes (Wagner, 1981). The EPI shows concurrent validity with the MBTI®, suggesting that the EPI measures similar underlying personality constructs, and thus displays construct validity (Wagner & Walker, 1983). In addition, a study involving twins showed that enneatypes are more commonly shared among dizygotic (identical) twins than monozygotic twins (non-identical), suggesting that enneatypes have a partially genetic component (Brooks, 1998).

However, not all empirical research has been favorable, and the overall lack of research on the enneagram makes it difficult to make broad validity generations. A systematic review of the literature surrounding the enneagram shows that factor-analytic approaches reveal fewer than nine broad factors, and that no work has used clustering techniques to identify enneatypes (Hook, Hall, Davis, Van Tongeren & Conner, 2021). However, this merely suggests that enneatypes are not underpinned by singular factors and that existing questionnaires did not employ factor-analysis during initial construction. More research is required to properly appraise the enneagram of personality on theoretical grounds, as the current research focuses mainly on psychometric properties of its associated personality questionnaires.

Overall, the evidence is insufficient to fully appraise the Enneagram of personality, but available research does highlight some evidence supporting its use in personal development settings. However, as with MBTI® and other type-based models, the research does not support their use in high stakes decision-making processes, indeed the proponents of the respective models themselves do not recommend this.

Future direction

Further research and technological advancements

Until recently, the questionnaire method was the only possible method of assessing individual differences in psychology, and thus all major models of personality employed the questionnaire method. However, as technological advancement progresses, alternative approaches are being developed, and the future holds many promising innovations in the field of psychometric testing. Equally, due to the highly personal nature of human character and temperament, many of these innovations are likely to receive controversy, as they raise important ethical questions around privacy and individuality.

Social media has revolutionised how human beings communicate, interact, market products, and manage their careers. Through social media, people inevitably consume and produce content which is highly specific to them, highlighting their interests, values, objectives, and goals. As a result, a person's social media profiles and engagements are (at least partially) a reflection of their personality. Researchers at Cambridge University have found a way to generate full personality profiles based on peoples’ social media data, without the need to complete questionnaires (Marengo & Montag, 2020). Although this saves individuals considerable time, it does raise important questions around privacy and consent. This technology could, theoretically, be utilised by corporations to make high-stakes decisions for people based on their personality profile, without their knowledge or consent.

Similarly, a recent innovation in the field of personality measurement is through gamification, using games to measure behavioral constructs. This comes in two forms:

  • 1) Games which were explicitly designed to measure personality, and
  • 2) Games which were designed for another purpose (leisure, training etc), but are able to measure personality indirectly.

The former represents a more engaging and enjoyable way to measure personality, raising no ethical or moral dilemmas, but the second has potential for abuse. Game developers could, theoretically, capture key elements of a person's temperament, without their knowledge or consent. Naturally, by agreeing to play a game for personal enjoyment, this does not imply consent to be psychologically assessed, or to have that data used in high-stakes decision-making.

Lastly, perhaps the most controversial of the personality measurement innovations would be through genetic testing. Following the completion of the human genome project, genetic analysis has become significantly cheaper and more effective (Niedringhaus, Milanova, Kerby, Snyder & Barron, 2011). This could eventually allow psychometricians to identify certain genes which are associated with key behavioral traits, allowing for genetic-based personality measurement. This would mean that individuals could have their personality assessed objectively, without the need for subjective / self-report assessments. Although an exciting prospect, it has the greatest potential to become draconian, and if misused, could result in a feared genetic caste system, whereby preferential treatment is given to those with “good genes”, and discrimination to those with “bad genes”.

Time will tell whether any or all of these innovations in psychometric testing take root, and it is the responsibility of society as a whole to decide whether or not they should be considered ethical, moral, or even legal.

Conclusion and Practical Implications

Considerations on the use of psychometric testing

Overall, the science of psychometric testing has a long lineage dating back well over a hundred years, and represents one of the most empirically supported fields within psychology.

As with any field, certain theoretical models are better supported than others, and the individual tools used to measure those underlying constructs will vary in quality. Certain models of personality have received near universal acclaim among academics, and are accompanied by highly valid, reliable, and fair assessments. Other models of personality are more questionable, requiring significantly more research to properly support their claims regarding human personality. Although personality theory and the questionnaire method themselves are extremely well supported by the academic literature, specific assessments must be judged in isolation, and the existence of poorly designed assessments does not discredit the questionnaire method, or personality psychology as a whole.

Therefore, when deciding whether or not to utilise personality assessments for recruitment, selection, personal development, or purely for individual interest, always consider the quality, and thus likely utility of the assessment in question, to ensure that the information gained from that assessment is valid, useful, and free from unfair bias. However, you can rest assured that the concept of personality testing itself using the questionnaire method is empirically sound and well supported by the science.


Academic references

Barrett, L. A. (1991). Relationship of Observable Teaching Effectiveness Behaviors to MBTI Personality Types.

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: a meta-analysis. Personnel psychology, 44(1), 1-26.

Brooks, D. (1998). Are personality traits inherited? South African Journal of Science, 94(1), 9-11.

Cowles, M. (2005). Statistics in psychology: An historical perspective. Psychology Press.

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston, 6277 Sea Harbor Drive, Orlando, FL 32887.

Georgiev, S. Y., Christov, C. V., & Philipova, D. T. (2014). Ambiversion as independent personality characteristic. Act. Nerv. Super. Rediviva, 56(3-4), 65-72.

Gutiérrez, J. L. G., Jiménez, B. M., Hernández, E. G., & Pcn, C. (2005). Personality and subjective well-being: Big Five correlates and demographic variables. Personality and individual differences, 38(7), 1561-1569.

Hook, J. N., Hall, T. W., Davis, D. E., Van Tongeren, D. R., & Conner, M. (2021). The Enneagram: A systematic review of the literature and directions for future research. Journal of Clinical Psychology, 77(4), 865-883.

Marengo, D., & Montag, C. (2020). Digital phenotyping of Big Five personality via facebook data mining: a meta-analysis. Digital Psychology, 1(1), 52-64.

Marston, W. M. (1928). Emotions of normal people. Routledge.

Moyle, P., & Hackston, J. (2018). Personality assessment for employee development: Ivory tower or real world?. Journal of personality assessment, 100(5), 507-517.

Niedringhaus, T. P., Milanova, D., Kerby, M. B., Snyder, M. P., & Barron, A. E. (2011). Landscape of next-generation sequencing technologies. Analytical chemistry, 83(12), 4327-4341.

O’Connor, M. C., & Paunonen, S. V. (2007). Big Five personality predictors of post-secondary academic performance. Personality and Individual differences, 43(5), 971-990.

Ones, D. S., & Anderson, N. (2002). Gender and ethnic group differences on personality scales in selection: Some British data. Journal of Occupational and Organizational Psychology, 75(3), 255-276.

Rothmann, S., & Coetzer, E. P. (2003). The Big Five personality dimensions and job performance. SA Journal of Industrial Psychology, 29(1), 68-74.

Schaubhut, N. A., Herk, N. A., & Thompson, R. C. (2009). MBTI® Form M manual supplement. Retrieved May, 15, 2010.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological bulletin, 124(2), 262.

Taber, K. S. (2018). The use of Cronbach's alpha when developing and reporting research instruments in science education. Research in science education, 48(6), 1273-1296.

Wagner, J. P. (1981). A descriptive, reliability, and validity study of the enneagram personality typology. (Doctoral dissertation, Loyola University of Chicago, 1981). Dissertation Abstracts International, 41, 4664.

Wagner, J. P., & Walker, R. E. (1983). Reliability and validity study of a sufi personality typology: The enneagram. Journal of Clinical Psychology, 39(5), 712-717.


grey avatar

Ronit Vishwanathan, MSc

Birkbeck, University of London, Occupational Psychology

Ronit Vishwanathan, MSc in Psychology from Birkbeck. Ronit has written extensively on personality type and scale theory and has helped develop our personality questionnaires. according to best-practice and international standards.