Maxqda intercoder reliability

12/30/2023

It has become a shortcut to demonstrate quality of research and that data was interpreted without bias. These requests could come from supervisors, examiners or journal reviewers who have a quantitative background, and little understanding of qualitative methods and analysis. So why are people so desperate for these quantitative metrics? The most common source of queries on this topic are posted by students or academics who have been demanded to demonstrate their coding is robust, by showing a Kappa figure of 0.8 or higher.

Unless there is a very specific, tested and validated reason to do so, qualitative methods and data should not be analysed with quantitative tools. I personally feel that it is methodologically incorrect to try and quantify qualitative data or the process behind it. 2019 notes that “RR can be confusing because it merges a quantitative method, which has roots in positivism and objective discovery, with qualitative methods that favor an interpretivist view of knowledge”. Some qualitative software tools will give agreement figures (% or kappa) to two decimal places, and unless you have very large projects, this level of precision can be meaningless and misleading. And as always, the type and amount of data entered in to the test will affect the outcome and significance. It’s also important to realise that while the two different Kappa measurements have some control for chance and probability, they will be affected by sample size. So be aware that just because two coders interpreted text the same way, it doesn’t mean they were both correct (they were reliable, but inaccurate). It is possible, however, to hit the bull’s-eye purely by chance.” (Viera et al. If all our shots land together and we hit the bull’s-eye, we are accurate as well as precise. If all our shots land together, we have good precision (good reliability). “If we actually hit the bull’s-eye, we are accurate. The first question should be: What are you looking to test? Is it that multiple researchers created the same coded in a grounded theory approach? How often codes were used? How many of the highlights/quotes were coded in the same way? Each of these situations can be tested for IRR, but the latter is the most common.Īlso, note that that when we are looking for agreement, there is an important difference between precious and accuracy: These tests are very common in psychology where they are used for having multiple people give binary diagnostics (positive/negative diagnoses), or delivering standardised tests – both situations that are probably better suited to measures of Kappa than qualitative coding.īut before qualitative researchers use any method of interrater reliablity, they should understand what they are and how they work. However, they use different methods to calculate ratios (and account for chance), so should not be directly compared.Īll these are methods of calculating what is called ‘inter-rater reliability’ (IRR or RR) – how much raters agree about something. The basic difference is that Cohen’s Kappa is used between two coders, and Fleiss can be used between more than two. Cohen’s Kappa and Fleiss’s Kappa are two statistical tests often used in qualitative research to demonstrate a level of agreement. There are a couple of different ways to calculate and measure agreement, the simplest being percentages, ratios or joint-probability of agreement. Often, researchers will be asked to quantify the level of agreement between the different codes, looking at how often the coding agrees : that is usually that two (or more) coders have applied the same code to the same section of the data. This is essentially a triangulation process between different researchers interpretations and coding of qualitative data. But multiple coders can also check each other’s work, and use differences to spark a discussion about the best way to interpret complex qualitative data. Some would argue that this mitigates the subjectivity of a single coder/interpreter, producing a more valid and rigorous analysis (a very positivist interpretation).

However, some researchers aim for better accuracy and consistency by having multiple people code the data, and check that they are making the same interpretations. So to promote consistency, researchers often take a cyclical approach to coding. Often there many themes, rich and numerous sources, and difficult decisions to be made as to where sections of text fit. With complex data sets, and ‘wicked’ issues, there are times that a researcher coding qualitative data will not consistently code different sources to the same themes or codes in the same way. In qualitative analysis it’s sometimes difficult to agree even with yourself.

0 Comments

Maxqda intercoder reliability

Leave a Reply.

Author

Archives

Categories