What test reliability is
Reliability is the degree to which a test measures consistently: under similar conditions it gives a similar result, not driven by chance.
Reliability is the degree to which a test measures consistently: if it were repeated under similar conditions, it would give a similar result, not one driven by chance. It’s one of the two key properties of any serious assessment, along with validity. It answers a simple question: is this result stable, or would it change if the person took the test again?
Reliability: one concept, one name
Reliability is about consistency over repeated measurement. It’s worth keeping it distinct from accuracy or correctness: a result can be highly stable —giving the same outcome again and again— without that stability telling you whether the test measures the right thing. Stability is one property; relevance is another.
Why it matters: the margin of error
No psychological test measures without error. Tiredness, the day’s focus, a misread question or simple chance introduce variation. Reliability estimates how much of a result is stable signal and how much is noise. The more reliable a test is, the more confidence we can have that the score reflects something real and not just the moment.
The relationship with validity
Reliability and validity are not the same thing, and it’s best not to confuse them:
| Reliability | Validity | |
|---|---|---|
| Question | Does it measure consistently? | Does it measure the right thing? |
| It is | Consistency | Relevance |
| Relationship | Necessary for validity | Requires reliability, but also something more |
A test can be consistent and still invalid, but it cannot be valid if it isn’t consistent. That’s why reliability is the first requirement, not the last. It’s complemented by what test validity is.
See how we build tests designed to deliver stable signal.
See the science behind itIn short
Reliability is the consistency with which a test measures: if repeated, it would give a similar result. It estimates how much of a score is stable signal and how much is noise, which is why results are best read as ranges, not as exact figures. It’s distinct from validity —consistency versus relevance— and it’s a prerequisite for it. In Kokoro, the tests in the library are designed to deliver stable, comparable signal; you can see the approach in the science behind it.