What makes a measure valid




















Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. For example, Figure 5. For example, there are ways to split a set of 10 items into two sets of five. Many behavioural measures involve significant judgment on the part of an observer or a rater. Inter-rater reliability is the extent to which different observers are consistent in their judgments.

Validity is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to.

There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. Here we consider three basic kinds: face validity, content validity, and criterion validity. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities.

So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity. Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 MMPI-2 measures many personality characteristics and disorders by having people decide whether each of over different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure.

For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation leading to nervous feelings and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that he or she thinks positive thoughts about exercising, feels good about exercising, and actually exercises.

Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. Barbara Ferrell, PhD. Validity Validity is the extent to which the construct measures what it says it is measuring.

How do I determine if my measurements are reliable and valid? List 3 things that might have introduced random error into Ms. Jones blood pressure reading. Some possibilities are: person taking the reading time of day instrument might not be reliable. What might you do to attempt to help establish reliability of Ms.

Jones BP measurement? Take her blood pressure again. Are there any non-random sources of error possible in your assessment of Ms. Example: Inter-rater reliability might be employed when different judges are evaluating the degree to which art portfolios meet certain standards. Inter-rater reliability is especially useful when judgments can be considered relatively subjective. Thus, the use of this type of reliability would probably be more likely when evaluating artwork as opposed to math problems.

Validity refers to how well a test measures what it is purported to measure. Why is it necessary? While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 5lbs to your true weight. It is not a valid measure of your weight.

Types of Validity. Example : If a measure of art appreciation is created all of the items should be related to the different components and types of art. Bivariate correlational analysis for convergent and discriminant validity. An alternative and more common statistical method used to demonstrate convergent and discriminant validity is exploratory factor analysis.

This is a data reduction technique which aggregates a given set of items to a smaller set of factors based on the bivariate correlation structure discussed above using a statistical technique called principal components analysis.

These factors should ideally correspond to the underling theoretical constructs that we are trying to measure. The general norm for factor extraction is that each extracted factor should have an eigenvalue greater than 1. The extracted factors can then be rotated using orthogonal or oblique rotation techniques, depending on whether the underlying constructs are expected to be relatively uncorrelated or correlated, to generate factor weights that can be used to aggregate the individual items of each construct into a composite measure.

For adequate convergent validity, it is expected that items belonging to a common construct should exhibit factor loadings of 0. A more sophisticated technique for evaluating convergent and discriminant validity is the multi-trait multi-method MTMM approach. This technique requires measuring each construct trait using two or more different methods e.

This is an onerous and relatively less popular approach, and is therefore not discussed here. Criterion-related validity can also be assessed based on whether a given measure relate well with a current or future criterion, which are respectively called concurrent and predictive validity. Predictive validity is the degree to which a measure successfully predicts a future outcome that it is theoretically expected to predict.

For instance, can standardized test scores e. Concurrent validity examines how well one measure relates to other concrete criterion that is presumed to occur simultaneously. These scores should be related concurrently because they are both tests of mathematics.

Unlike convergent and discriminant validity, concurrent and predictive validity is frequently ignored in empirical social science research. Exploratory factor analysis for convergent and discriminant validity. Now that we know the different kinds of reliability and validity, let us try to synthesize our understanding of reliability and validity in a mathematical manner using classical test theory , also called true score theory. This is a psychometric theory that examines how measurement works, what it measures, and what it does not measure.

This theory postulates that every observation has a true score T that can be observed accurately if there were no errors in measurement. However, the presence of measurement errors E results in a deviation of the observed score X from the true score as follows:. Across a set of observed scores, the variance of observed and true scores can be related using a similar equation:.

The goal of psychometric analysis is to estimate and minimize if possible the error variance var E , so that the observed score X is a good measure of the true score T. Measurement errors can be of two types: random error and systematic error. Random error is the error that can be attributed to a set of unknown and uncontrollable external factors that randomly influence some observations but not others.

As an example, during the time of measurement, some respondents may be in a nicer mood than others, which may influence how they respond to the measurement items. For instance, respondents in a nicer mood may respond more positively to constructs like self-esteem, satisfaction, and happiness than those who are in a poor mood. However, it is not possible to anticipate which subject is in what type of mood or control for the effect of mood in research studies.

Likewise, at an organizational level, if we are measuring firm performance, regulatory or environmental changes may affect the performance of some firms in an observed sample but not others. Systematic error is an error that is introduced by factors that systematically affect all observations of a construct across an entire sample in a systematic manner. In our previous example of firm performance, since the recent financial crisis impacted the performance of financial firms disproportionately more than any other type of firms such as manufacturing or service firms, if our sample consisted only of financial firms, we may expect a systematic reduction in performance of all firms in our sample due to the financial crisis.

Unlike random error, which may be positive negative, or zero, across observation in a sample, systematic errors tends to be consistently positive or negative across the entire sample. Since an observed score may include both random and systematic errors, our true score equation can be modified as:. The statistical impact of these errors is that random error adds variability e. What does random and systematic error imply for measurement procedures?

By increasing variability in observations, random error reduces the reliability of measurement. In contrast, by shifting the central tendency measure, systematic error reduces the validity of measurement.

Validity concerns are far more serious problems in measurement than reliability concerns, because an invalid measure is probably measuring a different construct than what we intended, and hence validity problems cast serious doubts on findings derived from statistical analysis. Note that reliability is a ratio or a fraction that captures how close the true score is relative to the observed score.

Hence, reliability can be expressed as:.



0コメント

  • 1000 / 1000