Monday, February 25, 2013

The Promises and Pitfalls of Pay for Performance

There's been a great deal of discussion about health care payment reform. Prominent in this discussion is "Pay for Performance" (P4P). The idea is simple -- rather than pay providers based on volume of care (fee-for-service) or number of patients (capitation), tie their payment to a measure(s) of performance. There has been substantial concern about the quality of care delivered to patients, so pay for performance appears to make a lot of sense. Don't we want to reward providers for good performance? Shouldn't this encourage them to provide high quality care?

Unfortunately, this is not as straightforward as it might appear. While the idea of pay for performance is very appealing and intuitive, there are some major pitfalls in implementation. First, let's consider what we want to accomplish. We want to set up a system for paying providers that aligns their incentives with what's best for patients, taking into account the benefits and the costs of treatment. In practice P4P systems are set up by payers to align providers' incentives with their objectives. One question that emerges immediately is whether the payer's objectives are the right ones. If payers do not have the best interest of patients at heart a perfectly designed and effective P4P scheme may work extremely well, but may not be to the benefit of patients. This may be true regardless of whether the payer is public or private.

Aside from the issue of the payer's motivation, there are a number of design issues that are critical for the effectiveness of P4P. This is truly a situation where "the devil is in the details."

A number of issues revolve around how performance is measured. First, "you get what you pay for." Providers will respond to the incentive, but this may come at the cost of less of those things which are not measured and therefore not rewarded. For example, this means that aspects of quality that are hard to measure may suffer. If P4P is at the individual provider level, then informal consults or other aspects of being a "team player" may decline. Second, if the performance measure can be manipulated, then P4P may actually generate perverse incentives. For example, suppose performance is measured by patient outcomes incompletely adjusted for patient severity (as is certainly the case). Then providers may attempt to see only patients who are easy to treat and avoid difficult cases. Third, if the performance measure isn't very accurate then chance will play a large role in measured performance. In this case, provider effort won't play a large role in determining payment, so providers will have little incentive to try hard. In addition, rewards can be perceived as unfair -- some providers who aren't so good will receive rewards and some good doctors won't. How accurate the performance measure is depends (among other things) on the size of a provider's practice. A larger practice with a larger patient population will have more statistically reliable measures of the performance metric. Unfortunately, statistical reliability may be hard to achieve in practice. An article by Nyweide et al. finds that "Relatively few primary care physician practices are large enough to reliably measure 10% relative differences in common measures of quality and cost performance among fee-for-service Medicare patients."

The figure below illustrates the problem with chance and fairness. (Note: This figure and the one below are borrowed from Tom McGuire. His original presentation at the Third International Jerusalem Conference on Health Policy, which I highly recommend, is here.) The "bell curve" to the left represents the performance distribution of "not so good" doctors. Some do better than others on the performance measure just by pure chance. The curve to the right represents the performance distribution of "good doctors." They clearly do better as a group than the "not so good" doctors, but purely by chance some of them will do worse than the "not so good" group. Given a target, a proportion A of the good doctors will end up falling below the target and not getting rewarded. Similarly, a proportion B of the not so good doctors will end up being rewarded. First, if the proportion of good doctors who will fall below the target just by chance is high enough, even good doctors won't bother trying. Second, given that a large proportion (in this example) of good doctors will not be rewarded and some not so good ones will, the system is likely to be perceived as unfair.

Another important factor is the amount of money at stake. If the amount at risk isn't large enough then it won't get providers' attention -- the incentive will be too weak (Ashish Jha has a nice blog post on this, and some other aspects of P4P, here). On the other hand, if the amount at risk is too high, then providers can be placed in the position of bearing too much risk -- a bad event can put their practice under water. This is not only undesirable for providers, it can have undesired consequences -- providers will have strong incentives to avoid difficult patients or to "teach to the test," i.e., distort treatment decisions to ensure meeting measured performance goals. In addition, payers that impose a large amount of risk on providers will have to pay more to have them see their patients and take on that risk.

One way to mitigate accuracy problems in performance measures and risk is to use P4P for groups of providers instead of individuals. Performance measures for groups will have better statistical properties than for individuals and groups of providers can spread risk (pdf). Unfortunately, there's no free lunch. Using P4P for groups weakens individual incentives -- the well known "free rider problem." The larger the number of providers in the group, the weaker is the incentive for individuals (pdf). The weakening effect on incentives can be substantial.

Third, most P4P programs use targets -- there's a measured performance goal and payments depend on reaching that target. Using targets in P4P presents a number of issues. First, how well P4P will work, or if at all, depends critically on where the target is set. Set the target too high and no one will be able to reach it, so no one will try. Set the target too low and everyone will be able to reach it, so no one will have to try. As a consequence, P4P schemes which use targets are very fragile -- how well they will work depends critically on where the target is set. This requires a lot of information on the part of the payer to get this right, especially because where the target should be set will change over time and also across providers. How much providers differ in their responsiveness or abilities to reach the target is also critical.

For example, consider the figure below. Each angled line represents a different provider, e.g. a primary care physician. The horizontal axis is each provider's immunization rate for their patients and the vertical axis is their marginal cost of improving the immunization rates for their patient populations. The lines slope up, indicating that the cost of getting more patients immunized increases with the immunization rate -- it's pretty easy to get the first patients immunized, they're aware and compliant, but getting the last few patients immunized can be difficult. A fixed target for immunization is set, e.g., 75%, and providers receive a performance payment if they are at or over the target. Now consider four different providers. Provider A is so far below the target that she will never reach it no matter how hard she works, so P4P gives her no incentive for performance. Provider D is so far beyond the target that she will reach it no matter what she does. She also has no incentive for performance. It's only Providers C and D who have any incentive to respond to this P4P scheme -- the rest of the providers will ignore it.

Last, P4P with a target can be wasteful. In the figure above, only Providers B and C respond to the incentive. Nonetheless, they plus Provider D and all of the providers to the right of Provider D will earn a reward, even though only B and C responded to the P4P incentive. This is clearly wasteful.The effect of P4P is small relative to the cost. The extent to which this is true depends on how much providers differ, and where the target is set. For example, in the figure above if all providers were like B or C, then P4P using the target in the figure would work quite well. If the target were set substantially above or below B or C, however, then P4P would likely fail.

In sum, incentives matter, but the problems with P4P are substantial enough that simply using high powered pay for performance schemes may not be a practical or desirable way to try to improve quality or lower costs. Pay for performance has potential, but it has to be used carefully to avoid its pitfalls. It's important to realize that addressing health care quality and costs requires multiple tools and provider pay is merely one of them.

No comments:

Post a Comment