Limits...
Decision-making in research tasks with sequential testing.

Pfeiffer T, Rand DG, Dreber A - PLoS ONE (2009)

Bottom Line: In these scenarios, research tasks are solved sequentially, i.e. subsequent tests can be chosen depending on previous results.We investigate simple sequential testing and scenarios where only a selected subset of results can be published and used for future rounds of test choice.Our results may help optimize existing procedures used in the practice of scientific research and provide guidance for the development of novel forms of scholarly communication.

View Article: PubMed Central - PubMed

Affiliation: Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts, United States of America. pfeiffer@fas.harvard.edu

ABSTRACT

Background: In a recent controversial essay, published by JPA Ioannidis in PLoS Medicine, it has been argued that in some research fields, most of the published findings are false. Based on theoretical reasoning it can be shown that small effect sizes, error-prone tests, low priors of the tested hypotheses and biases in the evaluation and publication of research findings increase the fraction of false positives. These findings raise concerns about the reliability of research. However, they are based on a very simple scenario of scientific research, where single tests are used to evaluate independent hypotheses.

Methodology/principal findings: In this study, we present computer simulations and experimental approaches for analyzing more realistic scenarios. In these scenarios, research tasks are solved sequentially, i.e. subsequent tests can be chosen depending on previous results. We investigate simple sequential testing and scenarios where only a selected subset of results can be published and used for future rounds of test choice. Results from computer simulations indicate that for the tasks analyzed in this study, the fraction of false among the positive findings declines over several rounds of testing if the most informative tests are performed. Our experiments show that human subjects frequently perform the most informative tests, leading to a decline of false positives as expected from the simulations.

Conclusions/significance: For the research tasks studied here, findings tend to become more reliable over time. We also find that the performance in those experimental settings where not all performed tests could be published turned out to be surprisingly inefficient. Our results may help optimize existing procedures used in the practice of scientific research and provide guidance for the development of novel forms of scholarly communication.

Show MeSH

Related in: MedlinePlus

Simulation Results.(A) Evolution of knowledge. The odds for the true hypothesis increase at the slowest rate for random test choice (SIM-R), at intermediate rate for the scenario where the most informative test is chosen and published in each round (SIM-1), and at the fastest rate for the scenario where two tests are chosen in each round and the most informative test result is published (SIM-2). This illustrates that informative test choice leads to better performance than random test choice (SIM-1>SIM-R), and that there is an advantage of performing two tests even if only one test can be published (SIM-2>SIM-1). (B) Fraction of false among the positive results. For random test choice, the fraction of false positives stays constant at a level of 0.26. For both scenarios with informative test choice (SIM-1 and SIM-2), the fraction of false among the positives declines over the rounds. (C) Fraction of false among the negative results. For random test choice, the fraction of false among the negative results remains constant at a level of 0.15. For SIM-1 the fraction of false negatives tends to increase over the rounds, while for SIM-2 the fraction fluctuates around the level for random test choice. (D) Frequency of tests that support the true hypothesis. For random test choice, the chance of picking a test that is expected to support the true hypothesis (i.e. AB and BC for sequence ABC) is 1/3, because each hypothesis is supported by two of the six tests. Over the rounds, tests that support the true hypothesis tend to be chosen preferentially in the scenarios with informative test choice. This leads to a decrease of false among the positive findings. For scenario SIM-1, where all tests are published, this implies that there is an increase in the fraction of false negatives as shown panel C. For SIM-2, where results can be selected for publication, accumulating knowledge can be used to avoid the publication of false findings. The grey line shows the probability for a false finding to be published in SIM-2. The chance for a false finding to be published declines over the rounds.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2643008&req=5

pone-0004607-g002: Simulation Results.(A) Evolution of knowledge. The odds for the true hypothesis increase at the slowest rate for random test choice (SIM-R), at intermediate rate for the scenario where the most informative test is chosen and published in each round (SIM-1), and at the fastest rate for the scenario where two tests are chosen in each round and the most informative test result is published (SIM-2). This illustrates that informative test choice leads to better performance than random test choice (SIM-1>SIM-R), and that there is an advantage of performing two tests even if only one test can be published (SIM-2>SIM-1). (B) Fraction of false among the positive results. For random test choice, the fraction of false positives stays constant at a level of 0.26. For both scenarios with informative test choice (SIM-1 and SIM-2), the fraction of false among the positives declines over the rounds. (C) Fraction of false among the negative results. For random test choice, the fraction of false among the negative results remains constant at a level of 0.15. For SIM-1 the fraction of false negatives tends to increase over the rounds, while for SIM-2 the fraction fluctuates around the level for random test choice. (D) Frequency of tests that support the true hypothesis. For random test choice, the chance of picking a test that is expected to support the true hypothesis (i.e. AB and BC for sequence ABC) is 1/3, because each hypothesis is supported by two of the six tests. Over the rounds, tests that support the true hypothesis tend to be chosen preferentially in the scenarios with informative test choice. This leads to a decrease of false among the positive findings. For scenario SIM-1, where all tests are published, this implies that there is an increase in the fraction of false negatives as shown panel C. For SIM-2, where results can be selected for publication, accumulating knowledge can be used to avoid the publication of false findings. The grey line shows the probability for a false finding to be published in SIM-2. The chance for a false finding to be published declines over the rounds.

Mentions: For the more complex scenario of informative test choice (SIM-2), we assume that in each round two tests can be performed, but only one result can published, i.e. used in subsequent rounds. The two tests are selected independently of each other. First, for each test the expected informativity is calculated. Among the tests with the highest expected informativity, two are sampled randomly with replacement. This implies that if there is a single test that has the highest expected informativity, this test is performed twice. After the test results are obtained, the result with the highest informativity is published, while the other one is discarded. If both results are equally informative, one is chosen randomly. Details on the informativity of a result are given in the Methods section. An example simulation for this scenario is shown in Fig. 1B. For each of the three scenarios (SIM-R, SIM-1, SIM-2) we performed 10,000 simulations. Results are shown in Fig. 2.


Decision-making in research tasks with sequential testing.

Pfeiffer T, Rand DG, Dreber A - PLoS ONE (2009)

Simulation Results.(A) Evolution of knowledge. The odds for the true hypothesis increase at the slowest rate for random test choice (SIM-R), at intermediate rate for the scenario where the most informative test is chosen and published in each round (SIM-1), and at the fastest rate for the scenario where two tests are chosen in each round and the most informative test result is published (SIM-2). This illustrates that informative test choice leads to better performance than random test choice (SIM-1>SIM-R), and that there is an advantage of performing two tests even if only one test can be published (SIM-2>SIM-1). (B) Fraction of false among the positive results. For random test choice, the fraction of false positives stays constant at a level of 0.26. For both scenarios with informative test choice (SIM-1 and SIM-2), the fraction of false among the positives declines over the rounds. (C) Fraction of false among the negative results. For random test choice, the fraction of false among the negative results remains constant at a level of 0.15. For SIM-1 the fraction of false negatives tends to increase over the rounds, while for SIM-2 the fraction fluctuates around the level for random test choice. (D) Frequency of tests that support the true hypothesis. For random test choice, the chance of picking a test that is expected to support the true hypothesis (i.e. AB and BC for sequence ABC) is 1/3, because each hypothesis is supported by two of the six tests. Over the rounds, tests that support the true hypothesis tend to be chosen preferentially in the scenarios with informative test choice. This leads to a decrease of false among the positive findings. For scenario SIM-1, where all tests are published, this implies that there is an increase in the fraction of false negatives as shown panel C. For SIM-2, where results can be selected for publication, accumulating knowledge can be used to avoid the publication of false findings. The grey line shows the probability for a false finding to be published in SIM-2. The chance for a false finding to be published declines over the rounds.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2643008&req=5

pone-0004607-g002: Simulation Results.(A) Evolution of knowledge. The odds for the true hypothesis increase at the slowest rate for random test choice (SIM-R), at intermediate rate for the scenario where the most informative test is chosen and published in each round (SIM-1), and at the fastest rate for the scenario where two tests are chosen in each round and the most informative test result is published (SIM-2). This illustrates that informative test choice leads to better performance than random test choice (SIM-1>SIM-R), and that there is an advantage of performing two tests even if only one test can be published (SIM-2>SIM-1). (B) Fraction of false among the positive results. For random test choice, the fraction of false positives stays constant at a level of 0.26. For both scenarios with informative test choice (SIM-1 and SIM-2), the fraction of false among the positives declines over the rounds. (C) Fraction of false among the negative results. For random test choice, the fraction of false among the negative results remains constant at a level of 0.15. For SIM-1 the fraction of false negatives tends to increase over the rounds, while for SIM-2 the fraction fluctuates around the level for random test choice. (D) Frequency of tests that support the true hypothesis. For random test choice, the chance of picking a test that is expected to support the true hypothesis (i.e. AB and BC for sequence ABC) is 1/3, because each hypothesis is supported by two of the six tests. Over the rounds, tests that support the true hypothesis tend to be chosen preferentially in the scenarios with informative test choice. This leads to a decrease of false among the positive findings. For scenario SIM-1, where all tests are published, this implies that there is an increase in the fraction of false negatives as shown panel C. For SIM-2, where results can be selected for publication, accumulating knowledge can be used to avoid the publication of false findings. The grey line shows the probability for a false finding to be published in SIM-2. The chance for a false finding to be published declines over the rounds.
Mentions: For the more complex scenario of informative test choice (SIM-2), we assume that in each round two tests can be performed, but only one result can published, i.e. used in subsequent rounds. The two tests are selected independently of each other. First, for each test the expected informativity is calculated. Among the tests with the highest expected informativity, two are sampled randomly with replacement. This implies that if there is a single test that has the highest expected informativity, this test is performed twice. After the test results are obtained, the result with the highest informativity is published, while the other one is discarded. If both results are equally informative, one is chosen randomly. Details on the informativity of a result are given in the Methods section. An example simulation for this scenario is shown in Fig. 1B. For each of the three scenarios (SIM-R, SIM-1, SIM-2) we performed 10,000 simulations. Results are shown in Fig. 2.

Bottom Line: In these scenarios, research tasks are solved sequentially, i.e. subsequent tests can be chosen depending on previous results.We investigate simple sequential testing and scenarios where only a selected subset of results can be published and used for future rounds of test choice.Our results may help optimize existing procedures used in the practice of scientific research and provide guidance for the development of novel forms of scholarly communication.

View Article: PubMed Central - PubMed

Affiliation: Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts, United States of America. pfeiffer@fas.harvard.edu

ABSTRACT

Background: In a recent controversial essay, published by JPA Ioannidis in PLoS Medicine, it has been argued that in some research fields, most of the published findings are false. Based on theoretical reasoning it can be shown that small effect sizes, error-prone tests, low priors of the tested hypotheses and biases in the evaluation and publication of research findings increase the fraction of false positives. These findings raise concerns about the reliability of research. However, they are based on a very simple scenario of scientific research, where single tests are used to evaluate independent hypotheses.

Methodology/principal findings: In this study, we present computer simulations and experimental approaches for analyzing more realistic scenarios. In these scenarios, research tasks are solved sequentially, i.e. subsequent tests can be chosen depending on previous results. We investigate simple sequential testing and scenarios where only a selected subset of results can be published and used for future rounds of test choice. Results from computer simulations indicate that for the tasks analyzed in this study, the fraction of false among the positive findings declines over several rounds of testing if the most informative tests are performed. Our experiments show that human subjects frequently perform the most informative tests, leading to a decline of false positives as expected from the simulations.

Conclusions/significance: For the research tasks studied here, findings tend to become more reliable over time. We also find that the performance in those experimental settings where not all performed tests could be published turned out to be surprisingly inefficient. Our results may help optimize existing procedures used in the practice of scientific research and provide guidance for the development of novel forms of scholarly communication.

Show MeSH
Related in: MedlinePlus