Limits...
An unbiased index to quantify participant ’ s phenotypic contribution to an open-access cohort

View Article: PubMed Central - PubMed

ABSTRACT

The Personal Genome Project (PGP) is an effort to enroll many participants to create an open-access repository of genome, health and trait data for research. However, PGP participants are not enrolled for studying any specific traits and participants choose the phenotypes to disclose. To measure the extent and willingness and to encourage and guide participants to contribute phenotypes, we developed an algorithm to score and rank the phenotypes and participants of the PGP. The scoring algorithm calculates the participation index (P-index) for every participant, where 0 indicates no reported phenotypes and 100 indicate complete phenotype reporting. We calculated the P-index for all 5,015 participants in the PGP and they ranged from 0 to 96.7. We found that participants mainly have either high scores (P-index > 90, 29.5%) or low scores (P-index < 10, 57.8%). While, there are significantly more males than female participants (1,793 versus 1,271), females tend to have on average higher P-indexes (P = 0.015). We also reported the P-indexes of participants based on demographics and states like Missouri and Massachusetts have better P-indexes than states like Utah and Minnesota. The P-index can therefore be used as an unbiased way to measure and rank participant’s phenotypic contribution towards the PGP.

No MeSH data available.


The P-index based on state.Only states with 30 or more participants were included. (A) Box-plot of P-index of participants from each state. (B) P-index dot-plot of participants from each state.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384003&req=5

f4: The P-index based on state.Only states with 30 or more participants were included. (A) Box-plot of P-index of participants from each state. (B) P-index dot-plot of participants from each state.

Mentions: We next explored the scores of participants based on geographic location. 2,780 participants reported the states in which they reside in (Table S9). We separated the participants into their respective states and filtered for states that had at least 30 participants, which resulted in 26 states. We then calculated the median P-index as well as the percentage of participants in these states that have P-indexes > 90. Overall, Missouri and Massachusetts performed the best, where the median P-indexes are well over 90 and greater than 60% of the participants has P-indexes > 90 (Fig. 4, Table S10). The state with the lowest median P-index is Minnesota with a median P-index of 8.17. South Carolina has the least number of participants with P-indexes > 90; only 29% of the participants (9 out of 31) had P-indexes > 90 (Fig. 4, Table S10). We also analyzed the P-indexes of participants stratified by zip code. While 2,556 participants provided zip codes (Table S2), only 2,538 were valid to be linked to actual longitude and latitude coordinates (Table S11). We clustered these 2,538 participants into 354 distinct clusters and plotted their median P-indexes (see Materials and Methods). We observed a similar trend as the analysis by state, where the hotspots with participants having high P-indexes are in Massachusetts, Missouri and California (Fig. 5).


An unbiased index to quantify participant ’ s phenotypic contribution to an open-access cohort
The P-index based on state.Only states with 30 or more participants were included. (A) Box-plot of P-index of participants from each state. (B) P-index dot-plot of participants from each state.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384003&req=5

f4: The P-index based on state.Only states with 30 or more participants were included. (A) Box-plot of P-index of participants from each state. (B) P-index dot-plot of participants from each state.
Mentions: We next explored the scores of participants based on geographic location. 2,780 participants reported the states in which they reside in (Table S9). We separated the participants into their respective states and filtered for states that had at least 30 participants, which resulted in 26 states. We then calculated the median P-index as well as the percentage of participants in these states that have P-indexes > 90. Overall, Missouri and Massachusetts performed the best, where the median P-indexes are well over 90 and greater than 60% of the participants has P-indexes > 90 (Fig. 4, Table S10). The state with the lowest median P-index is Minnesota with a median P-index of 8.17. South Carolina has the least number of participants with P-indexes > 90; only 29% of the participants (9 out of 31) had P-indexes > 90 (Fig. 4, Table S10). We also analyzed the P-indexes of participants stratified by zip code. While 2,556 participants provided zip codes (Table S2), only 2,538 were valid to be linked to actual longitude and latitude coordinates (Table S11). We clustered these 2,538 participants into 354 distinct clusters and plotted their median P-indexes (see Materials and Methods). We observed a similar trend as the analysis by state, where the hotspots with participants having high P-indexes are in Massachusetts, Missouri and California (Fig. 5).

View Article: PubMed Central - PubMed

ABSTRACT

The Personal Genome Project (PGP) is an effort to enroll many participants to create an open-access repository of genome, health and trait data for research. However, PGP participants are not enrolled for studying any specific traits and participants choose the phenotypes to disclose. To measure the extent and willingness and to encourage and guide participants to contribute phenotypes, we developed an algorithm to score and rank the phenotypes and participants of the PGP. The scoring algorithm calculates the participation index (P-index) for every participant, where 0 indicates no reported phenotypes and 100 indicate complete phenotype reporting. We calculated the P-index for all 5,015 participants in the PGP and they ranged from 0 to 96.7. We found that participants mainly have either high scores (P-index > 90, 29.5%) or low scores (P-index < 10, 57.8%). While, there are significantly more males than female participants (1,793 versus 1,271), females tend to have on average higher P-indexes (P = 0.015). We also reported the P-indexes of participants based on demographics and states like Missouri and Massachusetts have better P-indexes than states like Utah and Minnesota. The P-index can therefore be used as an unbiased way to measure and rank participant’s phenotypic contribution towards the PGP.

No MeSH data available.