Limits...
pvsR: An Open Source Interface to Big Data on the American Political Sphere.

Matter U, Stutzer A - PLoS ONE (2015)

Bottom Line: The free and open code is expected to substantially reduce the cost of research with PVS' new big public data in a vast variety of possible applications.We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces.Similar OSIs are recommended for other big public databases.

View Article: PubMed Central - PubMed

Affiliation: University of Basel/Faculty of Business and Economics, Peter-Merian-Weg 6, 4002 Basel, Switzerland.

ABSTRACT
Digital data from the political sphere is abundant, omnipresent, and more and more directly accessible through the Internet. Project Vote Smart (PVS) is a prominent example of this big public data and covers various aspects of U.S. politics in astonishing detail. Despite the vast potential of PVS' data for political science, economics, and sociology, it is hardly used in empirical research. The systematic compilation of semi-structured data can be complicated and time consuming as the data format is not designed for conventional scientific research. This paper presents a new tool that makes the data easily accessible to a broad scientific community. We provide the software called pvsR as an add-on to the R programming environment for statistical computing. This open source interface (OSI) serves as a direct link between a statistical analysis and the large PVS database. The free and open code is expected to substantially reduce the cost of research with PVS' new big public data in a vast variety of possible applications. We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces. The validity of the library is documented based on an illustration involving female representation in local politics. In addition, pvsR facilitates the replication of research with PVS data at low costs, including the pre-processing of data. Similar OSIs are recommended for other big public databases.

No MeSH data available.


The current share of women in county legislative offices across the United States.Data sources: Own compilation based on Project Vote Smart using pvsR.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4488489&req=5

pone.0130501.g003: The current share of women in county legislative offices across the United States.Data sources: Own compilation based on Project Vote Smart using pvsR.

Mentions: The study of women in politics (in and outside the United States) is so far primarily focused on the national or the state level. However, the few studies analyzing gender differences in local politics show that politicians’ gender seems to matter a lot for local policy outcomes (see, e.g., [28] and [29]). Studies that focus on local politics in the United States often appear (at least in part) to be shaped by data availability issues by focusing, for example, only on mayors of rather big cities or on a relatively small sample based on a local survey (see, e.g., [30] and [31]). Classical data sources on women in U.S. politics cover data on federal and state offices over many years (see as a reference point the Center for American Women in Politics of the Eagelton Institute of Politics at the Rutgers University). Apart from data on city executives, however, data on local politicians is not instantly available from classical data sources. We show how the simple approach to instantly query data on politicians’ genders via pvsR can easily be extended to a thorough inquiry regarding female representation in county legislative offices across the United States. To the best of our knowledge, this is the first study to unveil the share of women in local U.S. politics to such a highly granular extent. To do so, we first gather the raw PVS data on all county officials with the high-level function getAllLocalOfficials(locality = counties) (see the next section for more details on high-level functions in pvsR.) and query all biographical data on all those officials via CandidateBio.getBio(). As not all biographical profiles reveal the officials’ gender, we match the officials’ first names with census data on the most common female and male first names in the United States ([32] for female first names and [33] for male first names). We code the gender of officials according to the appearance of his or her first name in the census data. Officials with a first name listed as being either a female or a male name are coded as male if their first name is more frequently used for men than for women, and vice versa. A comparison of the resulting categories with the true gender categories for the 3,344 cases with information on gender indicates that the classification based on the first name is correct in 98.4 percent of the cases, and thus highly accurate. Based on the classification of all the officials in county legislative offices across the United States, we then compute the share of women in such offices per county. The results are presented in Fig 3, (a) for all counties in the United States, (b) exemplary for the state of New York, and (c) exemplary for the state of Texas. Counties for which no data were available are indicated as grey areas (in some counties only the initials of the first names are provided, and a gender classification based on the first names is thus not feasible). A script with the original code and data for this analysis are presented in the Supporting Information (S1 Code and S1 Data).


pvsR: An Open Source Interface to Big Data on the American Political Sphere.

Matter U, Stutzer A - PLoS ONE (2015)

The current share of women in county legislative offices across the United States.Data sources: Own compilation based on Project Vote Smart using pvsR.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4488489&req=5

pone.0130501.g003: The current share of women in county legislative offices across the United States.Data sources: Own compilation based on Project Vote Smart using pvsR.
Mentions: The study of women in politics (in and outside the United States) is so far primarily focused on the national or the state level. However, the few studies analyzing gender differences in local politics show that politicians’ gender seems to matter a lot for local policy outcomes (see, e.g., [28] and [29]). Studies that focus on local politics in the United States often appear (at least in part) to be shaped by data availability issues by focusing, for example, only on mayors of rather big cities or on a relatively small sample based on a local survey (see, e.g., [30] and [31]). Classical data sources on women in U.S. politics cover data on federal and state offices over many years (see as a reference point the Center for American Women in Politics of the Eagelton Institute of Politics at the Rutgers University). Apart from data on city executives, however, data on local politicians is not instantly available from classical data sources. We show how the simple approach to instantly query data on politicians’ genders via pvsR can easily be extended to a thorough inquiry regarding female representation in county legislative offices across the United States. To the best of our knowledge, this is the first study to unveil the share of women in local U.S. politics to such a highly granular extent. To do so, we first gather the raw PVS data on all county officials with the high-level function getAllLocalOfficials(locality = counties) (see the next section for more details on high-level functions in pvsR.) and query all biographical data on all those officials via CandidateBio.getBio(). As not all biographical profiles reveal the officials’ gender, we match the officials’ first names with census data on the most common female and male first names in the United States ([32] for female first names and [33] for male first names). We code the gender of officials according to the appearance of his or her first name in the census data. Officials with a first name listed as being either a female or a male name are coded as male if their first name is more frequently used for men than for women, and vice versa. A comparison of the resulting categories with the true gender categories for the 3,344 cases with information on gender indicates that the classification based on the first name is correct in 98.4 percent of the cases, and thus highly accurate. Based on the classification of all the officials in county legislative offices across the United States, we then compute the share of women in such offices per county. The results are presented in Fig 3, (a) for all counties in the United States, (b) exemplary for the state of New York, and (c) exemplary for the state of Texas. Counties for which no data were available are indicated as grey areas (in some counties only the initials of the first names are provided, and a gender classification based on the first names is thus not feasible). A script with the original code and data for this analysis are presented in the Supporting Information (S1 Code and S1 Data).

Bottom Line: The free and open code is expected to substantially reduce the cost of research with PVS' new big public data in a vast variety of possible applications.We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces.Similar OSIs are recommended for other big public databases.

View Article: PubMed Central - PubMed

Affiliation: University of Basel/Faculty of Business and Economics, Peter-Merian-Weg 6, 4002 Basel, Switzerland.

ABSTRACT
Digital data from the political sphere is abundant, omnipresent, and more and more directly accessible through the Internet. Project Vote Smart (PVS) is a prominent example of this big public data and covers various aspects of U.S. politics in astonishing detail. Despite the vast potential of PVS' data for political science, economics, and sociology, it is hardly used in empirical research. The systematic compilation of semi-structured data can be complicated and time consuming as the data format is not designed for conventional scientific research. This paper presents a new tool that makes the data easily accessible to a broad scientific community. We provide the software called pvsR as an add-on to the R programming environment for statistical computing. This open source interface (OSI) serves as a direct link between a statistical analysis and the large PVS database. The free and open code is expected to substantially reduce the cost of research with PVS' new big public data in a vast variety of possible applications. We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces. The validity of the library is documented based on an illustration involving female representation in local politics. In addition, pvsR facilitates the replication of research with PVS data at low costs, including the pre-processing of data. Similar OSIs are recommended for other big public databases.

No MeSH data available.