Limits...
Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

Fariss CJ, Linder FJ, Jones ZM, Crabtree CD, Biek MA, Ross AS, Kaur T, Tsai M - PLoS ONE (2015)

Bottom Line: To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus.We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time.We close with a discussion of our plans for dataset maintenance, updating, and availability.

View Article: PubMed Central - PubMed

Affiliation: Department of Political Science, Pennsylvania State University, University Park, PA, 16802, United States of America.

ABSTRACT
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.

No MeSH data available.


The number of human rights documents by year from the four publication sources that we have collected.The figure shows the year-by-year distribution of reports by Amnesty International (grey), Human Rights Watch (orange), the Lawyers Committee for Human Rights (blue), and the United States Department of State (green). The increasing number of reports each year coincides to both expanding coverage in the early years of the series and the increasing number of countries that enter the international state system. Some of the older documents are not easily found. We will continue to search for missing documents and eventually plan to expand this corpus to a large number of human rights publications.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4587949&req=5

pone.0138935.g001: The number of human rights documents by year from the four publication sources that we have collected.The figure shows the year-by-year distribution of reports by Amnesty International (grey), Human Rights Watch (orange), the Lawyers Committee for Human Rights (blue), and the United States Department of State (green). The increasing number of reports each year coincides to both expanding coverage in the early years of the series and the increasing number of countries that enter the international state system. Some of the older documents are not easily found. We will continue to search for missing documents and eventually plan to expand this corpus to a large number of human rights publications.

Mentions: The corpus includes the raw text of over 14,000 human rights country reports published from four sources: Amnesty International (1974–2012), Human Rights Watch (1989–2014), the Lawyers Committee for Human Rights (1982–1996), and the United States Department of State (1977–2013). Fig 1 presents a coverage plot that indicates the temporal scope of the reports within the corpus. Fig 2 presents the average number of words per report, over time. Though all four reporting agencies share similar goals—the cataloging of human rights abuses throughout the world—each uses somewhat different methods and serves a different audience [2]. Taken together, these sources provide an increasingly detailed and accurate picture about the condition of human rights throughout the globe [1–5].


Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

Fariss CJ, Linder FJ, Jones ZM, Crabtree CD, Biek MA, Ross AS, Kaur T, Tsai M - PLoS ONE (2015)

The number of human rights documents by year from the four publication sources that we have collected.The figure shows the year-by-year distribution of reports by Amnesty International (grey), Human Rights Watch (orange), the Lawyers Committee for Human Rights (blue), and the United States Department of State (green). The increasing number of reports each year coincides to both expanding coverage in the early years of the series and the increasing number of countries that enter the international state system. Some of the older documents are not easily found. We will continue to search for missing documents and eventually plan to expand this corpus to a large number of human rights publications.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4587949&req=5

pone.0138935.g001: The number of human rights documents by year from the four publication sources that we have collected.The figure shows the year-by-year distribution of reports by Amnesty International (grey), Human Rights Watch (orange), the Lawyers Committee for Human Rights (blue), and the United States Department of State (green). The increasing number of reports each year coincides to both expanding coverage in the early years of the series and the increasing number of countries that enter the international state system. Some of the older documents are not easily found. We will continue to search for missing documents and eventually plan to expand this corpus to a large number of human rights publications.
Mentions: The corpus includes the raw text of over 14,000 human rights country reports published from four sources: Amnesty International (1974–2012), Human Rights Watch (1989–2014), the Lawyers Committee for Human Rights (1982–1996), and the United States Department of State (1977–2013). Fig 1 presents a coverage plot that indicates the temporal scope of the reports within the corpus. Fig 2 presents the average number of words per report, over time. Though all four reporting agencies share similar goals—the cataloging of human rights abuses throughout the world—each uses somewhat different methods and serves a different audience [2]. Taken together, these sources provide an increasingly detailed and accurate picture about the condition of human rights throughout the globe [1–5].

Bottom Line: To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus.We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time.We close with a discussion of our plans for dataset maintenance, updating, and availability.

View Article: PubMed Central - PubMed

Affiliation: Department of Political Science, Pennsylvania State University, University Park, PA, 16802, United States of America.

ABSTRACT
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.

No MeSH data available.