Limits...
True Randomness from Big Data

View Article: PubMed Central - PubMed

ABSTRACT

Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.

No MeSH data available.


High-level overview of the extraction method.(a) The core of the proposed method is the streaming extractor. The description corresponds to one big source, with parameter estimation and initial seed generation done only once. Repeating the seed generation is optional, and is not needed for practical purposes. (b) The output of the proposed method is validated by standard statistical tests (NIST). On the ideal (theoretical) uniform distribution the P-values are uniformly distributed. We plot the histogram for 1,512,000 P-values proving a high-level indication about the uniformly extracted bits, i.e., well-concentrated frequencies around the expected frequency μ = 0.01. This graph provides a visual overview (averaging), whereas detailed statistics are in the Supplementary Information.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036032&req=5

f1: High-level overview of the extraction method.(a) The core of the proposed method is the streaming extractor. The description corresponds to one big source, with parameter estimation and initial seed generation done only once. Repeating the seed generation is optional, and is not needed for practical purposes. (b) The output of the proposed method is validated by standard statistical tests (NIST). On the ideal (theoretical) uniform distribution the P-values are uniformly distributed. We plot the histogram for 1,512,000 P-values proving a high-level indication about the uniformly extracted bits, i.e., well-concentrated frequencies around the expected frequency μ = 0.01. This graph provides a visual overview (averaging), whereas detailed statistics are in the Supplementary Information.

Mentions: We propose and validate a new empirical method for true randomness extraction from big sources. This method consists of a novel extractor and empirical methods to both estimate the min-entropy and generate the initial random seed. Figure 1 depicts a high-level view of the complete extaction method. This is the first complete general extraction method, not only for big sources but for every statistical source.


True Randomness from Big Data
High-level overview of the extraction method.(a) The core of the proposed method is the streaming extractor. The description corresponds to one big source, with parameter estimation and initial seed generation done only once. Repeating the seed generation is optional, and is not needed for practical purposes. (b) The output of the proposed method is validated by standard statistical tests (NIST). On the ideal (theoretical) uniform distribution the P-values are uniformly distributed. We plot the histogram for 1,512,000 P-values proving a high-level indication about the uniformly extracted bits, i.e., well-concentrated frequencies around the expected frequency μ = 0.01. This graph provides a visual overview (averaging), whereas detailed statistics are in the Supplementary Information.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036032&req=5

f1: High-level overview of the extraction method.(a) The core of the proposed method is the streaming extractor. The description corresponds to one big source, with parameter estimation and initial seed generation done only once. Repeating the seed generation is optional, and is not needed for practical purposes. (b) The output of the proposed method is validated by standard statistical tests (NIST). On the ideal (theoretical) uniform distribution the P-values are uniformly distributed. We plot the histogram for 1,512,000 P-values proving a high-level indication about the uniformly extracted bits, i.e., well-concentrated frequencies around the expected frequency μ = 0.01. This graph provides a visual overview (averaging), whereas detailed statistics are in the Supplementary Information.
Mentions: We propose and validate a new empirical method for true randomness extraction from big sources. This method consists of a novel extractor and empirical methods to both estimate the min-entropy and generate the initial random seed. Figure 1 depicts a high-level view of the complete extaction method. This is the first complete general extraction method, not only for big sources but for every statistical source.

View Article: PubMed Central - PubMed

ABSTRACT

Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.

No MeSH data available.