Limits...
True Randomness from Big Data

View Article: PubMed Central - PubMed

ABSTRACT

Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.

No MeSH data available.


The Random Re-Bucketing (RRB) extractor.The random seed of size polylog(n) is used only in Stages I & III. In Stage III the same local extractor h is used for the first γ fraction of blocks. The number of super-blocks b also depends on an error tolerance ε and the empirically estimated min-entropy rate κ. In the main body, we explain how to realize this description as an algorithm that uses two streams.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036032&req=5

f2: The Random Re-Bucketing (RRB) extractor.The random seed of size polylog(n) is used only in Stages I & III. In Stage III the same local extractor h is used for the first γ fraction of blocks. The number of super-blocks b also depends on an error tolerance ε and the empirically estimated min-entropy rate κ. In the main body, we explain how to realize this description as an algorithm that uses two streams.

Mentions: A key-feature of the RRB extractor in Fig. 2 is its simplicity, with the technical difficulty being in proving its correctness, which requires a novel, non-trivial analysis. RRB is the first extractor capable of handling big sources without additional assumptions. Previous works require either (i) unrealistic running times or (ii) ad hoc assumptions about the source. In particular, the local extractors such as von Neumann22 and Local Hash fail significantly in terms of output quality, whereas the throughput of Trevisan’s extractor19 and its followups degrade significantly (see Fig. 3) with the size of the sample12 even with practical optimization considered; e.g., 103,407 years of computing time for a 20 GB input sample and ε = 10−3, κ = 1/2. We note that we choose to compare to the Local Hash and von Neumann extractors since these are the only extractors experimented upon in previous work (see ref. 23 for empirical work using von Neumann’s extractor, and see refs 12, 24 and 25 for empirical work using Local Hash), and importantly, both extractors happen to be streaming extractors. Thus, due to their special attention in previous work they are two ideal candidates for comparison. We refer the reader to Table 2, Fig. 3, and the Supplementary Information for details.


True Randomness from Big Data
The Random Re-Bucketing (RRB) extractor.The random seed of size polylog(n) is used only in Stages I & III. In Stage III the same local extractor h is used for the first γ fraction of blocks. The number of super-blocks b also depends on an error tolerance ε and the empirically estimated min-entropy rate κ. In the main body, we explain how to realize this description as an algorithm that uses two streams.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036032&req=5

f2: The Random Re-Bucketing (RRB) extractor.The random seed of size polylog(n) is used only in Stages I & III. In Stage III the same local extractor h is used for the first γ fraction of blocks. The number of super-blocks b also depends on an error tolerance ε and the empirically estimated min-entropy rate κ. In the main body, we explain how to realize this description as an algorithm that uses two streams.
Mentions: A key-feature of the RRB extractor in Fig. 2 is its simplicity, with the technical difficulty being in proving its correctness, which requires a novel, non-trivial analysis. RRB is the first extractor capable of handling big sources without additional assumptions. Previous works require either (i) unrealistic running times or (ii) ad hoc assumptions about the source. In particular, the local extractors such as von Neumann22 and Local Hash fail significantly in terms of output quality, whereas the throughput of Trevisan’s extractor19 and its followups degrade significantly (see Fig. 3) with the size of the sample12 even with practical optimization considered; e.g., 103,407 years of computing time for a 20 GB input sample and ε = 10−3, κ = 1/2. We note that we choose to compare to the Local Hash and von Neumann extractors since these are the only extractors experimented upon in previous work (see ref. 23 for empirical work using von Neumann’s extractor, and see refs 12, 24 and 25 for empirical work using Local Hash), and importantly, both extractors happen to be streaming extractors. Thus, due to their special attention in previous work they are two ideal candidates for comparison. We refer the reader to Table 2, Fig. 3, and the Supplementary Information for details.

View Article: PubMed Central - PubMed

ABSTRACT

Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.

No MeSH data available.