Limits...
A survey on platforms for big data analytics.

Singh D, Reddy CK - J Big Data (2014)

Bottom Line: In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks.Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs.Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Wayne State University, Detroit, MI 48202 USA.

ABSTRACT

The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of each of these platforms based on various metrics such as scalability, data I/O rate, fault tolerance, real-time processing, data size supported and iterative task support. In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks. Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs. Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics. In order to provide more insights into the effectiveness of each of the platform in the context of big data analytics, specific implementation level details of the widely used k-means clustering algorithm on various platforms are also described in the form pseudocode.

No MeSH data available.


Pseudocode of k-means clustering algorithm using MPI in a master–slave configuration.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4505391&req=5

Fig6: Pseudocode of k-means clustering algorithm using MPI in a master–slave configuration.

Mentions: MPI [44] typically have a master–slave setting and the data is usually distributed among the slaves. Figure 6 explains the pseudocode for K-means using MPI. In the first step, the slaves read their portion of the data. In the second step, the master broadcasts the centroids to the slaves. Next, the slaves assign data instances to the clusters and compute new local centroids which are then sent back to the master. Master will then compute new global centroids by aggregating local centroids weighted by local cluster sizes. These new global centroids are then again broadcasted back to the slaves for the next iteration of K-means. In this manner, the process continues until the centroids converge. In this implementation, the data is not written to the disk but the primary bottleneck lies in the communication when MPI is used with peer-to-peer networks since aggregation is costly and the network performance will be low.Figure 6


A survey on platforms for big data analytics.

Singh D, Reddy CK - J Big Data (2014)

Pseudocode of k-means clustering algorithm using MPI in a master–slave configuration.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4505391&req=5

Fig6: Pseudocode of k-means clustering algorithm using MPI in a master–slave configuration.
Mentions: MPI [44] typically have a master–slave setting and the data is usually distributed among the slaves. Figure 6 explains the pseudocode for K-means using MPI. In the first step, the slaves read their portion of the data. In the second step, the master broadcasts the centroids to the slaves. Next, the slaves assign data instances to the clusters and compute new local centroids which are then sent back to the master. Master will then compute new global centroids by aggregating local centroids weighted by local cluster sizes. These new global centroids are then again broadcasted back to the slaves for the next iteration of K-means. In this manner, the process continues until the centroids converge. In this implementation, the data is not written to the disk but the primary bottleneck lies in the communication when MPI is used with peer-to-peer networks since aggregation is costly and the network performance will be low.Figure 6

Bottom Line: In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks.Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs.Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Wayne State University, Detroit, MI 48202 USA.

ABSTRACT

The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of each of these platforms based on various metrics such as scalability, data I/O rate, fault tolerance, real-time processing, data size supported and iterative task support. In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks. Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs. Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics. In order to provide more insights into the effectiveness of each of the platform in the context of big data analytics, specific implementation level details of the widely used k-means clustering algorithm on various platforms are also described in the form pseudocode.

No MeSH data available.