Authorea

Daniel Stanley Tan edited untitled.tex about 8 years ago

Commit id: ad0f8bfedcc5237368bcad23432641c18a712dfc

deletions | additions

Visualizing data helps reveal interesting patterns from large data sets that might not be obvious in some representations. It also aids domain experts in extracting information, generating ideas, and formulating hypotheses from the data. Which is why visualizing the data plays a huge role in the data analytics process. However visualizingbig and high dimensional data is challenging due to the human limitation of only being able to visualize up to three dimensions. Moreover, traditional Traditional techniques are not capable also incapable of processing visualizing huge amounts of data due to their processing time that increases exponentially as the number of data points increases and data also continues to grow exponentially. In fact, data generated in the past decade is much larger than all data collected in the past century combined \cite{data2013}. For now, no algorithm exists that tackles all the problems of handling big data, . although there has been many works that address some specific aspects of it. \cite{xu2016exploring} Indeed, in the recent years, there has been an explosion of data and it continues to grow by the second. In fact, data generated in the past decade is much larger than all data collected in the past century combined \cite{data2013}. Traditional techniques for data analytics are not capable of analyzing these huge amounts of data due to their processing time that increases exponentially as the number of data increases. To make matters more challenging, this is usually coupled with high dimensionality thus, increasing the complexity of the problem further. For now, no algorithm exists that tackles all the problems of handling big data, although there has been many works that address some specific aspects of it. \cite{xu2016exploring} Some existing ways to visualize high-dimensional data are through dimensionality reduction techniques like Random Projections \cite{bingham2001random,kaski1998dimensionality}, Self Organizing Maps (SOM) \cite{kohonen1990self}, Multidimensional Scaling (MDS) \cite{kruskal1964multidimensional} and Principal Components Analysis (PCA) \cite{dunteman1989principal} which significantly reduce the dimensions by mapping high dimensional data into lower dimensions. This mapping inevitably loses information but these algorithms are creative in doing this in such a way that useful distances are preserved and information loss is minimized. The only problem is that the time complexity of these algorithms are exponential which is not suitable for handling big data. Parallelizable implementations of SOM \cite{carpenter1987massively}, MDS \cite{varoneckas2015parallel} and PCA \cite{andrecut2009parallel} exist but it only reduces the complexity by a linear factor, which may be good for now but it won't scale well for the future.