Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about wh...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/67852/1/FSKTM%202017%2024%20IR.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Nowadays, growing expansion of data content on the web delivers a huge
amount of collective resources. Twitter, one of the biggest social media site
collects tweets in millions every day in the range of Petabyte per year. Societies
share their experiences, thoughts or simply talk just about whatever concerns them online. Unstructured big data in social media plays vital roles in sentiment
analysis or also known as opinion mining.
Continuous structured and unstructured data are being generated in a large
scale every day. These data are meaningless if they are not being captured and
analyzed accordingly. Traditional RDBMS technology becomes less reliable
when dealing with huge amount of structured data and the processing speed of
data becomes sluggish if the infrastructure is not being upgraded to match the
big amount of data. Furthermore, RDBMS is not capable to deal with
unstructured data.
Due to petabytes of records are generated every year on the net, capturing and
analyzing big data can be challenging and cloud computing technologies are
able to provide an on-demand infrastructures and services based on user
requirements. Therefore, this thesis aims to use cloud based infrastructure
which is Amazon Web Service to capture unstructured of big data, and afterward
analyzing, visualizing and extracting useful information from large, diverse,
distributed and mixed of data gathered from public data sets and Twitter’s
Application Programming Interface (API).
The results and explanation on the experiments mentioned in the chapter four;
show the test bed result on collecting twitter data, test bed result on processing
twitter input data and test bed result on output data. The analysis emphasizes
on the elapsed time when collecting twitter data and also the performance of
Amazon Elastic MapReduce (EMR). The infrastructures provided by Amazon
Web Service are proficient enough to captured and manipulated large volume of
unstructured big data on twitter. Afterward, this study have tested the capability
of Amazon Elastic MapReduce (EMR) to process the input twitter data that had
collected earlier, and transform them into a meaningful output that can be used
for any decision making. |
---|