this is for holding javascript data
Mircea Trifan edited Introduction.tex
about 10 years ago
Commit id: 9cb11f6664e827f5a54b50aa3a139473b3e1aac3
deletions | additions
diff --git a/Introduction.tex b/Introduction.tex
index ae4828a..7f77ea0 100644
--- a/Introduction.tex
+++ b/Introduction.tex
...
\section{Introduction}
There are
three four types of twitter streams that a ordinary user has acces to: trends, search phrase wich returns up to maximum 1500 tweets, user timeline, streams parametrized by keywords or users and spritzer stream that is 10\% of overall tweets.
Theese can be implemented as tab panels in a spread sheet like user interface.
Big data processing can be integrated in M3Data. The underlying database is Apache Accumulo and the processing could be done in a pipeline approach by Cascading. Cascading can run on top of Accumulo (or Storm).
M3Data blocks can be made for Twitter
NER:
Users at streams as defined above. Other processing blocks ca be built for the following named entity categories: user prefixed by the @ sign, hash sign
RT identified topics, web pages prefixed by http:\\, youtube videos, stock market companies prefixed by $, retweets RT, OpenNLP: people, places,
dates
dollar sign stock market
hash
http
youtube dates, Freebase
news
trends entities, news, schema.org. A regex block could make implementation easier.
Co-occurence matrix of entities can be identified by Cascading on Accumulo and presented in the spreadsheet interface in all the four stream tabs identified previously. Another M3Data block for tf-idf for concepts as defined in
For a corpus of existing tweets, Twitter2011
or 2012 TREC corpus
can be used.
co-occurence matrix on spritzer or trends or search phrase or schema.org
tf-idf for concepts with cascading on accumulo
...
cascading in M3Data (Lingual) + ML (PMML)
OLAP cube
UIMA