ROUGH DRAFT authorea.com/109627
Main Data History
Export
Show Index Toggle 0 comments

THESIS

Chapter I. The Problem and Its Background

Introduction

On a daily basis, news articles in several dozens of languages are being published every day. According to Chart Beat(2014), over 92,000 news articles are posted to the web every 24 hours not including the blogs yet. They may have access to the news stories conveniently, but it takes too much time for people to read all the news. Tons and tons of articles about a specific subject may not be manageable for a reader to read through all of it one by one considering if they have different languages. It would be hard on the part of a reader to gather information regarding the subject that interests him/her. And learning a foreign language is not an ideal option to solve that problem, for learning a new language takes time and dedication(Simon Ager, 1998-2016). With that, the proponents will develop a multilingual, multi-document news summarizer that focuses on current events happening around the world as a domain such as terrorism and other health news spreading around the world. This system is effective to people who struggle reading lengthy news articles and yet can’t come up with the main points of the specific readings. This system would also eliminate the redundant information in the news article, reorganize the news for readers, and help them resolve the language barriers encountered. In addition, it is important to have a tool that would give easier access to any news articles regardless of what language they have because being able to read or understand other languages gives you access to a greater range of information about your subject. Languages accepted for the input documents are English, French, and Spanish; the resulting output will be in a summarized English language. Multilingual, multi-document news summarizer is a system that uses Natural Language Processing to create a shortened summarized version of the news taken from the input of the user.

Background of the Study

In the field of Natural Language Processing, there are certain topics that rise like those of text summarizers. There is a wide use of text summarizers today especially for education purposes. Mostly, text summarizers focus only on one language and summarize it to a similar language. This study will then focus on a much bigger domain that would classify different languages specifically news about current events. Since the study will be focusing on with multi-document, it will accept any news about a certain current event. Even in having different articles of multiple languages, it will then translate it to a single standard English version as the output summary. To move to a more specific technological development perspective of different innovations on text summarizers, an example would be text summarization methods that create text summaries by ranking and extracting sentences from the original documents (Gong, et. al). For this study, it will focus on using Text Rank Model. By using the this method, the study will be evaluating the systems efficiency and accuracy, complete and correct by the reviewing the outputted summary and evaluated by Professors in the field of English.

Theoretical Framework

Figures

Note that \label must occur AFTER (or within) \caption . For figures, \caption should occur after the \includegraphics . Note that IEEEtran v1.7 and later has special internal code that is designed to preserve the operation of \label within \caption even when the captionsoff option is in effect. However, because of issues like this, it may be the safest practice to put all your \label just after \caption rather than within \caption{} .

\label{fig:fig1} An example of a floating figure using the graphicx package.