As the refugee crisis is a complex social phenomenon we will be resorting to many data sources to feed
our model. We can categorize them into three groups:

  • Social media data
    • Existing social media aggregator streams
    • liveuamap (http://syria.liveuamap.com/), mining Twitter, YouTube and social media accounts of large news
      portals in real-time for conflict and refugees crisis related posts
    • Our own Facebook-harvester for demographic data profiling
  • Semi-official and highly embedded data
    • NGOs: There are many NGO are collecting about refugees trying to get into Europe. For example, number of boat crossing, people rescued, and fatalities. The problem with this is that they do not have a unified reporting system and often the data is reported in sources that hard to mine, such as blog posts or infographics.
    • News aggregator streams, such as GDELT present a very rich but hard to mine data as well.
  • Official data
    • UNHCR: They have great amount of information related to the Syrian refugee since they control and administer most refugee camps in Egypt, Jordan, Iraq, Lebanon and Turkey. They have got weekly report per camp data. Number of household, population, demographics, incoming and leaving refugees. They also have data about health situation, educational situation, and financial situation of these camps. The problem of these data is that it is not organized and stored in PDF format. Therefore we need to do some kind of scrapping data inter good data set. We have been trying to contact UNHCR here in Abu Dhabi and Jordan and Lebanon. With no luck on getting cleaner data.
    • UN and World Bank statistics to understand the pre-crisis Syrian society
    • EUROSTAT: Euro statistics got information about Asylum seeker in Europe, on a monthly resolution with some demographic data included