We collected a sample of the pdf file issued by the various UNCHR camp directorates and looked at the
feasibility of extracting textual information from them on the event that we will be unable to get the data
from UNCHR.

Also, we have implemented a Facebook data-harvester to map the existing Syrian diaspora and its
demographics, in order to be able to compare this with pre-conflict census data.

  • Using Facebook’s Graph Search one can get the current locations of a people originating from a
    certain city, as long as they have not set this option to hidden on their Facebook (public by
    default). This information block oftentimes contains data about the person’s job, education and
    marital status as well. As an API-based approach is not possible for this case since Facebook’s 2013
    policy change, we have implemented a data harvester in Python that automates a browser to load
    the search results pages (to bypass Facebook’s ‘lazy-load’ memory preserving method). We found
    empirically that about 1000 profiles will take up 2GB of space in RAM, therefore the number of profiles that can be mined with this method is significant. Of course, extrapolating from a
    relatively small sample to real population dynamics is still a significant challenge, but for cases of
    cities of 5-10000 people, the search results have been exhaustive in most of the cases, resulting
    in about 1000 profiles or 10% of the that location’s population. For larger cities, the results are
    limited by the workstations memory, to about 5000 hits. Significant data cleaning had to be
    conducted on the harvested data, as the fields do not contain all information and are very noisy,
    people mixing the fields in many cases or reporting irrelevant information. But overall, this data
    source is deemed to be very useful for dynamic diaspora comparison purposes.