this is for holding javascript data
We collected a sample of the pdf file issued by the various UNCHR camp directorates and looked at the
feasibility of extracting textual information from them on the event that we will be unable to get the data
from UNCHR.
Also, we have implemented a Facebook data-harvester to map the existing Syrian diaspora and its
demographics, in order to be able to compare this with pre-conflict census data.
- Using Facebook’s Graph Search one can get the current locations of a people originating from a
certain city, as long as they have not set this option to hidden on their Facebook (public by
default). This information block oftentimes contains data about the person’s job, education and
marital status as well. As an API-based approach is not possible for this case since Facebook’s 2013
policy change, we have implemented a data harvester in Python that automates a browser to load
the search results pages (to bypass Facebook’s ‘lazy-load’ memory preserving method). We found
empirically that about 1000 profiles will take up 2GB of space in RAM, therefore the number of profiles that can be mined with this method is significant. Of course, extrapolating from a
relatively small sample to real population dynamics is still a significant challenge, but for cases of
cities of 5-10000 people, the search results have been exhaustive in most of the cases, resulting
in about 1000 profiles or 10% of the that location’s population. For larger cities, the results are
limited by the workstations memory, to about 5000 hits. Significant data cleaning had to be
conducted on the harvested data, as the fields do not contain all information and are very noisy,
people mixing the fields in many cases or reporting irrelevant information. But overall, this data
source is deemed to be very useful for dynamic diaspora comparison purposes.