Juan de Monasterio edited section_Data_sources_As_a__.tex  almost 8 years ago

Commit id: c60d07319ef78bf14e116b2473a1fc93513fb734

deletions | additions      

       

As a general rule, privacy is assured by handling data in private servers and by differentiating users by their hashed ID, with encryption keys managed exclusively by the TelCo.  Other general characteristics are that logs are timestamped and geolocalized by the position of the antenna used to place the call. To exclude outlying users such as call-centers or dead phones, numbers whose monthly cellphone use did not surpass 5 calls and fall to at most 400 calls were automatically filtered.  \subsection{Argentina} 

The mexican data source is an anonymized dataset from a national mobile phone operator. Data is available for every call made within a period of 19 months from January 2014 to September 2015. The raw logs contain about 12 million calls per day for more than 8 million users that accessed the telecommunication company's (TelCo) network to place the call. This means that users from other TelCos are logged, as long as one of the users registering the call is a client of the operator. In practice, we only considered CDRs between TelCo users since geolocalization was only possible for this group.  Call duration, Information logged for each  (EJEMPLO DE UNA TABLA DE DATOS CRUDOS o simpleformat?)  Of the two datasets, long-term mobility can only be seen in Mexico. Users living in one region can be tested to their area of influence at least one year before. This opens the possibility of modeling the question as a supervised learning problem where a users's past