Methodology

This section presents the methodology used to analyze the relative variables of rape crime, based on the merged Kaggle and NYPD citywide incident level data, to develop a predictive model of rape crime index. This can be useful in policeforce allocation among different time periods.
Time Series Analysis  As an exploratory analysis, this project applies hourly, weekly and monthly time series analysis on rape crime data in 2015 and 2016. 
OLS Regression  A multivariate regression model has been constructed to analyze and predict correlative features associated with rape crimes. The variables applied are as follows: applied are as follows:
1. Building Type: Based on NYPD's own categories, I have modified each building type into private; public and transportation. Each category is labeled according its openess to the public. For example, "Parking lot" has been labeled as public due to its accessibility; "Residential buildings - APT" has been categorized as private becasue mostly for this building type, owners' permission to enter the area is required. 
2. Weather: There are two components for weather: temperature ( ̊F) and precipitation (inch). However, the effect of temperature may be explained by season as well.
3. Date of the week: This variable is a dummy variable with the baseline (0) indicating the crimetakes place on Friday, Saturday or Sunday.
4. Month of the year
5. Season: I have assigned seasons as following to account for seasonal differences while minimizing the variance in each season. This returns us with an ordered categorical variable from 1 to 4, the higher the number is, the higher the average number of crimes is expected. 1: Jan-Mar; 2: Apr-Jun; 3: Jul-Sep; 4: Oct-Dec.
The Current Limitation
Data Limitation Due to the sensitivity of sexual crimes, NYPD stopped disclosing location information on offenses reported after 2015, meanwhile they limited public access to historical data from 1990 to 2015 recently. The dataset I acquired for 2015 was downloaded from NYPD historical dataset early August this year. Its accuracy and preciseness is not secured.
Methodology Limitation Features selected for this multivariate regression model are external factors that have less determinants toward rape crimes. The insignificant R^2 indicates that the regression line does not fit real data points well. Since not all rapes happen under same circumstances, the factors that “permit” rape to happen vary from case to case. The motivation for conducting rape can range from “poverty, anger, power, sadism, ethical views and attitudes projected onto the victims, to evolutionary pressures.” As Hanief(2013) pointed out, there is no scientific evidence covering all the correlative variables in all the dissimilar types of rapes. One feature that establish a strong negative correlation in one scenario maybe weakly correlated in another.   
Justice System Limitation The inability in lawfully defining rape crime; differentiating rape, sexual harassment and sexual assault has made the constitution of rape crime always unclear. Based on the result that 75% of sexual crimes were committed inside private areas, it is possible that these victims were assaulted by their acquaintances instead of completely strangers. This may affect a rape crime to be under reported as sexual harassment. When sexual assault has been defined as “any type of sexual contact or behavior that occurs without the explicit consent of the recipient.”, there isn’t and less likely will ever have a concrete baseline for both physical and mental damage.

Result

Figure 1 below represents all reported rape crimes occured across New York City within 2015. Red dots are almost fairly distributed across Manhattan, Brooklyn, Bronx and Queens, with little spatial clustering significance.