In this paper we try to classify a user into three categories: “Gender”, “Age” and “Political Affiliation” with an application to Indian Twitter users. Our approach automatically predicts these attributes by leveraging observable
information such as the tweet behavior, linguistic content of the user’s Twitter feed and the celebrities followed by the user.
This paper also uses a novel feature that we define as “class influencers”.
Class influencers are the twitter users which influence a particular class so much that they themselves can be used as a
discriminative feature.
Our approach first extracts the linguistic content based features using the Linguistic Inquiry and Word Count (LIWC) dictionary. Then we derive features like smiley types, smiley count, tweet frequency, night-time tweet frequency , etc. We also derive celebrity based features, like age, genre, gender of the celebrities a user is following using Wikipedia and Freebase. Finally, we refine the results using class influencers. Results show that rich linguistic features combined with popular neighborhood and influencers prove to be valuable and promising for additional user classification needs.