A Simple Custom Similarity-Based Recommender System


The demand of recommender systems has been greatly increased for the last few times after the trending of personalization within various kind of systems. This leads to the better achievement of the best recommendation or more precise result could be for both service providers or users. So there is actually a need for users to create their own custom recommendation in a small scale. That would help them get personal recommendations as simple and fast as possible, but without having to involved in a complex system. For the better and more precise result, the system would be enhanced with semantic similarity method. One of the best method to achieve this is by using a framwork, such as Crab Python framework for building recommender engines or systems. As the outcome, regular users could provide themselves with simple custom recommendations by their own configuration. The best case would present users the most related recommendations with high accuracy of preferences.


Recommender systems have always been used for almost anything that related within user and usage relation of subjects, entities, and ratings. Most of them are created specifically for particular subjects. It can be used primarily for research, user assistant, personal needs, or even just for fun. This kind of dynamic feedback based on actual collected data is helpful to find the most related or appropriate information that would be presented. Nevertheless, it's still only made by researchers, producers, or creators. If regular users want to exactly configure their needs and have a small scale recommendation for personal use, they need to use a simpler method to achieve the result that recommender system could produce. So without having to complicate themselves with various algorithms and a lot of information that they need to know before. In this case, using a framework is recommended to build that kind of recommender system. Still, people will know the underlying principles behind the system but while understanding and building in the highest system layer. Moreover, the system would be enhanced with a semantic similarity or semantic relatedness method. So the better and more precise the recommendation results would be.

Similarity-Based Recommender System

Recommender system or recommendation engine are so called to basically give people or users more offers or options about something that closely related with the item or content they're liking, watching, or using. The item could be a foods, books, movies, songs, games, places, and so on even people. Most of them are created by the developers, specifically or focused only just to recommend chosen subjects. People mostly have various tastes, but those can be calculated as patterns or converted into models. What will be done in this work is taking those patterns or models, then generate the recommendations with similarities in mind. Because of that, this will also considered as a semantic similarity system since the relation between items are all logically related and meaningful. Actually the real considered semantic recommender is any system that bases its performance on a knowledge base (Peis 2008). Also lately, the basic good recommendation is one that increases the usefulnes of your product in the long run, even it's hard to measure directly (Levy 2013). Or better yet, that could gives or predicts a similar, better, or new desirable things that might haven't know or discovered yet by the users.

This work in progress will focus on the item-based recommenders approach, figuring out what items are similar with the one that have already been liked. Item-based or similarity-based recommender system is included in a bigger scale called collaborative filtering and moreover, machine learning. Generally it produces recommendations based on the information or knowledge that users have about their relationships to one or some items. So there is is no requirement for prior knowledge of the properties or attributes of the items themselves. The items could be so various like some of the mentioned before and nothing about their attributes need to enter into any of the input.

Crab Recommender Systems Framework

Crab, formerly known as scikits.recommender, is a Python framework for building recommender systems and engines integrated with the world of scientific Python packages ( (Caraciolo 2012). It's released as an open source project and commercially usable with BSD license (3 clause).

It has useful segmented features:

  • Recommender Algorithms: User-Based Filtering and Item-Based Filtering.
  • Work in progress: Slope One, SVD, Evaluation of Recommenders.
  • Planned: Sparse Matrices, REST API’s.

Which will be used in this work is item-based filtering for defining semantic similarity items.


There are few main components that need to be installed and it can be done from:

  • Operating system package distribution
  • Official release archive or repository

It is more recommended to install from source by having these dependencies first (with their common package names):

  • Python development tools (python-dev)
  • Numpy (python-numpy, python-numpy-dev)
  • setuptools (python-setuptools)
  • SciPy (python-scipy)
  • Atlas Build Tool (libatlas-dev)
  • GNU C++ Compiler (g++)
  • scikit-learn for Machine Learning (scikits.learn)
  • matplotlib (python-matplotlib)
  • nose Unit Testing Framework (nose)

Then finally the Crab framework (crab). All can be done by using os-based package managers; apt-get, pip or easy_install. Or also get the repository ( then install with python install.


To make it simple, it is better to classify all of the main experiment components or tools with their associated resources into categorized layers.

System Base

Python, Numpy, and SciPy

Engine or Framework


Dataset Load

Various datasets can be created from scratch or are available from GroupLens; such as MovieLens for movie ratings, HetRec (for Delicious Bookmarks, listening records, MovieLens with IMDb/Rotten Tomatoes ratings), Book-Crossing (BX), and Jester jokes list.

Approach Model

Item-based or similarity-based recommender system.

Recommendation Result

Array of possible recommendations.

Simple Application

A three steps to create a simple app are required:

  • Data representative creation as the input.
  • Recommender engine building as the output.
  • Effectiveness evaluation as the test.

Data input is in form of preferences in a dataset. The preferences would be associations from users to items which both could be anything that associated with specific ID and expression strength/rating. For example, a dataset with users named with numbers and their preferences for the movies. The dataset is would be dictionary-like object or simple comma-separated values format files (.csv) that includes data and metadata, stored in .data attribute in the following format: {user_id1: {item_id1: preference, ...}, {user_id2: {...}, ...}


To make this short, the work will be using the sample datasets that provided.

from scikits.crab import datasets # import all tools movies = datasets.load_sample_movies() # load a dataset into variable print # data attribute from the variable

Intrepret it in Python, that will get:

{1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0}, 2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0}, 3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0}, 4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0}, 5: {2: 4.5, 3: 1.0, 4: 4.0}, 6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5}, 7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}}

Each item number is a key for a given value. To check or know that, print out user_ids or item_ids attribute from the variable. Based on the data alone, we can visualize by the following heat map about its similarity or correlation about users and their item ratings. And by this, it's easier to look at the current data condition. Within the scale from 0 to 5, represented from darkest to lightest color in blue hue.

Here's the complete code:

``` from scikits.crab import datasets from scikits.crab.models import MatrixPreferenceDataModel from scikits.crab.metrics import pearson_correlation from scikits.crab.similarities import UserSimilarity from scikits.crab.recommenders.knn import UserBasedRecommender

movies = datasets.loadsamplemovies() # Load the dataset model = MatrixPreferenceDataModel( # Build the model similarity = UserSimilarity(model, pearsoncorrelation) # Build the similarity recommender = UserBasedRecommender(model, similarity, withpreference=True) # Build the User based recommender

recommender.recommend(7) # Recommend items for the user 7 ```

Then it would outputted: [(6, 2.8092760065251263), (3, 2.6946367039803634)]. So that means, user 7 (Michael) is recommended to watch movie 3 (You, Me a