Stian Soiland-Reyes added Use_case_pharmacological_reference_data__.md  almost 9 years ago

Commit id: 54162c2eca640d5e7e9e23010b111f72fcecc00d

deletions | additions      

         

#   ## Use case: pharmacological reference data  Open PHACTS combines pharmacological data in a single Linked Data cache, and provide RESTful APIs to query and examine the  data in a uniform interface. Data is loaded from several data providers like EBI and Uniprot - some of it is loaded directly as  RDF, other datasets are translated to RDF from more specific data formats. A series of VoID linksets are also loaded to   provide identity equivalence across the different RDF graphs. These linksets are derived from existing references in  the earlier loaded data sources, but can also be found   computationally (e.g. by comparing chemical structures or protein sequences).  Building Open PHACTS showed a challenge in how to keep all these data sources up to date, particularly when  changing our architecture to allow for a Docker-based installation of the Open PHACTS platform   and data at customer sites.  Open PHACTS platform is a service oriented architecture with components like a mySQL database, Virtuoso RDF store, memcache and PHP, Tomcat and ElasticSearch.  We found Docker to be a great tool for managing and deploying these components in isolation, and combined with Docker Compose, provide an easy way to   link them together to form a uniform and installable platform.  The challenge in setting up the Open PHACTS platform is the data loading. Docker images are stored as a series of differential file system layers, which can be  pushed and pull from registries like the Docker Hub and third-party installations of the Docker Registry. This mechanism works well for typical Docker applications,   where each layer can have a size in the magnitude of 100 MB, and the full Docker image a size of the magnitude 1 GB. We found that using this mechanism breaks down  when used with the Open PHACTS data, which can be in the magnitude of 100 GB as uncompressed RDF.   The mechanism of `Dockerfile` provides a reproducible way to build docker images, which can be created automatically by the Docker hub, typically based on a   GitHub repository. Changes pushed to the GitHub repository causes a new Docker image to be built. Instructions in the Dockerfile consists of commands like `ADD` and `RUN`.