Tuesday 25 November 2014

Creating a graph of software technologies with Docker, Neo4j, NodeJS and D3.js Part 1

Feedback highly appreciated


Okey! So what would I like to do?


I would like to experiment with some technologies, which were in my mind nowadays, and have fun. I might later swap some of the pieces (maybe Node to JVM platform or to another client side JS framework). The idea is to visualize the connections between software technologies using this stack.

The imagined problem and solution is the following. (It might turn up to be a good thing, but maybe it will just remain as a prototype.)

Usually we are interested in particular technologies and their relations to others so we start googling. We do this because we know what we want to achieve, but we don't know what tools do we have or we know some of the tools, but we want to see if there are comparable alternatives. We select usually by checking if the technology is free or not, is it mature enough for our purpose, is there a community behind it or maybe a known company, is it maintained frequently and we also sometimes need to predict for how long this technology will be around before a new comes. 

To visualize the technologies I would like to create a web application.

  • It will run in a browser so I need some JavaScript libs for sure. I always wanted to try out the powerful D3.js
  • On the server side I will go with Node.js for now as a middleware  (might swap to JVM platform later). It will provide an API for the client application. 
  • The data, as the whole model is basically a graph, will be stored in a graph database, for now, let's go with Neo4j.
  • I would like to put the server side pieces in lightweight containers. Docker will be perfect for this.
  • I will need to manage the containers and I don't want to do this manually, so I might use Fig or Flocker or Kubernetes or all :).

Let's start and see what happens :)


What types of connections we should consider?


Implements

It means that A technology (e.g. library, framework) implements the B specification or protocol.

Uses

This connection says that A technology uses the B. This is a transitive relation.

Extends

A technology extends B technology. This is a transitive relation.

Relates to

A technology is related to B technology. This is a symmetric relation.

A is an alternative of B

A technology is created for a similar purpose like B. This is a symmetric relation.

Later we can consider the inverted connections like contains, specifies etc. We have to do this carefully as it can speed up our queries because of more specific types, but it can also slow down in some cases. It really depends on the use cases and size. For now this is enough. To learn more about this, the Graph Databases book is a great reference.

We can model these relationships with a property model graph.


What do we need?

(Note: Installed Docker is required and some Linux distribution. Windows and Mac users have to use boot2docker and set up the port forwarding for their boot2docker controlled VM. Another sibling for Windows users is to use Spoon but that is not a Docker based platform.)
Lets build our stack from down to top.

We need a running Neo4j instance. Why not run it in a Docker container? We could move our instance whenever we want or reuse the image for staging environments, integration tests, or for adding new nodes. Unfortunately there is no official Neo4j Docker image on Docker Hub, but there is one which is quite popular created by tpires. Could be a good fit, but first let's check it's Dockerfile to see how the image was built.

It is based on dockerfile/java which is based on an ubuntu image.

Looks good!

To just try it if it works (and to get the dependencies) let's run what the author says:

We are saying here, hey Docker run tpires/neo4j image in a container please, and bind the host machine's 7474 port to this container's 7474 port. 

Great. Let's check if its running.

Yup!

Neo4j runs a webserver for us, so let's open our browser and type localhost:7474.


It works!

It would be good to separate the data from the functionality into two containers, not to loose the portability, so let's create a data-only container.
Create a Dockerfile.

Let's reuse ubuntu image and add a volume.

Build the image.

Run the image.

Bind the volume to our Neo4j container and run it. (Don't forget to stop it.)

Nice. 
Ok, so now we have a running database. Of course it is not production ready, but for this prototype it is enough.

Next we will add a middleware based on Node platform, which will call the Neo4j's REST API and add some business logic. We will do this in the next part.
(Note: If we would want, we could extend the Neo4j's REST API with writing extensions in Java using JAX-RS annotations.) 

Feedback highly appreciated