Tuesday, 25 November 2014

Creating a graph of software technologies with Docker, Neo4j, NodeJS and D3.js Part 1

Feedback highly appreciated


Okey! So what would I like to do?


I would like to experiment with some technologies, which were in my mind nowadays, and have fun. I might later swap some of the pieces (maybe Node to JVM platform or to another client side JS framework). The idea is to visualize the connections between software technologies using this stack.

The imagined problem and solution is the following. (It might turn up to be a good thing, but maybe it will just remain as a prototype.)

Usually we are interested in particular technologies and their relations to others so we start googling. We do this because we know what we want to achieve, but we don't know what tools do we have or we know some of the tools, but we want to see if there are comparable alternatives. We select usually by checking if the technology is free or not, is it mature enough for our purpose, is there a community behind it or maybe a known company, is it maintained frequently and we also sometimes need to predict for how long this technology will be around before a new comes. 

To visualize the technologies I would like to create a web application.

  • It will run in a browser so I need some JavaScript libs for sure. I always wanted to try out the powerful D3.js
  • On the server side I will go with Node.js for now as a middleware  (might swap to JVM platform later). It will provide an API for the client application. 
  • The data, as the whole model is basically a graph, will be stored in a graph database, for now, let's go with Neo4j.
  • I would like to put the server side pieces in lightweight containers. Docker will be perfect for this.
  • I will need to manage the containers and I don't want to do this manually, so I might use Fig or Flocker or Kubernetes or all :).

Let's start and see what happens :)


What types of connections we should consider?


Implements

It means that A technology (e.g. library, framework) implements the B specification or protocol.

Uses

This connection says that A technology uses the B. This is a transitive relation.

Extends

A technology extends B technology. This is a transitive relation.

Relates to

A technology is related to B technology. This is a symmetric relation.

A is an alternative of B

A technology is created for a similar purpose like B. This is a symmetric relation.

Later we can consider the inverted connections like contains, specifies etc. We have to do this carefully as it can speed up our queries because of more specific types, but it can also slow down in some cases. It really depends on the use cases and size. For now this is enough. To learn more about this, the Graph Databases book is a great reference.

We can model these relationships with a property model graph.


What do we need?

(Note: Installed Docker is required and some Linux distribution. Windows and Mac users have to use boot2docker and set up the port forwarding for their boot2docker controlled VM. Another sibling for Windows users is to use Spoon but that is not a Docker based platform.)
Lets build our stack from down to top.

We need a running Neo4j instance. Why not run it in a Docker container? We could move our instance whenever we want or reuse the image for staging environments, integration tests, or for adding new nodes. Unfortunately there is no official Neo4j Docker image on Docker Hub, but there is one which is quite popular created by tpires. Could be a good fit, but first let's check it's Dockerfile to see how the image was built.

It is based on dockerfile/java which is based on an ubuntu image.

Looks good!

To just try it if it works (and to get the dependencies) let's run what the author says:
docker run -i -t -d --name neo4j --privileged -p 7474:7474 tpires/neo4j
view raw gistfile1.sh hosted with ❤ by GitHub

We are saying here, hey Docker run tpires/neo4j image in a container please, and bind the host machine's 7474 port to this container's 7474 port. 
ogi@ubuntu:~$ sudo docker run -i -t -d --name neo4j --privileged -p 7474:7474 tpires/neo4j
Unable to find image 'tpires/neo4j' locally
Pulling repository tpires/neo4j
bc1c23b28916: Pulling dependent layers
511136ea3c5a: Download complete
d497ad3926c8: Download complete
bc1c23b28916: Download complete
e791be0477f2: Download complete
3680052c0f5c: Download complete
22093c35d77b: Download complete
5506de2b643b: Download complete
b08854b89605: Download complete
d0ca2a3c0233: Download complete
1716e82f74f0: Download complete
b41d25703535: Download complete
e95dbc5735e1: Download complete
5992007b07de: Download complete
b4e54ddfb2af: Download complete
cb875b6a5e56: Download complete
ea9d3f0791a1: Download complete
ad4d64683ae2: Download complete
1e40114b530a: Download complete
78bda9302d72: Download complete
1c2b68432f4e: Download complete
33e130bf1c86: Download complete
dabafd1110de: Download complete
d35b10c1f6c2: Download complete
5b83638ca8f8: Download complete
c3e91297793d: Download complete
Status: Downloaded newer image for tpires/neo4j:latest
ad63739868ec29d7fd820517e9b69f6743eb9f552d41c1b6184a85eb2e1ce927

Great. Let's check if its running.

ogi@ubuntu:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ad63739868ec tpires/neo4j:latest "/bin/bash -c /launc 4 days ago Up 4 days 1337/tcp, 0.0.0.0:7474->7474/tcp neo4j
view raw neo4j_docker_ps hosted with ❤ by GitHub
Yup!

Neo4j runs a webserver for us, so let's open our browser and type localhost:7474.


It works!

It would be good to separate the data from the functionality into two containers, not to loose the portability, so let's create a data-only container.
Create a Dockerfile.
ogi@ubuntu:~$ touch Dockerfile

Let's reuse ubuntu image and add a volume.
FROM ubuntu
VOLUME /var/lib/neo4j/data
CMD ["true"]
view raw Neo4jData hosted with ❤ by GitHub

Build the image.
ogi@ubuntu:~$ sudo docker build .
Sending build context to Docker daemon 116.6 MB
Sending build context to Docker daemon
Step 0 : FROM ubuntu
---> 5506de2b643b
Step 1 : VOLUME /var/lib/neo4j/data
---> Using cache
---> 2ba1b980567c
Step 2 : CMD true
---> Using cache
---> 4b6c62e5e10c
Successfully built 4b6c62e5e10c

Run the image.
ogi@ubuntu:~$ sudo docker run --name neo4j-data 4b6
view raw runDockerVolume hosted with ❤ by GitHub

Bind the volume to our Neo4j container and run it. (Don't forget to stop it.)
sudo docker run -i -t -d --name neo4j --volumes-from neo4j-data --privileged -p 7474:7474 tpires/neo4j

Nice. 
Ok, so now we have a running database. Of course it is not production ready, but for this prototype it is enough.

Next we will add a middleware based on Node platform, which will call the Neo4j's REST API and add some business logic. We will do this in the next part.
(Note: If we would want, we could extend the Neo4j's REST API with writing extensions in Java using JAX-RS annotations.) 

Feedback highly appreciated