Exploring Algorand: Preparing to query blockchain data

I’ve been exploring Algorand as a blockchain platform to build on. Algorand supports smart contracts via its own virtual machine, has a good transaction rate, low fees, is decentralized, and uses a proof of stake consensus algorithm. This post starts off exploring the Algorand network and blockchain data. Specifically, how to run an Algorand node, and an Algorand Indexer to explore the data in the blockchain. The Algorand Node synchronizes with the Algorand network and receives all block, transaction, and address data. The Algorand Indexer connects to and reads data from the Algorand Node, and stores data in an easy to search structure in PostgreSQL.
To get this all running, in this article we’re going to walk through:
  • Creating a virtual machine in Google Cloud Platform via Vagrant
  • Setting up the VM with everything we need including Docker
  • Running the Algorand Indexer, PostgreSQL, and Algorand Node via Docker compose
  • Checking progress of syncing and indexing
We’ll explore querying the data in the indexer in a future post. The code related to this post can be found at:

Google or other cloud provider

The examples and code here are based on Google Cloud. You can sign up for an account with a free tier if you don’t have one already. The code and examples here can be adapted to another code provider, or to run everything locally. The main thing you’d need to do is change the Google-specific parts of the Vagrantfile.

To host on Google, follow the steps in https://github.com/mitchellh/vagrant-google#google-cloud-platform-setup to:
  • Set up a Google account if you don’t have one
  • Create a new project if you don’t have one and enable Compute Engine API
  • Create a service account, export its JSON API key.
    Note the service account email, and save the JSON API key file. You’ll need both later.
  • Add an SSH key you want to use to the Compute Engine.
I don’t see the steps in the above link calling out a role to assign the new service account. I gave the service account “Compute Admin” role in IAM to make sure it could create the server.

Automating server creation and setup

We use Vagrant to automate the creation and setup of the Google VM. Vagrant can be used to automate creation of multiple types of VMs in the cloud and locally. If you don’t have it installed already, you can do so here.

To create a Google Cloud VM, you’ll also need to install the Vagrant Google plugin.

vagrant plugin install vagrant-google
Following is a sample Vagrantfile to create the VM.

Vagrant.configure("2") do |config|
  config.vm.box = "google/gce"# Provider to set up VM in Google Cloud
  config.vm.provider :google do |google, override|
    google.google_project_id = "<Your google cloud project ID here>"
    google.google_json_key_location = "<Path to JSON key here>"    # 2vCPU, 4GB
    google.machine_type='e2-medium'    # Use Ubuntu 20.04
    google.image_family = 'ubuntu-2004-lts'    google.name = 'algorand-index-server'
  
    # Allocate 400 GB for disk.  You may need more if running
    # mainnet node
    google.disk_size = '400'
      
    # Tags to apply to server
    google.tags = ['algorand-indexer']    override.ssh.username = "<username you want to create on server>"
    override.ssh.private_key_path = "<local path to your SSH private key you want to use>"
  end  # Copy docker-compose.yml and Algorand Node config files to VM
  config.vm.provision "file", source: "./docker-compose.yml", destination: "docker-compose.yml"
  config.vm.provision "file", source: "./node-config.json", destination: "config.json"  # Execute setup script on the VM
  config.vm.provision "shell", path: "setup.sh"
end
The Vagrantfile will create the VM in Google Cloud, set up an SSH user on it, copy docker-compose.yml and node-config.json to the VM, and finally run the setup.sh script to do the rest of the setup.

Note: you’ll need to substitute the <…> in the file with your own values.

The setup.sh file, copied to the server and executed by Vagrant, provisions the server, installing all the needed packages, setting up directories and populating them, and finally runs the services in the docker-compose.yml:

#!/bin/sh

#
# Setup script for VM
#

# Exit on any error
set -e

# Install docker: https://docs.docker.com/engine/install/ubuntu/
sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# Install Docker compose
wget https://github.com/docker/compose/releases/download/v2.1.0/docker-compose-linux-x86_64
sudo mv ./docker-compose-linux-x86_64 /usr/local/bin/docker-compose
sudo chmod a+x /usr/local/bin/docker-compose

# Create data directory for algorand, which will be shared among node & indexer
sudo mkdir -p /var/algorand/data

# Data directory for postgresql
sudo mkdir  -p /var/lib/postgresql/data/

# Copy node-config.json to data directory
sudo cp config.json /var/algorand/config.json

# Bootstrap Algorand node data directory on VM from algorand-node docker image
sudo docker-compose run algorand-node sh -c "cp -R /root/node/data/* /var/algorand/data/"
sudo docker-compose run algorand-node sh -c "cp /var/algorand/config.json /var/algorand/data/"

# Start everything up
sudo docker-compose up -d

Docker Compose

Docker Compose orchestrates the three docker containers which must be run:
  • Algorand Node
  • Algorand Indexer
  • PostgreSQL database used by indexer
Following is the docker-compose.yml which sets up all of the above containers. Comments describe each element.
version: "2.4"
services:
  # Algorand node.  Can't use catchup mode, so takes a long time
  # to get to current block.
  algorand-node:
    # Use Algorand tesnet.  To use mainnet, change to algorand/stable.
    image: algorand/testnet
    command: /bin/sh -c "./goal node start -l 0.0.0.0:8080 -d /var/algorand/data && sleep infinity"
    ports:
      - 8080:8080
    volumes:
      # Mount data directory on host so block data survives container.
      - /var/algorand/data:/var/algorand/data:rw
      # Mount config so it can be changed outside image
      - /var/algorand/config.json:/var/algorand/config.json:ro

  # Postgres database where indexer stores data
  indexer-db:
    image: "postgres"
    ports:
      - 5433:5432
    expose:
      - 5432
    environment:
      POSTGRES_USER: algorand
      POSTGRES_PASSWORD: indexer34u
      POSTGRES_DB: pgdb
    volumes:
      - /var/lib/postgresql/data/:/var/lib/postgresql/data/:rw

  # Algorand indexer which reads from algorand-node,
  # and writes to indexer-db
  indexer:
    image: "rcodesmith/algorand-indexer:2.6.4"
    ports:
      - 8980:8980
    restart: unless-stopped
    environment:
      DATABASE_NAME: pgdb
      DATABASE_USER: algorand
      DATABASE_PASSWORD: indexer34u
      ALGORAND_HOST: algorand-node
    depends_on: 
      - indexer-db
      - algorand-node
    volumes:
      # Mount Algorand node data, to get token
      - /var/algorand/data:/var/algorand/data:rw

Indexer docker image

The Indexer is the one component where we’re not using an existing Docker image. I forked the indexer repo and added a Dockerfile:
FROM golang:alpine

# Dependencies
RUN apk add --update make bash libtool git python3 autoconf automake g++ boost-dev busybox-extras curl

# Add code to gopath and build
RUN mkdir -p src/github.com/algorand/indexer
WORKDIR src/github.com/algorand/indexer
COPY . .
RUN make

# Launch indexer with a script
COPY run.sh /tmp/run.sh
CMD ["/tmp/run.sh"]
The Indexer docker image can be found on docker hub here, and the github repo is here.

Create VM via Vagrant

Now that we’ve reviewed everything, it’s time to create the server and start everything up.

To do everything, run the following in the algorand-indexer-server top directory:

vagrant up --provider=google
If it completes successfully, the VM has been created, and all of the containers have been started up.

To work with the VM, you’ll need the public IP address that’s been allocated by Google. You can find it in Google Cloud Console, Compute Engine page.

You should now be able to SSH into the server, using the username and SSH key you substituted earlier in the Vagrantfile. e.g.

ssh [email protected]
Once you’re on the server, a couple things to point out:

The docker-compose.yml is in the user’s home directory. /var/algorand/data contains the Algorand Node data. This is also where the Node config.json is stored.

You can check on the volume free space via ‘df’.

Note that we’re running a full Algorand node, so it has a copy of all block data, and is continuously increasing in size.

You can confirm everything is running in docker-compose:

Inspecting containers in docker-compose

As these containers run, the Node will continue to receive blocks from the Algorand network, and the Indexer will continue to index data into PostgreSQL. It can take days for everything to get caught up.

To check the status of the node, first, start an interactive bash shell in the algorand-node container:

sudo docker-compose exec algorand-node bash
Then use ‘goal node status’ to get the status of the Node process

In the example above, the Node process last processed block 9,050,016. Note in this example I was running a Node on mainnet.

You can check this against the latest block generated by the network as reported in Algorand Explorer:

You can change the network (mainnet or testnet) in the top right. In this case, the latest block generated is 17,674,713, so my node is about halfway caught up with the network.

To check the progress of the Indexer, look at the output of the Indexer container via the docker-exec tail command:

sudo docker-compose logs --tail=100 indexer
The “rounds” correspond to the block numbers.

Stopping and Deleting everything

Since all of the persistent state, the algorand node data and PostgreSQL storage, is stored in volumes outside the docker containers, you can safely stop everything and start everything back up later on the VM:

# To stop and remove all containers:
sudo docker-compose stop
sudo docker-compose rm# To start everything back up:
sudo docker-compose up -d
When all done, you can delete everything, including the VM, via Vagrant:

vagrant destroy

Summary

We now have all Algorand blockchain data being synchronized to a PostgreSQL database. We’ll follow up in a future post with how to query that data via Indexer APIs, or directly in PostgreSQL.
Posted in Uncategorized | Tagged , , , | Leave a comment

Automatically installing and cleaning up software on Kubernetes hosts

I had a need to automatically install software on each node in a Kubernetes cluster. In my case, security scanning software. Kubernetes can start new nodes to scale up automatically, destroy nodes when no longer needed, and create/destroy nodes as part of automatic Kubernetes upgrades. For this reason, the mechanism to install this software has be integrated into Kubernetes itself, so when Kubernetes creates nodes, it automatically installs whatever additional software is needed.

I came across a clever solution using Kubernetes DaemonSets and the Linux nsenter command, described here. The solution consists of:
  • A Kubernetes DaemonSet which ensures that each server in the cluster (or some subset of them you specify) runs a single copy of an installer pod.
  • The installer pod runs an installer docker image which copies the installer and other needed files onto the node, and runs the installer script you provide via nsenter so the script runs within the host namespace instead of the docker container
The DaemonSet runs a given pod, in our case the installer pod which runs the installer script, automatically on each Kubernetes server, including any new servers created as part of horizontal scaling or upgrades.

Shekhar Patnaik has implemented and packaged this pattern up into a Docker image and sample DaemonSet. The project is here (AKSNodeInstaller).

There’s a couple additional things I needed which the above project doesn’t do
  • The ability to clean up installed software before a Kubernetes node is destroyed; In my case uninstalling packages and de-registering agents
  • Support for copying files onto the node for installation (e.g. debian package files)
To support this, I extended AKSNodeInstaller with the above features, and a sample of how to test in VirtualBox/Minikube. The forked github repo is at https://github.com/rcodesmith/KubeNodeInstaller and the installer docker image is at rcodesmith/kubenodeinstaller.

Please read the original blog post from Shekhar Patnaik to understand how the DaemonSet and installer Docker image work together.
To support registering a cleanup script to be called before a node is destroyed, I use a Container preStop hook in the DaemonSet. The preStop hook lets you specify a command to be run before a container is stopped. Since the DaemonSet pod and its containers are started when a node is created, and stopped before a node is destroyed, the preStop hook lets us run a cleanup shell script just before the Kubernetes node is destroyed.

The fragment of the sample DaemonSet manifest showing the preStop hook and the install and cleanup scripts volume mount looks like this:

apiVersion: v1
kind: Namespace
metadata:
  name: node-installer
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: installer
  namespace: node-installer
spec:
  selector:
    matchLabels:
      job: installer
  template:
    metadata:
      labels:
        job: installer
    spec:
      hostPID: true
      restartPolicy: Always
      containers:
      - image: rcodesmith/kubenodeinstaller:1.1
        name: installer
        securityContext:
          privileged: true
        volumeMounts:
        - name: install-cleanup-scripts
          mountPath: /tmp
        - name: host-mount
          mountPath: /host
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","./runCleanup.sh"]
      volumes:
      - name: install-cleanup-scripts
        configMap:
          name: sample-installer-config
      - name: host-mount
        hostPath:
          path: /tmp/install
The runCleanup.sh script will run a cleanup.sh script you provide on the host via nsenter. You supply the cleanup.sh script via a ConfigMap that is mounted into the pod as a volume, same as the install.sh script. Following is an example ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-installer-config
  namespace: node-installer
data:
  install.sh: |
    #!/bin/bash
    # Test that the install file we provided in Docker image is there
    if [ ! -f /vagrant/files/sample_install_file.txt ]; then
        echo "sample_install_file not found on host!"
        exit 0
    fi
    # Update and install packages
    sudo apt-get update
    sudo apt-get install cowsay -y
    touch /vagrant/samplefile.txt
  cleanup.sh: |
    #!/bin/bash
    sudo apt-get remove cowsay -y
    rm /vagrant/samplefile.txt
I also had a need to install a package from a file that wasn’t in a repository. To support this, I add whatever files are needed to a custom installer Docker image, then copy those files onto the node. The install script you supply can then make use of those files. To use this, supply your own Docker image which copies whatever additional install files you need in a files/ directory. For example:

FROM rcodesmith/kubenodeinstaller
COPY files /files
Then use the docker image in your DaemonSet manifest instead of rcodesmith/kubenodeinstaller.

Finally, you can make use of whatever files you copied in your install script. The files will be copied onto the host in whatever directory you mounted into /host in your DaemonSet.

In summary, to use this solution:
  1. Create a ConfigMap with the installer script, named install.sh, with whatever install commands you want. They’ll be executed on the node whenever a new server is added.
  2. If you need some additional files for your install script, such as debian package files, create a custom Docker Image and include those files in the image via the Docker COPY command. Then use the Docker image in your DaemonSet manifest.
  3. If you have some cleanup steps to execute, provide a cleanup.sh script in the same ConfigMap. The script will be executed on the node before a server is destroyed.
Testing in VirtualBox and Minikube Initially, I was testing out the solution and my install script by creating / destroying Kubernetes node pools in GKE. This wasn’t ideal, so I wanted a faster, local way to test. Following is a way to test this out locally using Vagrant, VirtualBox and Minikube. VirtualBox is a free machine virtualization product from Oracle that runs on Mac, Linux, and Windows. We’ll use VirtualBox to run an Ubuntu VM locally on top of which Minikube will run. Essentially, the VM will be our Kubernetes host.

Minikube is a Kubernetes implementation suitable for running locally on Mac, Linux, or Windows.

Vagrant is a tool that can automate the creation and setup of machines, and supports multiple providers including VirtualBox. We’ll use it to automate the creation of and setup of the VirtualBox Ubuntu VM and Minikube.

Following are install instructions for Mac using Homebrew, but you can also use Windows and Linux:

Install VirtualBox, extensions, and Vagrant:
brew install Caskroom/cask/virtualbox
brew install Caskroom/cask/virtualbox-extension-pack
brew install vagrant
vagrant plugin install vagrant-vbguest
Install whatever Vagrant box you need, corresponding to what you’ll use for your Kubernetes nodes:

You can find boxes at: https://app.vagrantup.com/boxes/search

I’m using this Ubuntu box.

To get started with a Vagrant box
vagrant init ubuntu/focal64
The above command will generate a Vagrantfile in the current directory which describes the VM to be created, and steps to provision it. The Vagrantfile I used is here. You might need to add more memory for the VM in the Vagrantfile:

  config.vm.provider "virtualbox" do |vb|
    # Display the VirtualBox GUI when booting the machine
#    vb.gui = true
  
    # Customize the amount of memory on the VM:
    vb.memory = "2024"
  end

In the Vagrantfile, use the Vagrant shell provisioner to install Minikube, Docker, and kubectl. We’re using the Minikube ‘none’ driver which will cause it to run Kubernetes in the current server (the Vagrant VM). And finally, start minikube.
  # Enable provisioning with a shell script. Additional provisioners such as
  # Ansible, Chef, Docker, Puppet and Salt are also available. Please see the
  # documentation for more information about their specific syntax and use.
  config.vm.provision "shell", inline: <<-SHELL
    sudo apt update
    sudo curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64   && sudo chmod +x minikube
    sudo mv minikube /usr/local/bin/minikube
    sudo apt install conntrack
    sudo minikube config set vm-driver none
    sudo sysctl fs.protected_regular=0
    sudo apt install -y docker.io
    sudo apt-get install -y apt-transport-https
    curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
    sudo apt-get update
    sudo apt-get install -y kubectl
    sudo minikube start --driver=none
  SHELL
To verify Minikube is running in the VM:
> sudo minikube statusminikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
To start Minikube if it isn’t running:
sudo minikube start --driver=none
Now that Minikube is running, you can interact with the Kubernetes cluster using Kubectl.

> sudo kubectl get nodes
 
NAME STATUS ROLES AGE VERSION
ubuntu-focal Ready control-plane,master 10d v1.21.2
Now, apply your ConfigMap and DaemonSet. Following is an example from https://github.com/rcodesmith/KubeNodeInstaller

# Change to project directory mounted in VM
cd /vagrant# Apply ConfigMap and DaemonSet
sudo kubectl apply -f k8s/sampleconfigmap.yaml
sudo kubectl apply -f k8s/daemonset.yaml

# The DaemonSet's pods should be running, one per server (1 here). Check:
sudo kubectl get pods -n node-installer# Look at pod logs, look for errors:
sudo kubectl logs daemonset/installer -c installer -n node-installer
My DaemonSet and Docker image had an install file which should have been copied to the VM.
Additionally, the install script wrote to /vagrant/samplefile.txt. Check for these:
> ls -l /vagrant/files/sample_install_file.txt
> ls -l /vagrant/samplefile.txt
The cleanup script should delete /vagrant/samplefile.txt. Let’s test this by deleting the DaemonSet, then verifying the file is deleted.

> sudo kubectl delete -f k8s/daemonset.yaml
> ls -l /vagrant/samplefile.txt
ls: cannot access '/vagrant/samplefile.txt': No such file or directory
Now that we tested everything, to destroy the VM and everything in it, run following back on your workstation:
vagrant destroy
Posted in Software Development, Tools | Tagged | 1 Comment

Apache Spark Experiments

I’m in the process of learning Apache Spark for processing and transforming large data sets, as well as machine learning. As I dig into different facets of Spark, I’m compiling notes and experiments in a series of Jupyter notebooks.

I published these notebooks to a github repo, spark-experiments. Right now it has some basic and spark-sql based experiments. I’ll be adding more as I go.

Rather than setting up Jupyter, Spark, and everything else needed locally, I found an existing Docker image, pyspark-notebook, that contains everything I needed, including matplotlib to visualize the data as I get further along. If you have Docker installed, you just run the Docker container via a single command, and you’re off and running. See the spark-experiments installation instructions for details.

Initially, I was going to create my own sample data sets for the experiments. I’m mostly interested in learning the operations and process rather than executing with a large data set across a cluster of servers, so it’s ok to use a small data set. But I hit on the idea of using publicly available data sets such as those from data.cms.gov instead. Maybe we’ll turn up something interesting, and it’ll be more real-worldish.

Posted in Python, Scala | Tagged | Leave a comment

Migrating Drupal and WordPress sites using Docker

There’s several sites I host for family and friends in addition to this site. It’s a mix of WordPress, Drupal, and static sites, all running on a Linux virtual host hosted by Rackspace. The Linux host is pretty old at this point, and really should be upgraded. Additionally, I wanted to give DigitalOcean a try as I can get a virtual server there for less.

Although I kept the installations for each site pretty well organized in different directories, migrating them over the traditional way would still be time consuming and error prone, involving copying over all the directories and databases that are needed, migrating users, making sure permissions are right, and making sure to get any service scripts and configurations that need to come along. This is all a very manual process. If (when) I get something wrong, I’d have to troubleshoot it on the target server, and the whole process isn’t very repeatable nor version controlled. I wasn’t looking forward to it.

While working on our Pilot product at Encanto Squared, a new tool came on our radar, Docker, which we adopted and greatly simplified and streamlined our deployment and server provisioning process at Encanto.

Naturally, I decided to use docker to migrate my sites to another server, and to generally improve how these are being managed.

The overall configuration looks like this:

rpsSitesDocker

The above diagram is inspired by this dockerboard tool. The tool works but the diagram required some style tweaking so I did it in OmniGraffle.

Each of the rounded rectangles above is a separate docker container, and all of the containers are orchestrated by docker compose. The blue lines between the containers are docker compose links, which connect the two containers at a network level, and create an entry in the source’s host file pointing to the target container. Each docker container runs with its own network, layered filesystem, and process space. So for one container to be able to communicate with another it has to be specifically enabled, via links in the case of docker compose.

Following is a breakdown of each container and its configuration:

nginx – front-end reverse proxy

  • I’m using this as a reverse proxy into the other docker containers.
  • This is the only container with an exposed port, 80
  • It has a link to each of the individual site containers to be able to proxy HTTP requests to them.
  • In the future, I may have this serve up the static sites rather than proxying to another nginx process. It’ll still be needed to proxy the WordPress and Drupal sites
  • This image is based on the official nginx image, with the addition of the Nginx configuration files into the Docker image. Dockerfile:
FROM nginx
COPY conf.d /etc/nginx/conf.d
  • Each of the sites gets a separate Nginx configuration file under conf.d. They proxy to the specific site by name (mfywordpress in the example below). Here’s what one of them looks like:
server {
  listen 80;
  server_name www.mfyah.org mfyah.org;

  location / {
    proxy_pass http://mfywordpress:80;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}

latinchristmas – This is a static site hosted by nginx

  • This is a static site served up by its own nginx process.
  • This is an image that is based on the official nginx image. The only thing it does in addition is add the static content to /usr/share/nginx/html
  • Dockerfile:
FROM nginx

COPY WWW /usr/share/nginx/html

mfy – WordPress-based site

  • This image is based on the official WordPress image, with some additional packages installed.
  • The official WordPress image uses Apache.
  • This container maps the directory /var/www/html to /var/lib/cont/mfywp on the host to store the WordPress site files. Having the site files on the host makes it easier to backup and ensures any changes to the site survive a restart.
  • Dockerfile:
FROM wordpress

RUN apt-get update &amp;&amp; apt-get install -y libcurl4-openssl-dev

RUN docker-php-ext-install curl

I won’t go into the other WordPress-based containers. They’re essentially the same.

DB – MariaDB

  • This is the database for all of the WordPress sites.
  • This container maps the directory /var/lib/mysql to /var/lib/cont/db on the host to store the database files so they survive restarts & can be backed up easily.
  • It is running the official MariaDB Docker image.

Docker compose and usage

As mentioned above, all of this is managed by Docker Compose. Following is a portion of the Docker Compose configuration file.

latinchristmas:
  image: somedockerrepo/someuser/latinchristmaswebsite:latest
  restart: always

mfywordpress:
  image: somedockerrepo/someuser/mfy
  restart: always
  links:
    - db:mysql
  environment:
    WORDPRESS_DB_NAME: mfy
  volumes:
    - /var/lib/cont/mfywp:/var/www/html

db:
  image: mariadb
  restart: always
  environment:
    MYSQL_ROOT_PASSWORD: PutSomethingHere
  volumes:
    - /var/lib/cont/db:/var/lib/mysql

nginx:
  build: nginx
  restart: always
  ports:
    - &quot;80:80&quot;
  links:
    - latinchristmas
    - mfywordpress

The WordPress-based site images are stored on a Docker repository. The proxy nginx image is built locally by Docker Compose.

The steps I took to get this all working on the server were roughly:

  • Install Docker if it’s not already there: sudo apt-get install lxc-docker
  • Create the directories for the individual sites (e.g. /var/lib/cont/mfywp) and copy the site files over to them
  • Create the directory for the database under /var/lib/cont/db, empty
  • Copy the Docker Compose file and the nested nginx Dockerfile and configuration files over to the server. This is in a git repository, so I packaged it up as a tar file to send: git archive --format=tar --prefix=rpstechServer/ HEAD &gt; send.tar
  • If you’re hosting your images in a private Docker repository, create a .dockercfg file on the server containing the credentials to your private Docker repository. Docker Compose will use this on the server when pulling the images from the Docker repository. If your images are all in a public repository, this isn’t needed. You can remove the .dockercfg after the next step to avoid having the credentials on the server.
  • Run docker-compose up -d

Everything should be running at this point.

I haven’t converted over the Drupal sites yet, but the approach will be the same as the WordPress sites.

The benefits to this setup are:

  • Each site is largely self contained and easy to migrate to a different server
  • The sites are independent of each other. I can install new and upgrade packages of one site without affecting other sites.
  • I’m able to make changes and run the sites locally and test them out before pushing out any changes.

Future improvements:

  • Avoid having the MariaDB password in the Docker Compose or any other file
  • Combine some of the lines in the Dockerfiles, reducing the number of Docker layers that are created
  • Consider running the WordPress sites using a lighter weight process rather than requiring Apache. Maybe this isn’t a problem at all.
Posted in Software Development, Tools | Tagged , , | 8 Comments

Type-checked JavaScript : TypeScript and Flow

The last couple systems I’ve been working on have been almost completely JavaScript, with a bit of Python thrown in where it made sense.

Working in a dynamic language like JavaScript, small mistakes like mistyping a symbol name don’t get caught by a compiler as they do in statically typed languages. Instead they come up during runtime when that code is executed, or worse, they won’t fail right away, leading to incorrect results or failure somewhere else. To mitigate this, it becomes even more important to use unit testing extensively. If you have an extensive set of unit tests that verify almost every line of code, they’ll catch these syntax/typing bugs in addition to functional bugs.

But verifying almost every line of code with unit tests is very difficult, and I’ve rarely seen it done. Also, it’d be nice to get more immediate feedback of a syntax error, in the IDE/editor, even before running unit tests. Additionally, static typing serves as a form of documentation in the code, and enables IDEs to more accurately auto-suggest completions, which cuts down on the amount of time you spend looking up function and variable names from other modules.

That’s not to say the answer is to only use statically-typed languages. There’s many benefits to dynamic languages and reasons we’re using them in the first place.

Ideally, I’d like to have an optional typing system where typing can be specified where it makes sense, and not where it doesn’t add enough value or is impossible due to the dynamic nature of the code. Additionally, the system should be smart, using type inference to cut down on the amount of type annotations that need to be made.

Lucky for us, JavaScript has a couple excellent options that aim to do just that.

One option is TypeScript, backed by Microsoft. TypeScript supports React via a plugin, and is used by Angular 2. TypeScript has been around for several years, and has a rich set of type definitions available for popular JavaScript libraries.

TypeScript is a separate language that transpiles to JavaScript. It’s a superset of JavaScript, so anything that works in JavaScript should work in TypeScript, and they’ve worked to keep up with JavaScript and supporting ES6 features.

Another option is flow, backed by Facebook. Coming from Facebook, it has good support for React. Flow is a relatively new option, released in 2014, so doesn’t have as much of an ecosystem as Typescript and doesn’t have many type definitions for 3rd party libraries, although supporting TypeScript’s definitions is on their roadmap.

Flow makes more extensive use of type inference, so it’s able to infer types and detect errors without requiring as much explicit type annotations.

Flow has a different philosophy than TypeScript. The philosophy behind flow is to make the minimal amount of additions to JavaScript to facilitate type checking. Rather than being a separate language, flow is based on JavaScript, only extending the language with type annotations. These type annotations are stripped out by a simple transformer or via a transpiler such as Babel if you’re using that already. Also, it’s easier to gradually adopt Flow for an existing codebase as you can enable it module by module, use a ‘weak’ mode for adapting existing modules, and gradually add annotations.

My project is starting with a significant ES6 code base. We’re pretty happy with ES6 as it is, so the main thing I’m looking for is to add type checking rather than a new language. Based on these factors, we decided to try out flow.

In a future post I’ll write about our experience with trying out flow, and steps to adopt it into an existing codebase.

Posted in Software Development | Tagged | Leave a comment

Converting Maven APT docs to Markdown

In a project I worked on many moons ago we were writing documentation in the APT format, and publishing to HTML and PDF using a Maven-based toolchain.

APT served us well, but it hasn’t been supported or improved by the community in a long time. When the time came to update the documentation for a major release, we decided to switch to using Markdown, which was a format everyone was already familiar with, and allowed the team to take advantage of all the tools, such as Sublime plugins, that support Markdown.

Converting APT documents to Markdown is a two step process of APT -> XHTML -> Markdown using the Doxia converter which can be downloaded here and the excellent swiss-army document format conversion tool Pandoc:

# Converting over the existing APT docs to XHTML via Doxia converter
> java -jar ~/Downloads/doxia-converter-1.2-jar-with-dependencies.jar -in your_doc.apt \
  -from apt -out ./ -to xhtml

# Convert resulting XHTML to Markdown
> pandoc -o your_doc.md your_doc.apt.xhtml

The end result will require a bit of manual fixing up, but in my experience it was pretty minimal and beats doing it manually or writing your own converter.

Posted in General, Software Development, Tools | Tagged , , | Leave a comment

Encanto Squared

I’ve been working with Encanto Squared lately, and will be posting on the Encanto Squared Engineering site, with more of a focus on Node.js, Polymer, AngularJS, and other technologies we’re using.

Speaking of which, Encanto Squared is hiring. If you’re passionate about solving interesting problems, creating products that are key to our customers, and enjoy working with new technologies, drop us a note.

Posted in General | 1 Comment

Sculptor point release and documentation

This post is just a couple quick updates on the Sculptor Generator project.

Hot on the heels of the major 3.0 release, release 3.0.1 is out with additional improvements and examples. Kudos to Torsten, who’s been on fire cranking out code and documentation.

I made my own small contributions to the documentation, with a blog post on the shipping example project, which shows how to override Sculptor templates in your own project, and documentation on the Sculptor overrides and extension mechanism.

Posted in MDSD | Tagged , | Leave a comment

Profiling Maven Sculptor execution using YourKit

The latest version of the Sculptor Generator is based on XTend 2, which is compiled to Java/Bytescode rather than interpreted as XTend1 and XPand was. This should bring large performance improvements to the code generation cycle, and it certainly feels faster for my projects. Of course, since code generation is part of the development cycle, we’d always like the performance to be better. In order to improve the performance, we first need to know what the bottlenecks are, which is where a profiler comes in; specifically I’ll describe using YourKit to profile code generation for one of the Sculptor sample projects.

The first step is to start the YourKit profiler. YourKit will start up with the welcome view, and will show any running Java processes, ready to attach to one of them.

yourkitWelcome

Now we need to execute the Sculptor generator, first attaching it to the YourKit process. Sculptor code generation is typically executed as part of a Maven build, via the Sculptor Maven plugin. Since Maven isn’t a long-running process, and we want to make sure to profile all of the code generation cycle, the best way to attach Sculptor to Maven is to do it at Maven startup via JVM command line arguments. Specifically -agentpath to attach the process to YourKit and enable profiling, and YourKit startup options that can be used to enable different types of profiling, taking snapshots, etc.

To pass these arguments to Maven, we can use the MAVEN_OPTS environment variable. I already had some JVM arguments to set the maximum memory. So on my Mac, I ended up with:

#!bash
export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=tracing,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m'

The above will enable the tracing profiling method (vs sampling), and instruct YourKit to record a snapshot that may later be inspected on process exit.

You can control how YourKit performs tracing via Settings -> CPU Tracing… The only tracing setting I changed was to disable adaptive tracing, which omits some small frequently called methods from profiling. This lessens the profiling overhead, but I’m not really concerned about that and want to make sure I’m getting complete results.

yourKitTracingSettings

Now that the options are set up, run Maven in the Sculptor project to be profiled. In my case, the library-example project:

#!bash
mvn sculptor:generate -Dsculptor.generator.force=true

Once it’s done executing, we can open the previously recorded snapshot via File->Open Snapshot.., and look at the different reports and views. This is what the call tree view looks like:

yourKitCallTree

These results are fine, but the trace results are cluttered with many methods we’re not interested in, since the entire Maven execution has been traced. The best option I found to only trace those methods we’re interested in was to initially disable tracing, then use a YourKit Trigger to enable tracing on entry and exit of the main Sculptor generation method, org.sculptor.generator.SculptorGeneratorRunner.doRun.

In YourKit, you can add a trigger via the “Trigger action on event” button.

triggerActionOnEventSelection

The problem is this button seems to only be enabled if YourKit is actively profiling an application, and since the Maven execution isn’t a long-running process, you can’t configure it in time. The solution I used was to start Maven suspended in debug mode, configure the trigger, then kill Maven. Again, this can be done by adding some JVM arguments to MAVEN_OPTS, and running Maven again:

#!bash
export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=tracing,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y'

Once YourKit is attached to the running Maven process, we can add the trigger:

triggerActionOnEventDialog

To be able to use this trigger each time Sculptor is executed via Maven, we have to export the trigger configuration into a file, then when running Maven, specify the trigger file via another YourKit argument. We can export the trigger via Popup menu->Export Triggers…

Following is the exported trigger configuration. The above steps are just a means to end up with this configuration, so you can skip them and simply copy the following into a triggers.txt file.

MethodListener methodPattern=org.sculptor.generator.SculptorGeneratorRunner\s:\sdoRun\s(\sString\s) instanceOf= fillThis=true fillParams=true fillReturnValue=true maxTriggerCount=-1
  onenter: StartCPUTracing
  onreturn: StopCPUProfiling
  onexception: StopCPUProfiling

To specify the trigger file that should be used, use the ‘triggers’ command line argument. Since tracing will now be enabled via the trigger, I also removed the ‘tracing’ argument so tracing wouldn’t be enabled on startup:

#!bash
export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=triggers=triggers.txt,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m'
Posted in Java, MDSD, Tools | Tagged , | Leave a comment

Working with Geospatial support in MongoDB: the basics

A project I’m working on requires storage of and queries on Geospatial data. I’m using MongoDB, which has good support for Geospatial data, at least good enough for my needs. This post walks through the basics of inserting and querying Geospatial data in MongoDB.

First off, I’m working with MongoDB 2.4.5, the latest. I initially tried this out using 2.2.3 and it wasn’t recognizing the 2dsphere index I set up, so I had to upgrade.

MongoDB supports storage of Geospatial types, represented as GeoJSON objects, specifically the Point, LineString, and Polygon types. I’m just going to work with Point objects here.

Once Geospatial data is stored in MongoDB, you can query for:

  • Inclusion: Whether locations are included in a polygon
  • Intersection: Whether locations intersect with a specified geometry
  • Proximity: Querying for points nearest other points

You have two options for indexing Geospatial data:

  • 2d : Calculations are done based on flat geometry
  • 2dsphere : Calculations are done based on spherical geometry

As you can imagine, 2dsphere is more accurate, especially for points that are further apart.

In my example, I’m using a 2dsphere index, and doing proximity queries.

First, create the collection that’ll hold a point. I’m planning to work this into the Sculptor code generator so I’m using the ‘port’ collection which is part of the ‘shipping’ example MongoDB-based project.

> db.createCollection("port") { "ok" : 1 }

Next, insert records into the collection including a GeoJSON type, point. According to MongoDB docs, in order to index the location data, it must be stored as GeoJSON types.

> db.port.insert( { name: "Boston", loc : { type : "Point", coordinates : [ 71.0603, 42.3583 ] } })
> db.port.insert( { name: "Chicago", loc : { type : "Point", coordinates : [ 87.6500, 41.8500 ] } })

> db.port.find()

{ "_id" : ObjectId("51e47b4588ecd4e8dedf7185"), "name" : "Boston", "loc" : { "type" : "Point", "coordinates" : [  71.0603,  42.3583 ] } }
{ "_id" : ObjectId("51e47ee688ecd4e8dedf7187"), "name" : "Chicago", "loc" : { "type" : "Point", "coordinates" : [  87.65,  41.85 ] } }

The coordinates above, as with all coordinates in MongoDB, are in longitude, latitude order.

Next, we create a 2dsphere index, which supports geolocation queries over spherical spaces.

> db.port.ensureIndex( { loc: "2dsphere" }) >

Once this is set up, we can issue location-based queries, in this case using the ‘geoNear’ command:

> db.runCommand( { geoNear: 'port', near: {type: "Point", coordinates: [87.9806, 42.0883]}, spherical: true, maxDistance: 40000})

{
    "ns" : "Shipping-test.port",
    "results" : [
        {
            "dis" : 38110.32969523317,
            "obj" : {
                "_id" : ObjectId("51e47ee688ecd4e8dedf7187"),
                "name" : "Chicago",
                "loc" : {
                    "type" : "Point",
                    "coordinates" : [
                        87.65,
                        41.85
                    ]
                }
            }
        }
    ],
    "stats" : {
        "time" : 1,
        "nscanned" : 1,
        "avgDistance" : 38110.32969523317,
        "maxDistance" : 38110.32969523317
    },
    "ok" : 1
}

For some reason, a similar query using ‘find’ and the ‘near’ operator, which should work, doesn’t:

> db.port.find( { "port" : { $near : { $geometry : { type : "Point", coordinates: [87.9806, 42.0883] } }, $maxDistance: 40000 } } )

error: {
"$err" : "can't find any special indices: 2d (needs index), 2dsphere (needs index),  for: { port: { $near: { $geometry: { type: \"Point\", coordinates: [ 87.9806, 42.0883 ] } }, $maxDistance: 40000.0 } }",
"code" : 13038
}
Posted in General, MDSD | Tagged , | Comments Off on Working with Geospatial support in MongoDB: the basics