Deploy Container API

Everything you need to know to deploy the pdfRest Container API with your Docker framework.

Loading the image

The Docker container can be obtained in tar form from links provided by the Datalogics Sales team. Questions regarding container licensing can be directed towards sales@datalogics.com.

Once the image has been downloaded you will load it into your environment, likely with docker load.

Using the Container API image

This guide covers two containerized deployment methods, Docker and Kubernetes. It references some Environment Variables which are covered in-depth in the Configure Container API Guide.

Docker

The simplest Docker Compose file to define the pdfRest Container API can be found below:

services:
  pdfrest-toolkit:
    platform: linux/amd64
    image: <image_name>
    restart: always
    environment:
      - PDFREST_SERVER_DOMAIN=https://api.yourDomainHere.com
    ports:
     - "80:3000"
    volumes:
      - /tmp:/opt/datalogics/public

This will instruct Docker to set up a pdfRest instances on a Linux machine using the pdfRest container image previously loaded with docker load.

The optional PDFREST_SERVER_DOMAIN Environment Variable formats the input and output URLs of documents uploaded to and processed by the API.

The ports section of the YAML instructs the stack to listen for API calls on port 80 and forward those calls to port 3000 inside of the container.

If you require shared storage between multiple running instances, set up a shared volume as described in the Docker storage volume documentation and configure the volumes section of the YAML to mount that volume as shown below:

volumes:
  - <your_volume>:/opt/datalogics/public

Kubernetes

Using the image previously loaded with docker load you can now create and expose a deployment. The only item to note is that the pdfRest server listens on port 3000

kubectl create deployment pdfrest --image=<image_name>
kubectl expose deployment pdfrest --type=NodePort --port=3000

When running multiple instances you will want to set up a shared volume so that all instances of pdfRest can read and write to the same location. This ensures that all input and output documents are available no matter which instance a processing or retrieval API call load-balances to. This requires the creation of a PersistentVolume.

Here, we will demonstrate with a hostPath PersistentVolume for testing and development purposes. It is not recommended to use a hostPath in a production cluster. A cluster administrator should provision a networked resource such as an NFS share, a Google Cloud persistent disk, Azure File Share, or an AWS EFS volume.

First, create the PersistentVolume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pdfrest-pv
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "/mnt/data"

save it as pdfrest-pv.yaml and apply it to your deployment with:

kubectl apply -f pdfrest-pv.yaml

This configuration specifies that the volume will be found at /mnt/data/ on the cluster Node. It also defines the StorageClass name for the PersistentVolume as manual, which will be used to bind PersistentVolumeClaim requests to this PersistentVolume.

Then, create the PersistentVolumeClaim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pdfrest-pvc
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

Save it as pdfrest-pvc.yaml and apply it to your deployment with:

kubectl apply -f pdfrest-pvc.yaml

The PersistentVolumeClaim should be automatically bound to the PersistentVolume you created.

Then configure the pdfRest deployment to use the PersistentVolumeClaim you made as a volume.

kubectl edit deployment pdfrest

Edit the deployment.yaml, under spec > template > spec:

spec:
# ...
  template:
    # ...
    spec:
      # Add the PersistentVolumeClaim under volumes
      volumes:
      - name: pdfrest-storage
        persistentVolumeClaim:
          claimName: pdfrest-pvc
      containers:
        # Mount to the /opt/datalogics/public directory
        volumeMounts:
          - name: pdfrest-storage
            mountPath: /opt/datalogics/public