Batch processing using Cron Jobs. MySQL automated backup on Openshift/K8s.

Daniel Izquierdo
5 min readMar 22, 2021

Few days ago I started a small personal project to deploy a database on Openshift for subtituting excel so that, it allows me to use SQL commands to retrieve the data that I need.

For doing so I just get into my openshift cluster and deploy a MariaDB database by taking advantage of the predefined templates existing inside Openshift. I use the persistent storage template so that if the pod goes down the data inside will not dissappear in outer space.

MariaDB with persistent storage template

Now that I have my database running and I‘ve just imported some data on it, I start to think on how can I automate a process in which, just by using openshift, a backup of my data was created daily and saved inside the pod in which my database was running. I know that it is not a best practice but it can save you if for some reason someone do a truncate table by mistake or something similar. **Updated here you have another article in which I automate the data backup between two mySQL Databases.

I decide to create a Cron Job to schedule a task that perform this backup using kubectl commands. Don’t be afraid of the Cron Job I will explain it step by step:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mariadump
namespace: my-namespace
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: mariadbdumpsa
containers:
- name: kubectl
image: garland/kubectl:1.10.4
command:
- /bin/sh
- -c
- kubectl exec $(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}') -- /opt/rh/rh-mariadb102/root/usr/bin/mysqldump --skip-lock-tables -h 127.0.0.1 -P 3306 -u userdb --password=userdbpass databasename >/tmp/backup.sql;kubectl cp my-namespace/$(kubectl get pods | grep Running | grep 'mariadbdump' | awk '{print $1}'):/tmp/backup.sql my-namespace/$(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}'):/tmp/backup.sql;echo 'Backup process completed.'
restartPolicy: OnFailure

Lets explain a little bit what the Cron Job is doing

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mariadump
namespace: my-namespace

Above we define the name of the Cron Job and the namespace in which it will be created (name of the project on openshift).

spec:
schedule: "0 0 * * *"

We define the schedule time for our process. In our case each midnight.

serviceAccountName: mariadbdumpsa

Linking here a serviceAccountName so that our Cron Job can have the proper permissions. After explaining the Cron Job I will show how the ServiceAccount is created.

containers:
— name: kubectl
image: garland/kubectl:1.10.4

We set here the image of the container that is created each time the Cron Job starts. As long as we need an image that supports kubectl commands we decided to use “garland/kubectl:1.10.4”.

command:
— /bin/sh
— -c
— kubectl exec $(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}') — /opt/rh/rh-mariadb102/root/usr/bin/mysqldump — skip-lock-tables -h 127.0.0.1 -P 3306 -u userdb — password=userdbpass databasename >/tmp/backup.sql;kubectl cp my-namespace/$(kubectl get pods | grep Running | grep 'mariadbdump' | awk '{print $1}'):/tmp/backup.sql my-namespace/$(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}'):/tmp/backup.sql;echo 'Backup process completed.'
restartPolicy: OnFailure

Here we define the commands that we need to run in order to do the backup automatically in our mariadb. Lets analyze one by one.

kubectl exec $(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}') — /opt/rh/rh-mariadb102/root/usr/bin/mysqldump — skip-lock-tables -h 127.0.0.1 -P 3306 -u userdb — password=userdbspass databasename >/tmp/backup.sql

We run this sentence that we use to execute another command into another pod.

kubectl exec

For searching automatically the name of our mariadb pod we execute

$(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}')

When we have our mariadb pod name retireved we excute the command to do the dump of the database.

-- /opt/rh/rh-mariadb102/root/usr/bin/mysqldump — skip-lock-tables -h 127.0.0.1 -P 3306 -u userdb — password=userdbpass databasename >/tmp/backup.sql

After that we have our dump done it is saved inside the pod that the Cron Job starts while running, this pod is ephemeral so if we want the backup to be saved in the pod in which our database is running we need to copy the .sql file from one pod to another. For doing so we execute the following command.

kubectl cp my-namespace/$(kubectl get pods | grep Running | grep 'mariadbdump' | awk '{print $1}'):/tmp/backup.sql my-namespace/$(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}'):/tmp/backup.sql

We copy the backup.sql file from the pod ‘mariadbdump…’, the one that the Cron Job starts to the mariadb pod.

echo 'Backup process completed.'

We execute an echo command that let us know that the project has finished.

With the Cron Job explained I will introduce the three resources that we need to create for giving the Cron Job the right permissions.

We need to create a Role, A RoleBinding and a ServiceAccount.

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: my-namespace
name: mariadbdump
rules:
- apiGroups:
- ""
- ''
resources:
- deployments
- replicasets
- pods
- pods/exec
verbs:
- 'watch'
- 'get'
- 'create'
- 'list'

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: mariadbdump
namespace: my-namespace
subjects:
- kind: ServiceAccount
name: mariadbdumpsa
namespace: my-namespace
roleRef:
kind: Role
name: mariadbdump
apiGroup: ""

---
apiVersion: v1
kind: ServiceAccount
metadata:
name: mariadbdumpsa
namespace: my-namespace

With all this we will have our backup automated and save inside our pod just in case we need to do a restore of our database.

In order to create everything at teh same time I recommend to save everything in a yaml file called “backup-cronjob.yaml” as the following one:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: my-namespace
name: mariadbdump
rules:
- apiGroups:
- ""
- ''
resources:
- deployments
- replicasets
- pods
- pods/exec
verbs:
- 'watch'
- 'get'
- 'create'
- 'list'

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: mariadbdump
namespace: my-namespace
subjects:
- kind: ServiceAccount
name: mariadbdumpsa
namespace: my-namespace
roleRef:
kind: Role
name: mariadbdump
apiGroup: ""

---
apiVersion: v1
kind: ServiceAccount
metadata:
name: mariadbdumpsa
namespace: my-namespace
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mariadump
namespace: my-namespace
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: mariadbdumpsa
containers:
- name: kubectl
image: garland/kubectl:1.10.4
command:
- /bin/sh
- -c
- kubectl exec $(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}') -- /opt/rh/rh-mariadb102/root/usr/bin/mysqldump --skip-lock-tables -h 127.0.0.1 -P 3306 -u userdb --password=userdbpass databasename >/tmp/backup.sql;kubectl cp my-namespace/$(kubectl get pods | grep Running | grep 'mariadbdump' | awk '{print $1}'):/tmp/backup.sql my-namespace/$(kubectl get pods | grep Running | grep 'mariadb-' | awk '{print $1}'):/tmp/backup.sql;echo 'Backup process completed.'
restartPolicy: OnFailure

And import to openshift by:

oc create -f backup-cronjob.yaml

As future steps we can schedule a process that, not just give us the .sql for future recovery process in our database but we can schedule a process that set this .sql file in other database so that we have a replica of our database in case of extreme failure.

Thanks a lot!

--

--