# Deployment Procedure

This describes our internal deployment procedure. Shared publicly, in case it may be of use. Instructions on how to deploy cBioPortal can be found elsewhere, see e.g. Deploying the web application and Deploy using Docker.

We deploy the master branch of backend and the master branch of frontend to production. The public portal (https://www.cbioportal.org) runs on AWS EKS. The configuration can be found in the knowledgesystems repo:

https://github.com/knowledgesystems/knowledgesystems-k8s-deployment

Other internal MSK portals run on AWS EKS as well.

The frontend and backend can be upgraded independently. We have the following events that can require a new deployment:

  • New frontend commit in master
  • New backend commit in master

# New frontend commit in master

Currently we are auto-deploying the frontend master branch to netlify: https://frontend.cbioportal.org. So any change should be automatically built and deployed to the relevant portals if the frontend configuration has been set up properly. Do note that the current build time for the frontend project is ~15 minutes or so. To see what frontend commit is deployed, check window.FRONTEND_COMMIT in the console of the browser.

# Public Portal Frontend URL

The public portal is on AWS and running inside a Kubernetes cluster. The URL that it gets the frontend version from is here:

https://github.com/knowledgesystems/knowledgesystems-k8s-deployment/search?q=-Dfrontend.url&unscoped_q=-Dfrontend.url

This should be a URL pointing to netlify.

# Internal Portal Frontend URL

For the internally runnning portals the frontend.url is defined in the application.properties file in the mercurial portal-configuration repo. If set up correctly, this should point to a file on both dashi and dashi2 that in turn points to a netlify frontend URL. The reason we have a separate file with the URL in it is that it allows us to update the frontend URL without redeploying the backend.

# New backend commit in master

A new backend commit usually also means a new frontend change is necessary. For this reason the following sections assume that's the case.

# Public Portal Backend Upgrade

Once the backend repo has been tagged on github, a docker image gets build on Docker Hub automatically. It can take ~5 min before the image is available. You can check here what the status of the builds is: https://github.com/cBioPortal/cbioportal/actions?query=workflow%3A%22Docker+Image+CI%22.

After that, if you have access to the kubernetes cluster you can change the image in the configuration of the kubernetes cluster:

https://github.com/knowledgesystems/knowledgesystems-k8s-deployment/blob/master/public-eks/cbioportal-prod/cbioportal_spring_boot.yaml

point this line, to the new tag on docker hub e.g.:

image: cbioportal/cbioportal:6.0.2-web-shenandoah

Make sure it is an image with the postfix -web-shenandoah. This is the image that only has the web part of cBioPortal and uses the shenandoah garbage collector.

Also remove the -Dfrontend.url parameter such that the frontend version inside the war will be used:

"-Dfrontend.url=https://frontend.cbioportal.org"

Then running this command applies the changes to the cluster:

kubectl apply -f public-eks/cbioportal-prod/cbioportal_spring_boot.yaml

You can keep track of what's happening by looking at the pods:

kubectl get po

If you have the watch command installed you can also use that to see the output of this every 2s:

watch kubectl get po

Another thing to look at is the events:

kubectl get events --sort-by='{.lastTimestamp}'

If there are any issues, point the image back to what it was, set -Dfrontend.url and run kubectl apply -f filename again.

If everything went ok, you can re-enable auto deployment on netlify, set -Dfrontend.url in the kubernetes file and run kubectl apply -f filename again.

Make sure to commit your changes to the knowledgesystems-k8s-deployment repo and push them to the main repo, so that other people making changes to the kubernetes config will be using the latest version.

# Upgrading Related Backend Components

Backend upgrades involving the database schema, DAO classes, etc. require updates to databases and importers. CBioPortal has multiple MySQL databases (all using AWS RDS) backing different portals. Similarly, there are multiple importers responsible for loading portal-specific data. Every database must be manually migrated on an individual basis; all importers/data fetchers can be updated simultaneously through an existing deployment script.

Before upgrading, make sure to turn off import jobs in the crontab and alert the backend pipelines team (Avery, Angelica, Rob, Manda).

To access the crontab, log in to pipelines, log in as cbioportal_importer: sudo su - cbioportal_importer, and run crontab -e. Comment out any lines that run import jobs, save, and exit. Make sure to uncomment these lines once the upgrade (database and importers) is complete. Lines that need to be commented out will be under the Import Jobs section, shown here.

# Updating Databases

First, make sure there is a backup of the database being migrated. If there is not a weekly dump, backup the database being migrated using mysqldump. This process may take awhile depending on the size of the database.

mysqldump -u <user> -h <host> -p <database name> | gzip > <database_name>_`date +%Y%m%d_%H%M`.sql.gz 

The second step is to migrate the database. Make sure that the migration script is the same version as the deployed cBioPortal website. It is recommended to first test the migration script manually line-by-line in a copy of the existing database. This will catch any data-related bugs that might not be captured by the python migration script. After testing is successful, migrate the production databases following these steps here.

These are all cBioPortal databases and their locations:

Website Database Location
cbioportal.mskcc.org cgds_gdac pipelines
cbioportal.org cgds_public AWS
genie.cbioportal.org cgds_genie AWS
triage.cbioportal.org cgds_triage pipelines

To obtain information such as usernames, passwords, hostnames - ask Avery, Angelica, Rob, Manda, and Ino.

# Updating Importers/Data Fetchers

Importers (code found here) and data fetchers (code found here) use code from the cBioPortal codebase. The cbioportal dependency is packaged with the genome-nexus-annotation-pipeline and specified in the pipelines importer pom.

The following steps are used during releases/updates to build new importers with the most-up-to-date cBioPortal and genome-nexus-annotation-pipeline code. All steps should be performed on the pipelines machine.

  1. Set the jitpack hash here in the genome-nexus-annotation-pipeline codebase to the most recent cbioportal/cbioportal commit hash in master.

  2. Merge this change into genome-nexus-annotation-pipeline/master.

  3. Set the commit hash here in the pipelines codebase to the most most recent genome-nexus/genome-nexus-annotation-pipeline commit hash (after merge specfied in step 2). Also ensure the db version in the pom here matches the db schema version in the cbioportal codebase.

  4. Merge this change into pipelines/master.

  5. Set the commit hash here in the cmo-pipelines codebase to the most recent genome-nexus/genome-nexus-annotation-pipeline commit hash (after merge specified in step 2)

  6. Merge this change into cmo-pipelines/master

  7. Run the deployment wrapper script. See details here.

  8. Verify new importers/data fetchers have been placed in /data/portal-cron/lib by checking timestamps.

ls -tlra /data/portal-cron/lib

# Deployment Script

The wrapper script is found on pipelines here: /data/portal-cron/git-repos/pipelines-configuration/build-importer-jars/buildproductionjars.sh.

Run git pull to pull in any updates to the build script.

The wrapper script takes two arguments:

  1. --cbioportal-git-hash (required): Set to the cBioPortal commit hash being used in the pipelines build (hash specified in step 1 of updating importers). This must match because the build copies out resource files (e.g application-context-business.xml) from the cbioportal codebase.
  2. --skip-deployment (optional): Set to true to skip auto-deployment to /data/portal-cron/lib. Built jars will be found in /data/portal-cron/git-repos/pipelines-configuration/build-importer-jars/ and can be be manually moved.

The wrapper script will automatically backup the importers/data-fetchers to /data/portal-cron/lib/backup.