#
Using the metaImport script
#
Importing Data into cBioPortal
The metaImport script should be used to automate the process of validating and loading datasets. It also has some nice features like an extra option to only load datasets that completely pass validation (i.e. with no errors, while warnings can be explicitly allowed by the user).
#
Running the metaImport Script
docker compose exec cbioportal metaImport.py -h
This will tell you the parameters you can use:
$ docker compose exec cbioportal metaImport.py -h
usage: metaImport.py [-h] [-s STUDY_DIRECTORY | -d DATA_DIRECTORY] [-u URL_SERVER | -p PORTAL_INFO_DIR | -n] [-jvo JAVA_OPTS] [-jar JAR_PATH]
[-html HTML_TABLE] [-v] [-o] [-r] [-m] [-update UPDATE_GENERIC_ASSAY_ENTITY] [-oncokb] [-skipimport] [--no-derive-tables |
--derived-table-sql PATH]
cBioPortal meta Importer
options:
-h, --help show this help message and exit
-s, --study_directory STUDY_DIRECTORY
path to study directory.
-d, --data_directory DATA_DIRECTORY
path to data directory for incremental upload.
-u, --url_server URL_SERVER
URL to cBioPortal server. You can set this if your URL is not http://localhost:8080
-p, --portal_info_dir PORTAL_INFO_DIR
Path to a directory of cBioPortal info files to be used instead of contacting the web API
-n, --no_portal_checks
Skip tests requiring information from the cBioPortal installation
-jvo, --java_opts JAVA_OPTS
Path to specify JAVA_OPTS for the importer. (default: gets the JAVA_OPTS from the environment)
-jar, --jar_path JAR_PATH
Path to scripts JAR file (default: locate it relative to the import script)
-html, --html_table HTML_TABLE
path to html report
-v, --verbose report status info messages while validating
-o, --override_warning
override warnings and continue importing
-r, --relaxed_clinical_definitions
Option to enable relaxed mode for validator when validating clinical data without header definitions
-m, --strict_maf_checks
Option to enable strict mode for validator when validating mutation data
-update, --update_generic_assay_entity UPDATE_GENERIC_ASSAY_ENTITY
Set as True to update the existing generic assay entities, set as False to keep the existing generic assay entities for
generic assay
-oncokb, --import_oncokb
Set as True to download OncoKB annotations for Mutations and CNA and load as custom driver annotations
-skipimport, --skip_db_import
Perform validation and OncoKB download but do not import study into database.
--no-derive-tables Skip derived table construction after import.
--derived-table-sql PATH
Path to SQL file used for derived table construction.
#
Example of Importing a study
docker compose exec cbioportal metaImport.py -s /study/lgg_ucsf_2014 -o
Note:
-ooverrides validation warnings and proceeds with the import. If you are confident your data will pass all validation checks without warnings, you can drop-o.
Adding -v shows status messages.
#
Generating an HTML Validation Report
# From the cbioportal-docker-compose repo
mkdir -p study/reports
docker compose exec cbioportal metaImport.py -s /study/my_study -o -html /study/reports/report.html
The HTML report is written to the mounted cbioportal-docker-compose/study/reports directory on your host and can be opened directly.
#
Incremental Upload
You have to specify --data_directory (or -d) instead of --study_directory (or -s) option to load data incrementally.
Incremental upload enables data entries of certain data types to be updated without the need of re-uploading the whole study.
The data directory follows the same structure and data format as the study directory.
It should contain complete information about entries you want to add or update.
Please note that some data types like study are not supported and must not be present in the data directory.
Here you can find more details.
#
Derived Tables
After each import (incremental or otherwise), metaImport.py automatically rebuilds derived tables — ClickHouse tables that pre-join and denormalize data for fast Study View queries. See the ClickHouse Setup Guide for details on what derived tables are and why they matter.
#
Rebuilding Derived Tables Only
You can rebuild derived tables without importing any studies by running:
docker compose exec cbioportal metaImport.py derive-tables
This command executes the derived table SQL scripts against your ClickHouse database without performing any study validation or import. It is useful after batch imports, study deletions, or whenever you need to refresh precomputed query structures.
#
Skipping Derived Table Rebuild
When batch-importing multiple studies, you can skip the derived table rebuild after each import with --no-derive-tables:
docker compose exec cbioportal metaImport.py -s /study/your_study -o --no-derive-tables
Then rebuild derived tables once after all studies have been imported:
docker compose exec cbioportal metaImport.py derive-tables
This can save a lot of time when many different studies are being imported in sequence.
Important: Always rebuild derived tables before using the portal in production. Without them, the cBioPortal web app will not function properly.
#
Development / debugging mode
For developers and specific testing purposes, an extra script, cbioportalImporter.py, is available which imports data regardless of validation results. Check this page for more information on how to use it.