Importing test data with Docker
Instructions
This is an example to import the sample study study_es_0. study_es_0 is a testing and evaluation dataset that covers a broad range of cBioPortal data types. It is intended to help ensure that your cBioPortal importer is correctly handling all supported data types. We will follow this flow:
- import gene sets (if applicable)
- import gene panels (if applicable, studies without gene panels are assumed to be whole exome/genome)
- import study data
Warning: Importing gene sets (Step 1) removes all existing gene set, gene set hierarchy, and gene set genetic profile data from the database. It is strongly recommended to do this on a fresh database instance rather than one that already contains geneset data you want to keep. Importing other studies may not work after importing
study_es_0and will require a fresh database to be loaded.
Step 1 - Import gene sets
study_es_0 relies on gene set data, so gene set definitions must be loaded before the study. Place the required reference files in the ./study/reference_data/ directory on the host (mounted as /study/reference_data/ inside the container), then run:
docker compose exec cbioportal importGenesetData.pl \
--data /study/reference_data/study_es_0_genesets.gmt \
--new-version msigdb_7.5.1 \
--supp /study/reference_data/study_es_0_supp-genesets.txt
docker compose exec cbioportal importGenesetHierarchy.pl \
--data /study/reference_data/study_es_0_tree.yaml
For studies that do not include gene set data, skip this step.
Step 2 - Import gene panels
To import gene panels for your study, please reference the example commands in this file
These are the commands for importing study_es_0 gene panels (data_gene_panel_testpanel1 and data_gene_panel_testpanel2):
docker compose exec cbioportal importGenePanel.pl --data /study/reference_data/data_gene_panel_testpanel1.txt
docker compose exec cbioportal importGenePanel.pl --data /study/reference_data/data_gene_panel_testpanel2.txt
Step 3 - Import data
To import data for your study, please reference the example commands in this file
Command for importing study_es_0 data:
docker compose exec cbioportal metaImport.py -s /study/study_es_0 -o
⚠️ after importing a study, remember to restart cbioportal to see the study on the home page. Run docker compose restart cbioportal.
You have now imported the test study study_es_0. The process for adding a study that is outside of the container is similar — place the data files in the ./study folder on the host, which is mounted as /study/ inside the container.
Frequently Asked Questions
Gene panel ID is not in database
If you see an error like this when you importing the data:
ERROR: data_gene_panel_matrix.txt: lines [2, 3, 4, (10 more)]: Gene panel ID is not in database. Please import this gene panel before loading study data.; values encountered: ['TESTPANEL1', 'TESTPANEL2']
please follow Step 2 to import gene panels (e.g. import data_gene_panel_testpanel1 and data_gene_panel_testpanel2 for study_es_0), then try to import the data again.
Error occurred during validation step
Please make sure the seed database was correctly imported.
Study imported correctly, but got error when trying to query something
Remember to restart the cbioportal after data imported.
docker compose restart cbioportal
Import GRCh38 data
If you are importing GRCh38 data, please remember to set the reference_genome: hg38 field in the meta_study.txt file. See also cancer study metadata.