​Introduction​
​Running the validator​
​Offline Validation​
To facilitate the loading of new studies into its database, cBioPortal provides a set of staging files formats for the various data types. To validate your files you can use the dataset validator script.
To run the validator first go to the importer folder <cbioportal_source_folder>/core/src/main/scripts/importer
and then run the following command:
./validateData.py --help
This will tell you the parameters you can use:
usage: validateData.py [-h] -s STUDY_DIRECTORY[-u URL_SERVER | -p PORTAL_INFO_DIR | -n][-P PORTAL_PROPERTIES] [-html HTML_TABLE][-e ERROR_FILE] [-v] [-r] [-m] [-a MAX_REPORTED_VALUES]​cBioPortal study validator​optional arguments:-h, --help show this help message and exit-s STUDY_DIRECTORY, --study_directory STUDY_DIRECTORYpath to directory.-u URL_SERVER, --url_server URL_SERVERURL to cBioPortal server. You can set this if your URLis not http://localhost:8080-p PORTAL_INFO_DIR, --portal_info_dir PORTAL_INFO_DIRPath to a directory of cBioPortal info files to beused instead of contacting a server-n, --no_portal_checksSkip tests requiring information from the cBioPortalinstallation-html HTML_TABLE, --html_table HTML_TABLEpath to html report output file-e ERROR_FILE, --error_file ERROR_FILEFile to which to write line numbers on which errorswere found, for scripts-v, --verbose report status info messages in addition to errors andwarnings-r, --relaxed-clinical_definitionsOption to enable relaxed mode for validator when validatingclinical data without header definitions-m, --strict_maf_checksOption to enable strict mode for validator when validatingmutation data-a MAX_REPORTED_VALUES, --max_reported_values MAX_REPORTED_VALUESCutoff in HTML report for the maximum number of linenumbers and values encountered to report for eachmessage. For example, set this to a high number toreport all genes that could not be loaded, instead ofreporting "GeneA, GeneB, GeneC, 213 more"
For more information on the --portal_info_dir
option, see Offline validation below. If your cBioPortal is not using hg19
, you have to specify the reference_genome
field in your meta_study.txt
.
For more information, see Validation of non-human data.
When running the validator with parameter -r
the validator will run the validation of the clinical data it will ignore all failing checks about values in the headers of the clinical data file.
When running the validator with parameter -m
the validator will run the validation of the specific MAF file checks for the mutation file in strict maf check mode. This means that when the validator encounters these validation checks it will report them as an error instead of a warning.
As an example, you can try the validator with one of the test studies found in <cbioportal_source_folder>/core/src/test/scripts/test_data
. Example, assuming port 8080 and using -v option to also see the progress:
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -u http://localhost:8080 -v
Results in:
DEBUG: -: Requesting info from portal at 'http://localhost:8080'DEBUG: -: Requesting cancer-types from portal at 'http://localhost:8080'DEBUG: -: Requesting genes from portal at 'http://localhost:8080'DEBUG: -: Requesting genesets from portal at 'http://localhost:8080'DEBUG: -: Requesting genesets_version from portal at 'http://localhost:8080'DEBUG: -: Requesting gene-panels from portal at 'http://localhost:8080'​DEBUG: meta_cancer_type.txt: Starting validation of meta fileINFO: meta_cancer_type.txt: Validation of meta file complete​DEBUG: meta_clinical_patients.txt: Starting validation of meta fileINFO: meta_clinical_patients.txt: Validation of meta file complete​DEBUG: meta_clinical_samples.txt: Starting validation of meta fileINFO: meta_clinical_samples.txt: Validation of meta file complete​DEBUG: meta_cna_discrete.txt: Starting validation of meta fileINFO: meta_cna_discrete.txt: Validation of meta file complete​DEBUG: meta_cna_hg19_seg.txt: Starting validation of meta fileINFO: meta_cna_hg19_seg.txt: Validation of meta file complete​DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'​DEBUG: meta_cna_log2.txt: Starting validation of meta fileINFO: meta_cna_log2.txt: Validation of meta file complete​DEBUG: meta_expression_median.txt: Starting validation of meta fileINFO: meta_expression_median.txt: Validation of meta file complete​DEBUG: meta_expression_median_Zscores.txt: Starting validation of meta fileINFO: meta_expression_median_Zscores.txt: Validation of meta file complete​DEBUG: meta_fusions.txt: Starting validation of meta fileINFO: meta_fusions.txt: Validation of meta file complete​DEBUG: meta_gene_panel_matrix.txt: Starting validation of meta fileINFO: meta_gene_panel_matrix.txt: Validation of meta file complete​DEBUG: meta_gistic_genes_amp.txt: Starting validation of meta fileINFO: meta_gistic_genes_amp.txt: Validation of meta file complete​DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'​DEBUG: meta_gsva_pvalues.txt: Starting validation of meta fileINFO: meta_gsva_pvalues.txt: Validation of meta file complete​DEBUG: meta_gsva_scores.txt: Starting validation of meta fileINFO: meta_gsva_scores.txt: Validation of meta file complete​DEBUG: meta_methylation_hm27.txt: Starting validation of meta fileINFO: meta_methylation_hm27.txt: Validation of meta file complete​DEBUG: meta_mutational_signature.txt: Starting validation of meta fileINFO: meta_mutational_signature.txt: Validation of meta file complete​DEBUG: meta_mutations_extended.txt: Starting validation of meta fileINFO: meta_mutations_extended.txt: Validation of meta file complete​DEBUG: meta_resource_definition.txt: Starting validation of meta fileINFO: meta_resource_definition.txt: Validation of meta file complete​DEBUG: meta_resource_patient.txt: Starting validation of meta fileINFO: meta_resource_patient.txt: Validation of meta file complete​DEBUG: meta_resource_sample.txt: Starting validation of meta fileINFO: meta_resource_sample.txt: Validation of meta file complete​DEBUG: meta_resource_study.txt: Starting validation of meta fileINFO: meta_resource_study.txt: Validation of meta file complete​DEBUG: meta_structural_variants.txt: Starting validation of meta fileINFO: meta_structural_variants.txt: Validation of meta file complete​DEBUG: meta_study.txt: Starting validation of meta fileINFO: meta_study.txt: Validation of meta file complete​DEBUG: -: Study Tag file found. It will be validated.​DEBUG: meta_treatment_ec50.txt: Starting validation of meta fileINFO: meta_treatment_ec50.txt: Validation of meta file complete​DEBUG: meta_treatment_ic50.txt: Starting validation of meta fileINFO: meta_treatment_ic50.txt: Validation of meta file complete​DEBUG: data_cancer_type.txt: Starting validation of fileINFO: data_cancer_type.txt: line 1: New disease type will be added to the portal; value encountered: 'brca-es0'INFO: data_cancer_type.txt: Validation of file completeINFO: data_cancer_type.txt: Read 1 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_clinical_samples.txt: Starting validation of fileINFO: data_clinical_samples.txt: Validation of file completeINFO: data_clinical_samples.txt: Read 847 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_definition.txt: Starting validation of fileINFO: data_resource_definition.txt: Validation of file completeINFO: data_resource_definition.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_sample.txt: Starting validation of fileINFO: data_resource_sample.txt: Validation of file completeINFO: data_resource_sample.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: study_tags.yml: Starting validation of study tags fileINFO: study_tags.yml: Validation of study tags file complete.​DEBUG: -: Validating case lists​DEBUG: case_lists/cases_cnaseq.txt: Starting validation of meta fileINFO: case_lists/cases_cnaseq.txt: Validation of meta file complete​DEBUG: case_lists/cases_test.txt: Starting validation of meta fileINFO: case_lists/cases_test.txt: Validation of meta file complete​DEBUG: case_lists/cases_sequenced.txt: Starting validation of meta fileINFO: case_lists/cases_sequenced.txt: Validation of meta file complete​DEBUG: case_lists/cases_custom.txt: Starting validation of meta fileINFO: case_lists/cases_custom.txt: Validation of meta file complete​DEBUG: case_lists/cases_cna.txt: Starting validation of meta fileINFO: case_lists/cases_cna.txt: Validation of meta file complete​INFO: -: Validation of case list folder complete​DEBUG: data_gene_panel_matrix.txt: Starting validation of fileINFO: data_gene_panel_matrix.txt: line 1: This column can be replaced by a 'gene_panel' property in the respective meta file; value encountered: 'gistic'INFO: data_gene_panel_matrix.txt: Validation of file completeINFO: data_gene_panel_matrix.txt: Read 21 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_discrete.txt: Starting validation of fileINFO: data_cna_discrete.txt: Validation of file completeINFO: data_cna_discrete.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_clinical_patients.txt: Starting validation of fileINFO: data_clinical_patients.txt: Validation of file completeINFO: data_clinical_patients.txt: Read 845 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_expression_median.txt: Starting validation of fileINFO: data_expression_median.txt: Validation of file completeINFO: data_expression_median.txt: Read 7 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_expression_median_Zscores.txt: Starting validation of fileINFO: data_expression_median_Zscores.txt: Validation of file completeINFO: data_expression_median_Zscores.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_fusions.txt: Starting validation of fileINFO: data_fusions.txt: Validation of file completeINFO: data_fusions.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_mutational_signature.txt: Starting validation of fileINFO: data_mutational_signature.txt: Validation of file completeINFO: data_mutational_signature.txt: Read 62 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_treatment_ec50.txt: Starting validation of fileINFO: data_treatment_ec50.txt: Validation of file completeINFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_treatment_ic50.txt: Starting validation of fileINFO: data_treatment_ic50.txt: Validation of file completeINFO: data_treatment_ic50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gistic_genes_amp.txt: Starting validation of fileINFO: data_gistic_genes_amp.txt: Validation of file completeINFO: data_gistic_genes_amp.txt: Read 13 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gsva_pvalues.txt: Starting validation of fileINFO: data_gsva_pvalues.txt: Validation of file completeINFO: data_gsva_pvalues.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gsva_scores.txt: Starting validation of fileINFO: data_gsva_scores.txt: Validation of file completeINFO: data_gsva_scores.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_log2.txt: Starting validation of fileINFO: data_cna_log2.txt: Validation of file completeINFO: data_cna_log2.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_methylation_hm27.txt: Starting validation of fileINFO: data_methylation_hm27.txt: Validation of file completeINFO: data_methylation_hm27.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_mutations_extended.maf: Starting validation of fileINFO: data_mutations_extended.maf: lines [4, 5, 6, (3 more)]: column 164: Values contained in the column cbp_driver_tiers that will appear in the "Mutation Color" menu of the Oncoprint; values encountered: ['Class 2', 'Class 1', 'Class 4', '(1 more)']INFO: data_mutations_extended.maf: lines [7, 9]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; value encountered: 'Silent'INFO: data_mutations_extended.maf: Validation of file completeINFO: data_mutations_extended.maf: Read 35 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_patient.txt: Starting validation of fileINFO: data_resource_patient.txt: Validation of file completeINFO: data_resource_patient.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_study.txt: Starting validation of fileINFO: data_resource_study.txt: Validation of file completeINFO: data_resource_study.txt: Read 2 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_hg19.seg: Starting validation of fileINFO: data_cna_hg19.seg: Validation of file completeINFO: data_cna_hg19.seg: Read 10 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_structural_variants.txt: Starting validation of fileINFO: data_structural_variants.txt: Validation of file completeINFO: data_structural_variants.txt: Read 46 lines. Lines with warning: 0. Lines with error: 0​INFO: -: Validation completeValidation of study succeeded.
When using the -html
option, a report will be generated, which looks like this for the previous example: ​
More test studies for trying the validator (study_es_1
and study_es_3
) are available in <cbioportal_source_folder>/core/src/test/scripts/test_data
. Example, assuming port 8080 and using -v option:
./validateData.py -s ../../../test/scripts/test_data/study_es_1/ -u http://localhost:8080 -v
Results in:
DEBUG: -: Requesting info from portal at 'http://localhost:8081'DEBUG: -: Requesting cancer-types from portal at 'http://localhost:8081'DEBUG: -: Requesting genes from portal at 'http://localhost:8081'DEBUG: -: Requesting genesets from portal at 'http://localhost:8081'DEBUG: -: Requesting genesets_version from portal at 'http://localhost:8081'DEBUG: -: Requesting gene-panels from portal at 'http://localhost:8081'​DEBUG: meta_expression_median.txt: Starting validation of meta fileERROR: meta_expression_median.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'Z-SCORE'; expected one of [mrna_U133_Zscores, rna_seq_mrna_median_Zscores, mrna_median_Zscores, rna_seq_v2_mrna_median_Zscores, rna_seq_v2_mrna_median_normals_Zscores, mirna_median_Zscores, mrna_merged_median_Zscores, mrna_zbynorm, mrna_seq_tpm_Zscores, mrna_seq_cpm_Zscores, rna_seq_mrna_capture_Zscores, mrna_seq_fpkm_capture_Zscores, mrna_seq_fpkm_polya_Zscores, mrna_U133_all_sample_Zscores, mrna_all_sample_Zscores, rna_seq_mrna_median_all_sample_Zscores, mrna_median_all_sample_Zscores, rna_seq_v2_mrna_median_all_sample_Zscores, rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores, mrna_seq_cpm_all_sample_Zscores, mrna_seq_tpm_all_sample_Zscores, rna_seq_mrna_capture_all_sample_Zscores, mrna_seq_fpkm_capture_all_sample_Zscores, mrna_seq_fpkm_polya_all_sample_Zscores]; value encountered: 'mrna'​DEBUG: meta_samples.txt: Starting validation of meta fileWARNING: meta_samples.txt: Unrecognized field in meta file; values encountered: ['show_profile_in_analysis_tab', 'profile_description', 'profile_name']INFO: meta_samples.txt: Validation of meta file complete​DEBUG: meta_study.txt: Starting validation of meta fileINFO: meta_study.txt: Validation of meta file completeINFO: meta_study.txt: No reference genome specified -- using default (hg19)​DEBUG: meta_treatment_ec50.txt: Starting validation of meta fileINFO: meta_treatment_ec50.txt: Validation of meta file complete​DEBUG: meta_treatment_ic50.txt: Starting validation of meta fileINFO: meta_treatment_ic50.txt: Validation of meta file complete​DEBUG: data_samples.txt: Starting validation of fileINFO: data_samples.txt: Validation of file completeINFO: data_samples.txt: Read 831 lines. Lines with warning: 0. Lines with error: 0​DEBUG: -: Validating case lists​DEBUG: case_lists/cases_all.txt: Starting validation of meta fileINFO: case_lists/cases_all.txt: Validation of meta file completeERROR: case_lists/cases_all.txt: Sample ID not defined in clinical file; value encountered: 'INVALID-A2-A0T2-01'​INFO: -: Validation of case list folder complete​DEBUG: data_treatment_ec50.txt: Starting validation of fileERROR: data_treatment_ec50.txt: line 2: column 1: Do not use space in the stable id; value encountered: '17 AAG'ERROR: data_treatment_ec50.txt: line 7: column 5: Blank cell found in column; value encountered: ''' (in column 'TCGA-A1-A0SB-01')'INFO: data_treatment_ec50.txt: Validation of file completeINFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 2​DEBUG: data_treatment_ic50.txt: Starting validation of fileERROR: data_treatment_ic50.txt: line 7: column 5: Blank cell found in column; value encountered: ''' (in column 'TCGA-A1-A0SB-01')'INFO: data_treatment_ic50.txt: Validation of file completeINFO: data_treatment_ic50.txt: Read 10 lines. Lines with warning: 0. Lines with error: 1​INFO: -: Validation completeValidation of study failed.
And respective HTML report: ​
The validation script can be used offline, without connecting to a cBioPortal server. The tests that depend on information specific to the portal (which clinical attributes and cancer types have been previously defined, and which Entrez gene identifiers and corresponding symbols are supported), will instead be read from a folder with .json files generated from the portal.
To run the validator with a folder of portal information files, add the -p/--portal_info_dir
option to the command line, followed by the path to the folder:
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -p ../../../test/scripts/test_data/api_json_system_tests/ -v
DEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/cancer-types.jsonDEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/genes.jsonDEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/genesets.jsonDEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/genesets_version.jsonDEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/gene-panels.json​DEBUG: meta_cancer_type.txt: Starting validation of meta fileINFO: meta_cancer_type.txt: Validation of meta file complete​DEBUG: meta_clinical_patients.txt: Starting validation of meta fileINFO: meta_clinical_patients.txt: Validation of meta file complete​DEBUG: meta_clinical_samples.txt: Starting validation of meta fileINFO: meta_clinical_samples.txt: Validation of meta file complete​DEBUG: meta_cna_discrete.txt: Starting validation of meta fileINFO: meta_cna_discrete.txt: Validation of meta file complete​DEBUG: meta_cna_hg19_seg.txt: Starting validation of meta fileINFO: meta_cna_hg19_seg.txt: Validation of meta file complete​DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'​DEBUG: meta_cna_log2.txt: Starting validation of meta fileINFO: meta_cna_log2.txt: Validation of meta file complete​DEBUG: meta_expression_median.txt: Starting validation of meta fileINFO: meta_expression_median.txt: Validation of meta file complete​DEBUG: meta_expression_median_Zscores.txt: Starting validation of meta fileINFO: meta_expression_median_Zscores.txt: Validation of meta file complete​DEBUG: meta_fusions.txt: Starting validation of meta fileINFO: meta_fusions.txt: Validation of meta file complete​DEBUG: meta_gene_panel_matrix.txt: Starting validation of meta fileINFO: meta_gene_panel_matrix.txt: Validation of meta file complete​DEBUG: meta_gistic_genes_amp.txt: Starting validation of meta fileINFO: meta_gistic_genes_amp.txt: Validation of meta file complete​DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'​DEBUG: meta_gsva_pvalues.txt: Starting validation of meta fileINFO: meta_gsva_pvalues.txt: Validation of meta file complete​DEBUG: meta_gsva_scores.txt: Starting validation of meta fileINFO: meta_gsva_scores.txt: Validation of meta file complete​DEBUG: meta_methylation_hm27.txt: Starting validation of meta fileINFO: meta_methylation_hm27.txt: Validation of meta file complete​DEBUG: meta_mutational_signature.txt: Starting validation of meta fileINFO: meta_mutational_signature.txt: Validation of meta file complete​DEBUG: meta_mutations_extended.txt: Starting validation of meta fileINFO: meta_mutations_extended.txt: Validation of meta file complete​DEBUG: meta_resource_definition.txt: Starting validation of meta fileINFO: meta_resource_definition.txt: Validation of meta file complete​DEBUG: meta_resource_patient.txt: Starting validation of meta fileINFO: meta_resource_patient.txt: Validation of meta file complete​DEBUG: meta_resource_sample.txt: Starting validation of meta fileINFO: meta_resource_sample.txt: Validation of meta file complete​DEBUG: meta_resource_study.txt: Starting validation of meta fileINFO: meta_resource_study.txt: Validation of meta file complete​DEBUG: meta_structural_variants.txt: Starting validation of meta fileINFO: meta_structural_variants.txt: Validation of meta file complete​DEBUG: meta_study.txt: Starting validation of meta fileINFO: meta_study.txt: Validation of meta file complete​DEBUG: -: Study Tag file found. It will be validated.​DEBUG: meta_treatment_ec50.txt: Starting validation of meta fileINFO: meta_treatment_ec50.txt: Validation of meta file complete​DEBUG: meta_treatment_ic50.txt: Starting validation of meta fileINFO: meta_treatment_ic50.txt: Validation of meta file complete​DEBUG: data_cancer_type.txt: Starting validation of fileINFO: data_cancer_type.txt: Validation of file completeINFO: data_cancer_type.txt: Read 1 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_clinical_samples.txt: Starting validation of fileINFO: data_clinical_samples.txt: Validation of file completeINFO: data_clinical_samples.txt: Read 847 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_definition.txt: Starting validation of fileINFO: data_resource_definition.txt: Validation of file completeINFO: data_resource_definition.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_sample.txt: Starting validation of fileINFO: data_resource_sample.txt: Validation of file completeINFO: data_resource_sample.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: study_tags.yml: Starting validation of study tags fileINFO: study_tags.yml: Validation of study tags file complete.​DEBUG: -: Validating case lists​DEBUG: case_lists/cases_cnaseq.txt: Starting validation of meta fileINFO: case_lists/cases_cnaseq.txt: Validation of meta file complete​DEBUG: case_lists/cases_test.txt: Starting validation of meta fileINFO: case_lists/cases_test.txt: Validation of meta file complete​DEBUG: case_lists/cases_sequenced.txt: Starting validation of meta fileINFO: case_lists/cases_sequenced.txt: Validation of meta file complete​DEBUG: case_lists/cases_custom.txt: Starting validation of meta fileINFO: case_lists/cases_custom.txt: Validation of meta file complete​DEBUG: case_lists/cases_cna.txt: Starting validation of meta fileINFO: case_lists/cases_cna.txt: Validation of meta file complete​INFO: -: Validation of case list folder complete​DEBUG: data_gene_panel_matrix.txt: Starting validation of fileINFO: data_gene_panel_matrix.txt: line 1: This column can be replaced by a 'gene_panel' property in the respective meta file; value encountered: 'gistic'INFO: data_gene_panel_matrix.txt: Validation of file completeINFO: data_gene_panel_matrix.txt: Read 21 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_discrete.txt: Starting validation of fileINFO: data_cna_discrete.txt: Validation of file completeINFO: data_cna_discrete.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_clinical_patients.txt: Starting validation of fileINFO: data_clinical_patients.txt: Validation of file completeINFO: data_clinical_patients.txt: Read 845 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_expression_median.txt: Starting validation of fileINFO: data_expression_median.txt: Validation of file completeINFO: data_expression_median.txt: Read 7 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_expression_median_Zscores.txt: Starting validation of fileINFO: data_expression_median_Zscores.txt: Validation of file completeINFO: data_expression_median_Zscores.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_fusions.txt: Starting validation of fileINFO: data_fusions.txt: Validation of file completeINFO: data_fusions.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_mutational_signature.txt: Starting validation of fileINFO: data_mutational_signature.txt: Validation of file completeINFO: data_mutational_signature.txt: Read 62 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_treatment_ec50.txt: Starting validation of fileINFO: data_treatment_ec50.txt: Validation of file completeINFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_treatment_ic50.txt: Starting validation of fileINFO: data_treatment_ic50.txt: Validation of file completeINFO: data_treatment_ic50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gistic_genes_amp.txt: Starting validation of fileINFO: data_gistic_genes_amp.txt: Validation of file completeINFO: data_gistic_genes_amp.txt: Read 13 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gsva_pvalues.txt: Starting validation of fileINFO: data_gsva_pvalues.txt: Validation of file completeINFO: data_gsva_pvalues.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gsva_scores.txt: Starting validation of fileINFO: data_gsva_scores.txt: Validation of file completeINFO: data_gsva_scores.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_log2.txt: Starting validation of fileINFO: data_cna_log2.txt: Validation of file completeINFO: data_cna_log2.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_methylation_hm27.txt: Starting validation of fileINFO: data_methylation_hm27.txt: Validation of file completeINFO: data_methylation_hm27.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_mutations_extended.maf: Starting validation of fileINFO: data_mutations_extended.maf: lines [4, 5, 6, (3 more)]: column 164: Values contained in the column cbp_driver_tiers that will appear in the "Mutation Color" menu of the Oncoprint; values encountered: ['Class 2', 'Class 1', 'Class 4', '(1 more)']INFO: data_mutations_extended.maf: lines [7, 9]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; value encountered: 'Silent'INFO: data_mutations_extended.maf: Validation of file completeINFO: data_mutations_extended.maf: Read 35 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_patient.txt: Starting validation of fileINFO: data_resource_patient.txt: Validation of file completeINFO: data_resource_patient.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_study.txt: Starting validation of fileINFO: data_resource_study.txt: Validation of file completeINFO: data_resource_study.txt: Read 2 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_hg19.seg: Starting validation of fileINFO: data_cna_hg19.seg: Validation of file completeINFO: data_cna_hg19.seg: Read 10 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_structural_variants.txt: Starting validation of fileINFO: data_structural_variants.txt: Validation of file completeINFO: data_structural_variants.txt: Read 46 lines. Lines with warning: 0. Lines with error: 0​INFO: -: Validation completeValidation of study succeeded.
The portal information files can be generated on the server, using the dumpPortalInfo script. Go to <cbioportal_source_folder>/core/src/main/scripts
, make sure the environment variables $JAVA_HOME
and $PORTAL_HOME
are set, and run dumpPortalInfo.pl with the name of the directory you want to create:
export JAVA_HOME='/usr/lib/jvm/default-java'export PORTAL_HOME=<cbioportal_configuration_folder>./dumpPortalInfo.pl /home/johndoe/my_portal_info_folder/
Alternatively, you can run the validation script with the -n/--no_portal_checks
flag to entirely skip checks relating to installation-specific metadata. Be warned that files succeeding this validation may still fail to load (correctly).
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -n -v
WARNING: -: Skipping validations relating to cancer types defined in the portalWARNING: -: Skipping validations relating to gene identifiers and aliases defined in the portalWARNING: -: Skipping validations relating to gene set identifiersWARNING: -: Skipping validations relating to gene panel identifiers​DEBUG: meta_cancer_type.txt: Starting validation of meta fileINFO: meta_cancer_type.txt: Validation of meta file complete​DEBUG: meta_clinical_patients.txt: Starting validation of meta fileINFO: meta_clinical_patients.txt: Validation of meta file complete​DEBUG: meta_clinical_samples.txt: Starting validation of meta fileINFO: meta_clinical_samples.txt: Validation of meta file complete​DEBUG: meta_cna_discrete.txt: Starting validation of meta fileINFO: meta_cna_discrete.txt: Validation of meta file complete​DEBUG: meta_cna_hg19_seg.txt: Starting validation of meta fileINFO: meta_cna_hg19_seg.txt: Validation of meta file complete​DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'​DEBUG: meta_cna_log2.txt: Starting validation of meta fileINFO: meta_cna_log2.txt: Validation of meta file complete​DEBUG: meta_expression_median.txt: Starting validation of meta fileINFO: meta_expression_median.txt: Validation of meta file complete​DEBUG: meta_expression_median_Zscores.txt: Starting validation of meta fileINFO: meta_expression_median_Zscores.txt: Validation of meta file complete​DEBUG: meta_fusions.txt: Starting validation of meta fileINFO: meta_fusions.txt: Validation of meta file complete​DEBUG: meta_gene_panel_matrix.txt: Starting validation of meta fileINFO: meta_gene_panel_matrix.txt: Validation of meta file complete​DEBUG: meta_gistic_genes_amp.txt: Starting validation of meta fileINFO: meta_gistic_genes_amp.txt: Validation of meta file complete​DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'​DEBUG: meta_gsva_pvalues.txt: Starting validation of meta fileINFO: meta_gsva_pvalues.txt: Validation of meta file complete​DEBUG: meta_gsva_scores.txt: Starting validation of meta fileINFO: meta_gsva_scores.txt: Validation of meta file complete​DEBUG: meta_methylation_hm27.txt: Starting validation of meta fileINFO: meta_methylation_hm27.txt: Validation of meta file complete​DEBUG: meta_mutational_signature.txt: Starting validation of meta fileINFO: meta_mutational_signature.txt: Validation of meta file complete​DEBUG: meta_mutations_extended.txt: Starting validation of meta fileINFO: meta_mutations_extended.txt: Validation of meta file complete​DEBUG: meta_resource_definition.txt: Starting validation of meta fileINFO: meta_resource_definition.txt: Validation of meta file complete​DEBUG: meta_resource_patient.txt: Starting validation of meta fileINFO: meta_resource_patient.txt: Validation of meta file complete​DEBUG: meta_resource_sample.txt: Starting validation of meta fileINFO: meta_resource_sample.txt: Validation of meta file complete​DEBUG: meta_resource_study.txt: Starting validation of meta fileINFO: meta_resource_study.txt: Validation of meta file complete​DEBUG: meta_structural_variants.txt: Starting validation of meta fileINFO: meta_structural_variants.txt: Validation of meta file complete​DEBUG: meta_study.txt: Starting validation of meta fileINFO: meta_study.txt: Validation of meta file complete​DEBUG: -: Study Tag file found. It will be validated.​DEBUG: meta_treatment_ec50.txt: Starting validation of meta fileINFO: meta_treatment_ec50.txt: Validation of meta file complete​DEBUG: meta_treatment_ic50.txt: Starting validation of meta fileINFO: meta_treatment_ic50.txt: Validation of meta file complete​DEBUG: data_cancer_type.txt: Starting validation of fileINFO: data_cancer_type.txt: Validation of file completeINFO: data_cancer_type.txt: Read 1 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_clinical_samples.txt: Starting validation of fileINFO: data_clinical_samples.txt: Validation of file completeINFO: data_clinical_samples.txt: Read 847 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_definition.txt: Starting validation of fileINFO: data_resource_definition.txt: Validation of file completeINFO: data_resource_definition.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_sample.txt: Starting validation of fileINFO: data_resource_sample.txt: Validation of file completeINFO: data_resource_sample.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: study_tags.yml: Starting validation of study tags fileINFO: study_tags.yml: Validation of study tags file complete.​DEBUG: -: Validating case lists​DEBUG: case_lists/cases_cnaseq.txt: Starting validation of meta fileINFO: case_lists/cases_cnaseq.txt: Validation of meta file complete​DEBUG: case_lists/cases_test.txt: Starting validation of meta fileINFO: case_lists/cases_test.txt: Validation of meta file complete​DEBUG: case_lists/cases_sequenced.txt: Starting validation of meta fileINFO: case_lists/cases_sequenced.txt: Validation of meta file complete​DEBUG: case_lists/cases_custom.txt: Starting validation of meta fileINFO: case_lists/cases_custom.txt: Validation of meta file complete​DEBUG: case_lists/cases_cna.txt: Starting validation of meta fileINFO: case_lists/cases_cna.txt: Validation of meta file complete​INFO: -: Validation of case list folder complete​DEBUG: data_gene_panel_matrix.txt: Starting validation of fileINFO: data_gene_panel_matrix.txt: line 1: This column can be replaced by a 'gene_panel' property in the respective meta file; value encountered: 'gistic'INFO: data_gene_panel_matrix.txt: Validation of file completeINFO: data_gene_panel_matrix.txt: Read 21 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_discrete.txt: Starting validation of fileINFO: data_cna_discrete.txt: Validation of file completeINFO: data_cna_discrete.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_clinical_patients.txt: Starting validation of fileINFO: data_clinical_patients.txt: Validation of file completeINFO: data_clinical_patients.txt: Read 845 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_expression_median.txt: Starting validation of fileINFO: data_expression_median.txt: Validation of file completeINFO: data_expression_median.txt: Read 7 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_expression_median_Zscores.txt: Starting validation of fileINFO: data_expression_median_Zscores.txt: Validation of file completeINFO: data_expression_median_Zscores.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_fusions.txt: Starting validation of fileINFO: data_fusions.txt: Validation of file completeINFO: data_fusions.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_mutational_signature.txt: Starting validation of fileINFO: data_mutational_signature.txt: Validation of file completeINFO: data_mutational_signature.txt: Read 62 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_treatment_ec50.txt: Starting validation of fileINFO: data_treatment_ec50.txt: Validation of file completeINFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_treatment_ic50.txt: Starting validation of fileINFO: data_treatment_ic50.txt: Validation of file completeINFO: data_treatment_ic50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gistic_genes_amp.txt: Starting validation of fileINFO: data_gistic_genes_amp.txt: Validation of file completeINFO: data_gistic_genes_amp.txt: Read 13 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gsva_pvalues.txt: Starting validation of fileINFO: data_gsva_pvalues.txt: Validation of file completeINFO: data_gsva_pvalues.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_gsva_scores.txt: Starting validation of fileINFO: data_gsva_scores.txt: Validation of file completeINFO: data_gsva_scores.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_log2.txt: Starting validation of fileINFO: data_cna_log2.txt: Validation of file completeINFO: data_cna_log2.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_methylation_hm27.txt: Starting validation of fileINFO: data_methylation_hm27.txt: Validation of file completeINFO: data_methylation_hm27.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_mutations_extended.maf: Starting validation of fileINFO: data_mutations_extended.maf: lines [4, 5, 6, (3 more)]: column 164: Values contained in the column cbp_driver_tiers that will appear in the "Mutation Color" menu of the Oncoprint; values encountered: ['Class 2', 'Class 1', 'Class 4', '(1 more)']INFO: data_mutations_extended.maf: lines [7, 9]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; value encountered: 'Silent'INFO: data_mutations_extended.maf: Validation of file completeINFO: data_mutations_extended.maf: Read 35 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_patient.txt: Starting validation of fileINFO: data_resource_patient.txt: Validation of file completeINFO: data_resource_patient.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_resource_study.txt: Starting validation of fileINFO: data_resource_study.txt: Validation of file completeINFO: data_resource_study.txt: Read 2 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_cna_hg19.seg: Starting validation of fileINFO: data_cna_hg19.seg: Validation of file completeINFO: data_cna_hg19.seg: Read 10 lines. Lines with warning: 0. Lines with error: 0​DEBUG: data_structural_variants.txt: Starting validation of fileINFO: data_structural_variants.txt: Validation of file completeINFO: data_structural_variants.txt: Read 46 lines. Lines with warning: 0. Lines with error: 0​INFO: -: Validation completeValidation of study succeeded with warnings.
When importing a study with a reference genome other than hg19/GRCh37, this should be specified in the meta_study.txt
file, next to the reference_genome
field. Supported values are hg19, hg38 and mm10.
cBioPortal is gradually introducing support for mouse. If you want to load mouse studies and you have to set up your database for mouse.
As an example, the command for the mouse example using the three parameters is given:
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -P ../../../../../src/main/resources/portal.properties -u http://localhost:8080 -v
The importer folder <cbioportal_source_folder>/core/src/main/scripts/importer
also contains a script for running the validator for multiple studies:
./validateStudies.py --help
The following parameters can be used:
usage: validateStudies.py [-h] [-d ROOT_DIRECTORY] [-l LIST_OF_STUDIES][-html HTML_FOLDER][-u URL_SERVER | -p PORTAL_INFO_DIR | -n][-P PORTAL_PROPERTIES] [-m] [-a MAX_REPORTED_VALUES]​Wrapper where cBioPortal study validator is run for multiple studies​optional arguments:-h, --help show this help message and exit-d ROOT_DIRECTORY, --root-directory ROOT_DIRECTORYPath to directory with all studies that should bevalidated-l LIST_OF_STUDIES, --list-of-studies LIST_OF_STUDIESList with paths of studies which should be validated-html HTML_FOLDER, --html-folder HTML_FOLDERPath to folder for output HTML reports-u URL_SERVER, --url_server URL_SERVERURL to cBioPortal server. You can set this if your URLis not http://localhost:8080-p PORTAL_INFO_DIR, --portal_info_dir PORTAL_INFO_DIRPath to a directory of cBioPortal info files to beused instead of contacting a server-n, --no_portal_checksSkip tests requiring information from the cBioPortalinstallation-m, --strict_maf_checksOption to enable strict mode for validator whenvalidating mutation data-a MAX_REPORTED_VALUES, --max_reported_values MAX_REPORTED_VALUESCutoff in HTML report for the maximum number of linenumbers and values encountered to report for eachmessage. For example, set this to a high number toreport all genes that could not be loaded, instead ofreporting "GeneA, GeneB, GeneC, 213 more"
Parameters --url_server
, --portal_info_dir
, --no_portal_checks
and --portal_properties
are equal to the parameters with the same name in validateData.py
. The script will save a log file with validation output (log-validate-studies.txt
) and output the validation status from the input studies:
=== Validating study ../../../test/scripts/test_data/study_es_0Result: VALID (WITH WARNINGS)​=== Validating study ../../../test/scripts/test_data/study_es_1Result: INVALID​=== Validating study ../../../test/scripts/test_data/study_es_invaliddirectory cannot be found: ../../../test/scripts/test_data/study_es_invalidResult: INVALID (PROBLEMS OCCURRED)
Validation can be run for all studies in a certain directory by using the --root-directory
parameter. The script will append each folder in the root directory to the study list to validate:
./validateStudies.py -d ../../../test/scripts/test_data/
Validation can also be run for specific studies by using the --list-of-studies
parameter. The paths to the different studies can be defined and seperated by a comma:
./validateStudies.py -l ../../../test/scripts/test_data/study_es_0,../../../test/scripts/test_data/study_es_1
Validation can also be run on specific studies in a certain directory by combining the --root-directory
and --list-of-studies
parameter:
./validateStudies.py -d ../../../test/scripts/test_data/ -l study_es_0,study_es_1
When HTML validation reports are desired, an output folder for these HTML files can be specified. This folder does not have to exist, the script can create the folder. The HTML validation reports will get the following name: <study_name>-validation.html
. To create HTML validation reports for each study the --html-folder
parameter needs to be defined:
./validateStudies.py -d ../../../test/scripts/test_data/ -l study_es_0,study_es_1 -html ../../../test/scripts/test_data/validation-reports