cBioPortal
Search…
2.4 Integration with Other Webservices
5.2 Datasets
6. Web API and Clients
Powered By GitBook
Using the Dataset Validator

Introduction

To facilitate the loading of new studies into its database, cBioPortal provides a set of staging files formats for the various data types. To validate your files you can use the dataset validator script.

Running the validator

To run the validator first go to the importer folder <cbioportal_source_folder>/core/src/main/scripts/importer and then run the following command:
1
./validateData.py --help
Copied!
This will tell you the parameters you can use:
1
usage: validateData.py [-h] -s STUDY_DIRECTORY
2
[-u URL_SERVER | -p PORTAL_INFO_DIR | -n]
3
[-P PORTAL_PROPERTIES] [-html HTML_TABLE]
4
[-e ERROR_FILE] [-v] [-r] [-m]
5
6
cBioPortal study validator
7
8
optional arguments:
9
-h, --help show this help message and exit
10
-s STUDY_DIRECTORY, --study_directory STUDY_DIRECTORY
11
path to directory.
12
-u URL_SERVER, --url_server URL_SERVER
13
URL to cBioPortal server. You can set this if your URL
14
is not http://localhost:8080
15
-p PORTAL_INFO_DIR, --portal_info_dir PORTAL_INFO_DIR
16
Path to a directory of cBioPortal info files to be
17
used instead of contacting a server
18
-n, --no_portal_checks
19
Skip tests requiring information from the cBioPortal
20
installation
21
-html HTML_TABLE, --html_table HTML_TABLE
22
path to html report output file
23
-e ERROR_FILE, --error_file ERROR_FILE
24
File to which to write line numbers on which errors
25
were found, for scripts
26
-v, --verbose report status info messages in addition to errors and
27
warnings
28
-r, --relaxed-clinical_definitions
29
Option to enable relaxed mode for validator when validating
30
clinical data without header definitions
31
-m, --strict_maf_checks
32
Option to enable strict mode for validator when validating
33
mutation data
Copied!
For more information on the --portal_info_dir option, see Offline validation below. If your cBioPortal is not using hg19, you have to specify the reference_genome field in your meta_study.txt.
For more information, see Validation of non-human data.
When running the validator with parameter -r the validator will run the validation of the clinical data it will ignore all failing checks about values in the headers of the clinical data file.
When running the validator with parameter -m the validator will run the validation of the specific MAF file checks for the mutation file in strict maf check mode. This means that when the validator encounters these validation checks it will report them as an error instead of a warning.

Example 1: test study_es_0

As an example, you can try the validator with one of the test studies found in <cbioportal_source_folder>/core/src/test/scripts/test_data. Example, assuming port 8080 and using -v option to also see the progress:
1
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -u http://localhost:8080 -v
Copied!
Results in:
1
DEBUG: -: Requesting info from portal at 'http://localhost:8080'
2
DEBUG: -: Requesting cancer-types from portal at 'http://localhost:8080'
3
DEBUG: -: Requesting genes from portal at 'http://localhost:8080'
4
DEBUG: -: Requesting genesets from portal at 'http://localhost:8080'
5
DEBUG: -: Requesting genesets_version from portal at 'http://localhost:8080'
6
DEBUG: -: Requesting gene-panels from portal at 'http://localhost:8080'
7
8
DEBUG: meta_cancer_type.txt: Starting validation of meta file
9
INFO: meta_cancer_type.txt: Validation of meta file complete
10
11
DEBUG: meta_clinical_patients.txt: Starting validation of meta file
12
INFO: meta_clinical_patients.txt: Validation of meta file complete
13
14
DEBUG: meta_clinical_samples.txt: Starting validation of meta file
15
INFO: meta_clinical_samples.txt: Validation of meta file complete
16
17
DEBUG: meta_cna_discrete.txt: Starting validation of meta file
18
INFO: meta_cna_discrete.txt: Validation of meta file complete
19
20
DEBUG: meta_cna_hg19_seg.txt: Starting validation of meta file
21
INFO: meta_cna_hg19_seg.txt: Validation of meta file complete
22
23
DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'
24
25
DEBUG: meta_cna_log2.txt: Starting validation of meta file
26
INFO: meta_cna_log2.txt: Validation of meta file complete
27
28
DEBUG: meta_expression_median.txt: Starting validation of meta file
29
INFO: meta_expression_median.txt: Validation of meta file complete
30
31
DEBUG: meta_expression_median_Zscores.txt: Starting validation of meta file
32
INFO: meta_expression_median_Zscores.txt: Validation of meta file complete
33
34
DEBUG: meta_fusions.txt: Starting validation of meta file
35
INFO: meta_fusions.txt: Validation of meta file complete
36
37
DEBUG: meta_gene_panel_matrix.txt: Starting validation of meta file
38
INFO: meta_gene_panel_matrix.txt: Validation of meta file complete
39
40
DEBUG: meta_gistic_genes_amp.txt: Starting validation of meta file
41
INFO: meta_gistic_genes_amp.txt: Validation of meta file complete
42
43
DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'
44
45
DEBUG: meta_gsva_pvalues.txt: Starting validation of meta file
46
INFO: meta_gsva_pvalues.txt: Validation of meta file complete
47
48
DEBUG: meta_gsva_scores.txt: Starting validation of meta file
49
INFO: meta_gsva_scores.txt: Validation of meta file complete
50
51
DEBUG: meta_methylation_hm27.txt: Starting validation of meta file
52
INFO: meta_methylation_hm27.txt: Validation of meta file complete
53
54
DEBUG: meta_mutational_signature.txt: Starting validation of meta file
55
INFO: meta_mutational_signature.txt: Validation of meta file complete
56
57
DEBUG: meta_mutations_extended.txt: Starting validation of meta file
58
INFO: meta_mutations_extended.txt: Validation of meta file complete
59
60
DEBUG: meta_resource_definition.txt: Starting validation of meta file
61
INFO: meta_resource_definition.txt: Validation of meta file complete
62
63
DEBUG: meta_resource_patient.txt: Starting validation of meta file
64
INFO: meta_resource_patient.txt: Validation of meta file complete
65
66
DEBUG: meta_resource_sample.txt: Starting validation of meta file
67
INFO: meta_resource_sample.txt: Validation of meta file complete
68
69
DEBUG: meta_resource_study.txt: Starting validation of meta file
70
INFO: meta_resource_study.txt: Validation of meta file complete
71
72
DEBUG: meta_structural_variants.txt: Starting validation of meta file
73
INFO: meta_structural_variants.txt: Validation of meta file complete
74
75
DEBUG: meta_study.txt: Starting validation of meta file
76
INFO: meta_study.txt: Validation of meta file complete
77
78
DEBUG: -: Study Tag file found. It will be validated.
79
80
DEBUG: meta_treatment_ec50.txt: Starting validation of meta file
81
INFO: meta_treatment_ec50.txt: Validation of meta file complete
82
83
DEBUG: meta_treatment_ic50.txt: Starting validation of meta file
84
INFO: meta_treatment_ic50.txt: Validation of meta file complete
85
86
DEBUG: data_cancer_type.txt: Starting validation of file
87
INFO: data_cancer_type.txt: line 1: New disease type will be added to the portal; value encountered: 'brca-es0'
88
INFO: data_cancer_type.txt: Validation of file complete
89
INFO: data_cancer_type.txt: Read 1 lines. Lines with warning: 0. Lines with error: 0
90
91
DEBUG: data_clinical_samples.txt: Starting validation of file
92
INFO: data_clinical_samples.txt: Validation of file complete
93
INFO: data_clinical_samples.txt: Read 847 lines. Lines with warning: 0. Lines with error: 0
94
95
DEBUG: data_resource_definition.txt: Starting validation of file
96
INFO: data_resource_definition.txt: Validation of file complete
97
INFO: data_resource_definition.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
98
99
DEBUG: data_resource_sample.txt: Starting validation of file
100
INFO: data_resource_sample.txt: Validation of file complete
101
INFO: data_resource_sample.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
102
103
DEBUG: study_tags.yml: Starting validation of study tags file
104
INFO: study_tags.yml: Validation of study tags file complete.
105
106
DEBUG: -: Validating case lists
107
108
DEBUG: case_lists/cases_cnaseq.txt: Starting validation of meta file
109
INFO: case_lists/cases_cnaseq.txt: Validation of meta file complete
110
111
DEBUG: case_lists/cases_test.txt: Starting validation of meta file
112
INFO: case_lists/cases_test.txt: Validation of meta file complete
113
114
DEBUG: case_lists/cases_sequenced.txt: Starting validation of meta file
115
INFO: case_lists/cases_sequenced.txt: Validation of meta file complete
116
117
DEBUG: case_lists/cases_custom.txt: Starting validation of meta file
118
INFO: case_lists/cases_custom.txt: Validation of meta file complete
119
120
DEBUG: case_lists/cases_cna.txt: Starting validation of meta file
121
INFO: case_lists/cases_cna.txt: Validation of meta file complete
122
123
INFO: -: Validation of case list folder complete
124
125
DEBUG: data_gene_panel_matrix.txt: Starting validation of file
126
INFO: data_gene_panel_matrix.txt: line 1: This column can be replaced by a 'gene_panel' property in the respective meta file; value encountered: 'gistic'
127
INFO: data_gene_panel_matrix.txt: Validation of file complete
128
INFO: data_gene_panel_matrix.txt: Read 21 lines. Lines with warning: 0. Lines with error: 0
129
130
DEBUG: data_cna_discrete.txt: Starting validation of file
131
INFO: data_cna_discrete.txt: Validation of file complete
132
INFO: data_cna_discrete.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0
133
134
DEBUG: data_clinical_patients.txt: Starting validation of file
135
INFO: data_clinical_patients.txt: Validation of file complete
136
INFO: data_clinical_patients.txt: Read 845 lines. Lines with warning: 0. Lines with error: 0
137
138
DEBUG: data_expression_median.txt: Starting validation of file
139
INFO: data_expression_median.txt: Validation of file complete
140
INFO: data_expression_median.txt: Read 7 lines. Lines with warning: 0. Lines with error: 0
141
142
DEBUG: data_expression_median_Zscores.txt: Starting validation of file
143
INFO: data_expression_median_Zscores.txt: Validation of file complete
144
INFO: data_expression_median_Zscores.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0
145
146
DEBUG: data_fusions.txt: Starting validation of file
147
INFO: data_fusions.txt: Validation of file complete
148
INFO: data_fusions.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0
149
150
DEBUG: data_mutational_signature.txt: Starting validation of file
151
INFO: data_mutational_signature.txt: Validation of file complete
152
INFO: data_mutational_signature.txt: Read 62 lines. Lines with warning: 0. Lines with error: 0
153
154
DEBUG: data_treatment_ec50.txt: Starting validation of file
155
INFO: data_treatment_ec50.txt: Validation of file complete
156
INFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0
157
158
DEBUG: data_treatment_ic50.txt: Starting validation of file
159
INFO: data_treatment_ic50.txt: Validation of file complete
160
INFO: data_treatment_ic50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0
161
162
DEBUG: data_gistic_genes_amp.txt: Starting validation of file
163
INFO: data_gistic_genes_amp.txt: Validation of file complete
164
INFO: data_gistic_genes_amp.txt: Read 13 lines. Lines with warning: 0. Lines with error: 0
165
166
DEBUG: data_gsva_pvalues.txt: Starting validation of file
167
INFO: data_gsva_pvalues.txt: Validation of file complete
168
INFO: data_gsva_pvalues.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
169
170
DEBUG: data_gsva_scores.txt: Starting validation of file
171
INFO: data_gsva_scores.txt: Validation of file complete
172
INFO: data_gsva_scores.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
173
174
DEBUG: data_cna_log2.txt: Starting validation of file
175
INFO: data_cna_log2.txt: Validation of file complete
176
INFO: data_cna_log2.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
177
178
DEBUG: data_methylation_hm27.txt: Starting validation of file
179
INFO: data_methylation_hm27.txt: Validation of file complete
180
INFO: data_methylation_hm27.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0
181
182
DEBUG: data_mutations_extended.maf: Starting validation of file
183
INFO: data_mutations_extended.maf: lines [4, 5, 6, (3 more)]: column 164: Values contained in the column cbp_driver_tiers that will appear in the "Mutation Color" menu of the Oncoprint; values encountered: ['Class 2', 'Class 1', 'Class 4', '(1 more)']
184
INFO: data_mutations_extended.maf: lines [7, 9]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; value encountered: 'Silent'
185
INFO: data_mutations_extended.maf: Validation of file complete
186
INFO: data_mutations_extended.maf: Read 35 lines. Lines with warning: 0. Lines with error: 0
187
188
DEBUG: data_resource_patient.txt: Starting validation of file
189
INFO: data_resource_patient.txt: Validation of file complete
190
INFO: data_resource_patient.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
191
192
DEBUG: data_resource_study.txt: Starting validation of file
193
INFO: data_resource_study.txt: Validation of file complete
194
INFO: data_resource_study.txt: Read 2 lines. Lines with warning: 0. Lines with error: 0
195
196
DEBUG: data_cna_hg19.seg: Starting validation of file
197
INFO: data_cna_hg19.seg: Validation of file complete
198
INFO: data_cna_hg19.seg: Read 10 lines. Lines with warning: 0. Lines with error: 0
199
200
DEBUG: data_structural_variants.txt: Starting validation of file
201
INFO: data_structural_variants.txt: Validation of file complete
202
INFO: data_structural_variants.txt: Read 46 lines. Lines with warning: 0. Lines with error: 0
203
204
INFO: -: Validation complete
205
Validation of study succeeded.
Copied!
When using the -html option, a report will be generated, which looks like this for the previous example:

Example 2: test study_es_1

More test studies for trying the validator (study_es_1 and study_es_3) are available in <cbioportal_source_folder>/core/src/test/scripts/test_data. Example, assuming port 8080 and using -v option:
1
./validateData.py -s ../../../test/scripts/test_data/study_es_1/ -u http://localhost:8080 -v
Copied!
Results in:
1
DEBUG: -: Requesting info from portal at 'http://localhost:8081'
2
DEBUG: -: Requesting cancer-types from portal at 'http://localhost:8081'
3
DEBUG: -: Requesting genes from portal at 'http://localhost:8081'
4
DEBUG: -: Requesting genesets from portal at 'http://localhost:8081'
5
DEBUG: -: Requesting genesets_version from portal at 'http://localhost:8081'
6
DEBUG: -: Requesting gene-panels from portal at 'http://localhost:8081'
7
8
DEBUG: meta_expression_median.txt: Starting validation of meta file
9
ERROR: meta_expression_median.txt: Invalid stable id for genetic_alteration_type 'MRNA_EXPRESSION', data_type 'Z-SCORE'; expected one of [mrna_U133_Zscores, rna_seq_mrna_median_Zscores, mrna_median_Zscores, rna_seq_v2_mrna_median_Zscores, rna_seq_v2_mrna_median_normals_Zscores, mirna_median_Zscores, mrna_merged_median_Zscores, mrna_zbynorm, mrna_seq_tpm_Zscores, mrna_seq_cpm_Zscores, rna_seq_mrna_capture_Zscores, mrna_seq_fpkm_capture_Zscores, mrna_seq_fpkm_polya_Zscores, mrna_U133_all_sample_Zscores, mrna_all_sample_Zscores, rna_seq_mrna_median_all_sample_Zscores, mrna_median_all_sample_Zscores, rna_seq_v2_mrna_median_all_sample_Zscores, rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores, mrna_seq_cpm_all_sample_Zscores, mrna_seq_tpm_all_sample_Zscores, rna_seq_mrna_capture_all_sample_Zscores, mrna_seq_fpkm_capture_all_sample_Zscores, mrna_seq_fpkm_polya_all_sample_Zscores]; value encountered: 'mrna'
10
11
DEBUG: meta_samples.txt: Starting validation of meta file
12
WARNING: meta_samples.txt: Unrecognized field in meta file; values encountered: ['show_profile_in_analysis_tab', 'profile_description', 'profile_name']
13
INFO: meta_samples.txt: Validation of meta file complete
14
15
DEBUG: meta_study.txt: Starting validation of meta file
16
INFO: meta_study.txt: Validation of meta file complete
17
INFO: meta_study.txt: No reference genome specified -- using default (hg19)
18
19
DEBUG: meta_treatment_ec50.txt: Starting validation of meta file
20
INFO: meta_treatment_ec50.txt: Validation of meta file complete
21
22
DEBUG: meta_treatment_ic50.txt: Starting validation of meta file
23
INFO: meta_treatment_ic50.txt: Validation of meta file complete
24
25
DEBUG: data_samples.txt: Starting validation of file
26
INFO: data_samples.txt: Validation of file complete
27
INFO: data_samples.txt: Read 831 lines. Lines with warning: 0. Lines with error: 0
28
29
DEBUG: -: Validating case lists
30
31
DEBUG: case_lists/cases_all.txt: Starting validation of meta file
32
INFO: case_lists/cases_all.txt: Validation of meta file complete
33
ERROR: case_lists/cases_all.txt: Sample ID not defined in clinical file; value encountered: 'INVALID-A2-A0T2-01'
34
35
INFO: -: Validation of case list folder complete
36
37
DEBUG: data_treatment_ec50.txt: Starting validation of file
38
ERROR: data_treatment_ec50.txt: line 2: column 1: Do not use space in the stable id; value encountered: '17 AAG'
39
ERROR: data_treatment_ec50.txt: line 7: column 5: Blank cell found in column; value encountered: ''' (in column 'TCGA-A1-A0SB-01')'
40
INFO: data_treatment_ec50.txt: Validation of file complete
41
INFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 2
42
43
DEBUG: data_treatment_ic50.txt: Starting validation of file
44
ERROR: data_treatment_ic50.txt: line 7: column 5: Blank cell found in column; value encountered: ''' (in column 'TCGA-A1-A0SB-01')'
45
INFO: data_treatment_ic50.txt: Validation of file complete
46
INFO: data_treatment_ic50.txt: Read 10 lines. Lines with warning: 0. Lines with error: 1
47
48
INFO: -: Validation complete
49
Validation of study failed.
Copied!
And respective HTML report:

Offline validation ##

The validation script can be used offline, without connecting to a cBioPortal server. The tests that depend on information specific to the portal (which clinical attributes and cancer types have been previously defined, and which Entrez gene identifiers and corresponding symbols are supported), will instead be read from a folder with .json files generated from the portal.

Example 3: validation with a portal info folder ###

To run the validator with a folder of portal information files, add the -p/--portal_info_dir option to the command line, followed by the path to the folder:
1
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -p ../../../test/scripts/test_data/api_json_system_tests/ -v
Copied!
1
DEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/cancer-types.json
2
DEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/genes.json
3
DEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/genesets.json
4
DEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/genesets_version.json
5
DEBUG: -: Reading portal information from ../../../test/scripts/test_data/api_json_system_tests/gene-panels.json
6
7
DEBUG: meta_cancer_type.txt: Starting validation of meta file
8
INFO: meta_cancer_type.txt: Validation of meta file complete
9
10
DEBUG: meta_clinical_patients.txt: Starting validation of meta file
11
INFO: meta_clinical_patients.txt: Validation of meta file complete
12
13
DEBUG: meta_clinical_samples.txt: Starting validation of meta file
14
INFO: meta_clinical_samples.txt: Validation of meta file complete
15
16
DEBUG: meta_cna_discrete.txt: Starting validation of meta file
17
INFO: meta_cna_discrete.txt: Validation of meta file complete
18
19
DEBUG: meta_cna_hg19_seg.txt: Starting validation of meta file
20
INFO: meta_cna_hg19_seg.txt: Validation of meta file complete
21
22
DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'
23
24
DEBUG: meta_cna_log2.txt: Starting validation of meta file
25
INFO: meta_cna_log2.txt: Validation of meta file complete
26
27
DEBUG: meta_expression_median.txt: Starting validation of meta file
28
INFO: meta_expression_median.txt: Validation of meta file complete
29
30
DEBUG: meta_expression_median_Zscores.txt: Starting validation of meta file
31
INFO: meta_expression_median_Zscores.txt: Validation of meta file complete
32
33
DEBUG: meta_fusions.txt: Starting validation of meta file
34
INFO: meta_fusions.txt: Validation of meta file complete
35
36
DEBUG: meta_gene_panel_matrix.txt: Starting validation of meta file
37
INFO: meta_gene_panel_matrix.txt: Validation of meta file complete
38
39
DEBUG: meta_gistic_genes_amp.txt: Starting validation of meta file
40
INFO: meta_gistic_genes_amp.txt: Validation of meta file complete
41
42
DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'
43
44
DEBUG: meta_gsva_pvalues.txt: Starting validation of meta file
45
INFO: meta_gsva_pvalues.txt: Validation of meta file complete
46
47
DEBUG: meta_gsva_scores.txt: Starting validation of meta file
48
INFO: meta_gsva_scores.txt: Validation of meta file complete
49
50
DEBUG: meta_methylation_hm27.txt: Starting validation of meta file
51
INFO: meta_methylation_hm27.txt: Validation of meta file complete
52
53
DEBUG: meta_mutational_signature.txt: Starting validation of meta file
54
INFO: meta_mutational_signature.txt: Validation of meta file complete
55
56
DEBUG: meta_mutations_extended.txt: Starting validation of meta file
57
INFO: meta_mutations_extended.txt: Validation of meta file complete
58
59
DEBUG: meta_resource_definition.txt: Starting validation of meta file
60
INFO: meta_resource_definition.txt: Validation of meta file complete
61
62
DEBUG: meta_resource_patient.txt: Starting validation of meta file
63
INFO: meta_resource_patient.txt: Validation of meta file complete
64
65
DEBUG: meta_resource_sample.txt: Starting validation of meta file
66
INFO: meta_resource_sample.txt: Validation of meta file complete
67
68
DEBUG: meta_resource_study.txt: Starting validation of meta file
69
INFO: meta_resource_study.txt: Validation of meta file complete
70
71
DEBUG: meta_structural_variants.txt: Starting validation of meta file
72
INFO: meta_structural_variants.txt: Validation of meta file complete
73
74
DEBUG: meta_study.txt: Starting validation of meta file
75
INFO: meta_study.txt: Validation of meta file complete
76
77
DEBUG: -: Study Tag file found. It will be validated.
78
79
DEBUG: meta_treatment_ec50.txt: Starting validation of meta file
80
INFO: meta_treatment_ec50.txt: Validation of meta file complete
81
82
DEBUG: meta_treatment_ic50.txt: Starting validation of meta file
83
INFO: meta_treatment_ic50.txt: Validation of meta file complete
84
85
DEBUG: data_cancer_type.txt: Starting validation of file
86
INFO: data_cancer_type.txt: Validation of file complete
87
INFO: data_cancer_type.txt: Read 1 lines. Lines with warning: 0. Lines with error: 0
88
89
DEBUG: data_clinical_samples.txt: Starting validation of file
90
INFO: data_clinical_samples.txt: Validation of file complete
91
INFO: data_clinical_samples.txt: Read 847 lines. Lines with warning: 0. Lines with error: 0
92
93
DEBUG: data_resource_definition.txt: Starting validation of file
94
INFO: data_resource_definition.txt: Validation of file complete
95
INFO: data_resource_definition.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
96
97
DEBUG: data_resource_sample.txt: Starting validation of file
98
INFO: data_resource_sample.txt: Validation of file complete
99
INFO: data_resource_sample.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
100
101
DEBUG: study_tags.yml: Starting validation of study tags file
102
INFO: study_tags.yml: Validation of study tags file complete.
103
104
DEBUG: -: Validating case lists
105
106
DEBUG: case_lists/cases_cnaseq.txt: Starting validation of meta file
107
INFO: case_lists/cases_cnaseq.txt: Validation of meta file complete
108
109
DEBUG: case_lists/cases_test.txt: Starting validation of meta file
110
INFO: case_lists/cases_test.txt: Validation of meta file complete
111
112
DEBUG: case_lists/cases_sequenced.txt: Starting validation of meta file
113
INFO: case_lists/cases_sequenced.txt: Validation of meta file complete
114
115
DEBUG: case_lists/cases_custom.txt: Starting validation of meta file
116
INFO: case_lists/cases_custom.txt: Validation of meta file complete
117
118
DEBUG: case_lists/cases_cna.txt: Starting validation of meta file
119
INFO: case_lists/cases_cna.txt: Validation of meta file complete
120
121
INFO: -: Validation of case list folder complete
122
123
DEBUG: data_gene_panel_matrix.txt: Starting validation of file
124
INFO: data_gene_panel_matrix.txt: line 1: This column can be replaced by a 'gene_panel' property in the respective meta file; value encountered: 'gistic'
125
INFO: data_gene_panel_matrix.txt: Validation of file complete
126
INFO: data_gene_panel_matrix.txt: Read 21 lines. Lines with warning: 0. Lines with error: 0
127
128
DEBUG: data_cna_discrete.txt: Starting validation of file
129
INFO: data_cna_discrete.txt: Validation of file complete
130
INFO: data_cna_discrete.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0
131
132
DEBUG: data_clinical_patients.txt: Starting validation of file
133
INFO: data_clinical_patients.txt: Validation of file complete
134
INFO: data_clinical_patients.txt: Read 845 lines. Lines with warning: 0. Lines with error: 0
135
136
DEBUG: data_expression_median.txt: Starting validation of file
137
INFO: data_expression_median.txt: Validation of file complete
138
INFO: data_expression_median.txt: Read 7 lines. Lines with warning: 0. Lines with error: 0
139
140
DEBUG: data_expression_median_Zscores.txt: Starting validation of file
141
INFO: data_expression_median_Zscores.txt: Validation of file complete
142
INFO: data_expression_median_Zscores.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0
143
144
DEBUG: data_fusions.txt: Starting validation of file
145
INFO: data_fusions.txt: Validation of file complete
146
INFO: data_fusions.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0
147
148
DEBUG: data_mutational_signature.txt: Starting validation of file
149
INFO: data_mutational_signature.txt: Validation of file complete
150
INFO: data_mutational_signature.txt: Read 62 lines. Lines with warning: 0. Lines with error: 0
151
152
DEBUG: data_treatment_ec50.txt: Starting validation of file
153
INFO: data_treatment_ec50.txt: Validation of file complete
154
INFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0
155
156
DEBUG: data_treatment_ic50.txt: Starting validation of file
157
INFO: data_treatment_ic50.txt: Validation of file complete
158
INFO: data_treatment_ic50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0
159
160
DEBUG: data_gistic_genes_amp.txt: Starting validation of file
161
INFO: data_gistic_genes_amp.txt: Validation of file complete
162
INFO: data_gistic_genes_amp.txt: Read 13 lines. Lines with warning: 0. Lines with error: 0
163
164
DEBUG: data_gsva_pvalues.txt: Starting validation of file
165
INFO: data_gsva_pvalues.txt: Validation of file complete
166
INFO: data_gsva_pvalues.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
167
168
DEBUG: data_gsva_scores.txt: Starting validation of file
169
INFO: data_gsva_scores.txt: Validation of file complete
170
INFO: data_gsva_scores.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
171
172
DEBUG: data_cna_log2.txt: Starting validation of file
173
INFO: data_cna_log2.txt: Validation of file complete
174
INFO: data_cna_log2.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
175
176
DEBUG: data_methylation_hm27.txt: Starting validation of file
177
INFO: data_methylation_hm27.txt: Validation of file complete
178
INFO: data_methylation_hm27.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0
179
180
DEBUG: data_mutations_extended.maf: Starting validation of file
181
INFO: data_mutations_extended.maf: lines [4, 5, 6, (3 more)]: column 164: Values contained in the column cbp_driver_tiers that will appear in the "Mutation Color" menu of the Oncoprint; values encountered: ['Class 2', 'Class 1', 'Class 4', '(1 more)']
182
INFO: data_mutations_extended.maf: lines [7, 9]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; value encountered: 'Silent'
183
INFO: data_mutations_extended.maf: Validation of file complete
184
INFO: data_mutations_extended.maf: Read 35 lines. Lines with warning: 0. Lines with error: 0
185
186
DEBUG: data_resource_patient.txt: Starting validation of file
187
INFO: data_resource_patient.txt: Validation of file complete
188
INFO: data_resource_patient.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
189
190
DEBUG: data_resource_study.txt: Starting validation of file
191
INFO: data_resource_study.txt: Validation of file complete
192
INFO: data_resource_study.txt: Read 2 lines. Lines with warning: 0. Lines with error: 0
193
194
DEBUG: data_cna_hg19.seg: Starting validation of file
195
INFO: data_cna_hg19.seg: Validation of file complete
196
INFO: data_cna_hg19.seg: Read 10 lines. Lines with warning: 0. Lines with error: 0
197
198
DEBUG: data_structural_variants.txt: Starting validation of file
199
INFO: data_structural_variants.txt: Validation of file complete
200
INFO: data_structural_variants.txt: Read 46 lines. Lines with warning: 0. Lines with error: 0
201
202
INFO: -: Validation complete
203
Validation of study succeeded.
Copied!

Example 4: generating the portal info folder ###

The portal information files can be generated on the server, using the dumpPortalInfo script. Go to <cbioportal_source_folder>/core/src/main/scripts, make sure the environment variables $JAVA_HOME and $PORTAL_HOME are set, and run dumpPortalInfo.pl with the name of the directory you want to create:
1
export JAVA_HOME='/usr/lib/jvm/default-java'
2
export PORTAL_HOME=<cbioportal_configuration_folder>
3
./dumpPortalInfo.pl /home/johndoe/my_portal_info_folder/
Copied!

Example 5: validating without portal-specific information ###

Alternatively, you can run the validation script with the -n/--no_portal_checks flag to entirely skip checks relating to installation-specific metadata. Be warned that files succeeding this validation may still fail to load (correctly).
1
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -n -v
Copied!
1
WARNING: -: Skipping validations relating to cancer types defined in the portal
2
WARNING: -: Skipping validations relating to gene identifiers and aliases defined in the portal
3
WARNING: -: Skipping validations relating to gene set identifiers
4
WARNING: -: Skipping validations relating to gene panel identifiers
5
6
DEBUG: meta_cancer_type.txt: Starting validation of meta file
7
INFO: meta_cancer_type.txt: Validation of meta file complete
8
9
DEBUG: meta_clinical_patients.txt: Starting validation of meta file
10
INFO: meta_clinical_patients.txt: Validation of meta file complete
11
12
DEBUG: meta_clinical_samples.txt: Starting validation of meta file
13
INFO: meta_clinical_samples.txt: Validation of meta file complete
14
15
DEBUG: meta_cna_discrete.txt: Starting validation of meta file
16
INFO: meta_cna_discrete.txt: Validation of meta file complete
17
18
DEBUG: meta_cna_hg19_seg.txt: Starting validation of meta file
19
INFO: meta_cna_hg19_seg.txt: Validation of meta file complete
20
21
DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'
22
23
DEBUG: meta_cna_log2.txt: Starting validation of meta file
24
INFO: meta_cna_log2.txt: Validation of meta file complete
25
26
DEBUG: meta_expression_median.txt: Starting validation of meta file
27
INFO: meta_expression_median.txt: Validation of meta file complete
28
29
DEBUG: meta_expression_median_Zscores.txt: Starting validation of meta file
30
INFO: meta_expression_median_Zscores.txt: Validation of meta file complete
31
32
DEBUG: meta_fusions.txt: Starting validation of meta file
33
INFO: meta_fusions.txt: Validation of meta file complete
34
35
DEBUG: meta_gene_panel_matrix.txt: Starting validation of meta file
36
INFO: meta_gene_panel_matrix.txt: Validation of meta file complete
37
38
DEBUG: meta_gistic_genes_amp.txt: Starting validation of meta file
39
INFO: meta_gistic_genes_amp.txt: Validation of meta file complete
40
41
DEBUG: -: Retrieving chromosome lengths from '/home/sander/git/cbioportal/core/src/main/scripts/importer/chromosome_sizes.json'
42
43
DEBUG: meta_gsva_pvalues.txt: Starting validation of meta file
44
INFO: meta_gsva_pvalues.txt: Validation of meta file complete
45
46
DEBUG: meta_gsva_scores.txt: Starting validation of meta file
47
INFO: meta_gsva_scores.txt: Validation of meta file complete
48
49
DEBUG: meta_methylation_hm27.txt: Starting validation of meta file
50
INFO: meta_methylation_hm27.txt: Validation of meta file complete
51
52
DEBUG: meta_mutational_signature.txt: Starting validation of meta file
53
INFO: meta_mutational_signature.txt: Validation of meta file complete
54
55
DEBUG: meta_mutations_extended.txt: Starting validation of meta file
56
INFO: meta_mutations_extended.txt: Validation of meta file complete
57
58
DEBUG: meta_resource_definition.txt: Starting validation of meta file
59
INFO: meta_resource_definition.txt: Validation of meta file complete
60
61
DEBUG: meta_resource_patient.txt: Starting validation of meta file
62
INFO: meta_resource_patient.txt: Validation of meta file complete
63
64
DEBUG: meta_resource_sample.txt: Starting validation of meta file
65
INFO: meta_resource_sample.txt: Validation of meta file complete
66
67
DEBUG: meta_resource_study.txt: Starting validation of meta file
68
INFO: meta_resource_study.txt: Validation of meta file complete
69
70
DEBUG: meta_structural_variants.txt: Starting validation of meta file
71
INFO: meta_structural_variants.txt: Validation of meta file complete
72
73
DEBUG: meta_study.txt: Starting validation of meta file
74
INFO: meta_study.txt: Validation of meta file complete
75
76
DEBUG: -: Study Tag file found. It will be validated.
77
78
DEBUG: meta_treatment_ec50.txt: Starting validation of meta file
79
INFO: meta_treatment_ec50.txt: Validation of meta file complete
80
81
DEBUG: meta_treatment_ic50.txt: Starting validation of meta file
82
INFO: meta_treatment_ic50.txt: Validation of meta file complete
83
84
DEBUG: data_cancer_type.txt: Starting validation of file
85
INFO: data_cancer_type.txt: Validation of file complete
86
INFO: data_cancer_type.txt: Read 1 lines. Lines with warning: 0. Lines with error: 0
87
88
DEBUG: data_clinical_samples.txt: Starting validation of file
89
INFO: data_clinical_samples.txt: Validation of file complete
90
INFO: data_clinical_samples.txt: Read 847 lines. Lines with warning: 0. Lines with error: 0
91
92
DEBUG: data_resource_definition.txt: Starting validation of file
93
INFO: data_resource_definition.txt: Validation of file complete
94
INFO: data_resource_definition.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
95
96
DEBUG: data_resource_sample.txt: Starting validation of file
97
INFO: data_resource_sample.txt: Validation of file complete
98
INFO: data_resource_sample.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
99
100
DEBUG: study_tags.yml: Starting validation of study tags file
101
INFO: study_tags.yml: Validation of study tags file complete.
102
103
DEBUG: -: Validating case lists
104
105
DEBUG: case_lists/cases_cnaseq.txt: Starting validation of meta file
106
INFO: case_lists/cases_cnaseq.txt: Validation of meta file complete
107
108
DEBUG: case_lists/cases_test.txt: Starting validation of meta file
109
INFO: case_lists/cases_test.txt: Validation of meta file complete
110
111
DEBUG: case_lists/cases_sequenced.txt: Starting validation of meta file
112
INFO: case_lists/cases_sequenced.txt: Validation of meta file complete
113
114
DEBUG: case_lists/cases_custom.txt: Starting validation of meta file
115
INFO: case_lists/cases_custom.txt: Validation of meta file complete
116
117
DEBUG: case_lists/cases_cna.txt: Starting validation of meta file
118
INFO: case_lists/cases_cna.txt: Validation of meta file complete
119
120
INFO: -: Validation of case list folder complete
121
122
DEBUG: data_gene_panel_matrix.txt: Starting validation of file
123
INFO: data_gene_panel_matrix.txt: line 1: This column can be replaced by a 'gene_panel' property in the respective meta file; value encountered: 'gistic'
124
INFO: data_gene_panel_matrix.txt: Validation of file complete
125
INFO: data_gene_panel_matrix.txt: Read 21 lines. Lines with warning: 0. Lines with error: 0
126
127
DEBUG: data_cna_discrete.txt: Starting validation of file
128
INFO: data_cna_discrete.txt: Validation of file complete
129
INFO: data_cna_discrete.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0
130
131
DEBUG: data_clinical_patients.txt: Starting validation of file
132
INFO: data_clinical_patients.txt: Validation of file complete
133
INFO: data_clinical_patients.txt: Read 845 lines. Lines with warning: 0. Lines with error: 0
134
135
DEBUG: data_expression_median.txt: Starting validation of file
136
INFO: data_expression_median.txt: Validation of file complete
137
INFO: data_expression_median.txt: Read 7 lines. Lines with warning: 0. Lines with error: 0
138
139
DEBUG: data_expression_median_Zscores.txt: Starting validation of file
140
INFO: data_expression_median_Zscores.txt: Validation of file complete
141
INFO: data_expression_median_Zscores.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0
142
143
DEBUG: data_fusions.txt: Starting validation of file
144
INFO: data_fusions.txt: Validation of file complete
145
INFO: data_fusions.txt: Read 6 lines. Lines with warning: 0. Lines with error: 0
146
147
DEBUG: data_mutational_signature.txt: Starting validation of file
148
INFO: data_mutational_signature.txt: Validation of file complete
149
INFO: data_mutational_signature.txt: Read 62 lines. Lines with warning: 0. Lines with error: 0
150
151
DEBUG: data_treatment_ec50.txt: Starting validation of file
152
INFO: data_treatment_ec50.txt: Validation of file complete
153
INFO: data_treatment_ec50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0
154
155
DEBUG: data_treatment_ic50.txt: Starting validation of file
156
INFO: data_treatment_ic50.txt: Validation of file complete
157
INFO: data_treatment_ic50.txt: Read 11 lines. Lines with warning: 0. Lines with error: 0
158
159
DEBUG: data_gistic_genes_amp.txt: Starting validation of file
160
INFO: data_gistic_genes_amp.txt: Validation of file complete
161
INFO: data_gistic_genes_amp.txt: Read 13 lines. Lines with warning: 0. Lines with error: 0
162
163
DEBUG: data_gsva_pvalues.txt: Starting validation of file
164
INFO: data_gsva_pvalues.txt: Validation of file complete
165
INFO: data_gsva_pvalues.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
166
167
DEBUG: data_gsva_scores.txt: Starting validation of file
168
INFO: data_gsva_scores.txt: Validation of file complete
169
INFO: data_gsva_scores.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
170
171
DEBUG: data_cna_log2.txt: Starting validation of file
172
INFO: data_cna_log2.txt: Validation of file complete
173
INFO: data_cna_log2.txt: Read 8 lines. Lines with warning: 0. Lines with error: 0
174
175
DEBUG: data_methylation_hm27.txt: Starting validation of file
176
INFO: data_methylation_hm27.txt: Validation of file complete
177
INFO: data_methylation_hm27.txt: Read 9 lines. Lines with warning: 0. Lines with error: 0
178
179
DEBUG: data_mutations_extended.maf: Starting validation of file
180
INFO: data_mutations_extended.maf: lines [4, 5, 6, (3 more)]: column 164: Values contained in the column cbp_driver_tiers that will appear in the "Mutation Color" menu of the Oncoprint; values encountered: ['Class 2', 'Class 1', 'Class 4', '(1 more)']
181
INFO: data_mutations_extended.maf: lines [7, 9]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; value encountered: 'Silent'
182
INFO: data_mutations_extended.maf: Validation of file complete
183
INFO: data_mutations_extended.maf: Read 35 lines. Lines with warning: 0. Lines with error: 0
184
185
DEBUG: data_resource_patient.txt: Starting validation of file
186
INFO: data_resource_patient.txt: Validation of file complete
187
INFO: data_resource_patient.txt: Read 4 lines. Lines with warning: 0. Lines with error: 0
188
189
DEBUG: data_resource_study.txt: Starting validation of file
190
INFO: data_resource_study.txt: Validation of file complete
191
INFO: data_resource_study.txt: Read 2 lines. Lines with warning: 0. Lines with error: 0
192
193
DEBUG: data_cna_hg19.seg: Starting validation of file
194
INFO: data_cna_hg19.seg: Validation of file complete
195
INFO: data_cna_hg19.seg: Read 10 lines. Lines with warning: 0. Lines with error: 0
196
197
DEBUG: data_structural_variants.txt: Starting validation of file
198
INFO: data_structural_variants.txt: Validation of file complete
199
INFO: data_structural_variants.txt: Read 46 lines. Lines with warning: 0. Lines with error: 0
200
201
INFO: -: Validation complete
202
Validation of study succeeded with warnings.
Copied!

Validation of non-human data ##

When importing a study with a reference genome other than hg19/GRCh37, this should be specified in the meta_study.txt file, next to the reference_genome field. Supported values are hg19, hg38 and mm10.
cBioPortal is gradually introducing support for mouse. If you want to load mouse studies and you have to set up your database for mouse.
As an example, the command for the mouse example using the three parameters is given:
1
./validateData.py -s ../../../test/scripts/test_data/study_es_0/ -P ../../../../../src/main/resources/portal.properties -u http://localhost:8080 -v
Copied!

Running the validator for multiple studies

The importer folder <cbioportal_source_folder>/core/src/main/scripts/importer also contains a script for running the validator for multiple studies:
1
./validateStudies.py --help
Copied!
The following parameters can be used:
1
usage: validateStudies.py [-h] [-d ROOT_DIRECTORY] [-l LIST_OF_STUDIES]
2
[-html HTML_FOLDER]
3
[-u URL_SERVER | -p PORTAL_INFO_DIR | -n]
4
[-P PORTAL_PROPERTIES] [-m]
5
6
Wrapper where cBioPortal study validator is run for multiple studies
7
8
optional arguments:
9
-h, --help show this help message and exit
10
-d ROOT_DIRECTORY, --root-directory ROOT_DIRECTORY
11
Path to directory with all studies that should be
12
validated
13
-l LIST_OF_STUDIES, --list-of-studies LIST_OF_STUDIES
14
List with paths of studies which should be validated
15
-html HTML_FOLDER, --html-folder HTML_FOLDER
16
Path to folder for output HTML reports
17
-u URL_SERVER, --url_server URL_SERVER
18
URL to cBioPortal server. You can set this if your URL
19
is not http://localhost:8080
20
-p PORTAL_INFO_DIR, --portal_info_dir PORTAL_INFO_DIR
21
Path to a directory of cBioPortal info files to be
22
used instead of contacting a server
23
-n, --no_portal_checks
24
Skip tests requiring information from the cBioPortal
25
installation
26
-m, --strict_maf_checks
27
Option to enable strict mode for validator when
28
validating mutation data
Copied!
Parameters --url_server, --portal_info_dir, --no_portal_checks and --portal_properties are equal to the parameters with the same name in validateData.py. The script will save a log file with validation output (log-validate-studies.txt) and output the validation status from the input studies:
1
=== Validating study ../../../test/scripts/test_data/study_es_0
2
Result: VALID (WITH WARNINGS)
3
4
=== Validating study ../../../test/scripts/test_data/study_es_1
5
Result: INVALID
6
7
=== Validating study ../../../test/scripts/test_data/study_es_invalid
8
directory cannot be found: ../../../test/scripts/test_data/study_es_invalid
9
Result: INVALID (PROBLEMS OCCURRED)
Copied!

Example 1: Root directory parameter

Validation can be run for all studies in a certain directory by using the --root-directory parameter. The script will append each folder in the root directory to the study list to validate:
1
./validateStudies.py -d ../../../test/scripts/test_data/
Copied!

Example 2: List of studies parameter

Validation can also be run for specific studies by using the --list-of-studies parameter. The paths to the different studies can be defined and seperated by a comma:
1
./validateStudies.py -l ../../../test/scripts/test_data/study_es_0,../../../test/scripts/test_data/study_es_1
Copied!

Example 3: Combination root directory and list of studies parameter

Validation can also be run on specific studies in a certain directory by combining the --root-directory and --list-of-studies parameter:
1
./validateStudies.py -d ../../../test/scripts/test_data/ -l study_es_0,study_es_1
Copied!

Example 4: HTML folder parameter

When HTML validation reports are desired, an output folder for these HTML files can be specified. This folder does not have to exist, the script can create the folder. The HTML validation reports will get the following name: <study_name>-validation.html. To create HTML validation reports for each study the --html-folder parameter needs to be defined:
1
./validateStudies.py -d ../../../test/scripts/test_data/ -l study_es_0,study_es_1 -html ../../../test/scripts/test_data/validation-reports
Copied!
Last modified 1mo ago