# Onco Query Language (OQL)

# Introduction to OQL

The Onco Query Language (OQL) is used to define which specific types of alterations are included in a query on the cBioPortal for Cancer Genomics. By default, querying for a gene includes mutations, fusions, amplifications and deep deletions. OQL can be used to specify specific mutations (e.g. BRAF V600E) or types of mutations (e.g. BRCA1 truncating mutations), lower level copy number alterations (e.g. CDKN2A shallow deletions), changes in mRNA or protein expression, and more.

OQL-specified alterations will be reflected on most tabs, including OncoPrint, but are not currently reflected on the Plots, Co-Expression or Expression tabs.

Note that OQL assumes any word that it doesn't recognize is a mutation code.

Additional explanation and examples using OQL are available in the OQL tutorial.

# OQL Keywords

Users can define specific subsets of genetic alterations for five data types:

Data Type Keywords and Syntax Default*
Mutations MUT All non-synonymous mutations
MUT = <protein change> Specific amino acid changes (e.g. V600E or V600)
MUT = <mutation type> Acceptable values are: MISSENSE, NONSENSE, NONSTART, NONSTOP, FRAMESHIFT, INFRAME, SPLICE, TRUNC
MUT
Fusions FUSION All fusions (note that many studies lack fusion data) FUSION
Copy Number Alterations AMP Amplifications
HOMDEL Deep Deletions
GAIN Gains
HETLOSS Shallow Deletions
Comparison operators can also be used with CNA (e.g. CNA >= GAIN is the same as AMP GAIN)
AMP
HOMDEL
mRNA Expression EXP < -x mRNA expression is less than x standard deviations (SD) below the mean
EXP > x mRNA expression is greater than x SD above the mean
The comparison operators <= and >= also work
EXP >= 2
EXP <= -2
Protein/phosphoprotein level PROT < -x Protein expression is less than x standard deviations (SD) below the mean
PROT > x Protein expression is greater than x SD above the mean
The comparison operators <= and >= also work
PROT >= 2
PROT <= -2

* These are the default OQL keywords used for each data type when a gene is queried without any explicit OQL.

# OQL modifiers

Mutations and copy number alterations can be further refined using modifiers:

Keyword Applicable Data Type Explanation
DRIVER Mutations
Fusions
Copy Number Alterations
Include only mutations, fusions and copy number alterations which are driver events, as defined in OncoPrint (default: OncoKB and CancerHotspots).
GERMLINE Mutations Include only mutations that are defined as germline events by the study.
SOMATIC Mutations Include all mutations that are not defined as germline.
(a-b) (protein position range) Mutations Include all mutations that overlap with the protein position range a-b, where a and b are integers. If you add a * (i.e. (a-b*)) then it will only include those mutations that are fully contained inside a-b. The open-ended ranges (a-) and (-b) are also allowed.

# Basic Usage

When querying a gene without providing any OQL specifications, cBioPortal will default to these OQL terms for a query with Mutation and Copy Number selected in the Genomic Profiles section: MUT FUSION AMP HOMDEL

image of basic query
image of basic query

You can see the OQL terms applied by hovering over the gene name in OncoPrint:

image of basic query oncoprint
image of basic query oncoprint

If you select RNA and/or Protein in the "Genomic Profiles" section of the query, the default settings are:

RNA: EXP >= 2 EXP <= -2

Protein: PROT >= 2 PROT <= -2

image of exp prot query oncoprint
image of exp prot query oncoprint

You must select the relevant Genomic Profile in order for OQL to query that data type. For example, you can't add EXP > 2 to the query without also selecting an RNA profile.

Proper formatting for OQL is straightforward: gene name, followed by a colon, followed by any OQL keywords and ending in a semicolon, an end-of-line, or both.

GENE1: OQL KEYWORDS;
GENE2: OQL KEYWORDS

In general, any combination of OQL keywords and/or expressions can annotate any gene, and the order of the keywords is immaterial.

Below we will go into greater detail about each data type.

# Mutations

To view cases with specific mutations, provide the specific amino acid change of interest:

BRAF: MUT = V600E

You can also view all mutations at a particular position:

BRAF: MUT = V600

Or all mutations of a specific type:

TP53: MUT = <mutation type>

<mutation type> can be one or more of:

  • MISSENSE
  • NONSENSE
  • NONSTART
  • NONSTOP
  • FRAMESHIFT
  • INFRAME
  • SPLICE
  • TRUNC

For example, to view TP53 truncating mutations and in-frame insertions/deletions:

TP53: MUT = TRUNC INFRAME

OQL for mutations can also be written without MUT =. The following examples are identical:

BRAF: MUT = V600E
BRAF: V600E
TP53: MUT = TRUNC INFRAME
TP53: TRUNC INFRAME

OQL can also be used to exclude a specific protein change, position or type of mutation. For example, below are examples to query all EGFR mutations except T790M, all BRAF mutations except those at V600 and all TP53 mutations except missense:

EGFR: MUT != T790M
BRAF: MUT != V600
TP53: MUT != MISSENSE

Note that this will only work to exclude a single event. Because OQL uses 'OR' logic, excluding multiple mutations or excluding a mutation while including another mutation (e.g. BRAF: MUT=V600 MUT!=V600E) will result in querying all mutations.

# Copy Number Alterations

To view cases with specific copy number alterations, provide the appropriate keywords for the copy number alterations of interest. For example, to see amplifications:

CCNE1: AMP

Or amplified and gained cases:

CCNE1: CNA >= GAIN

Which can also be written as:

CCNE1: GAIN AMP

# Expression

High or low mRNA expression of a gene is determined by the number of standard deviations (SD) from the mean. For example, to see cases where mRNA for CCNE1 is greater than 3 SD above the mean:

CCNE1: EXP > 3

# Protein

High or low protein expression is similarly determined by the number of SD from the mean. For example, to see cases where protein expression is 2 SD above the mean:

EGFR: PROT > 2

Protein expression can also be queried at the phospho-protein level:

EGFR_PY992: PROT > 2

# Modifiers

Modifiers can be used on their own or in combination with other OQL terms for mutations, fusions and copy number alterations to further refine the query. Modifiers can be combined with other OQL terms using an underscore. The order in which terms are combined is immaterial.

# Driver

The DRIVER modifier applies to mutations, fusions and copy number alterations. The definition of what qualifies as a driver alteration comes from the "Mutation Color" menu in OncoPrint. By default, drivers are defined as mutations, fusions and copy number alterations in OncoKB or CancerHotspots.

On its own, the DRIVER modifier includes driver mutations, fusions and copy number alterations:

EGFR: DRIVER

Or it can be used in combination with another OQL term. For example, to see only driver fusion events:

EGFR: FUSION_DRIVER

Or driver missense mutations:

EGFR: MUT = MISSENSE_DRIVER

When combining DRIVER with another OQL term, the order doesn't matter: MUT_DRIVER and DRIVER_MUT are equivalent. DRIVER can be combined with:

  • MUT
  • MUT = <mutation type> or MUT = <protein change>
  • FUSION
  • CNA
  • AMP or GAIN or HETLOSS or HOMDEL
  • GERMLINE or SOMATIC (see below)

# Germline/Somatic

The GERMLINE and SOMATIC modifiers only apply to mutations. A mutation can be explicitly defined as germline during the data curation process. Note that very few studies on the public cBioPortal contain germline data.

GERMLINE or SOMATIC can be combined with:

  • MUT
  • MUT = <mutation type> or MUT = <protein change>
  • DRIVER

To see all germline BRCA1 mutations:

BRCA1: GERMLINE

Or to see specifically truncating germline mutations:

BRCA1: TRUNC_GERMLINE
BRCA1: GERMLINE_TRUNC

The order is immaterial; both options produce identical results.

Or to see somatic missense mutations:

BRCA1: MUT = MISSENSE_SOMATIC

GERMLINE or SOMATIC can also be combined with DRIVER and, optionally, a more specific mutation term (e.g. NONSENSE):

BRCA1: NONSENSE_GERMLINE_DRIVER

# The DATATYPES Command

To save copying and pasting, the DATATYPES command sets the genetic annotation for all subsequent genes. Thus,

DATATYPES: AMP GAIN HOMDEL EXP > 1.5 EXP < -1.5; CDKN2A MDM2 TP53

is equivalent to:

CDKN2A: AMP GAIN HOMDEL EXP > 1.5 EXP < -1.5
MDM2: AMP GAIN HOMDEL EXP > 1.5 EXP < -1.5
TP53: AMP GAIN HOMDEL EXP > 1.5 EXP < -1.5

# Merged Gene Tracks

OQL can be used to create a merged gene track in OncoPrint, in which alterations in multiple genes appear as a single track. This is done by enclosing a list of genes in square brackets. By default, the track will be labeled by the gene names, separated by '/'. To instead specify a label, type the desired label within double quotes at the beginning of the square brackets. For example:

["CDK INHIBITORS" CDKN2A CDKN2B]
[MDM2 MDM4]

The resulting merged gene track will be visible in OncoPrint and can be expanded to view the individual gene tracks. For example:

Image of merged genes in OncoPrint
Image of merged genes in OncoPrint

https://www.cbioportal.org/results/oncoprint?session_id=5c1966e2e4b05228701f958e

It is possible to include OQL for specific alterations in merged gene tracks, as well as querying a combination of single and merged gene tracks.

Note that merged gene tracks only appear in OncoPrint. All other pages show the individual genes.

# Example: RB Pathway Alterations

Provided below is one example of the power of using OQL. Additional examples are available in the OQL tutorial.

# Using the Defaults

Select Ovarian Serous Cystadenocarcinoma (TCGA, Nature 2011) with the following data types:

  • Mutations
  • Putative copy-number alterations (GISTIC)
  • mRNA expression (mRNA expression Z-scores (all genes))

Input the following three genes in the RB pathway:

  • CCNE1
  • RB1
  • CDKN2A

image of rb query
image of rb query

Submit this query and note how many samples have alterations in multiple of these genes:

image of rb oncoprint
image of rb oncoprint

https://www.cbioportal.org/results/oncoprint?session_id=5c1966cee4b05228701f958d

# Greater Insight with OQL

Given what is known about the RB pathway, the events that are most likely selected for in the tumors are CCNE1 amplification, RB1 deletions or mutations, and loss of expression of CDKN2A. To investigate this hypothesis, we can use OQL to display only these events. Modify the query to reflect this:

CCNE1: AMP MUT
RB1: HOMDEL MUT
CDKN2A: HOMDEL EXP < -1

Examine the updated OncoPrint:

image of modified rb oncoprint
image of modified rb oncoprint

https://www.cbioportal.org/results/oncoprint?session_id=5c1966aee4b05228701f958c

This shows that alterations in these genes are almost entirely mutually-exclusive -- no cases are altered in all three genes and only six are altered in two genes. This supports the theory that the tumor has selected for these events.

# Questions? Feedback?

Please share any questions or feedback on OQL with us: https://groups.google.com/group/cbioportal

Also note that additional explanation and examples using OQL are available in the OQL tutorial.