Energy Parameters
Modified Bases
The functions vrna_sc_mod()
, vrna_sc_mod_json()
and alike
implement an energy correction framework to account for modified bases in the
secondary structure predictions. To supply these functions with the energy
parameters and general specifications of the base modification, the following
JSON
data format may be used:
JSON data must consist of a header section modified_bases
This header
is an object with the mandatory keys:
name
specifying a name of the modified baseunmodified
that consists of a single upper-case letter of the unmodified version of this base,the
one_letter_code
key to specify which letter is used for the modified bases in the subsequent energy parameters, andan array of pairing_partners`
The latter must be uppercase characters. An optional sources
key may contain
an array of related publications, e.g. those the parameters have been derived from.
Next to the header may follow additional keys to specify the actual energy
contributions of the modified base in various loop contexts. All energy
contributions must be specified in free energies \(\Delta G\) in units of
\(\text{kcal} \cdot \text{mol}^{-1}\). To allow for rescaling of the free
energies at temperatures that differ from the default (\(37^\circ C\)),
enthalpy parameters \(\Delta H\) may be specified as well. Those, however
are optional. The keys for free energy (at \(37^\circ C\)) and enthalpy
parameters have the suffixes _energies
and _enthalpies
, respectively.
The parser and underlying framework currently supports the following loop contexts:
base pair stacks (via the
stacking
key prefix).This key must point to an object with one key value pair for each stacking interaction data is provided for. Here, the key consists of four upper-case characters denoting the interacting bases, where the the first two represent one strand in 5’ to 3’ direction and the last two the opposite strand in 3’ to 5’ direction. The values are energies in \(kcal \cdot mol^{-1}\).
terminal mismatches (via the
mismatch
key prefix).This key points to an object with key value pairs for each mismatch energy parameter that is available. Keys are 4 characters long nucleotide one-letter codes as used in base pair stacks above. The second and fourth character denote the two unpaired mismatching bases, while the other two represent the closing base pair.
dangling ends (via the
dangle5
anddangle3
key prefixes).The object behind these keys, again, consists of key value pairs for each dangling end energy parameter. Keys are 3 characters long where the first two represent the two nucleotides that form the base pair, and the third is the unpaired base that either stacks on the 3’ or 5’ end of the enclosed part of the base pair.
terminal pairs (via the
terminal
key prefix).Terminal base pairs, such as AU or GU, sometimes receive an additional energy penalty. The object behind this key may list energy parameters to apply whenever particular base pairs occur at the end of a helix. Each of those parameters is specified as key value pair, where the key consists of two upper-case characters denoting the terminal base pair.
Below is a JSON template specifying most of the possible input
parameters. Actual energy parameter files can be found in the
source code tarball within the misc/
subdirectory.
{
"modified_base" : {
"name" : "My modification (M)",
"sources" : [
{
"authors" : "Author 1, Author 2",
"title" : "UV-melting of modified oligos",
"journal" : "Some journal",
"year" : 2022,
"doi" : "10.0000/000000"
}
],
"unmodified" : "G",
"pairing_partners" : [
"U","A"
],
"one_letter_code" : "M",
"fallback" : "G",
"stacking_energies" : {
"MAUU" : -1.2,
"AGMC" : -2.73
},
"stacking_enthalpies" : {
"MAUU" : -11.1,
"AGMC" : -9.73
},
"terminal_energies" : {
"MU" : 0.5,
"UM" : 0.5
},
"terminal_enthalpies" : {
"MU" : 2.0,
"UM" : 2.0
},
"mismatch_energies" : {
"CMGM" : -1.11,
"AGUM" : -0.73
},
"mismatch_enthalpies" : {
"CMGM" : -11.11,
"AGUM" : -7.73
},
"dangle5_energies" : {
"UAM" : -1.01
},
"dangle5_enthalpies" : {
"UAM" : -6.01
},
"dangle3_energies" : {
"CGM" : -2.1,
"GCM" : -1.3
}
}
}
An actual example of real-world data may look like
{
"modified_base" : {
"name" : "Pseudouridine",
"sources" : [
{
"authors": "Graham A. Hudson, Richard J. Bloomingdale, and Brent M. Znosko",
"title" : "Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides",
"journal" : "RNA 19:1474-1482",
"year" : 2013,
"doi" : "10.1261/rna.039610.113"
}
],
"unmodified" : "U",
"pairing_partners" : [
"A"
],
"one_letter_code" : "P",
"fallback" : "U",
"stacking_energies" : {
"APUA" : -2.8,
"CPGA" : -2.77,
"GPCA" : -3.29,
"UPAA" : -1.62,
"PAAU" : -2.10,
"PCAG" : -2.49,
"PGAC" : -2.2,
"PUAA" : -2.74
},
"stacking_enthalpies" : {
"APUA" : -22.08,
"CPGA" : -16.23,
"GPCA" : -24.07,
"UPAA" : -20.81,
"PAAU" : -12.47,
"PCAG" : -17.29,
"PGAC" : -11.19,
"PUAA" : -26.94
},
"terminal_energies" : {
"PA" : 0.31,
"AP" : 0.31
},
"terminal_enthalpies" : {
"PA" : -2.04,
"AP" : -2.04
},
"duplexes" : {
"CGAPACGGCUAUGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -9.93,
"dG37_p" : -10.12
},
"CGCPACGGCGAUGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -10.96,
"dG37_p" : -11.17
},
"CGGPACGGCCAUGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -11.71,
"dG37_p" : -11.53
},
"CGUPACGGCAAUGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -9.10,
"dG37_p" : -8.83
},
"CGAPCCGGCUAGGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -11.92,
"dG37_p" : -11.53
},
"CGCPCCGGCGAGGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -12.93,
"dG37_p" : -12.57
},
"CGGPCCGGCCAGGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -12.76,
"dG37_p" : -12.94
},
"CGUPCCGGCAAGGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -9.76,
"dG37_p" : -10.24
},
"CGAPGCGGCUACGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -11.45,
"dG37_p" : -11.40
},
"CGCPGCGGCGACGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -12.35,
"dG37_p" : -12.45
},
"CGGPGCGGCCACGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -12.59,
"dG37_p" : -12.81
},
"CGUPGCGGCAACGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -10.34,
"dG37_p" : -10.11
},
"CGAPUCGGCUAAGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -10.42,
"dG37_p" : -10.86
},
"CGCPUCGGCGAAGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -12.06,
"dG37_p" : -11.91
},
"CGGPUCGGCCAAGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -12.51,
"dG37_p" : -12.27
},
"CGUPUCGGCAAAGC" : {
"length1" : 7,
"length2" : 7,
"dG37" : -9.51,
"dG37_p" : -9.58
},
"GCGCAPCGCGUA" : {
"length1" : 6,
"length2" : 6,
"dG37" : -9.90,
"dG37_p" : -9.71
},
"GCGCCPCGCGGA" : {
"length1" : 6,
"length2" : 6,
"dG37" : -10.63,
"dG37_p" : -10.84
},
"GCGCGPCGCGCA" : {
"length1" : 6,
"length2" : 6,
"dG37" : -10.43,
"dG37_p" : -10.46
},
"GCGCUPCGCGAA" : {
"length1" : 6,
"length2" : 6,
"dG37" : -8.55,
"dG37_p" : -8.50
},
"PAGCGCAUCGCG" : {
"length1" : 6,
"length2" : 6,
"dG37" : -8.93,
"dG37_p" : -8.99
},
"PCGCGCAGCGCG" : {
"length1" : 6,
"length2" : 6,
"dG37" : -9.56,
"dG37_p" : -9.66
},
"PGGCGCACCGCG" : {
"length1" : 6,
"length2" : 6,
"dG37" : -10.30,
"dG37_p" : -10.27
},
"PUGCGCAACGCG" : {
"length1" : 6,
"length2" : 6,
"dG37" : -9.77,
"dG37_p" : -9.65
}
}
}
}