Structure Mapping Simulator (SMS)
Simulation of multiplexed sequencing-based structure mapping experiments
Matlab scripts and data used in the manuscript Rational Experiment Design for Sequencing-Based RNA Structure Mapping are available for download. Scripts can be used to simulate two main approaches to next-generation sequencing-based structure mapping experiments for sequences with known structural profiles. Statistical properties of data variation within technical replicates can be estimated from simulations, and results are saved to a user-specified *.mat file. Additional scripts produce plots that visualize the data and summarize simulation results.
Simulations and estimation of data variation
Scripts in the Simulations directory simulate experiments that rely on two approaches to obtaining cDNA fragment libraries prior to sequencing: single primer extension (SPE) and random primer extension (RPE). Scripts assume a particular input file with SHAPE or similar chemical reactivity data for a given RNA sequence. Two forms of structural input data are supported, obtained via capillary electrophoresis (CE) or next-generation sequencing (NGS).
The scripts and a subdirectory with sample input files can be accessed by clicking the Simulations directory in the Box file sharing frame below. Each script begins with a set of simulation parameters that the user can specify, including the name and location of a file in which all simulations results will be stored for subsequent visualization. Simulations also rely on an auxiliary function (randp) that draws random values from a given probability distribution. Its implementation is included and should be located at the same directory with the simulations scripts.
Note: for SHAPE-CE input data, the assumed format is the same as that required for integrating SHAPE-CE data into the RNAstructure program for SHAPE-directed secondary structure prediction (*.shape format). The SHAPE-CE data file used in the manuscript and the probed sequence (in *.seq format) are found in the Data subdirectory. Also included is a sample file with NGS read counts from the (+) and (-) channels of a SHAPE-Seq experiment (in *.adducts format) and the probed sequence (S. aureus plasmid pT181 sense RNA).
Visualization of reactivities and data variation
Scripts in the Visualization and Figures directory can produce Figures 2-6 in the manuscript as well as additional plots that summarize statistical variation in the generated sequencing data.
The scripts can be accessed by clicking the Visualization and Figures directory in the Box file sharing frame below. They accept a *.mat file with simulation results as input, whose name and location need to be specified by the user at the top of the script. Some of the plots make use of a customized box plot function (RD_boxplot), which is based on the freely available advanced box plot tool for Matlab by Alex Bikfalvi. An implementation is included and should be located at the same directory. Graphics also makes use of an auxiliary function for coloring subplots (freezeColors), which should also be located in this directory. For visualization of RPE data, the user can define a zoom-in window for which graphics will be depicted, since simulated transcripts may be long to mimc realistic RNAs probed in such experiments.