2 | METHODS
Introducing a new category into CASP requires planning data workflows, designing formats and technical parameters for new types of models, and incorporating those into the existing CASP infrastructure. Subsections 2.1-2.4 below describe the implementation details for four new CASP15 categories.
2.1 | RNA structure prediction. Prediction of RNA structure from nucleic sequence is a challenging task as RNA molecules, like proteins, can fold into a wide variety of 3D shapes. Several research groups have been actively working in this area, and in 2010 Eric Westhof pioneered a CASP-like RNA-Puzzles challenge9 to track the state of the art in the RNA structure prediction and provide a forum for discussing methodological advances. Over the course of 12 years (2010-2021) there were 22 evaluated RNA-Puzzles challenges, which attracted the attention of around 10 returning participants10. In 2022, on the initiative of Rhiju Das, Eric Westhof and CASP organizers, the RNA-Puzzles joined forces with CASP, and RNA structure prediction became a prediction category in CASP15. This helped expand the target and predictors base of the RNA-modeling experiment (12 targets, 25 research groups in CASP15), stimulate development of new RNA prediction methods through the exchange of ideas and techniques with the protein prediction community, where deep learning methods recently made a significant impact on the modeling accuracy11,12, increase visibility of the field, and use CASP’s standardized platform for managing predictions and evaluating and comparing different prediction methods.
To incorporate RNA prediction into CASP, we adhered as closely as possible to the requirements and recommendations of the RNA-Puzzles experiments 9.
2.1.1 | RNA prediction format(https://predictioncenter.org/casp15/index.cgi?page=format#TS). Similarly to protein structure prediction, a CASP RNA submission file starts with the CASP header including format specification code, target identifier, author identifier, and description of methods used for modeling. The file can include up to five RNA 3D models, each encompassed by the MODEL/END keywords. Models are formatted according to the established standards of the RNA-Puzzles community9:
In case of protein-RNA complexes, protein chains are designated with letters (A, B, C, …) and RNA chains with numbers (0, 1, 2, …).
An example of RNA prediction is provided in Example 3 on the CASP15 format pagehttps://predictioncenter.org/casp15/index.cgi?page=format.
2.1.2 | Preparation of targets and model templates.The CASP organizers prepare a FASTA file with the sequence of targeted RNA. The file begins with a header containing target ID (e.g., >R1117) and chain IDs (i.e., numbers from 0 to 9) of all strands in the target structure. The body of the file includes nucleic acid sequence(s). In addition, the organizers generate a 3D structure template using the RNA-Puzzles formatting tool13. The template is a PDB file containing all the required ATOM records with zeroed coordinate values. The information on targets is communicated to participating groups via the CASP web portal (e.g.,https://predictioncenter.org/casp15/target.cgi?id=30&view=all).
Prior to submission, predictors can verify compatibility of their models with the provided templates by running the RNA-Puzzles tool that checks the number and ordering of residues and atoms in the submission13. If a prediction file does not comply with the requirements, error messages are reported to a log file. Non-compliant files can be reformatted with the rna_pdb_toolsx.py script available from the rna-tools toolbox13,14. 
2.1.3 | Setting the acceptance system. At the target release time, each target is assigned a prediction time window, which is typically 3 days for servers and 3 weeks for expert groups. RNA structure models are accepted within the specified prediction window via email or dedicated CASP prediction submission webform. The CASP submission system automatically checks submissions for compliance with the deadlines and format requirements and provides feedback to predictors. The prediction format is checked with the same tools used to generate model templates (section 2.1.2). If a prediction is rejected, an error message is sent to the submitter, and they have until the target deadline to fix the reported issue(s) and resubmit. Accepted predictions are stored in the CASP system and eventually evaluated after the target structure becomes available.
The same submission rules apply to other prediction categories discussed further in this paper.
2.1.4 | RNA evaluation measures. Predictions in the RNA category are assessed by checking their geometric plausibility and comparing them with target structures. When alternative target structures were available, given the early stage of RNA modeling, we reported the best score per model. Evaluation measures include Clashscore 15, Root Mean Square Deviation (RMSD)16, Local Distance Difference Test (lDDT)17, Template Modeling score (TM-score) 18, and Global Distance Test - Total Score (GDT-TS)19. These are commonly used measures in protein-CASP evaluation that are also adopted here for RNA evaluation. However, none of these measures are suitable for assessing RNA-specific components, like canonical (G-C, A-U, G-U), non-canonical, and stacking interactions between the nucleobases that contribute to RNA folding and stabilization. Proper prediction of only canonical interactions is usually insufficient to obtain a good model of an RNA molecule (example in Figure 1), while prediction of non-canonical interactions is very valuable but hard to achieve due to high computational demands. We additionally consider an RNA-specific measure, Interaction Network Fidelity (INF)13,20, which evaluates different types of RNA interactions in models. Calculation of these measures requires prior determination of RNA interactions from the atomic coordinates. This is done using 2D structure annotators such as RNAView21, MC-Annotate22, ClaRNA23 or FR3D24, which provide base pairs and their classification25. Given two sets of interactions, one for the model and another for the target, we identify true positives (correctly predicted base pairs), false positives (unpredicted base pairs), and false negatives (incorrectly predicted base pairs), and then calculate the INF score as the Matthews correlation coefficient26. The score ranges from [0.0, 1.0], with higher scores indicating better prediction of base-base interactions. The INF score is determined for all interactions (INF_all), and separately for canonical (Watson-Crick, INF_WC), non-canonical (non-Watson-Crick, INF_nWC), and stacking (INF_stacking) interactions.