Methodological assessment of Mexican Clinical Practice Guidelines: Critical appraisal and GRADE framework adherence.

Background and Objective Clinical Practice Guidelines (CPGs) provide evidence-based recommendations to healthcare professionals, policy makers, patients and other stakeholders. Mexico is the biggest producer of CPGs in Latin America and Caribbean countries. The National Healthcare Technology Excellence Center (acronym in Spanish: CENETEC) is responsible for the CPG development, adaptation and update. The aim of this study was to assess the Mexican CPG quality and adherence to the GRADE framework. Study design We conducted a descriptive cross-sectional study of 86 CPGs representing all the CPGs produced by CENETEC between 2015 and 2017 and published in an online database called “Catalogo Maestro”. We performed quality assessment with the online AGREE II tool and assessed the reporting on the GRADE framework. Results Of the 86 CPGs, 34 were published in 2015, 21 in 2016 and 31 in 2017. The overall quality by AGREE II proved a median of 16.6% (Min 16.6%, Max 50%). Of the 86 CPGs, 25 (29%) used the GRADE framework; adherence to GRADE standards was, however, inconsistent and generally poor. Conclusion CPGs produced by CENETEC during this period had a low score by AGREE II standards and low adherence to the GRADE framework. A concerted initiative could rapidly improve CENETEC guidelines.


Introduction
Clinical Practice Guidelines (CPGs) are health-related documents created to provide evidence-based recommendations to healthcare professionals, policy makers, patients and other stakeholders. (1)In Mexico, CPGs are developed by the National Healthcare Technology Excellence Center (acronym in Spanish: CENETEC).CENETEC was founded in 2004 and includes the Mexican Healthcare institutions.To date, CENETEC has produced more than 760 CPGs, focusing on four categories: diseases, nursing, medical procedures and healthcare process. (2)Gs methodology has improved over time, from non-standardized "Good Old Boys Sitting Around a Table " (GOBSAT) approaches to standardized methods. (3)Different resources and methodologies have contributed to this improvement.(7)(8)(9)(10) CENETEC's massive CPGs production may be impressive and admirable but raises the issue of the rigor with which its guidelines are produced, their applicability and usefulness.To date, few studies had appraised CENETEC guidelines with the AGREE tool, (11) however, there is no systematic assessment of the quality of the guidelines produced by the CENETEC aiming to include their entire production.The aim of the current study was to assess the quality of Mexican CPGs and to evaluate their adherence to the GRADE framework.

Study design
We conducted a descriptive cross-sectional study of Mexican CPGs produced by the CENETEC.

Selection criteria and sample size calculation
Our population of interest was all the CPGs, regardless of their scope, produced by CENETEC and published on their online database called "Catálogo Maestro -Master Database " between 2015 and 2017.(Website citation) The total population of CPGs at the time of this study design were 224 CPGs (Accessed on el 20/11/2017http://cenetec-difusion.com/gpc-sns/?cat=52 ).
We considered the GRADE framework as the reference standard to frame the question, judge the quality of evidence, and move from evidence to recommendations. (9)Leyva de Los Rios et al. reported the frequency of the GRADE framework use in the CPGs from CENETEC as 10%. (12)Therefore, considering an alpha error of 0.05, and a 95% confidence interval, we determined that we needed to assess 86 CPGs.We chose a random sample from the total population of 224 CPGs using the software Epidat, version 4.2. (13)The CPGs were allocated randomly among the assessors (GMV, VAA, PGL, YOE, MTU, SVS, GML, GTF, LEC).The list of the included CPGs is reported in Table 1S.

CPG selection and collected variables
Two team members (LEC, GTF) extracted in duplicate the CPGs characteristics, disagreements were solved by discussion.Recorded variables were: Year of publication, type of CPG (Full Update/Partial Update), librarian involvement, number of questions, adaptation strategy, type of studies included (International Guideline, Non-Cochrane and Cochrane Systematic Reviews), quality assessment for GCPs and Systematic Reviews, number of articles obtained by the search strategy, the use of GRADE approach, adherence to the GRADE framework, recommendations systems, type of recommendations, number of recommendations and good clinical practice statements.

AGREE II tool
We used the AGREE II tool to assess the quality of the CPGs development.(6) AGREE II is a validated instrument composed of 23 items grouped under 6 domains and one final item to evaluate the overall quality of the CPG.The domains are: Scope and Purpose (3 items), Stakeholder Involvement (3 items), Rigor of Development (8 items), Clarity of Presentation (3 items), Applicability (4 items), and Editorial Independence (4 items).For each item, each appraiser is asked to score based on the statement, using a Likert scale from 1 ((Strongly disagree) to 7 (Strongly agree).A score between 2 and 6 is assigned when the reporting of the AGREE II item does not meet the full criteria or considerations depending on the completeness and quality of reporting.Scores increase as more criteria are met and considerations addressed. (5,14,15) Fowing the recommendations by the AGREE collaboration, each CPG was evaluated independently by three assessors.With the aim of reducing the variability among the assessors we performed an intensive training in the tool led by one of the investigators (IDF), expert in the use of AGREE II.After initial training, the assessors were calibrated by independently assessing two CPGs (one low quality and one high quality), followed by feedback from the trainer (IDF), and by a group discussion to solve any issues that arose.
We performed two rounds of assessment and feedback.We considered disagreement between the assessors when the differences among the scores were of 3 or more points, and they were resolved by discussion.If agreement was not achieved after the discussion, two reviewers (IDF, LEC) made the final assessment and decision.Final scores per domain were calculated by summing up scores from the 3 assessors per individual item in a domain and by scaling the total as a percentage of the maximum possible score for that domain.Therefore, each domain was scored in a range from 0 to 100%.
The AGREE collaboration does not define minimum domain scores or patterns of scores across domains to differentiate between high quality and poor-quality guidelines. (20)Instead, the recommendation for users is to use the scores to compare CPG among them and determine specific thresholds based on the context in which the guidelines are to be used.All the assessment were performed using the My AGREE-PLUS online tool provided by the AGREE collaboration (http://www.agreetrust.org/resource-centre/agree-plus/).

GRADE framework adherence
We describe the use of the GRADE framework in the included CPGs.GRADE can be used to define the CPG questions, develop SR, identify the importance of the outcomes, prepare evidence profile, summary of findings (SoF), assess the overall quality of the evidence and decide on the direction and strength of the recommendation.Currently, there is not a validated tool to assess adherence to GRADE.We defined an appropriate use of GRADE framework if the CPGs reported the following items: (7)(8)(9)(10) • Questions with explicit statement for each PICO element.
• SoF with the following characteristics: Number of studies, number of participants, risk of bias domains, relative effect, absolute effect and certainty of the evidence.• Explicit reference to all GRADE domains: • Rating down criteria : Risk of bias, imprecision, indirectness, inconsistency and publication bias.
• Rating up criteria : Large magnitude of an effect, Dose-response gradient and effect of plausible residual confounding.

Statistical Analysis
We calculated descriptive statistics to summarize our results.We explored the normal distribution of the continuous variables with the Shapiro-Wilk test.The results were express with mean and standard deviation (SD) if we fail to reject a normal distribution.The results were express with median and interquartile range (IQR) if normal distribution was rejected.We calculated proportions and frequencies for categorical variables.All the analyses were performed using the software SPSS (IBM SPSS Statistics for Windows, Version 22.0.Armonk, NY: IBM Corp).A Venn diagram was created with Eulerr web platform/site to display the combination of the grading systems (http://eulerr.co/). (16)

General description
We included 86 CPGs in our analysis.1).The authors used the NICE and SIGN approaches that had no adopted the GRADE framework.
AGREE II critical appraisal.
The overall quality assessment obtained a median score of 16.6% (IQR 0.0) (Figure 2).Table 3  The views and preferences of the target population (patients, public, etc.) and the methods for formulating the recommendations.

GRADE framework
Twenty-five (29.1%) of the CPGs reported using the GRADE framework for the quality of evidence assessment.However, most of the CPGs combined GRADE with others grading systems; only five GPGs used the GRADE framework exclusively.Regarding the GRADE adherence: None of the included CPGs reported the explicit statement of each PICO element, only one (4%) reported SoF tables and none reported the explicit reference to all GRADE domains.

DISCUSSION
Our study found that CENETEC guideline production had a low score by the AGREE II tool.Using this instrument, we identified the following pitfalls in eligible guidelines i) Most of the health related questions were not created using the explicit elements of the PICO format ii) The target audience was excessively broad -aiming to reach nurses, doctors and other healthcare professional, across the different levels of health care -iii) The views and preferences of the target population were not included iv) The systematic review methods were inconsistent across the guidelines v) The CPGs mixed different grading systems, vi) no formal external peer review was conducted vi) The transition from the evidence to the recommendations was not transparent and clear vii) and the recommendations were sometimes were unclear and ambiguous viii) We also found that guideline adherence to the GRADE framework was very low.Only one of the approximately 30% of the guidelines that purported to use GRADE included SoF tables, nor did they transparently address the GRADE domains related to reasons rating the quality of the evidence down or up.
The strengths of this study include the intensive training in the AGREE II tool provided to the reviewers, the experience of the team in used of the GRADE framework, and the relatively large number of CPGs included.Our study is limited by the following: although we did provide intensive training before the CPGs assessment, most of the reviewers were not experts in guideline development, and so possibilities of incomplete understanding remain.A standardized and valid tool to assess adherence to the GRADE framework is unavailable.To address this issue, we created an ad hoc instrument, the reliability and validity of which remains untested.
In 2019 Cabrera et al. reviewed the guideline development process with respect to the GRADE framework in guidelines developed in Latin America and Caribbean countries. (17)The CPGs were developed by Colombia (68%), Peru (13%), Chile (9%), Argentina, Costa Rica, Brazil, Honduras and Dominican Republic.
The authors focused their analysis in detecting the methods for grading the quality of evidence and topic prioritization without formally assessing the adherence to the GRADE framework.The authors conducted a literature search on MEDLINE, SciELO and Embase, also, they search within guideline repositories and other governmental websites.They did not, however, include the "Catalogo maestro", and thus excluded all the CENETEC CPGs.Of the 1,370 CPGs the authors assessed, only 98 (8.9%) followed the GRADE framework.In our findings we identify more CPGs using the GRADE approach (29.1%), but found very poor adherence to GRADE, an issue unaddressed in this previous work.
In general, our findings shown that despite the impressive and admirable production of CPGs by CENETEC, they in general have a number of methodological limitations.The overall objective(s) of the guideline is (are) specifically described.3 (2 -5) 2.
The guideline development group includes individuals from all relevant professional groups.3 (2 -5) 5.
The views and preferences of the target population (patients, public, etc.) have been sought.1 6.
The strengths and limitations of the body of evidence are clearly described.2 (2 -3) 10.The methods for formulating the recommendations are clearly described.1 11.The health benefits, side effects, and risks have been considered in formulating the recommendations.
2 (1 -4) 12.There is an explicit link between the recommendations and the supporting evidence.
3 (2 -4) 13.The guideline has been externally reviewed by experts prior to its publication.
3 (2 -5) 16.The different options for management of the condition or health issue are clearly presented.
2 (1 -4) 19.The guideline provides advice and/or tools on how the recommendations can be put into practice.
3 (2 -4) 20.The potential resource implications of applying the recommendations have been considered.
2 (1 -5) Editorial independence, median (min -max) 22.The views of the funding body have not influenced the content of the guideline.
Abbreviation list AGREE = Appraisal of Guidelines for Research and Evaluation.CENETEC = National Healthcare Technology Excellence Center (acronym in Spanish: CENETEC) CPGs = Clinical Practice Guidelines.EtD = Evidence to Decision EBM = Evidence-based Medicine.GRADE = Grading of Recommendations, Assessment, Development and Evaluation NICE = National Institute for Health and Care Excellence.OCEBM = Oxford Centre for Evidence-Based Medicine.

Table 2
summarizes the guideline development process.Librarian involvement was present in 32 CPGs (37.2%), international guidelines were included in 66 CPGs (76.7%),Cochrane SR in 36 CPGs (41.9%) and non-Cochrane SR in 59 CPGs (68.6%); no critical appraisal was reported.Several of the guidelines used more than one evidence grading system in the same document to provide their recommendations (48.8%).The systems implemented were the approaches by the National Institute of Health and Care Excellence (NICE)

Table 3 .
CPGs produced by CENETEC during this period had a low score by AGREE II standards and low adherence to the GRADE framework.Improving the training of methodologists in the GRADE methodology and addressing the specific pitfalls in the guidelines development process that we have highlighted should be a priority for CENETEC guidelines process.Enhancing the developer's capacity and the guidelines quality will result in better and more trustworthy guideline to be used in MexicoCONFLICT OF INTEREST:The authors declared that they do not have financial conflicts of interest.Also, none of the authors are part of CENETEC organization.RN, IDF and GG are members of the GRADE working group.IDF is part of the AGREE collaboration.Selecting an Instrument, selection of statistical tests, interpretation of statistical analyses and manuscript feedback.AGREE II Critical appraisal overall domains (%), N = 86 Table 3. AGREE II Critical Minimal value 16.6, maximal value 33.3, £ Minimal value 16.6, maximal value 50 \euro.Minimal value 16.6, maxim IDF: