Methods

Some systematic review approaches, such as those produced by the Cochrane collaboration, aim at exhaustively summarizing the evidence about the effects of a given intervention. The realist review approach differs in that identifying the effects of interventions is not the end result so much as a step toward understanding the causal processes involved in the production of those effects 1-3. This is similar to what the field of evaluation describes as reverse logic analysis where the aim is to identify the causal links between characteristics of given interventions and their outcomes in order to provide insights on how to produce similar outcomes in the design of new interventions 4. We initially expected to be able to conduct a detailed reverse logic analysis based on the available scientific literature documenting home care delivery models. However, the literature identified provided too few insights on the causal processes involved to allow us to go beyond the identification of three main characteristics of promising interventions.

Search Method

To maximize the breadth of the search, we relied on three different, sequential, search approaches. The starting point was a keyword-based search in MEDLINE and CINAHL conducted in June 2019. This search led to the identification of 1628 non-duplicate references that were reviewed independently by two reviewers on the basis of title and abstract. Two criteria were used. First, the document had to provide relevant information on the delivery of case managed, integrated or consumer-directed home and community services. Home and community services could include but could not be limited exclusively to medical care. Second, the population receiving the care needed to be community dwelling, with either a majority aged 65 years and over, or with a subsample of persons aged 65 and over for whom results were reported separately. Among the references identified, 107 were selected for full-text analysis. The full text was then independently appraised by two reviewers. 35 articles were selected at the end of the first step.
For the second step, the bibliographies of the 35 papers previously selected were compiled and reviewed to identify potentially relevant titles. This led to the identification of 94 new references that were then reviewed according to the same double-blind processes used in the first step. Of those, 50 documents were selected for full-text review and 34 included in the analysis. We also included one paper independently identified by a co-author. At the end of the second step, 70 documents were identified.
The third step was a reverse search in MEDLINE for all articles citing at least one of the 70 documents identified through the previous two steps. This led to the identification of 1102 non-duplicate references. Of those, 71 had already been reviewed previously (recaptures). The remaining 1034 were processed in the same way as described previously. Of those, 78 were deemed appropriate for a full-text review and 42 were retained. At this stage, a second paper provided by a co-author was also added. The low number of recaptures suggests that the total number of articles that fit our focus of interest is likely very large5.
In the end, 113 documents were included in the analysis11The complete list can be accessed as a PubMed bibliography at https://www.ncbi.nlm.nih.gov/myncbi/1Dm-PibJgyqcPF/bibliography/public/.. Figure 1 (below) provides a flowchart for the process.
[Insert figure 1: Search process flowchart]

Relevance and strength appraisal

Our approach to full-text appraisal relied on two scores: one for methodological quality (strong=3, acceptable=2, weak=1) and one for relevance (highly relevant=3, some relevant elements=2, not relevant =1). Documents were only selected for inclusion at this next stage if they had a combined score of 3 or above. The inclusion threshold was deliberately set low enough to maximize the sensitivity of the search. Divergences in scoring were resolved by discussion and consensus was reached in all cases.
As this system relies on a moving threshold regarding the strength of the study design, some additional discussion might be warranted. The type of review we conducted is integrative and iterative in nature. First, it is integrative, as different types of evidence (description, typologies, outcome evaluations, etc.) are brought together with the aim to identify desirable characteristics of home care delivery models. Second, it is iterative, as the focus of the review was refined on an ongoing basis as it progressed. Therefore, some of the documents included in the analysis offer robust evidence while others are descriptive or rely on weak study designs. However, when analyzing the data itself, the strength of the evidence was taken into account in a contextualized way. For example, study design matters when analyzing potential links between interventions and outcomes. But study design matters much less than face validity when assessing the usefulness of a typology of home care delivery models.

Data organization and coding

The documents were coded on an ongoing basis throughout the three phases according to a modified PICO grid. The main modification to the PICO format was that the ā€œCā€ here refers to the causality presumptions made in the paper (what underlying hypotheses are made or implied linking the intervention to the expected outcomes?). The other items were usual elements of the PICO format, including: Population (Who is receiving the services? For example health issue, age, insurance status, location, etc.); Intervention (What services are being offered? Professionals involved, intensity, duration, etc.); and Outcomes (what outcomes are described or measured?). We also coded articles by country and type of method. When relevant, additional information such as formal definitions of home care were retrieved during coding. Also, some documents included in the analysis would not easily be classified within this coding grid (for example, broadly focused reviews of the literature) and those where coded on an ad hoc basis. The coding results for each article were 164 words long on average (standard deviation 68 words), for a total of 18585 words. The analysis also heavily relied on multiple iterative reading of the full text of each document.