Adoption of Software Engineering principles in the development of research focused software for large data sets will yield more accurate and performant software packages.
Body
Maintenance
With the development of any software artifact, the key consideration to implementation should be maintenance. As many research scientists tend to think of their software products as unique tools that will not be used regularly or for a long period, most don't think about long term maintenance when in the development phase. From my professional software experience, starting with a long term view saves time and effort over the long run when a software product unexpectedly has long term use or success.
From various sources the consensus is that software maintenance costs are large and increasing \cite{Glass2001,Koskinen2015,Dehaghani2013}; some put maintenance at 90% of total software cost. The chief factor in cost of maintenance with respect to research and statistical software is time of the people creating and using the software. From the recent trend on making research results reproducible and replicable, some recommend making code openly available to any who might wish to repeat or further analyze results \citep{leek_opinion:_2015}. A reproducible and replicable solution is one that requires a long term maintenance oriented view.
There are many techniques that can help to reduce cost of maintenance and speed development time. While best practices such as the use of version control software, open access to results and papers are becoming wide spread, there are some that are important but need further attention: documentation, language choice, and software testing practices.
Documentation
While the purpose of software is to instruct a computer to perform a specific operation, with current technologies, that instruction must be created by humans. Software documentation conveys information to other users or developers through a richer language than that of the selected computing language selected. One of the most influential papers in this area is "Literate Programming" \citep{Knuth1984}. Although decades old at this time, its principles have yet to become common practice among non-computer science trained researchers. The key aspects of literate programming are weaving, creation of a single document that is both software code and description of that code, and tangling, a process by which written documentation and machine code is produced from a single file.
In the R language literate programming can be accomplished with specially formatted comments and the package roxygen2 \cite{Wickham2017}. An abbreviated example taken from a function header looks like this: