The availability of large digitized text corpora raises the question how to efficiently analyze them in order to detect relevant semantic effects in the underlying discourse. There exist a plethora of quantitative methods which deliver quick results  and might be able to bring to light hidden structures. Such “distant reading” approaches \cite{Moretti2013,Underwood2017} may use a variety of text mining tools, which typically operate on the local (contextual) and global (collection-wide) distribution of words. They analyze and model the semantics and topics of textual material (a) on the lexical level, e.g., word embeddings ; (b) within text zones, e.g., topic modeling of sentences, paragraphs, documents; (c) and sometimes with respect to groups of documents within a collection, e.g,. text classification/clustering. In order to produce "interesting results", there is typically a need for many natural language preprocessing decisions and hyper-parameter tuning of the modeling methods. The qualitative interpretation of the results of these quantitative analyses amounts to a challenging art, although, for instance, in the case of latent topic modeling progress has been made with respect to expressing the quality of topics in terms of their semantic coherence \cite{Chang2009}. Besides that, explorative user interfaces and appropriate visualizations of aggregated results generally serve as epistemological course of action.
Rather than focusing on recent algorithmic advances, we critically assess the concrete corpus linguistic research question of interpretability, and the detection of semantic and socio-cultural changes observable from longitudinal text corpora.
Our case study uses text mining methods on two large longitudinal text corpora in two different languages on two different domains. First, medical history from 1500 to 1800 in English. Second, official Swiss governmental publications (Federal Gazette)  from 1849 to 2017 in German, more than 700,000 pages with more than 250 million tokens . Both corpora are special domain collections, however, given the large time span covered they contain many heterogeneous subtopics.  Our research goal is to detect and quantitatively assess relevant changes of conceptual and topic trends over time. At the same time, we ask  which methods deliver interpretable results to us, and which methods lead to output that is difficult to interpret or rather reflects noise, superficial features or fluctuations in the data themselves than historical semantic developments. We rely on frequently used approaches, including Dirichlet topic modeling, document classification, word embeddings and kernel density estimation to detect relevant signals in our longitudinal corpora. 
The textual preprocessing can vary according to whether the following subtasks are performed or not: normalization of spelling variants, lemmatization, stop word removal, OCR error removal. The raw output of OCR software often contains erroneous word forms that are very specific to a time period and its print characteristics (e.g. black letter fonts). These superficial features then can dominate time-specific analyses.
Quantitative distributional approaches can vary according to the windows of context: single sentences or groups of several sentences, paragraphs,  pages, articles, or even larger texts up to volumes or books. Older periods typically have a lot less text material available than more recent periods, therefore,  subsampling the more abundantly available material into equally large strata per period can be helpful.  The binning of years,  e.g. 30, 60, 150 years influences how much material is available for the quantitative methods (typically, the more the better), but limits also the possibility to detect more fine-grained changes.
Our quantitative toolset comprises the popular Dirichlet topic modeling \cite{Blei2003} that implements unsupervised probabilistic clustering of texts into latent topics. Each topic is represented as a word distribution, and normally interpreted by its n most frequent words. We vary the size of texts from sentence-level up to page-level. We compute the topic shifts over time  by aggregating the topic proportions separately for each time period considered. A normalized view for each topic allows then to easily compare the prominence of a topic over the considered time periods (selfcitation).
Our second approach targets the lexical level and reuses the byproducts of supervised text classification. We use the time buckets of the year of publication as classification labels for the text snippets. Performing regularized logistic regression on word unigram features results in a weight for each word. Words with high weights are important features for a class, that is, a time period. This approach has been successfully applied to the English Medical texts (selfcitation).  In semantic natural language processing tasks, a lot of progress  has been made in recent years by switching from atomic representation of words (which are sparse) to dense continuous vector space representations, so-called "word embeddings" \cite{Mikolov2013}.  Modern distributional approaches build task-specific word embeddings, that is, representations that are specifically learnt for a supervised problem such as text classification\cite{Joulin2016}. In our experiments, we try to make exploit task-specific word embeddings by clustering these representations first, and then aggregating their occurrences over the time periods. The general idea behind this approach is that the shared and distributed representation of words captures the conceptual content better than atomic representations. 
A third approach which directly combines Dirichlet topic modeling and word embeddings has recently been developed \cite{Moody2016}.  The promise of this approach is to deliver interpretable, that is, coherent, topics based on distributed continuous word representations, which are known to be strong for capturing semantic similarity.