\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage[round]{natbib}
\let\cite\citep
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\begin{document}
\title{Dataflows}
\author[1]{Beyonix}%
\affil[1]{Affiliation not available}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\sloppy
In this document we describe the various principles and requirements
that we have for our dataflow language.
\par\null
\subsection*{Principles}
{\label{644318}}
\begin{itemize}
\tightlist
\item
Incremental: All dataflows are incremental, re-computation of old data
is not required when updates take place.
\item
Distributed: Computational graphs are distributed such that any peer
can subscribe to any step, fully (replicated) or partially (sharded)
\item
General purpose: We aim to maximise the utility of the language, given
the above constraints. We aim to be able to operate on scalars
(boolean, integral, floating, date, string),~ products, records and
co-products, collections (list, matrix, spreadsheet?, bag, set, map,
tree, graph) and some more exotic types (interval, interval-trie,
continuous)
\item
Pure: Most, if not all use cases that we foresee can be solved without
impurity. As long as we make it easy to integrate with impure
languages, our dataflows should remain pure.
\item
Extensible: Much of the power of this language comes from its
datastructures and operations they support. Enabling adding new
datastructures / operations would be a major asset, but for the sake
of efficiency / focusing expressiveness, such additions are perhaps
best defined using a different language and their interface brought
into scope for the dataflows.
\item
Safe: The type system should be sufficiently expressive to guard
against errors that can be caught compile time. Algebraic properties
(associativity, idempotency, commutativity, boundedness, invertible)
seem to be a prime candidate for inclusion in the type system.
\end{itemize}
\par\null
\subsection*{Features}
{\label{393952}}
\subsubsection*{Conditional branching / control
flow}
{\label{335637}}
This feature is hard to think out of a general purpose language. It is
much harder because unbounded conditionals can lead to major changes.
We aim to implement if/match branching through parallelisation.
Resulting datasets are parameterised with the if clause to obtain the
right value. However, at all times the result of all branches is
maintained. This approach is complicated by the fact that the variables
that are part of the conditional clause may be referenced within any of
the bodies, and thus even contain values that are invalid within the
context. To guard against this, each of the bodies has to be transformed
into the parts that can evaluate without knowing the actual value and a
part that depends on the actual value. Alternatively, we leave this
optimisation as a user-responsibility.
\par\null
\subsubsection*{Looping}
{\label{903374}}
See ``differential dataflow''
\par\null
\subsubsection*{Monoid algebra}
{\label{426281}}
We implement the same combinators as the monoid algebra
map, cmap (ringad (monad+monoid) flatten?), zip, coGroup, groupBy
\par\null\par\null
\subsection*{Questions}
{\label{811419}}
\subsubsection*{Type system}
{\label{878198}}
We suggest the existence of a useful type system that captures abstract
algebraic properties (associativity, commutativity, idempotency,
invertibility) and that is compatible with the monotonic typesystem when
datastructures and functions are monotonic (associative, commutative,
itempotent). However, here the notion of ordering becomes relevant and
it is unclear how that works when only a subset of the algebraic
properties apply. Monotonicity/Antitonicity seems to be a disjoint
notion that may or may not be held by a function, regardless of any of
its other properties.
\par\null\par\null
\selectlanguage{english}
\FloatBarrier
\end{document}