Awarded “Best Student Paper” and published in Proceedings of The 16th International Conference on Artificial General Intelligence, Stockholm, 2023. To make accurate inferences in an interactive setting, an agent must not confuse passive observation of events with having intervened to cause them. The do operator formalises interventions so that we may reason about their effect. Yet there exist pareto optimal mathematical formalisms of general intelligence in an interactive setting which, presupposing no explicit representation of intervention, make maximally accurate inferences. We examine one such formalism. We show that in the absence of a do operator, an intervention can be represented by a variable. We then argue that variables are abstractions, and that need to explicitly represent interventions in advance arises only because we presuppose these sorts of abstractions. The aforementioned formalism avoids this and so, initial conditions permitting, representations of relevant causal interventions will emerge through induction. These emergent abstractions function as representations of one’s self and of any other object, inasmuch as the interventions of those objects impact the satisfaction of goals. We argue that this explains how one might reason about one’s own identity and intent, those of others, of one’s own as perceived by others and so on. In a narrow sense this describes what it is to be aware, and is a mechanistic explanation of aspects of consciousness.
Please note this is a draft on which we are seeking feedback. Substantial changes and a third author are likely be added to the next version. The hard problem of consciousness asks why there is something it is like to be a conscious organism. We address this from 1st principles, by constructing a formalism that unifies lower and higher order theories of consciousness. We assume pancomputationalism and hold that the environment learns organisms that exhibit fit behaviour via the algorithm we call natural selection. Selection learns organisms that learn to classify causes, facilitating adaptation. Recent experimental and mathematical computer science elucidates how. Scaling this capacity implies a progressively higher order of “causal identity’‘ facilitating reafference and P-consciousness, then self awareness and A-consciousness, and then meta self awareness. We then use this to resolve the hard problem in precise terms. First, we deny that a philosophical zombie is in all circumstances as capable as a P-conscious being. This is because a variable presupposes an object to which a value is assigned. Whether X causes Y depends on the choice of X, so causality is learned by learning X such that X causes Y, not by presupposing X and then learning if X causes Y (presupposing rather than inferring abstractions can reduce sample efficiency in learning). However, learning is a discriminatory process that requires states be differentiated by value. Without objects, variables or values, there is only quality. By this we mean an organism is attracted to or repulsed by a physical state. Learning reduces quality into objects by constructing policies classifying cause of affect (“representations’‘ are just behaviour triggered by phenomenal content). Where selection pressures require an organism classify its own interventions, that  policy (a “1st order causal identity’‘) has a quality that persists across interventions, and so there is something it is like to be that organism. Thus organisms have P-consciousness because it allows them to adapt with greater sample efficiency, and infer the cause of affect. We then argue neither P nor A-consciousness alone are remarkable, but when P-consciousness gives rise to A-consciousness we obtain “H-consciousness” (what Boltuc argues is the crux of the hard problem). This occurs when selection pressures require organism o infer organism u’s prediction of o‘s interventions a “2nd order causal identity” approximating intent). A-consciousness is the contents of 2nd order causal identities, and by predicting another’s prediction of one’s own 1st order causal identities it becomes possible to know what one knows and feels, and act upon this information to communicate meaning in the Gricean sense. Thus P and A-consciousness are two aspects of H-concsiousness, the process of learning and acting in accord with a hierarchy of causal identities that simplify the environment into classifiers of cause and affect. We call this the psychophysical principle of causality.
Artificial general intelligence (AGI) may herald our extinction, according to AI safety research. Yet claims regarding AGI must rely upon mathematical formalisms – theoretical agents we may analyse or attempt to build. AIXI appears to be the only such formalism supported by proof that its behaviour is optimal, a consequence of its use of compression as a proxy for intelligence. Unfortunately, AIXI is incomputable and claims regarding its behaviour highly subjective. We argue that this is because AIXI formalises cognition as taking place in isolation from the environment in which goals are pursued (Cartesian dualism). We propose an alternative, supported by proof and experiment, which overcomes these problems. Integrating research from cognitive science with AI, we formalise an enactive model of learning and reasoning to address the problem of subjectivity. This allows us to formulate a different proxy for intelligence, called weakness, which addresses the problem of incomputability. We prove optimal behaviour is attained when weakness is maximised. This proof is supplemented by experimental results comparing weakness and description length (the closest analogue to compression possible without reintroducing subjectivity). Weakness outperforms description length, suggesting it is a better proxy. Furthermore we show that, if cognition is enactive, then minimisation of description length is neither necessary nor sufficient to attain optimal performance. These results undermine the notion that compression is closely related to intelligence. We conclude with a discussion of limitations, implications and future research. There remain several open questions regarding the implementation of scale-able general intelligence. In the short term, these results may be best utilised to improve the performance of existing systems. For example, our results explain why Deepmind’s Apperception Engine is able to generalise effectively, and how to replicate that performance by maximising weakness. Likewise in the context of neural networks, our results suggest both limitations of “scale is all you need”, and how those limitations can be overcome.