Next-generation networks powered by millimetre wave (mmW) technology offer improved data rates in the range of a few Gigabits per second. Heterogeneous networks (HetNets) with mmWave capability must be reliable, adaptable, and energy-efficient. It is suggested to maximise the transmission power of small cells to get better performance. This leads to the use of intelligent techniques namely, Q-learning approach that will allow seamless connectivity. A higher degree of automation, cooperation, and intelligence in distributed HetNets are applied using Q-learning with Markov decision process (MDP) based technique. The first objective is to perform cooperative online learning scheme to allocate power based on MDP in the Two-tier HetNet. Maximizing energy efficiency through effective power distribution is the second objective. The suggested technique improves the network’s overall energy efficiency and capacity by optimizing judicious power usage based on the probability of an MDP state transition. Cooperative learning techniques and the right Markov state models can enhance both macrocell and femtocell service, enhancing user experience.