On Reinforcement Learning, Nurturing, and the Evolution of Risk Neutral
MetadataShow full item record
Reinforcement learning depends on agents being learning individuals, and when agents rely on their instincts rather than gathering data and acting accordingly, the population tends to be less successful than a true RL population. ÒRiskinessÓ is the elementary metric for determining how willing to rely on learning an individual or a population is. With a high learning parameter, as we denote riskiness in this paper, agents find the safest option and seldom deviate from it, essentially using learning to become a non-learning individual. With a low learning rate, agents ignore recency entirely and seek out the highest reward, regardless of the risk. We attempt in this paper to evolve this Òrisk neutralityÓ in a population by adding a safe exploration nurturing period during which agents are free to explore without consequence. We discovered the environmental conditions necessary for our hypotheses to be mostly satisfied and found that nurturing enables agents to distinguish between two different risky options to evolve risk neutrality. Too long of a nurturing period causes the evolution to waver before settling on a path with essentially random results, while a short nurturing period causes a successful evolution of risk neutrality. The non-nurturing case evolves risk aversion by default as we expected from a reinforcement learning system, because agents are unable to distinguish between the good risk and bad risk, so they decide to avoid risks altogether.