Search results for "Reinforcement learning"
showing 10 items of 95 documents
Development of a Simulator for Prototyping Reinforcement Learning-Based Autonomous Cars
2022
Autonomous driving is a research field that has received attention in recent years, with increasing applications of reinforcement learning (RL) algorithms. It is impractical to train an autonomous vehicle thoroughly in the physical space, i.e., the so-called ’real world’; therefore, simulators are used in almost all training of autonomous driving algorithms. There are numerous autonomous driving simulators, very few of which are specifically targeted at RL. RL-based cars are challenging due to the variety of reward functions available. There is a lack of simulators addressing many central RL research tasks within autonomous driving, such as scene understanding, localization and mapping, pla…
Towards Model-Based Reinforcement Learning for Industry-Near Environments
2019
Deep reinforcement learning has over the past few years shown great potential in learning near-optimal control in complex simulated environments with little visible information. Rainbow (Q-Learning) and PPO (Policy Optimisation) have shown outstanding performance in a variety of tasks, including Atari 2600, MuJoCo, and Roboschool test suite. Although these algorithms are fundamentally different, both suffer from high variance, low sample efficiency, and hyperparameter sensitivity that, in practice, make these algorithms a no-go for critical operations in the industry.
Generating Hyperspectral Skin Cancer Imagery using Generative Adversarial Neural Network
2020
In this study we develop a proof of concept of using generative adversarial neural networks in hyperspectral skin cancer imagery production. Generative adversarial neural network is a neural network, where two neural networks compete. The generator tries to produce data that is similar to the measured data, and the discriminator tries to correctly classify the data as fake or real. This is a reinforcement learning model, where both models get reinforcement based on their performance. In the training of the discriminator we use data measured from skin cancer patients. The aim for the study is to develop a generator for augmenting hyperspectral skin cancer imagery. peerReviewed
Towards safe reinforcement-learning in industrial grid-warehousing
2020
Abstract Reinforcement learning has shown to be profoundly successful at learning optimal policies for simulated environments using distributed training with extensive compute capacity. Model-free reinforcement learning uses the notion of trial and error, where the error is a vital part of learning the agent to behave optimally. In mission-critical, real-world environments, there is little tolerance for failure and can cause damaging effects on humans and equipment. In these environments, current state-of-the-art reinforcement learning approaches are not sufficient to learn optimal control policies safely. On the other hand, model-based reinforcement learning tries to encode environment tra…
RDF* Graph Database as Interlingua for the TextWorld Challenge
2019
This paper briefly describes the top-scoring submission to the First TextWorld Problems: A Reinforcement and Language Learning Challenge. To alleviate the partial observability problem, characteristic to the TextWorld games, we split the Agent into two independent components: Observer and Actor, communicating only via the Interlingua of the RDF* graph database. The RDF* graph database serves as the “world model” memory incrementally updated by the Observer via FrameNet informed Natural Language Understanding techniques and is used by the Actor for the efficient exploration and planning of the game Action sequences. We find that the deep-learning approach works best for the Observer componen…
AI for Resource Allocation and Resource Allocation for AI: a two-fold paradigm at the network edge
2022
5G-and-beyond and Internet of Things (IoT) technologies are pushing a shift from the classic cloud-centric view of the network to a new edge-centric vision. In such a perspective, the computation, communication and storage resources are moved closer to the user, to the benefit of network responsiveness/latency, and of an improved context-awareness, that is, the ability to tailor the network services to the live user's experience. However, these improvements do not come for free: edge networks are highly constrained, and do not match the resource abundance of their cloud counterparts. In such a perspective, the proper management of the few available resources is of crucial importance to impr…
Explainable Reinforcement Learning with the Tsetlin Machine
2021
The Tsetlin Machine is a recent supervised machine learning algorithm that has obtained competitive results in several benchmarks, both in terms of accuracy and resource usage. It has been used for convolution, classification, and regression, producing interpretable rules. In this paper, we introduce the first framework for reinforcement learning based on the Tsetlin Machine. We combined the value iteration algorithm with the regression Tsetlin Machine, as the value function approximator, to investigate the feasibility of training the Tsetlin Machine through bootstrapping. Moreover, we document robustness and accuracy of learning on several instances of the grid-world problem.
A formal proof of the e-optimality of discretized pursuit algorithms
2015
Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for ??-optimality for all the reported algorithms for the past three decades have been flawed. We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm (CPA). The latter proof examines the monotonicity property of the proba…
Evolution and Learning: Evolving Sensors in a Simple MDP Environment
2003
Natural intelligence and autonomous agents face difficulties when acting in information-dense environments. Assailed by a multitude of stimuli they have to make sense of the inflow of information, filtering and processing what is necessary, but discarding that which is unimportant. This paper aims at investigating the interactions between evolution of the sensorial channel extracting the information from the environment and the simultaneous individual adaptation of agent-control. Our particular goal is to study the influence of learning on the evolution of sensors, with learning duration being the tunable parameter. A genetic algorithm governs the evolution of sensors appropriate for the a…
Optimization of anemia treatment in hemodialysis patients via reinforcement learning
2013
Objective: Anemia is a frequent comorbidity in hemodialysis patients that can be successfully treated by administering erythropoiesis-stimulating agents (ESAs). ESAs dosing is currently based on clinical protocols that often do not account for the high inter- and intra-individual variability in the patient's response. As a result, the hemoglobin level of some patients oscillates around the target range, which is associated with multiple risks and side-effects. This work proposes a methodology based on reinforcement learning (RL) to optimize ESA therapy. Methods: RL is a data-driven approach for solving sequential decision-making problems that are formulated as Markov decision processes (MDP…