Search results for "Q-learning"
showing 10 items of 12 documents
Low Latency Ambient Backscatter Communications with Deep Q-Learning for Beyond 5G Applications
2020
Low latency is a critical requirement of beyond 5G services. Previously, the aspect of latency has been extensively analyzed in conventional and modern wireless networks. With the rapidly growing research interest in wireless-powered ambient backscatter communications, it has become ever more important to meet the delay constraints, while maximizing the achievable data rate. Therefore, to address the issue of latency in backscatter networks, this paper provides a deep Q-learning based framework for delay constrained ambient backscatter networks. To do so, a Q-learning model for ambient backscatter scenario has been developed. In addition, an algorithm has been proposed that employ deep neur…
POMDP problēmu risināšana, izmantojot vēsturiskus elementus, un tās optimizācija
2017
Elvja Egles Bakalaura darba „POMDP problēmu risināšana, izmantojot vēsturiskus elementus, un tās optimizācija” ietvaros tika izpētīti daļēji novērojami Markova lēmuma procesu (POMDP) algoritmi. Balstoties uz Jāņa Zutera zinātniski pētniecisko raksta saturu „Sequence Q-Learning: a Memory-based Method Towards Solving POMDP”, kurā ir aprakstīta POMDP risināšanas ideja, tika izstrādāts līdzvērtīgs mašīnmācīšanās algoritms, saskaņā ar rakstā sniegto informāciju. Bakalaura darba ietvaros tika pētīta šī Algoritma efektivitāte, kā arī apskatītas un piedāvātas vairākas iespējas tā tālākai pilnveidei. Programmētais darbs tika pievērsts konkrētai problēmai, kurai bija novērojama labāka ātrdarbība ar n…
Learning competitive pricing strategies by multi-agent reinforcement learning
2003
Abstract In electronic marketplaces automated and dynamic pricing is becoming increasingly popular. Agents that perform this task can improve themselves by learning from past observations, possibly using reinforcement learning techniques. Co-learning of several adaptive agents against each other may lead to unforeseen results and increasingly dynamic behavior of the market. In this article we shed some light on price developments arising from a simple price adaptation strategy. Furthermore, we examine several adaptive pricing strategies and their learning behavior in a co-learning scenario with different levels of competition. Q-learning manages to learn best-reply strategies well, but is e…
Towards a Deep Reinforcement Learning Approach for Tower Line Wars
2017
There have been numerous breakthroughs with reinforcement learning in the recent years, perhaps most notably on Deep Reinforcement Learning successfully playing and winning relatively advanced computer games. There is undoubtedly an anticipation that Deep Reinforcement Learning will play a major role when the first AI masters the complicated game plays needed to beat a professional Real-Time Strategy game player. For this to be possible, there needs to be a game environment that targets and fosters AI research, and specifically Deep Reinforcement Learning. Some game environments already exist, however, these are either overly simplistic such as Atari 2600 or complex such as Starcraft II fro…
AIOC2: A deep Q-learning approach to autonomic I/O congestion control in Lustre
2021
Abstract In high performance computing systems, I/O congestion is a common problem in large-scale distributed file systems. However, the current implementation mainly requires administrator to manually design low-level implementation and optimization, we proposes an adaptive I/O congestion control framework, named AIOC 2 , which can not only adaptively tune the I/O congestion control parameters, but also exploit the deep Q-learning method to start the training parameters and optimize the tuning for different types of workloads from the server and the client at the same time. AIOC 2 combines the feedback-based dynamic I/O congestion control and deep Q-learning parameter tuning technology to …
Deep Q-Learning With Q-Matrix Transfer Learning for Novel Fire Evacuation Environment
2021
We focus on the important problem of emergency evacuation, which clearly could benefit from reinforcement learning that has been largely unaddressed. Emergency evacuation is a complex task which is difficult to solve with reinforcement learning, since an emergency situation is highly dynamic, with a lot of changing variables and complex constraints that makes it difficult to train on. In this paper, we propose the first fire evacuation environment to train reinforcement learning agents for evacuation planning. The environment is modelled as a graph capturing the building structure. It consists of realistic features like fire spread, uncertainty and bottlenecks. We have implemented the envir…
Interpretable Option Discovery Using Deep Q-Learning and Variational Autoencoders
2021
Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions [18] is perhaps the most promising method to solve these problems, but it still has noticeable shortcomings. It only guarantees local convergence, and it is challenging to automate initiation and termination conditions, which in practice are commonly hand-crafted.
Sequence Q-learning: A memory-based method towards solving POMDP
2015
Partially observable Markov decision process (POMDP) models a control problem, where states are only partially observable by an agent. The two main approaches to solve such tasks are these of value function and direct search in policy space. This paper introduces the Sequence Q-learning method which extends the well known Q-learning algorithm towards the ability to solve POMDPs through adding a special sequence management framework by advancing from action values to “sequence” values and including the “sequence continuity principle”.
Learning Automata Based Q-learning for Content Placement in Cooperative Caching
2019
An optimization problem of content placement in cooperative caching is formulated, with the aim of maximizing sum mean opinion score (MOS) of mobile users. Firstly, a supervised feed-forward back-propagation connectionist model based neural network (SFBC-NN) is invoked for user mobility and content popularity prediction. More particularly, practical data collected from GPS-tracker app on smartphones is tackled to test the accuracy of mobility prediction. Then, a learning automata-based Q-learning (LAQL) algorithm for cooperative caching is proposed, in which learning automata (LA) is invoked for Q-learning to obtain an optimal action selection in a random and stationary environment. It is p…
A Comparative Analysis of Multiple Biasing Techniques for $Q_{biased}$ Softmax Regression Algorithm
2021
Over the past many years the popularity of robotic workers has seen a tremendous surge. Several tasks which were previously considered insurmountable are able to be performed by robots efficiently, with much ease. This is mainly due to the advances made in the field of control systems and artificial intelligence in recent years. Lately, we have seen Reinforcement Learning (RL) capture the spotlight, in the field of robotics. Instead of explicitly specifying the solution of a particular task, RL enables the robot (agent) to explore its environment and through trial and error choose the appropriate response. In this paper, a comparative analysis of biasing techniques for the Q-biased softmax …