Search results for "Reinforcement learning"

showing 10 items of 95 documents

Reinforcement learning approach to nonequilibrium quantum thermodynamics

2021

We use a reinforcement learning approach to reduce entropy production in a closed quantum system brought out of equilibrium. Our strategy makes use of an external control Hamiltonian and a policy gradient technique. Our approach bears no dependence on the quantitative tool chosen to characterize the degree of thermodynamic irreversibility induced by the dynamical process being considered, require little knowledge of the dynamics itself and does not need the tracking of the quantum state of the system during the evolution, thus embodying an experimentally non-demanding approach to the control of non-equilibrium quantum thermodynamics. We successfully apply our methods to the case of single- …

---Computer scienceFOS: Physical sciencesGeneral Physics and AstronomyNon-equilibrium thermodynamics01 natural sciencesSettore FIS/03 - Fisica Della Materia010305 fluids & plasmassymbols.namesakeQuantum stateSHORTCUTS0103 physical sciencesQuantum systemReinforcement learningStatistical physics010306 general physicsQuantum thermodynamicsCondensed Matter - Statistical MechanicsADIABATICITYQuantum PhysicsStatistical Mechanics (cond-mat.stat-mech)Entropy productionENTROPYsymbolsQuantum Physics (quant-ph)Hamiltonian (quantum mechanics)

researchProduct

MARL-Ped+Hitmap: Towards Improving Agent-Based Simulations with Distributed Arrays

2016

Multi-agent systems allow the modelling of complex, heterogeneous, and distributed systems in a realistic way. MARL-Ped is a multi-agent system tool, based on the MPI standard, for the simulation of different scenarios of pedestrians who autonomously learn the best behavior by Reinforcement Learning. MARL-Ped uses one MPI process for each agent by design, with a fixed fine-grain granularity. This requirement limits the performance of the simulations for a restricted number of processors that is lesser than the number of agents. On the other hand, Hitmap is a library to ease the programming of parallel applications based on distributed arrays. It includes abstractions for the automatic parti…

020203 distributed computingComputer scienceDistributed computingMessage passing0202 electrical engineering electronic engineering information engineeringProcess (computing)Reinforcement learning020207 software engineering02 engineering and technologyCrowd simulationGranularityPartition (database)

researchProduct

Towards Intelligent IoT Networks: Reinforcement Learning for Reliable Backscatter Communications

2019

Backscatter communication is becoming the focal point of research for low-powered Internet of things (IoT). However, the intelligence aspect of the backscattering devices is not well-defined. Since future IoT networks are going to be a formidable platform of intelligent sensing devices operating in a self-organizing manner, it is necessary to incorporate learning capabilities in backscatter devices. Motivated by this objective, this paper aims to employ reinforcement learning for improving the performance of backscatter networks. In particular, a multicluster backscatter communication model is developed for shortrange information sharing. This is followed by a power allocation algorithm usi…

0203 mechanical engineeringBackscatterComputer scienceInformation sharingDistributed computing0202 electrical engineering electronic engineering information engineeringReinforcement learning020302 automobile design & engineering020206 networking & telecommunications02 engineering and technologyCeiling (cloud)Interference (wave propagation)Power (physics)2019 IEEE Globecom Workshops (GC Wkshps)

researchProduct

Online fitted policy iteration based on extreme learning machines

2016

Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to learn useful policies and the poor scalability to high-dimensional problems due to the use of local approximators. This paper presents a novel RL algorithm, called online fitted policy iteration (OFPI), that steps forward in both directions. OFPI is based on a semi-batch scheme that increases the convergence speed by reusing data and enables the use of global approximators by reformulating the valu…

0209 industrial biotechnologyInformation Systems and ManagementRadial basis function networkArtificial neural networkComputer sciencebusiness.industryStability (learning theory)02 engineering and technologyMachine learningcomputer.software_genreManagement Information Systems020901 industrial engineering & automationArtificial IntelligenceBellman equation0202 electrical engineering electronic engineering information engineeringBenchmark (computing)Reinforcement learning020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerSoftwareExtreme learning machineKnowledge-Based Systems

researchProduct

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

2020

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents…

0209 industrial biotechnologyreinforcement learningComputer scienceGeneral Mathematics02 engineering and technologypedestrian simulationTask (project management)learning by demonstration020901 industrial engineering & automationAprenentatgeInformàticaBellman equation0202 electrical engineering electronic engineering information engineeringComputer Science (miscellaneous)Reinforcement learningEngineering (miscellaneous)business.industrycausal entropylcsh:MathematicsProcess (computing)020206 networking & telecommunicationsFunction (mathematics)inverse reinforcement learninglcsh:QA1-939Problem domainTable (database)Artificial intelligenceTemporal difference learningbusinessoptimizationMathematics

researchProduct

2019

As rats learn to search for multiple sources of food or water in a complex environment, they generate increasingly efficient trajectories between reward sites. Such spatial navigation capacity involves the replay of hippocampal place-cells during awake states, generating small sequences of spatially related place-cell activity that we call "snippets". These snippets occur primarily during sharp-wave-ripples (SWRs). Here we focus on the role of such replay events, as the animal is learning a traveling salesperson task (TSP) across multiple trials. We hypothesize that snippet replay generates synthetic data that can substantially expand and restructure the experience available and make learni…

0301 basic medicineComputer sciencePlace cellMachine learningcomputer.software_genreSpatial memorySynthetic data03 medical and health sciencesCellular and Molecular Neuroscience0302 clinical medicineModels of neural computationGeneticsReinforcement learningMolecular BiologyEcology Evolution Behavior and SystematicsEcologybusiness.industryReservoir computingSnippet030104 developmental biologyComputational Theory and MathematicsModeling and SimulationSequence learningArtificial intelligencebusinesscomputer030217 neurology & neurosurgeryPLOS Computational Biology

researchProduct

Reinforcement learning in synthetic gene circuits.

2020

Synthetic gene circuits allow programming in DNA the expression of a phenotype at a given environmental condition. The recent integration of memory systems with gene circuits opens the door to their adaptation to new conditions and their re-programming. This lays the foundation to emulate neuromorphic behaviour and solve complex problems similarly to artificial neural networks. Cellular products such as DNA or proteins can be used to store memory in both digital and analog formats, allowing cells to be turned into living computing devices able to record information regarding their previous states. In particular, synthetic gene circuits with memory can be engineered into living systems to al…

0303 health sciencesArtificial neural networkComputer scienceQH02 engineering and technologyDNA021001 nanoscience & nanotechnologyQ1BiochemistryExpression (mathematics)Living systems03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITIONNeuromorphic engineeringSynthetic geneHuman–computer interactionArtificial IntelligenceGenes SyntheticReinforcement learningQDGene Regulatory Networks0210 nano-technologyAdaptation (computer science)030304 developmental biologyElectronic circuitBiochemical Society transactions

researchProduct

Cortical Recruitment Determines Learning Dynamics and Strategy

2018

AbstractSalience is a broad and widely used concept in neuroscience whose neuronal correlates, however, remain elusive. In behavioral conditioning, salience is used to explain various effects, such as stimulus overshadowing, and refers to how fast and strongly a stimulus can be associated with a conditioned event. Here, we show that sounds of diverse quality, but equal intensity and perceptual detectability, can recruit different levels of population activity in mouse auditory cortex. When using these sounds as cues in a Go/NoGo discrimination task, the degree of cortical recruitment matches the salience parameter of a reinforcement learning model used to analyze learning speed. We test an …

0303 health scienceseducation.field_of_studymedia_common.quotation_subjectPopulationStimulus (physiology)OptogeneticsAuditory cortexStimulus Salience03 medical and health sciences0302 clinical medicineSalience (neuroscience)PerceptionReinforcement learning10. No inequalityeducationPsychologyAssociation (psychology)Neuroscience030217 neurology & neurosurgerymedia_common030304 developmental biologySSRN Electronic Journal

researchProduct

Acute stress impairs reward positivity effect in probabilistic learning

2019

Decision making based on feedback learning requires a series of cognitive processes, including estimating the probability of particular outcomes and modulating expectations between expected versus actual outcomes. It has been suggested that stress affects decision making and subsequent processing of feedback valence and magnitude. However, less is known about the effect of acute stress on reward expectancy. In the current study, participants performed a probabilistic learning task, in which they learned an association between response and feedback within different reward expectancy trials (30% and 70%) under the conditions of stress (threat of shock) and safety (no shock). We recorded event…

AdultMalemedicine.medical_specialtyCognitive NeuroscienceExperimental and Cognitive PsychologyAudiology050105 experimental psychologyCorrelationYoung Adult03 medical and health sciences0302 clinical medicineRewardDevelopmental NeurosciencemedicineHumansReinforcement learningAttention0501 psychology and cognitive sciencesAcute stressValence (psychology)Positivity effectEvoked PotentialsBiological PsychiatryExpectancy theoryEndocrine and Autonomic SystemsGeneral Neuroscience05 social sciencesProbabilistic logicAssociation LearningCognitionAnticipation PsychologicalNeuropsychology and Physiological PsychologyNeurologyFemaleProbability LearningPsychologyPsychomotor PerformanceStress Psychologicalpsychological phenomena and processes030217 neurology & neurosurgeryPsychophysiology

researchProduct

Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning

2015

This paper reports the first known solution to the N-Door puzzle when the environment is both non-stationary and deceptive (adversarial learning). The Multi-Armed-Bandit (MAB) problem is the iconic representation of the exploration versus exploitation dilemma. In brief, a gambler repeatedly selects and play, one out of N possible slot machines or arms and either receives a reward or a penalty. The objective of the gambler is then to locate the most rewarding arm to play, while in the process maximize his winnings. In this paper we investigate a challenging variant of the MAB problem, namely the non-stationary N-Door puzzle. Here, instead of directly observing the reward, the gambler is only…

Adversarial systemComputer scienceProperty (programming)business.industryProcess (computing)Reinforcement learningArtificial intelligencebusinessRepresentation (mathematics)Bayesian inferenceMulti-armed banditThompson sampling2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

researchProduct