Search results for "Bandit"
showing 10 items of 18 documents
Simple learning rules to cope with changing environments
2008
10 pages; International audience; We consider an agent that must choose repeatedly among several actions. Each action has a certain probability of giving the agent an energy reward, and costs may be associated with switching between actions. The agent does not know which action has the highest reward probability, and the probabilities change randomly over time. We study two learning rules that have been widely used to model decision-making processes in animals-one deterministic and the other stochastic. In particular, we examine the influence of the rules' 'learning rate' on the agent's energy gain. We compare the performance of each rule with the best performance attainable when the agent …
Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning
2015
This paper reports the first known solution to the N-Door puzzle when the environment is both non-stationary and deceptive (adversarial learning). The Multi-Armed-Bandit (MAB) problem is the iconic representation of the exploration versus exploitation dilemma. In brief, a gambler repeatedly selects and play, one out of N possible slot machines or arms and either receives a reward or a penalty. The objective of the gambler is then to locate the most rewarding arm to play, while in the process maximize his winnings. In this paper we investigate a challenging variant of the MAB problem, namely the non-stationary N-Door puzzle. Here, instead of directly observing the reward, the gambler is only…
Nemo teneatur ad impossibile. Las consecuencias de la pragmática para la extirpación del bandolerismo valenciano: cláusulas relativas a la punición d…
2014
Obsesionado con atajar a cualquier precio el problema del bandolerismo, a mediados de 1586 el virrey Aytona publicó en Valencia una pragmática que hacía recaer sobre los dueños de lugares y las autoridades municipales la responsabilidad principal de la lucha contra el crimen. A tenor de las cuantiosas multas que, en aplicación de la misma, se les impondrían durante los 18 años en que la norma estuvo en vigor, en particular por incumplir las cláusulas concernientes al esclarecimiento y sanción de homicidios, cabe concluir que la corona encontró en ella un poderoso instrumento para obligar a los señores y a las oligarquías locales a colaborar más estrechamente en la ardua tarea de asegurar la…
Near surface seismostratigraphic modelling of the Bandita plain in Palermo town (Italy) from integra-ted analysis of HVSR and stratigraphic data
2016
The Horizontal to Vertical Spectral Ratio (HVSR) noise method (Nakamura, 1989) is nowadays widely used to estimate the resonance frequencies of geological structures (Bonnefoy-Claudet, 2006). However, often HVSR is also used to obtain information on the depth of the seismic bedrock and on thickness and seismic velocity of the process overburden deposits, using inversion techniques of the H/V curve (Fäh et al., 2003). This nevertheless produce results with large uncertainty intervals of parameters, and then must be necessarily constrained by detailed stratigraphic information. An application of HVSR inversion is presented in order to verify the effectiveness of this technique for purposes of…
Pliegos poéticos de bandoleros en la Cataluña del barroco. Un ejemplo de literatura propagandística
2018
En aquest estudi s’analitza la literatura popular de bandolers de la Catalunya del segle xvii. El bandolerisme sigué un fenomen important en les primeres dècades del barroc en el Principat i la seua repercussió arribà, de manera notable, a la literatura de cordell de l’època. Aquests plecs de cordell, que tenien la missió d’arribar a tots els públics, compten amb un nombre reduït d’estudis, si tenim en compte la seua importància. Per eixa raó, l’article analitza els dos principals moments històrics que propiciaren el major auge d’edició de plecs de bandolers en vers a Catalunya, com foren les signatures de les unions i dels agermanaments contra els bandolers (1606) i el sistema repressiu de…
An AI for dominion based on Monte-Carlo methods
2014
Masteroppgave i Informasjons- og kommunikasjonsteknologi IKT590 Universitetet i Agder 2014 To the best of our knowledge there exists no Arti_cial Intelligence (AI)for Dominion which uses Monte Carlo methods, that is competitive on ahuman level. This thesis presents such an AI, and tests it against someof the top Dominion strategies available. Although in a limited testingenvironment, the results show that our AI is capable of competing withhuman players, while keeping processing time per move at an acceptablelevel for human players. Although the approach for our AI is built onprevious knowledge about Upper Con_dence Bounds (UCB) and UCBapplied to Trees (UCT), an approach for handling the st…
Perfiles básicos del bandolerismo morisco valenciano: del desarme a la expulsión (1563-1609)
2009
espanolA la imagen del problema del bandolerismo morisco valenciano legada por Sebastian Garcia Martinez se contrapone en estas paginas una vision alternativa, basada en el empleo de dos fuentes principales: los libros de cuentas del Maestre Racional y las conclusiones criminales de la Real Audiencia. Dos son los aspectos fundamentales que se revisan: la geografia del fenomeno, a partir de la distincion entre los lugares de origen de los fuera de la ley y los escenarios donde perpetraron sus crimenes, y su evolucion desde 1563 hasta 1609, periodo a lo largo del cual pueden diferenciarse varias fases, tanto desde la perspectiva de la actividad delictiva, como desde la de la energia represiva…
Alocucion de ... Pio ... VI tenida en el Consistorio secreto dia 13 de Noviembre 1775 de la preciosa muerte de Jacinto Castañeda español i Vicente …
1775
Corren exemplars sense port., que comencen amb una carta introductòria de Fr. Francisco Ruiz, Provincial de la Província d'Aragó i que inclouen, al verso de la p. 11, una oració del dit Provincial Escut xil. de Pius VI en les dues port. enfrentades Sign.: [ ]8 Notes a peu de pàg Reclams Doble port., en llatí i en castellà, i text bilingüe a dues col.: llatí i castellà
Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters
2010
Published version of an article from Lecture Notes in Computer Science. Also available at SpringerLink: http://dx.doi.org/10.1007/978-3-642-13033-5_21 The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Alt…
Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning
2015
The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only tol…