Learning Automata Based Q-learning for Content Placement in Cooperative Caching

6533b836fe1ef96bd12a14ac

RESEARCH PRODUCT

Learning Automata Based Q-learning for Content Placement in Cooperative Caching

Yue Chen Yuanwei Liu Zhong Yang Lei Jiao

subject

Signal Processing (eess.SP)Optimization problem Learning automata business.industry Computer science Mean opinion score Q-learning ComputingMilieux_LEGALASPECTSOFCOMPUTING 020206 networking & telecommunications 02 engineering and technology computer.software_genre Action selection Intelligent agent Recurrent neural network FOS: Electrical engineering electronic engineering information engineering 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Quality of experience Artificial intelligence Electrical and Electronic Engineering Electrical Engineering and Systems Science - Signal Processing business VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550 computer

description

An optimization problem of content placement in cooperative caching is formulated, with the aim of maximizing sum mean opinion score (MOS) of mobile users. Firstly, a supervised feed-forward back-propagation connectionist model based neural network (SFBC-NN) is invoked for user mobility and content popularity prediction. More particularly, practical data collected from GPS-tracker app on smartphones is tackled to test the accuracy of mobility prediction. Then, a learning automata-based Q-learning (LAQL) algorithm for cooperative caching is proposed, in which learning automata (LA) is invoked for Q-learning to obtain an optimal action selection in a random and stationary environment. It is proven that the LA-based action selection scheme is capable of enabling every state to select the optimal action with arbitrarily high probability if Q-learning is able to converge to the optimal Q value eventually. To characterize the performance of the proposed algorithms, the sum MOS of users is applied to define the reward function. Extensive simulations reveal that: 1) The prediction error of SFBC-NN lessen with the increase of iterations and nodes; 2) the proposed LAQL achieves significant performance improvement against traditional Q-learning; 3) the cooperative caching scheme is capable of outperforming non-cooperative caching and random caching of 3% and 4%.

year	journal	country	edition	language
2019-03-14

http://arxiv.org/abs/1903.06235