6533b836fe1ef96bd12a14ac

RESEARCH PRODUCT

Learning Automata Based Q-learning for Content Placement in Cooperative Caching

Yue ChenYuanwei LiuZhong YangLei Jiao

subject

Signal Processing (eess.SP)Optimization problemLearning automatabusiness.industryComputer scienceMean opinion scoreQ-learningComputingMilieux_LEGALASPECTSOFCOMPUTING020206 networking & telecommunications02 engineering and technologycomputer.software_genreAction selectionIntelligent agentRecurrent neural networkFOS: Electrical engineering electronic engineering information engineering0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingQuality of experienceArtificial intelligenceElectrical and Electronic EngineeringElectrical Engineering and Systems Science - Signal ProcessingbusinessVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550computer

description

An optimization problem of content placement in cooperative caching is formulated, with the aim of maximizing sum mean opinion score (MOS) of mobile users. Firstly, a supervised feed-forward back-propagation connectionist model based neural network (SFBC-NN) is invoked for user mobility and content popularity prediction. More particularly, practical data collected from GPS-tracker app on smartphones is tackled to test the accuracy of mobility prediction. Then, a learning automata-based Q-learning (LAQL) algorithm for cooperative caching is proposed, in which learning automata (LA) is invoked for Q-learning to obtain an optimal action selection in a random and stationary environment. It is proven that the LA-based action selection scheme is capable of enabling every state to select the optimal action with arbitrarily high probability if Q-learning is able to converge to the optimal Q value eventually. To characterize the performance of the proposed algorithms, the sum MOS of users is applied to define the reward function. Extensive simulations reveal that: 1) The prediction error of SFBC-NN lessen with the increase of iterations and nodes; 2) the proposed LAQL achieves significant performance improvement against traditional Q-learning; 3) the cooperative caching scheme is capable of outperforming non-cooperative caching and random caching of 3% and 4%.

http://arxiv.org/abs/1903.06235