Ph.D. proposal on “Transfer in multi-armed bandit and reinforcement learning”

Keywords: reinforcement learning, multi-armed bandit, transfer learning, exploration-exploitation, representation learning, hierarchical learning.

Research Topic

This main objective of this Ph.D. research project is to advance the state-of-the-art in the field of multi-armed banditand reinforcement learning (RL) through the development of novel transfer learning algorithms.

Reinforcement learning (RL) formalizes the problem of learning an optimal behavior policy from the experience directly collected from an unknown environment. Such general model already provides powerful tools that can be used to learn from data in a very diverse range of applications (e.g., see successful applications of RL to computer games, energy management, logistics, and autonomous robotics). Nonetheless, practical limitations of current algorithms encouraged research in developing efficient ways to integrate expert prior knowledge into the learning process. Although this improves the performance of RL algorithms, it dramatically reduces their autonomy, since it requires a constant supervision by a domain expert. A solution to this problem is provided by transfer learning, which is directly motivated by the observation that one of the key features that allows humans to accomplish complicated tasks is their ability of building general knowledge from past experience and transfer it in learning new tasks. Thus, we believe that bringing the capability of transfer of learning to existing machine learning algorithms will enable them to solve series of tasks in complex and unknown environments. The objective is to develop algorithms that not only learn from experience but also extract knowledge and transfer it through different tasks; thus obtaining a dramatic speed-up in the learning process and a significant improvement of its overall performance. Thus, the general objective in this Ph.D. project is to design RL algorithms able to incrementally discover, construct, and transfer “prior” knowledge in a fully automatic way.

Research Program

While the idea of transfer learning has been applied in a series of machine learning problems, its integration in RL is much more complicated. In fact, the number of scenarios that can be constructed and the different types of knowledge that can be constructed and transferred is much larger than in simpler problems, such as supervised learning. During the Ph.D. we will thus investigate a variety of approaches to transfer in RL, ranging from transfer of sample to transfer of representations. More in detail, we will focus our attention on three aspects of RL algorithms that could significantly benefit from transfer of knowledge:

(i) Exploration. Which knowledge transfer can provably improve the exploration-exploitation performance of an RL agent in terms of sample complexity and regret?
(ii) Representation. Which techniques of representation better fit into transfer in RL?
(iii)Hierarchical structures. Is it possible to prove the advantage of hierarchical structures over flat structures in RL (e.g., options)? Under which assumptions? How can we create such hierarchies automatically?

The previous questions will require theoretical, algorithmic and empirical study. The Ph.D. will cover different learning scenarios (e.g., multi-armed bandit, linear bandit, contextual bandit, full reinforcement learning) and different validation environments (e.g., fully synthetic, off-line evaluation from logged data, online simulation). As such, we expect the Ph.D. to produce a variety of results:

  • Theoretical study of the conditions and the type of improvement brought by transfer methods w.r.t. no-transfer standard RL algorithms.
  • Empirical validation of the proposed algorithms and comparison with existing transfer and no-transfer methods.
  • Investigation of the application of transfer in RL to real-world problems such as recommendation systems, trading, and computer games.


The applicant must have a Master of Science in Computer Science, Statistics, or related fields, possibly with background in reinforcement learning, bandits, or optimization. Candidates with either very strong mathematical or computer science background will be considered. The working language in the lab is English, a good written and oral communication skills are required.


The application should include a brief description of research interests and past experience, a CV, degrees and grades, a copy of Master thesis (or a draft thereof), motivation letter (short but pertinent to this call), relevant publications, and other relevant documents. Candidates are encouraged to provide letter(s) of recommendation and contact information to reference persons. Please send your application in one single pdf to The deadline for the application is May 10, 2015. The final decision will be communicated in June/July 2015.

  • Application closing date: May 15, 2015
  • Interviews: May/June, 2015
  • Duration: 3 years (a full time position)
  • Starting date: October 15st, 2015 (flexible)
  • Supervisors: Alessandro Lazaric
  • Place: SequeL, INRIA Lille – Nord Europe

Working environment

The PhD candidate will work at SequeL ( lab at Inria Lille – Nord Europe located in Lille. Inria( is France’s leading institution in Computer Science, with over 2800 scientists employed, of which around 250 in Lille. Lille is the capital of the north of France, a metropolis with 1 million inhabitants, with excellent train connection to Brussels (30 min), Paris (1h) and London (1h30). The research team SequeL (Sequential Learning) is composed of about 20 members working in machine learning, notably in reinforcement learning, multi-armed bandit, statistical learning, and sequence prediction. The Ph.D. program will be co-funded by the ANR ExTra-Learn project, which is entirely focused on the problem of transfer in RL.


  • Salary: 1957,54 € the first two years and 2058,84 € the third year
  • Salary after taxes: around 1597,11€ the 1st two years and 1679,76 € the 3rd year (benefits included).
  • Possibility of French courses
  • Help for housing
  • Participation for public transport
  • Scientific Resident card and help for husband/wife visa


D. Calandriello, A. Lazaric, M. Restelli. “Sparse Multi-task Reinforcement Learning”. In Proceedings of the Twenty-Eigth Annual Conference on Neural Information Processing Systems (NIPS’14), 2014.
M. Gheshlaghi-Azar, A. Lazaric, E. Brunskill. “Resource-efficient Stochastic Optimization of a Locally Smooth Function under Correlated Bandit Feedback”. In Proceedings of the Thirty-First International Conference on Machine Learning (ICML’14), 2014.
M. Azar, A. Lazaric, and E. Brunskill. “Sequential Transfer in Multi-armed Bandit with Finite Set of Models”. In: Proceedings of the Twenty-Seventh Annual Conference on Neural Information Processing Systems (NIPS’13). 2013. pp. 2220-2228.
A. Lazaric and M. Restelli. “Transfer from Multiple MDPs”. In Proceedings of the Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS’11), 2011.
A. Lazaric. “Transfer in Reinforcement Learning: a Framework and a Survey”. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art, Springer, 2011.
M. E. Taylor and P. Stone. “Transfer Learning for Reinforcement Learning Domains: A Survey”. Journal of Machine Learning Research, 10(1): pp. 1633–1685, 2009.
R. S. Sutton and A. Barto. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, 1998.

Comments are closed.