


{"id":69,"date":"2016-10-21T12:57:01","date_gmt":"2016-10-21T10:57:01","guid":{"rendered":"http:\/\/project.inria.fr\/badass\/?page_id=69"},"modified":"2020-11-28T10:19:30","modified_gmt":"2020-11-28T09:19:30","slug":"publication","status":"publish","type":"page","link":"https:\/\/project.inria.fr\/badass\/publication\/","title":{"rendered":"Publications"},"content":{"rendered":"<div>The following works have been supported by the French Agence Nationale de la Recherche (ANR), under grant ANR-16- CE40-0002 (project BADASS).<\/div>\n<div><\/div>\n<div>Please take a look at <a href=\"https:\/\/haltools.inria.fr\/Public\/afficheRequetePubli.php?typdoc=(%27ART%27,%27COMM%27,%27POSTER%27,%27THESE%27,%27HDR%27,%27OTHERREPORT%27)&amp;projet_anr=16-CE40-0002&amp;CB_auteur=oui&amp;CB_titre=oui&amp;CB_article=oui&amp;CB_resume=oui&amp;langue=Anglais&amp;tri_exp=annee_publi&amp;tri_exp2=typdoc&amp;tri_exp3=date_publi&amp;ordre_aff=TA&amp;Fen=Aff&amp;css=..\/css\/VisuRubriqueEncadre.css\">this link on HaL<\/a> for a complete list of publications.<\/div>\n<p style=\"text-align: justify;\"><strong><span style=\"color: #999999;\">(WP3)<\/span> Monte-Carlo Tree Search by Best Arm Identification, <\/strong><span class=\"author\"><a href=\"https:\/\/hal.archives-ouvertes.fr\/search\/index\/q\/*\/authIdHal_s\/emilie-kaufmann\" target=\"_blank\" rel=\"nofollow noopener\">E. Kaufmann<\/a>\u00a0<\/span> <span class=\"author\"> <a href=\"https:\/\/hal.archives-ouvertes.fr\/search\/index\/q\/*\/authFullName_s\/Wouter+M.+Koolen\" target=\"_blank\" rel=\"nofollow noopener\">W. Koolen<\/a><\/span> in <span style=\"color: #999999;\"><i>NIPS 2017 &#8211; 31st Annual Conference on Neural Information Processing Systems<\/i>, Dec 2017, Long Beach, United States.<\/span> <a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-01535907v2\">HaL<\/a><br \/>\n<span style=\"color: #333399;\"><em>\u00abRecent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems. We study the game tree search problem, where the goal is to quickly identify the optimal move in a given game tree by sequentially sampling its stochastic payoffs. We develop new algorithms for trees of arbitrary depth, that operate by summarizing all deeper levels of the tree into confidence intervals at depth one, and applying a best arm identification procedure at the root. We prove new sample complexity guarantees with a refined dependence on the problem instance. We show experimentally that our algorithms outperform existing elimination-based algorithms and match previous special-purpose methods for depth-two trees.\u00bb<\/em><\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"color: #999999;\">(WP2)<\/span> Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings<\/strong>, <span class=\"author\"> <a href=\"https:\/\/hal.archives-ouvertes.fr\/search\/index\/q\/*\/authIdHal_s\/remi-bonnefoi\" target=\"_blank\" rel=\"nofollow noopener\">R. Bonnefoi<\/a> <\/span> <span class=\"author\"> <a href=\"https:\/\/hal.archives-ouvertes.fr\/search\/index\/q\/*\/authIdHal_s\/lilian-besson\" target=\"_blank\" rel=\"nofollow noopener\">L. Besson<\/a>\u00a0<\/span> <span class=\"author\"> <a href=\"https:\/\/hal.archives-ouvertes.fr\/search\/index\/q\/*\/authIdHal_s\/christophe-moy\" target=\"_blank\" rel=\"nofollow noopener\">C. Moy<\/a>\u00a0<\/span> <span class=\"author\"> <a href=\"https:\/\/hal.archives-ouvertes.fr\/search\/index\/q\/*\/authIdHal_s\/emilie-kaufmann\" target=\"_blank\" rel=\"nofollow noopener\">E. Kaufmann<\/a>\u00a0<\/span><span class=\"author\"> <a href=\"https:\/\/hal.archives-ouvertes.fr\/search\/index\/q\/*\/authFullName_s\/Jacques+Palicot\" target=\"_blank\" rel=\"nofollow noopener\">J. Palicot<\/a> <\/span> in <span style=\"color: #999999;\"><i>CROWNCOM 2017 &#8211; 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks<\/i>, Sep 2017, Lisbon, Portugal. <a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-01575419v1\">HaL<\/a><\/span><br \/>\n<span style=\"color: #333399;\"><em>\u00abLa mise en place des futurs r\u00e9seaux Internet des Objets (IoT) n\u00e9cessitera de supporter de plus en plus d&#8217;appareils communicants. Nous prouvons que les objets adaptatifs, dans des bandes non licenci\u00e9es, peuvent utiliser les algorithmes d&#8217;apprentissage de type Bandit Multi-Bras (MAB) pour am\u00e9liorer l&#8217;exploitation des ressources. Nous \u00e9valuons les performances de deux algorithmes classiques d&#8217;apprentissage MAB, UCB1 et Thompson Sampling, pour prendre en charge la prise de d\u00e9cision d\u00e9centralis\u00e9e d&#8217;Analyse de Spectre, appliqu\u00e9e aux r\u00e9seaux IoT, ainsi que les performances d&#8217;apprentissage avec un nombre croissant d&#8217;objets intelligents. Nous montrons que l&#8217;utilisation d&#8217;algorithmes d&#8217;apprentissage aide \u00e0 adapter un plus grand nombre de dispositifs dans de tels r\u00e9seaux, m\u00eame lorsque tous les appareils finaux sont intelligents et changent de canal de fa\u00e7on dynamique. Dans le sc\u00e9nario \u00e9tudi\u00e9, l&#8217;apprentissage stochastique (MAB) fournit un gain allant jusqu&#8217;\u00e0 16% en terme de probabilit\u00e9s de transmission r\u00e9ussie, et a des performances quasi optimales m\u00eame dans les situations non stationnaires et non i.i.d. avec une majorit\u00e9 d&#8217;appareils intelligents.\u00bb<\/em><\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"color: #999999;\">(WP2)<\/span> Efficient tracking of a growing number of experts<\/strong>, J. Mourtada, O.-A. Maillard in <span style=\"color: #999999;\">Algorithmic Learning Theory (ALT), 2017.<\/span><br \/>\n<span style=\"color: #333399;\"><em>\u00abWe consider a variation on the problem of prediction with expert advice, where new forecasters that were unknown until then may appear at each round. As often in prediction with expert advice, designing an algorithm that achieves near-optimal regret guarantees is straightforward, using aggregation of experts. However, when the comparison class is sufficiently rich, for instance when the best expert and the set of experts itself change over time, such strategies require to maintain a prohibitive number of weights (typically exponential with the time horizon). By contrast, designing strategies that both achieve a near-optimal regret and maintain a reasonable number of weights is highly non-trivial. We consider three increasingly challenging objectives (simple regret, shifting regret and sparse shifting regret) that extend existing notions defined for a fixed expert ensemble; in each case, we design strategies that achieve tight regret bounds, adaptive to the parameters of the comparison class, while being computationally inexpensive. Moreover, our algorithms are anytime, agnostic to the number of incoming experts and completely parameter-free. Such remarkable results are made possible thanks to two simple but highly effective recipes: first the \u201cabstention trick\u201d that comes from the specialist framework and enables to handle the least challenging notions of regret, but is limited when addressing more sophisticated objectives. Second, the \u201cmuting trick\u201d that we introduce to give more flexibility. We show how to combine these two tricks in order to handle the most challenging class of comparison strategies.\u00bb<\/em><\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"color: #999999;\">(WP1)<\/span> Boundary Crossing Probabilities for General Exponential Families<\/strong>, O.-A. Maillard in <span style=\"color: #999999;\">Algorithmic Learning Theory (ALT), 2017<\/span>. <a href=\"https:\/\/arxiv.org\/abs\/1705.08814\">ArXiv<\/a>.<br \/>\n<span style=\"color: #333399;\"><em>\u00abWe consider parametric exponential families of dimension <span id=\"MathJax-Element-1-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-1\" class=\"math\"><span id=\"MathJax-Span-2\" class=\"mrow\"><span id=\"MathJax-Span-3\" class=\"mi\">K<\/span><\/span><\/span><\/span> on the real line. We study a variant of \\textit{boundary crossing probabilities} coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension <span id=\"MathJax-Element-2-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-4\" class=\"math\"><span id=\"MathJax-Span-5\" class=\"mrow\"><span id=\"MathJax-Span-6\" class=\"mi\">K<\/span><\/span><\/span><\/span>. Formally, our result is a concentration inequality that bounds the probability that <span id=\"MathJax-Element-3-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-7\" class=\"math\"><span id=\"MathJax-Span-8\" class=\"mrow\"><span id=\"MathJax-Span-9\" class=\"msubsup\"><span id=\"MathJax-Span-10\" class=\"texatom\"><span id=\"MathJax-Span-11\" class=\"mrow\"><span id=\"MathJax-Span-12\" class=\"mi\">B^<\/span><\/span><\/span><span id=\"MathJax-Span-13\" class=\"mi\">\u03c8<\/span><\/span><span id=\"MathJax-Span-14\" class=\"mo\">(<\/span><span id=\"MathJax-Span-15\" class=\"msubsup\"><span id=\"MathJax-Span-16\" class=\"texatom\"><span id=\"MathJax-Span-17\" class=\"mrow\"><span id=\"MathJax-Span-18\" class=\"munderover\"><span id=\"MathJax-Span-19\" class=\"mi\">\u03b8<\/span><span id=\"MathJax-Span-20\" class=\"mo\">^<\/span><\/span><\/span><\/span><span id=\"MathJax-Span-21\" class=\"mi\">n<\/span><\/span><span id=\"MathJax-Span-22\" class=\"mo\">,<\/span><span id=\"MathJax-Span-23\" class=\"msubsup\"><span id=\"MathJax-Span-24\" class=\"mi\">\u03b8<\/span><span id=\"MathJax-Span-25\" class=\"mo\">\u22c6<\/span><\/span><span id=\"MathJax-Span-26\" class=\"mo\">)<\/span><span id=\"MathJax-Span-27\" class=\"mo\">\u2265<\/span><span id=\"MathJax-Span-28\" class=\"mi\">f<\/span><span id=\"MathJax-Span-29\" class=\"mo\">(<\/span><span id=\"MathJax-Span-30\" class=\"mi\">t<\/span><span id=\"MathJax-Span-31\" class=\"texatom\"><span id=\"MathJax-Span-32\" class=\"mrow\"><span id=\"MathJax-Span-33\" class=\"mo\">\/<\/span><\/span><\/span><span id=\"MathJax-Span-34\" class=\"mi\">n<\/span><span id=\"MathJax-Span-35\" class=\"mo\">)<\/span><span id=\"MathJax-Span-36\" class=\"texatom\"><span id=\"MathJax-Span-37\" class=\"mrow\"><span id=\"MathJax-Span-38\" class=\"mo\">\/<\/span><\/span><\/span><span id=\"MathJax-Span-39\" class=\"mi\">n<\/span><\/span><\/span><\/span>, where <span id=\"MathJax-Element-4-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-40\" class=\"math\"><span id=\"MathJax-Span-41\" class=\"mrow\"><span id=\"MathJax-Span-42\" class=\"msubsup\"><span id=\"MathJax-Span-43\" class=\"mi\">\u03b8<\/span><span id=\"MathJax-Span-44\" class=\"mo\">\u22c6<\/span><\/span><\/span><\/span><\/span> is the parameter of an unknown target distribution, <span id=\"MathJax-Element-5-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-45\" class=\"math\"><span id=\"MathJax-Span-46\" class=\"mrow\"><span id=\"MathJax-Span-47\" class=\"msubsup\"><span id=\"MathJax-Span-48\" class=\"texatom\"><span id=\"MathJax-Span-49\" class=\"mrow\"><span id=\"MathJax-Span-50\" class=\"munderover\"><span id=\"MathJax-Span-51\" class=\"mi\">\u03b8<\/span><span id=\"MathJax-Span-52\" class=\"mo\">^<\/span><\/span><\/span><\/span><span id=\"MathJax-Span-53\" class=\"mi\">n<\/span><\/span><\/span><\/span><\/span> is the empirical parameter estimate built from <span id=\"MathJax-Element-6-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-54\" class=\"math\"><span id=\"MathJax-Span-55\" class=\"mrow\"><span id=\"MathJax-Span-56\" class=\"mi\">n<\/span><\/span><\/span><\/span> observations, <span id=\"MathJax-Element-7-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-57\" class=\"math\"><span id=\"MathJax-Span-58\" class=\"mrow\"><span id=\"MathJax-Span-59\" class=\"mi\">\u03c8<\/span><\/span><\/span><\/span> is the log-partition function of the exponential family and <span id=\"MathJax-Element-8-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-60\" class=\"math\"><span id=\"MathJax-Span-61\" class=\"mrow\"><span id=\"MathJax-Span-62\" class=\"msubsup\"><span id=\"MathJax-Span-63\" class=\"texatom\"><span id=\"MathJax-Span-64\" class=\"mrow\"><span id=\"MathJax-Span-65\" class=\"mi\">B<\/span><\/span><\/span><span id=\"MathJax-Span-66\" class=\"mi\">\u03c8<\/span><\/span><\/span><\/span><\/span> is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function <span id=\"MathJax-Element-9-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-67\" class=\"math\"><span id=\"MathJax-Span-68\" class=\"mrow\"><span id=\"MathJax-Span-69\" class=\"mi\">f<\/span><\/span><\/span><\/span> is logarithmic, as it is enables to analyze the regret of the state-of-the-art KL-UCB and KL-UCB+ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when <span id=\"MathJax-Element-10-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-70\" class=\"math\"><span id=\"MathJax-Span-71\" class=\"mrow\"><span id=\"MathJax-Span-72\" class=\"mi\">K<\/span><span id=\"MathJax-Span-73\" class=\"mo\">=<\/span><span id=\"MathJax-Span-74\" class=\"mn\">1<\/span><\/span><\/span><\/span>, while we provide results for arbitrary finite dimension <span id=\"MathJax-Element-11-Frame\" class=\"MathJax\"><span id=\"MathJax-Span-75\" class=\"math\"><span id=\"MathJax-Span-76\" class=\"mrow\"><span id=\"MathJax-Span-77\" class=\"mi\">K<\/span><\/span><\/span><\/span>, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T.L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.\u00bb<\/em><\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"color: #999999;\">(WP3)<\/span> Spectral Learning from a Single Trajectory under Finite-State Policies<\/strong>, B. Balle, O.-A. Maillard in <span style=\"color: #999999;\">International conference on Machine Learning, 2017<\/span>.<br \/>\n<span style=\"color: #333399;\"><em>\u00ab<\/em><\/span><span style=\"color: #333399;\"><em>We present spectral methods of moments for <\/em><em>learning sequential models from a single trajectory, <\/em><em>in stark contrast with the classical literature <\/em><em>that assumes the availability of multiple i.i.d. <\/em><\/span><span style=\"color: #333399;\"><em>trajectories. Our approach leverages an efficient<\/em> <em>SVD-based learning algorithm for weighted automata <\/em><em>and provides the first rigorous analysis <\/em><em>for learning many important models using dependent<\/em> <em>data. We state and analyze the algorithm under <\/em><em>three increasingly difficult scenarios: probabilistic <\/em><em>automata, stochastic weighted automata, <\/em><em>and reactive predictive state representations controlled<\/em> <em>by a finite-state policy. Our proofs include <\/em><em>novel tools for studying mixing properties <\/em><em>of stochastic weighted automata.<\/em><\/span><span style=\"color: #0000ff;\"><em> \u00bb<\/em><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The following works have been supported by the French Agence Nationale de la Recherche (ANR), under grant ANR-16- CE40-0002 (project BADASS). Please take a look at this link on HaL for a complete list of publications. (WP3) Monte-Carlo Tree Search by Best Arm Identification, E. Kaufmann\u00a0 W. Koolen in NIPS\u2026<\/p>\n<p> <a class=\"continue-reading-link\" href=\"https:\/\/project.inria.fr\/badass\/publication\/\"><span>Continue reading<\/span><i class=\"crycon-right-dir\"><\/i><\/a> <\/p>\n","protected":false},"author":950,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-69","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/pages\/69","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/users\/950"}],"replies":[{"embeddable":true,"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/comments?post=69"}],"version-history":[{"count":15,"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/pages\/69\/revisions"}],"predecessor-version":[{"id":194,"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/pages\/69\/revisions\/194"}],"wp:attachment":[{"href":"https:\/\/project.inria.fr\/badass\/wp-json\/wp\/v2\/media?parent=69"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}